Bash
僅在第一列中的第一個空格後刪除字元串
以下文件是製表符分隔的。我試圖從第一列中刪除
NbLab330C00 64506568
空格後的數字以獲得NbLab330C00
.$ head LAB330_TE_annotation.gff3 ##gff-version 3 ##date Sun Feb 14 08:41:36 UTC 2021 ##Identity: Sequence identity (0-1) between the library sequence and the target region. ##ltr_identity: Sequence identity (0-1) between the left and right LTR regions. ##tsd: target site duplication. ##seqid source sequence_ontology start end score strand phase attributes NbLab330C00 64506568 EDTA Gypsy_LTR_retrotransposon 2 3364 20798 - . ID=TE_homo_0;Name=TE_00007365_INT;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.868;Method=homology NbLab330C00 64506568 EDTA Gypsy_LTR_retrotransposon 3367 4198 3385 - . ID=TE_homo_1;Name=TE_00008087_LTR;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.865;Method=homology NbLab330C00 64506568 EDTA hAT_TIR_transposon 4424 4715 1278 + . ID=TE_homo_2;Name=TE_00003964;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.834;Method=homology NbLab330C00 64506568 EDTA hAT_TIR_transposon 5236 5453 835 + . ID=TE_homo_3;Name=TE_00001425;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.828;Method=homology
我嘗試了以下
awk
命令,但它也縮短了最後一列。$ awk -v OFS='\t' '{print $1,$3,$4,$5,$7,$8,$9}' LAB330_TE_annotation.gff3 > LAB330_TE_annotation.fix.gff3 (base) ubuntu@ip-10-23-2-113:/efs/apollo/LAB330$ head LAB330_TE_annotation.fix.gff3 ##gff-version ##date Feb 14 08:41:36 2021 ##Identity: identity (0-1) between library sequence and ##ltr_identity: identity (0-1) between left and right ##tsd: site duplication. ##seqid sequence_ontology start end strand phase attributes NbLab330C00 EDTA Gypsy_LTR_retrotransposon 2 20798 - . NbLab330C00 EDTA Gypsy_LTR_retrotransposon 3367 3385 - . NbLab330C00 EDTA hAT_TIR_transposon 4424 1278 + . NbLab330C00 EDTA hAT_TIR_transposon 5236 835 + . (base) ubuntu@ip-10-23-2-113:/efs/apollo/LAB330$
如何修復上述命令,
先感謝您,
awk 'BEGIN{ OFS=FS="\t" } !/^#/{ sub(/ [0-9]+$/, "", $1) } 1 ' LAB330_TE_annotation.gff3 > LAB330_TE_annotation.fix.gff3
這將保留以未修改開頭的標題行,
#
並用空字元串替換第一個欄位末尾的空格字元,後跟至少一個數字。
您可以使用
cut
刪除第二列。預設分隔符是製表符,因此您無需指定-d
開關。$ cut -f 1,3- LAB330_TE_annotation.gff3 ##gff-version 3 ##date Sun Feb 14 08:41:36 UTC 2021 ##Identity: Sequence identity (0-1) between the library sequence and the target region. ##ltr_identity: Sequence identity (0-1) between the left and right LTR regions. ##tsd: target site duplication. ##seqid source sequence_ontology start end score strand phase attributes NbLab330C00 EDTA Gypsy_LTR_retrotransposon 2 3364 20798 - . ID=TE_homo_0;Name=TE_00007365_INT;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.868;Method=homology NbLab330C00 EDTA Gypsy_LTR_retrotransposon 3367 4198 3385 - . ID=TE_homo_1;Name=TE_00008087_LTR;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.865;Method=homology NbLab330C00 EDTA hAT_TIR_transposon 4424 4715 1278 + . ID=TE_homo_2;Name=TE_00003964;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.834;Method=homology NbLab330C00 EDTA hAT_TIR_transposon 5236 5453 835 + . ID=TE_homo_3;Name=TE_00001425;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.828;Method=homology
選擇:
$ cut -f 2 --complement LAB330_TE_annotation.gff3