Text-Processing
比較兩個文本文件,提取 file2 的匹配行以及其他行
一直在嘲笑這個太久並嘗試了 grep、join、awk 但我無法正確設置參數。我需要得到正確的命令。
我有兩個文本文件。
貓文件1
@ABC:11:ABC:1:1111:1111:1111 @ABC:22:ABC:1:1111:4444:4444
貓文件2
@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
我想做兩件事:
**輸出 1)**基於 file1,提取包含字元串的所有行加上兩個附加字元串。
輸出 2)基於 file1,提取所有不包含字元串的行加上另外兩行 - 但它應該只嘗試匹配以 @.. 開頭的行
範例輸出 1):
貓輸出1
@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
範例輸出 2)
貓輸出2
@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
(請不要perl)
您所展示和要求的是在 fastq 文件中 grep 一組給定的讀取。我強烈建議不要重新發明輪子,而是使用seqkit grep等現有工具。
儘管如此,這裡是“僅限bash”的變體:
連續 4 行屬於一次讀取。所以我們可以將它們全部放在一行中,用製表符分隔,grep 查找 id 並將製表符轉換回新行。
$ cat file2.fq|paste - - - -|grep -f file1.txt|tr "\t" "\n"
或者對於您的第二個輸出,我們只需使用 invert 參數
grep
$ cat file2.fq|paste - - - -|grep -v -f file1.txt|tr "\t" "\n