Text-Processing

比較兩個文本文件,提取 file2 的匹配行以及其他行

  • February 5, 2019

一直在嘲笑這個太久並嘗試了 grep、join、awk 但我無法正確設置參數。我需要得到正確的命令。

我有兩個文本文件。

貓文件1

@ABC:11:ABC:1:1111:1111:1111
@ABC:22:ABC:1:1111:4444:4444

貓文件2

@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

我想做兩件事:

**輸出 1)**基於 file1,提取包含字元串的所有行加上兩個附加字元串。

輸出 2)基於 file1,提取所有包含字元串的行加上另外兩行 - 但它應該只嘗試匹配以 @.. 開頭的行

範例輸出 1):

貓輸出1

@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

範例輸出 2)

貓輸出2

@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

(請不要perl)

您所展示和要求的是在 fastq 文件中 grep 一組給定的讀取。我強烈建議不要重新發明輪子,而是使用seqkit grep等現有工具。

儘管如此,這裡是“僅限bash”的變體:

連續 4 行屬於一次讀取。所以我們可以將它們全部放在一行中,用製表符分隔,grep 查找 id 並將製表符轉換回新行。

$ cat file2.fq|paste - - - -|grep -f file1.txt|tr "\t" "\n"

或者對於您的第二個輸出,我們只需使用 invert 參數grep

$ cat file2.fq|paste - - - -|grep -v -f file1.txt|tr "\t" "\n

引用自:https://unix.stackexchange.com/questions/498724