Text-Processing
模式匹配並刪除整行
如果 File1 的 Column1 與 Column 1 File2 完全匹配,我想刪除文件 1 的所有行。
文件 1:
r001:21:10 21 AAAAAATTTGC * = XM:21 r002:21:10 21 YAAAATTTGC * = nM:21 r001:21:10 21 TTAAAATTTGC * = XM:21 r0012:21:10 21 LLAAAATTTGC * + XM:21 r001:21:10 21 AAAAAATTTGC * = GM:21
文件2:
r001:21:10 r001:21:20 r002:41:36 r002:41:99 r002:41:87 r0012:21:1
預期輸出:
r002:21:10 21 YAAAATTTGC * = nM:21 r0012:21:10 21 LLAAAATTTGC * + XM:21
你可以使用這個
awk
:$ awk 'FNR==NR {a[$i]; next}; !($1 in a)' f2 f1 r002:21:10 21 YAAAATTTGC * = nM:21 r0012:21:10 21 LLAAAATTTGC * + XM:21
解釋
FNR==NR {a[$i]; next}
它讀取第一個文件並將內容保存到a
數組中。!($1 in a)
在讀取第二個文件時,它會檢查第一個欄位是否在a
數組中。如果不是,則列印該行。
你也可以做
$ grep -wvFf file2 file1 r002:21:10 21 YAAAATTTGC * = nM:21 r0012:21:10 21 LLAAAATTTGC * + XM:21
來自
man grep
:-F, --fixed-strings Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. -f FILE, --file=FILE Obtain patterns from FILE, one per line. -v, --invert-match Invert the sense of matching, to select non-matching lines. -w, --word-regexp Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
注意:但是,這將搜尋 的每一行的全部內容
file1
,而不僅僅是第一列。