Text-Processing

模式匹配並刪除整行

  • November 25, 2018

如果 File1 的 Column1 與 Column 1 File2 完全匹配,我想刪除文件 1 的所有行。

文件 1:

r001:21:10    21    AAAAAATTTGC    *     =    XM:21
r002:21:10    21    YAAAATTTGC     *     =    nM:21
r001:21:10    21    TTAAAATTTGC    *     =    XM:21
r0012:21:10   21    LLAAAATTTGC    *     +    XM:21
r001:21:10    21    AAAAAATTTGC    *     =    GM:21

文件2:

r001:21:10
r001:21:20
r002:41:36
r002:41:99
r002:41:87
r0012:21:1

預期輸出:

r002:21:10    21    YAAAATTTGC     *     =    nM:21
r0012:21:10   21    LLAAAATTTGC    *     +    XM:21

你可以使用這個awk

$ awk 'FNR==NR {a[$i]; next}; !($1 in a)' f2 f1
r002:21:10    21    YAAAATTTGC     *     =    nM:21
r0012:21:10   21    LLAAAATTTGC    *     +    XM:21

解釋

  • FNR==NR {a[$i]; next}它讀取第一個文件並將內容保存到a數組中。
  • !($1 in a)在讀取第二個文件時,它會檢查第一個欄位是否在a數組中。如果不是,則列印該行。

你也可以做

$ grep -wvFf file2 file1
r002:21:10    21    YAAAATTTGC     *     =    nM:21
r0012:21:10   21    LLAAAATTTGC    *     +    XM:21

來自man grep

  -F, --fixed-strings
         Interpret PATTERN as a  list  of  fixed  strings,  separated  by
         newlines,  any  of  which is to be matched. 
  -f FILE, --file=FILE
         Obtain  patterns  from  FILE,  one  per  line.  
  -v, --invert-match
         Invert the sense of matching, to select non-matching lines. 
  -w, --word-regexp
         Select  only  those  lines  containing  matches  that form whole
         words.  The test is that the matching substring must  either  be
         at  the  beginning  of  the  line,  or  preceded  by  a non-word
         constituent character.

注意:但是,這將搜尋 的每一行的全部內容file1,而不僅僅是第一列。

引用自:https://unix.stackexchange.com/questions/117456