Text-Processing

用 sed/grep/whatever 擦除 2 行模式

  • September 12, 2016

我有一個巨大的 cvs 日誌文件,從無用的資訊中清除,讀取類似

Working file: unmodifiedfile1.c
================
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: unmodifiedfile2.h
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h

我想清理與未修改文件相關的行:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

要匹配的模式是

Working file: FILENAME
================

到目前為止,我能夠做的是以下幾點:

sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/ 

我敢肯定,但是有一個更清潔的解決方案,我的 sed 無知排除了我……(另外,如果有必要,獎金將能夠刪除最新的行)

附言

以以下結尾的輸出:

================
Working file: unmodifiedfile3.h

也可以接受

sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/ 

確實可以縮短為:

$ sed '/Working file:/{N;/===/d}' changelog.txt 
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h
  • 刪除所有包含Working file:和後續行的行(如果包含===)以及最後一行(如果包含)Working file:

感謝@ilkkachu 的建議。如果模式需要在行首匹配,請使用^Working file:

$ cat ip.txt 
Working file: 123
================
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: abc
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================
Working file: xyz

$ sed '/Working file:/{N;/===/d}' ip.txt | sed '${/Working file:/d}' 
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================

這應該接近您所追求的:

<cvslog sed -n '/Working file/ { N; /\n=\+$/b; :a; N; /\n=\+$/!ba; p; }'

輸出:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

解釋

sed這是帶有註釋的相同腳本:

/Working file/ {
 N                 # append next line to pattern space
 /\n=\+$/b         # is it a file separator -> next file
 :a
 N                 # append next line to pattern space
 /\n=\+$/!ba       # isn't it a file separator -> read next line
 p                 # otherwise print accumulated text
}

awk

如果您告訴awk使用文件分隔線作為記錄分隔符 ( RS),定義一個合理的選擇標準變得相當簡單:

<cvslog awk 'NF>2' RS='\n=+\n' FS='\n' ORS='\n\n'

輸出:

Working file: modifiedfile1.h  
----------------------------------
revision 1.3
Fixed some bug

Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature

bash 和 coreutils

只是為了好玩:

csplit cvslog '/=\{16\}/1' '{*}'
wc -l xx* | 
head -n-1 | 
while read n f; do 
 if (( n > 2 )); then 
   cat $f
 fi
done

輸出:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

引用自:https://unix.stackexchange.com/questions/308656