Grep
pcregrep 查找帶有周圍空白的行
我有一些以降價開頭的標題
#
,並且我有以下兩個規則:
- 標題(
#
) 上面應該正好有兩條換行符,下面有一條- 字幕(
##
等###
)應該在上面和下面正好有一個空行。- 標題應優先於字幕。(如果有兩個衝突的規則,使用標題格式並忽略字幕)。
**注意:**我正在嘗試查找所有不符合這三個限制的標題。
下面是一些好標題和壞標題的例子
some text # Title | BAD ## Subtitle | Good (Has two spaces below, is needed for next main title) # Title | Good ## Subtitle | Bad text # Title | Bad text
在擺弄正則表達式之後,我想出了這些表達式:
主要標題:正則表達式
((?<=\n{4})|(?<=.\n{2})|(?<=.\n))(# .*)|(# .*)(?=(\n.|\n{3}(?!# )|\n{4}))
字幕:正則表達式
'((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)'
然而,令我非常困惑的是,他們不使用
pcregrep
?這是我嘗試執行的命令pcgrep
(只是為了完整起見):$ pcregrep -rniM --include='.*\.md' \ '((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)' \ ~/Programming/oppgaver/src/web
當我嘗試只搜尋一個文件時它也不起作用,而且我還有其他幾個可以正常工作的表達式。
我的regex有什麼問題,還是執行錯誤?
此解決方案修復了所有不正確的標題。
sed -r ' :loop; N; $!b loop s/\n+(#[^\n]+)/\n\n\1/g s/(#[^\n]+)\n+/\1\n\n/g s/\n+(#[^\n#]+)/\n\n\n\1/g ' input.txt;
附評論:
sed -r ' ### put all file into the pattern space, # in other words, merge all lines into one line :loop; N; $!b loop; ### first traversal of the pattern space # searches the line with "#" sign (all cases matches - Titles, SubTitles, etc), # takes all its upper empty lines # and converts them to the one empty line s/\n+(#[^\n]+)/\n\n\1/g; ### second traversal of the pattern space # again, searches the line with "#" sign, take all its bottom empty lines # and converts them to the one empty line s/(#[^\n]+)\n+/\1\n\n/g; ### third traversal of the pattern space # searches the single "#" sign (Titles only), # takes all its upper newlines (at this moment only two of them are there, # because of previous substitutions) # and converts them to three newlines s/\n+(#[^\n#]+)/\n\n\n\1/g ' input.txt
輸入
text # Title ## SubTitle ### SubSubTitle # Title ## SubTitle text ### SubSubTitle # Title # Title # Title ## SubTitle ### SubSubTitle
輸出
text # Title ## SubTitle ### SubSubTitle # Title ## SubTitle text ### SubSubTitle # Title # Title # Title ## SubTitle ### SubSubTitle