Regular-Expression
如何找到忘記標點的釋義
以範例文件為例
this is line one of a paragraph that continues here and finishes with a full stop as it should. Now we have a second paragraph that continues in a new line, but the full stop is missing I simply overlooked it, typing too fast.
如何檢測此類錯誤?我天真的 grep 方法
grep "^.*[a-zA-Z]$^$" file.text
不起作用(為什麼?)。
使用 GNU
awk
:$ awk -v RS='\n\n' '$NF !~ /[[:punct:]]$/' file Now we have a second paragraph that continues in a new line, but the full stop is missing
這會將記錄分隔符設置為兩個換行符的序列。這意味著每個段落都將是一個記錄。如果記錄的最後一個欄位(一個單詞)不以標點符號(其中之一
!"#$%&'()*+,-./:;<=>?@[\]^_
{|}~`)結尾,則列印該段落。如果這樣更合適,
[[:punct:]]
您可以使用更小的字元類來代替。[.!?]
如果要在輸出中包含段落編號和一些裝飾性文本,請使用
$ awk -v RS='\n\n' '$NF !~ /[[:punct:]]$/ { printf("ERROR (%d):\n%s\n", FNR, $0) }' file ERROR (2): Now we have a second paragraph that continues in a new line, but the full stop is missing
您
grep
不起作用,因為grep
預設情況下一次讀取單行。因此,您不能期望$
在行錨結束後匹配任何內容。