Regular-Expression

如何找到忘記標點的釋義

  • May 2, 2019

以範例文件為例

this is line one of a paragraph
that continues here and finishes
with a full stop as it should.

Now we have a second paragraph
that continues in a new line, 
but the full stop is missing

I simply overlooked it, typing too fast.

如何檢測此類錯誤?我天真的 grep 方法

grep "^.*[a-zA-Z]$^$"  file.text

不起作用(為什麼?)。

使用 GNU awk

$ awk -v RS='\n\n' '$NF !~ /[[:punct:]]$/' file
Now we have a second paragraph
that continues in a new line,
but the full stop is missing

這會將記錄分隔符設置為兩個換行符的序列。這意味著每個段落都將是一個記錄。如果記錄的最後一個欄位(一個單詞)不以標點符號(其中之一!"#$%&'()*+,-./:;<=>?@[\]^_{|}~`)結尾,則列印該段落。

如果這樣更合適,[[:punct:]]您可以使用更小的字元類來代替。[.!?]

如果要在輸出中包含段落編號和一些裝飾性文本,請使用

$ awk -v RS='\n\n' '$NF !~ /[[:punct:]]$/ { printf("ERROR (%d):\n%s\n", FNR, $0) }' file
ERROR (2):
Now we have a second paragraph
that continues in a new line,
but the full stop is missing

grep不起作用,因為grep預設情況下一次讀取單行。因此,您不能期望$在行錨結束後匹配任何內容。

引用自:https://unix.stackexchange.com/questions/516689