Linux

awk + 計算文件中的字元串

  • January 12, 2019

我們有這樣的大文件

這是文件中的部分列表

Topic: Ho_HTR_bvt     Partition: 31   Leader: 1007    Replicas: 1007,1008,1009        Isr: 1009,1007,1008
Topic: Ho_HTR_bvt     Partition: 32   Leader: 1008    Replicas: 1008,1009,1010        Isr: 1010,1009,1008
Topic: Ho_HTR_bvt     Partition: 33   Leader: 1009    Replicas: 1009,1010,1006        Isr: 1009,1010,1006
Topic: Ho_HTR_bvt     Partition: 34   Leader: 1010    Replicas: 1010,1006,1007        Isr: 1006,1007,1010
Topic: Ho_HTR_bvt     Partition: 35   Leader: 1006    Replicas: 1006,1008,1009        Isr: 1006,1009,1008
Topic: Ho_HTR_bvt     Partition: 36   Leader: 1007    Replicas: 1007,1009,1010        Isr: 1010,1007,1009
Topic: Ho_HTR_bvt     Partition: 37   Leader: 1008    Replicas: 1008,1010,1006        Isr: 1006,1010,1008
Topic: Ho_HTR_bvt     Partition: 38   Leader: 1009    Replicas: 1009,1006,1007        Isr: 1007,1009,1006
Topic: Ho_HTR_bvt     Partition: 39   Leader: 1010    Replicas: 1010,1007,1008        Isr: 1010,1007,1008
Topic: Ho_HTR_bvt     Partition: 40   Leader: 1006    Replicas: 1006,1009,1010        Isr: 1006,1010,1009
Topic: Ho_HTR_bvt     Partition: 41   Leader: 1007    Replicas: 1007,1010,1006        Isr: 1006,1007,1010
Topic: Ho_HTR_bvt     Partition: 42   Leader: 1008    Replicas: 1008,1006,1007        Isr: 1006,1007,1008
Topic: Ho_HTR_bvt     Partition: 43   Leader: 1009    Replicas: 1009,1007,1008        Isr: 1009,1007,1008
Topic: Ho_HTR_bvt     Partition: 44   Leader: 1010    Replicas: 1010,1008,1009        Isr: 1010,1009,1008

如何計算數字 - 1007字元串?

或文件中的任何其他詞

使用您的範例數據:

$ grep -Fo 1007 file | wc -l
     19

grep管道的一部分將搜尋字元串1007(使用-F標誌是因為我們正在進行字元串比較,而不是正則表達式匹配)。-o由於該標誌,它將在新行上返回字元串的每個單獨實例。返回的行數按 計算wc -l

如果字元串在輸入數據的一行上出現兩次,這將計算兩次。如果字元串作為另一個單詞的子字元串出現,它也會被計算在內。

awk

$ awk -v str="1007" '{ c += gsub(str, str) } END { print c }' file
19

這會計算字元串出現的次數gsub()(此函式返回執行替換的次數,我們將其分別應用於每個輸入行)並在最後列印總計數。我們感興趣的字元串在命令行中以-v str="1007".

引用自:https://unix.stackexchange.com/questions/493662