Bash

計算 CSV 中重複出現的次數?

  • July 16, 2018

我有一個 CSV,其中的列如下所示:

Team    Other Data  More Data   Result  Time
Knicks      A          F         Loss    2p
Celtics     B          E         Win     2p
Lakers      C          D         Loss    3p
Lakers      D          C         Loss    4p
Knicks      E          B         Win     4p
Lakers      F          A         Win     5p

如何讀取 CSV 並輸出每支球隊的勝負?

例如,我想要的輸出如下:

1 Loss Knicks
1 Win Knicks
1 Win Celtics
2 Loss Lakers
1 Win Lakers

現在,我有這個程式碼:

#!/bin/bash
while IFS=, read -r team result
do
 echo $team, $result
done < teams.csv

產生以下輸出:

Team, Result   
Knicks, Loss
Celtics, Win
Lakers, Loss
Lakers, Loss
Knicks, Win

如何計算和儲存每個團隊的每個結果的出現次數?理想情況下,我希望這些數據按團隊排序。

在中使用數組awk

如果輸入文件的欄位由一個或多個空格字元分隔,則不必聲明欄位分隔符:

awk 'NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.txt

相同的程式碼,為螢幕格式化:

awk 'NR>1 && NF { league[$1][$4]++ }
    END { for ( team in league )
          for ( results in league[team] )
          print league[team][results],results,team }' teams.txt

在這裡,計算聯盟(輸入文件)中每支球隊( ,第一個欄位)的勝負league[$1][$4]++數( ,第四個欄位)。$4``$1

NR>1意味著awk將忽略標題(第一行)。

類似地,NF(簡寫NF>0)意味著awk只檢查包含至少一個欄位的行。換句話說,NF跳過空行。

NR>1 && NF部分檢查輸入文件並創建數組。完成後,該END部分將列印數組。

如果輸入文件的欄位以逗號分隔,則添加BEGIN { FS="," ; OFS=" " }以設置輸入(FS)和輸出(OFS)欄位分隔符:

awk 'BEGIN { FS="," ; OFS=" " } NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.csv

相同的程式碼,為螢幕格式化:

awk 'BEGIN { FS="," ; OFS=" " }
        NR>1 && NF { league[$1][$4]++ }
        END { for ( team in league )
              for ( results in league[team] )
              print league[team][results],results,team }' teams.csv

輸出:

1 Win Knicks
1 Loss Knicks
1 Win Lakers
2 Loss Lakers
1 Win Celtics

添加| sort -t " " -k 3 -k 2,2到該程式碼的末尾,按團隊排序,然後按每個團隊的結果排序。

排序輸出:

1 Win Celtics
1 Loss Knicks
1 Win Knicks
2 Loss Lakers
1 Win Lakers

引用自:https://unix.stackexchange.com/questions/456477