Bash
計算 CSV 中重複出現的次數?
我有一個 CSV,其中的列如下所示:
Team Other Data More Data Result Time Knicks A F Loss 2p Celtics B E Win 2p Lakers C D Loss 3p Lakers D C Loss 4p Knicks E B Win 4p Lakers F A Win 5p
如何讀取 CSV 並輸出每支球隊的勝負?
例如,我想要的輸出如下:
1 Loss Knicks 1 Win Knicks 1 Win Celtics 2 Loss Lakers 1 Win Lakers
現在,我有這個程式碼:
#!/bin/bash while IFS=, read -r team result do echo $team, $result done < teams.csv
產生以下輸出:
Team, Result Knicks, Loss Celtics, Win Lakers, Loss Lakers, Loss Knicks, Win
如何計算和儲存每個團隊的每個結果的出現次數?理想情況下,我希望這些數據按團隊排序。
在中使用數組
awk
如果輸入文件的欄位由一個或多個空格字元分隔,則不必聲明欄位分隔符:
awk 'NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.txt
相同的程式碼,為螢幕格式化:
awk 'NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.txt
在這裡,計算聯盟(輸入文件)中每支球隊( ,第一個欄位)的勝負
league[$1][$4]++
數( ,第四個欄位)。$4``$1
NR>1
意味著awk
將忽略標題(第一行)。類似地,
NF
(簡寫NF>0
)意味著awk
只檢查包含至少一個欄位的行。換句話說,NF
跳過空行。該
NR>1 && NF
部分檢查輸入文件並創建數組。完成後,該END
部分將列印數組。如果輸入文件的欄位以逗號分隔,則添加
BEGIN { FS="," ; OFS=" " }
以設置輸入(FS
)和輸出(OFS
)欄位分隔符:awk 'BEGIN { FS="," ; OFS=" " } NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.csv
相同的程式碼,為螢幕格式化:
awk 'BEGIN { FS="," ; OFS=" " } NR>1 && NF { league[$1][$4]++ } END { for ( team in league ) for ( results in league[team] ) print league[team][results],results,team }' teams.csv
輸出:
1 Win Knicks 1 Loss Knicks 1 Win Lakers 2 Loss Lakers 1 Win Celtics
添加
| sort -t " " -k 3 -k 2,2
到該程式碼的末尾,按團隊排序,然後按每個團隊的結果排序。排序輸出:
1 Win Celtics 1 Loss Knicks 1 Win Knicks 2 Loss Lakers 1 Win Lakers