Shell-Script
如何將 Stdout 解析為 CSV 和 JSON 的混合體?
我目前正在研究一個課程,讓我們將程式碼送出給自動評分器,然後返回我們的結果。它返回的格式很難在視覺上解析,所以我想編寫一個可以在管道中使用的腳本,以使其更易於閱讀。
這是自動評分器的輸出:
Problem,Correct?,Correct Answer,Agent's Answer "Challenge Problem B-04",0,4,-1 "Basic Problem B-12",0,1,-1 "Challenge Problem B-05",0,6,-1 "Challenge Problem B-07",0,6,-1 "Challenge Problem B-06",0,3,-1 "Basic Problem B-11",0,1,-1 "Basic Problem B-10",0,3,-1 "Challenge Problem B-03",0,3,-1 "Challenge Problem B-02",0,1,-1 "Challenge Problem B-01",0,6,-1 "Challenge Problem B-09",0,4,-1 "Challenge Problem B-08",0,4,-1 "Basic Problem B-08",0,6,-1 "Basic Problem B-09",0,5,-1 "Basic Problem B-04",0,3,-1 "Basic Problem B-05",0,4,-1 "Basic Problem B-06",0,5,-1 "Basic Problem B-07",0,6,-1 "Basic Problem B-01",0,2,-1 "Basic Problem B-02",0,5,-1 "Basic Problem B-03",0,1,-1 "Challenge Problem B-10",0,4,-1 "Challenge Problem B-11",0,5,-1 "Challenge Problem B-12",0,1,-1 { "Basic Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Basic Problems B" }, "Challenge Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Challenge Problems B" } }
它是逗號分隔值和 JSON 的混合。把這一切放在一張我可以閱讀的漂亮桌子上會很好。
目前,我有類似的東西
python submit.py --provider gt --assignment error-check | column -t -s, | less -S
哪個輸出:
{ "Basic Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Basic Problems B" }, "Challenge Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Challenge Problems B" } } Problem Correct? Correct Answer Agent's Answer "Challenge Problem B-04" 0 4 -1 "Basic Problem B-12" 0 1 -1 "Challenge Problem B-05" 0 6 -1 "Challenge Problem B-07" 0 6 -1 "Challenge Problem B-06" 0 3 -1 "Basic Problem B-11" 0 1 -1 "Basic Problem B-10" 0 3 -1 "Challenge Problem B-03" 0 3 -1 "Challenge Problem B-02" 0 1 -1 "Challenge Problem B-01" 0 6 -1 "Challenge Problem B-09" 0 4 -1 "Challenge Problem B-08" 0 4 -1 "Basic Problem B-08" 0 6 -1 "Basic Problem B-09" 0 5 -1 "Basic Problem B-04" 0 3 -1 "Basic Problem B-05" 0 4 -1 "Basic Problem B-06" 0 5 -1 "Basic Problem B-07" 0 6 -1 "Basic Problem B-01" 0 2 -1 "Basic Problem B-02" 0 5 -1 "Basic Problem B-03" 0 1 -1 "Challenge Problem B-10" 0 4 -1 "Challenge Problem B-11" 0 5 -1 "Challenge Problem B-12" 0 1 -1
這讓我大部分時間都在那裡。現在我想知道是否有一種方法可以處理 JSON?
我不能依賴在某個行號處拆分輸出,但我想我可以在輸出第一次找到
{
.我想盡可能少地這樣做,以便與同學分享。所以依賴越少越好。
我看過其他建議使用外部程式碼的 JSON 解析文章。
理想的輸出如下所示:
Problem Correct? Correct Answer Agent's Answer "Challenge Problem B-04" 0 4 -1 "Basic Problem B-12" 0 1 -1 "Challenge Problem B-05" 0 6 -1 "Challenge Problem B-07" 0 6 -1 "Challenge Problem B-06" 0 3 -1 "Basic Problem B-11" 0 1 -1 "Basic Problem B-10" 0 3 -1 "Challenge Problem B-03" 0 3 -1 "Challenge Problem B-02" 0 1 -1 "Challenge Problem B-01" 0 6 -1 "Challenge Problem B-09" 0 4 -1 "Challenge Problem B-08" 0 4 -1 "Basic Problem B-08" 0 6 -1 "Basic Problem B-09" 0 5 -1 "Basic Problem B-04" 0 3 -1 "Basic Problem B-05" 0 4 -1 "Basic Problem B-06" 0 5 -1 "Basic Problem B-07" 0 6 -1 "Basic Problem B-01" 0 2 -1 "Basic Problem B-02" 0 5 -1 "Basic Problem B-03" 0 1 -1 "Challenge Problem B-10" 0 4 -1 "Challenge Problem B-11" 0 5 -1 "Challenge Problem B-12" 0 1 -1 Set Incorrect Skipped Correct Basic Problems B 0 12 0 Challenge Problems B 0 12 0
將 JSON 與其他 JSON 分開非常容易。這只會給你非 JSON:
python submit.py --provider gt --assignment error-check | sed '/{/,$d'
而這個,只有 JSON:
python submit.py --provider gt --assignment error-check | sed -n '/{/,$p'
為了說明,我已將您的範例輸入保存為
file
和:$ sed '/{/,$d' file Problem,Correct?,Correct Answer,Agent's Answer "Challenge Problem B-04",0,4,-1 "Basic Problem B-12",0,1,-1 "Challenge Problem B-05",0,6,-1 "Challenge Problem B-07",0,6,-1 "Challenge Problem B-06",0,3,-1 "Basic Problem B-11",0,1,-1 "Basic Problem B-10",0,3,-1 "Challenge Problem B-03",0,3,-1 "Challenge Problem B-02",0,1,-1 "Challenge Problem B-01",0,6,-1 "Challenge Problem B-09",0,4,-1 "Challenge Problem B-08",0,4,-1 "Basic Problem B-08",0,6,-1 "Basic Problem B-09",0,5,-1 "Basic Problem B-04",0,3,-1 "Basic Problem B-05",0,4,-1 "Basic Problem B-06",0,5,-1 "Basic Problem B-07",0,6,-1 "Basic Problem B-01",0,2,-1 "Basic Problem B-02",0,5,-1 "Basic Problem B-03",0,1,-1 "Challenge Problem B-10",0,4,-1 "Challenge Problem B-11",0,5,-1 "Challenge Problem B-12",0,1,-1
和
$ sed -n '/{/,$p' file { "Basic Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Basic Problems B" }, "Challenge Problems B": { "Incorrect": "0", "Skipped": "12", "Correct": "0", "Set": "Challenge Problems B" } }
現在,您已經很好地處理了非 JSON,所以我不會改變它。理想情況下,應該使用 JSON 解析器來解析 JSON 數據,例如
jq
. 可悲的是,我沒有足夠的知識jq
來正確地做到這一點,所以我能想出的最好的就是這個,相當不優雅的解決方案。至少它會做你想做的事(cat file
用你的python submit.py --provider gt --assignment error-check
命令替換:$ cat file | sed -n 's/[,"]//g; s/^ *//; /{/,$p' | tac | awk -F': ' 'BEGIN{printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct"} NF==2 && !/\{/{if($1=="Set"){set=$2;data[set]["Incorrect"] = 0;data[set]["Skipped"] = 0;data[set]["Correct"] = 0;} data[set][$1]=$2}END{for(set in data){printf "%-30s%-10s%-10s%-10s\n", set,data[set]["Incorrect"],data[set]["Skipped"],data[set]["Correct"]}}' Set Incorrect Skipped Correct Challenge Problems B 0 12 0 Basic Problems B 0 12 0
將所有這些放在一個 shell 腳本中給出:
#!/bin/bash tmpFile=$(mktemp) python submit.py --provider gt --assignment error-check > "$tmpFile"; sed '/{/,$d' "$tmpFile" | column -t -s, sed -n 's/[,"]//g; s/^ *//; /{/,$p' "$tmpFile" | tac | awk -F': ' ' BEGIN{ printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct" } NF==2 && !/\{/{ if($1=="Set"){ set=$2; data[set]["Incorrect"] = 0; data[set]["Skipped"] = 0; data[set]["Correct"] = 0; } data[set][$1]=$2 } END{ for(set in data){ printf "%-30s%-10s%-10s%-10s\n", set, data[set]["Incorrect"], data[set]["Skipped"], data[set]["Correct"]} }' rm "$tmpFile"
產生以下輸出:
$ foo.sh Problem Correct? Correct Answer Agent's Answer "Challenge Problem B-04" 0 4 -1 "Basic Problem B-12" 0 1 -1 "Challenge Problem B-05" 0 6 -1 "Challenge Problem B-07" 0 6 -1 "Challenge Problem B-06" 0 3 -1 "Basic Problem B-11" 0 1 -1 "Basic Problem B-10" 0 3 -1 "Challenge Problem B-03" 0 3 -1 "Challenge Problem B-02" 0 1 -1 "Challenge Problem B-01" 0 6 -1 "Challenge Problem B-09" 0 4 -1 "Challenge Problem B-08" 0 4 -1 "Basic Problem B-08" 0 6 -1 "Basic Problem B-09" 0 5 -1 "Basic Problem B-04" 0 3 -1 "Basic Problem B-05" 0 4 -1 "Basic Problem B-06" 0 5 -1 "Basic Problem B-07" 0 6 -1 "Basic Problem B-01" 0 2 -1 "Basic Problem B-02" 0 5 -1 "Basic Problem B-03" 0 1 -1 "Challenge Problem B-10" 0 4 -1 "Challenge Problem B-11" 0 5 -1 "Challenge Problem B-12" 0 1 -1 Set Incorrect Skipped Correct Challenge Problems B 0 12 0 Basic Problems B 0 12 0
雖然感覺很老套,但我希望有人能用專用的 JSON 解析器想出一個更乾淨的解決方案。
Steeldriver很好地
jq
在評論中給出了一個合適的解決方案,所以如果我們把它結合起來,我們會變得更簡單(也更安全):#!/bin/bash tmpFile=$(mktemp) python submit.py --provider gt --assignment error-check > "$tmpFile"; sed '/{/,$d' "$tmpFile" | column -t -s, sed -n '/{/,$p' "$tmpFile" | jq -r '["Set","Incorrect","Skipped","Correct"], (.[] | [.Set,.Incorrect,.Skipped,.Correct]) | @tsv' rm "$tmpFile"