Shell-Script

如何將 Stdout 解析為 CSV 和 JSON 的混合體?

  • January 13, 2020

我目前正在研究一個課程,讓我們將程式碼送出給自動評分器,然後返回我們的結果。它返回的格式很難在視覺上解析,所以我想編寫一個可以在管道中使用的腳本,以使其更易於閱讀。

這是自動評分器的輸出:

Problem,Correct?,Correct Answer,Agent's Answer
"Challenge Problem B-04",0,4,-1
"Basic Problem B-12",0,1,-1
"Challenge Problem B-05",0,6,-1
"Challenge Problem B-07",0,6,-1
"Challenge Problem B-06",0,3,-1
"Basic Problem B-11",0,1,-1
"Basic Problem B-10",0,3,-1
"Challenge Problem B-03",0,3,-1
"Challenge Problem B-02",0,1,-1
"Challenge Problem B-01",0,6,-1
"Challenge Problem B-09",0,4,-1
"Challenge Problem B-08",0,4,-1
"Basic Problem B-08",0,6,-1
"Basic Problem B-09",0,5,-1
"Basic Problem B-04",0,3,-1
"Basic Problem B-05",0,4,-1
"Basic Problem B-06",0,5,-1
"Basic Problem B-07",0,6,-1
"Basic Problem B-01",0,2,-1
"Basic Problem B-02",0,5,-1
"Basic Problem B-03",0,1,-1
"Challenge Problem B-10",0,4,-1
"Challenge Problem B-11",0,5,-1
"Challenge Problem B-12",0,1,-1
{
   "Basic Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Basic Problems B"
   },
   "Challenge Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Challenge Problems B"
   }
}

它是逗號分隔值和 JSON 的混合。把這一切放在一張我可以閱讀的漂亮桌子上會很好。

目前,我有類似的東西

python submit.py --provider gt --assignment error-check | column -t -s, | less -S

哪個輸出:

{
   "Basic Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Basic Problems B"
   },
   "Challenge Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Challenge Problems B"
   }
}
Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1

這讓我大部分時間都在那裡。現在我想知道是否有一種方法可以處理 JSON?

我不能依賴在某個行號處拆分輸出,但我想我可以在輸出第一次找到{.

我想盡可能少地這樣做,以便與同學分享。所以依賴越少越好。

我看過其他建議使用外部程式碼的 JSON 解析文章。

理想的輸出如下所示:

Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1

Set                   Incorrect Skipped Correct
Basic Problems B      0         12      0
Challenge Problems B  0         12      0

將 JSON 與其他 JSON 分開非常容易。這只會給你非 JSON:

python submit.py --provider gt --assignment error-check | sed '/{/,$d' 

而這個,只有 JSON:

python submit.py --provider gt --assignment error-check | sed -n '/{/,$p' 

為了說明,我已將您的範例輸入保存為file和:

$ sed '/{/,$d' file
Problem,Correct?,Correct Answer,Agent's Answer
"Challenge Problem B-04",0,4,-1
"Basic Problem B-12",0,1,-1
"Challenge Problem B-05",0,6,-1
"Challenge Problem B-07",0,6,-1
"Challenge Problem B-06",0,3,-1
"Basic Problem B-11",0,1,-1
"Basic Problem B-10",0,3,-1
"Challenge Problem B-03",0,3,-1
"Challenge Problem B-02",0,1,-1
"Challenge Problem B-01",0,6,-1
"Challenge Problem B-09",0,4,-1
"Challenge Problem B-08",0,4,-1
"Basic Problem B-08",0,6,-1
"Basic Problem B-09",0,5,-1
"Basic Problem B-04",0,3,-1
"Basic Problem B-05",0,4,-1
"Basic Problem B-06",0,5,-1
"Basic Problem B-07",0,6,-1
"Basic Problem B-01",0,2,-1
"Basic Problem B-02",0,5,-1
"Basic Problem B-03",0,1,-1
"Challenge Problem B-10",0,4,-1
"Challenge Problem B-11",0,5,-1
"Challenge Problem B-12",0,1,-1

$ sed -n '/{/,$p' file
{
   "Basic Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Basic Problems B"
   },
   "Challenge Problems B": {
       "Incorrect": "0",
       "Skipped": "12",
       "Correct": "0",
       "Set": "Challenge Problems B"
   }
}

現在,您已經很好地處理了非 JSON,所以我不會改變它。理想情況下,應該使用 JSON 解析器來解析 JSON 數據,例如jq. 可悲的是,我沒有足夠的知識jq來正確地做到這一點,所以我能想出的最好的就是這個,相當不優雅的解決方案。至少它會做你想做的事(cat file用你的python submit.py --provider gt --assignment error-check命令替換:

$ cat file | sed -n 's/[,"]//g; s/^ *//; /{/,$p'  | tac | awk -F': ' 'BEGIN{printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct"} NF==2 && !/\{/{if($1=="Set"){set=$2;data[set]["Incorrect"] = 0;data[set]["Skipped"] = 0;data[set]["Correct"] = 0;} data[set][$1]=$2}END{for(set in data){printf "%-30s%-10s%-10s%-10s\n", set,data[set]["Incorrect"],data[set]["Skipped"],data[set]["Correct"]}}' 
Set                           Incorrect Skipped   Correct   
Challenge Problems B          0         12        0         
Basic Problems B              0         12        0      

將所有這些放在一個 shell 腳本中給出:

#!/bin/bash

tmpFile=$(mktemp)
python submit.py --provider gt --assignment error-check > "$tmpFile";

sed '/{/,$d' "$tmpFile" | column -t -s, 
sed -n 's/[,"]//g; s/^ *//; /{/,$p' "$tmpFile" |
 tac |
 awk -F': ' '
   BEGIN{
     printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct"
   }
   NF==2 && !/\{/{
     if($1=="Set"){
        set=$2;
        data[set]["Incorrect"] = 0;
        data[set]["Skipped"] = 0;
        data[set]["Correct"] = 0;
     } 
     data[set][$1]=$2
   }
   END{
      for(set in data){
        printf "%-30s%-10s%-10s%-10s\n", set, 
                                    data[set]["Incorrect"], 
                                    data[set]["Skipped"], 
                                    data[set]["Correct"]}
   }' 
rm "$tmpFile"

產生以下輸出:

$ foo.sh
Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1
Set                           Incorrect Skipped   Correct   
Challenge Problems B          0         12        0         
Basic Problems B              0         12        0         

雖然感覺很老套,但我希望有人能用專用的 JSON 解析器想出一個更乾淨的解決方案。


Steeldriver很好地jq在評論中給出了一個合適的解決方案,所以如果我們把它結合起來,我們會變得更簡單(也更安全):

#!/bin/bash

tmpFile=$(mktemp)
python submit.py --provider gt --assignment error-check > "$tmpFile";

sed '/{/,$d' "$tmpFile" | column -t -s, 
sed -n '/{/,$p' "$tmpFile" | 
 jq -r '["Set","Incorrect","Skipped","Correct"], (.[] | [.Set,.Incorrect,.Skipped,.Correct]) | @tsv'
rm "$tmpFile"

引用自:https://unix.stackexchange.com/questions/561467