Awk

將數據格式化為表格

  • June 28, 2021

如何獲取細節並將其轉換為水平形式?

每條記錄在 之後結束Couse。Couse 永遠不會是空白或空值。

注意:這四個標題將用於以下數據:姓名,城市,年齡,Couse

如果您看到第二條記錄,則沒有任何 “Name”: "" -> 失去,因此它應該為 null 代替它,其餘的將在此之後附加一個管道,如下所示: null | Ors | 11 | MB

我在 demo.txt 文件中有如下數據

"Name":"asxadadad  ,aaf dsf"
"City":"Mum"
"Age":"23"
"Couse":"BBS"
"City":"Ors"
"Age":"11"
"Couse":"MB"
"Name":"adad sf"
"City":"Kol"
"Age":"21"
"Couse":"BB"
"Name":"pqr"
"Age":"21"
"Couse":"NN"

預期輸出:

asxadadad  ,aaf dsf | Mum  | 23 | BBS
null                | Ors  | 11 | MB
adad sf             | Kol  | 21 | BB
pqr                 | null | 21 | NN

我嘗試了以下程式碼:但不符合我的邏輯

counter=0
var_0='Couse'

while read -r line

  echo "$line"

  counter=$(( counter + 1 ))

  var_1=`echo "$line" | grep -oh "Couse"`

  if [ $var_0 == $var_1 ]
  then
       head -$counter demo.txt > temp.txt
       sed -i '1,$counter' demo.txt
       counter = 0
  else
       echo "No thing to do"
  fi

done < demo.txt

在每個 Unix 機器上的任何 shell 中使用任何 awk:

$ cat tst.awk
BEGIN {
   numTags = split("Name City Age Couse",nums2tags)
   for (tagNr=1; tagNr<=numTags; tagNr++) {
       tag = nums2tags[tagNr]
       tags2nums[tag] = tagNr
       wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
   }
   OFS=" | "
}
(NR==1) || (prevTag=="Couse") {
   numRecs++
}
{
   gsub(/^"|"$/,"")
   tag = val = $0
   sub(/".*/,"",tag)
   sub(/[^"]+":"/,"",val)

   tagNr = tags2nums[tag]
   vals[numRecs,tagNr] = val

   wid = length(val)
   wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

   prevTag = tag
}
END {
   # Uncomment these 3 lines if youd like a header line printed:
   # for (tagNr=1; tagNr<=numTags; tagNr++) {
   #   printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
   # }

   for (recNr=1; recNr<=numRecs; recNr++) {
       for (tagNr=1; tagNr<=numTags; tagNr++) {
           val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
           printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
       }
   }
}
$ awk -f tst.awk file
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

或者如果您不想使用硬編碼的標籤列表(欄位/列名):

$ cat tst.awk
BEGIN { OFS=" | " }
(NR==1) || (prevTag=="Couse") {
   numRecs++
}
{
   gsub(/^"|"$/,"")
   tag = val = $0
   sub(/".*/,"",tag)
   sub(/[^"]+":"/,"",val)

   if ( !(tag in tags2nums) ) {
       tagNr = ++numTags
       tags2nums[tag] = tagNr
       nums2tags[tagNr] = tag
       wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
   }

   tagNr = tags2nums[tag]
   vals[numRecs,tagNr] = val

   wid = length(val)
   wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

   prevTag = tag
}
END {
   for (tagNr=1; tagNr<=numTags; tagNr++) {
       printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
   }

   for (recNr=1; recNr<=numRecs; recNr++) {
       for (tagNr=1; tagNr<=numTags; tagNr++) {
           val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
           printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
       }
   }
}
$ awk -f tst.awk file
Name                | City | Age | Couse
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

請注意,第二個腳本的輸出中列的順序將是這些標籤出現在輸入中的順序,這就是為什麼它們需要標題行來標識值的原因,除非所有標籤都保證按照您的順序出現在輸入中希望他們輸出。

引用自:https://unix.stackexchange.com/questions/649799