如何將具有多個子行項（從列）的 CSV 轉換為 Awk 中的新列？

April 30, 2021

我有一個 CSV 文件，其一般格式如下圖所示。
在該 CSV 中有多個行屬於某個列 ( desc)，我想提取這些項目並將它們分別添加到名為的新列name, size, weight, glass中。我已經突出顯示（紅色）條目的那些子行項目。
原始結構：
預期結構：
原始 CSV：
page,item,id,desc,price
1,2,F1,Alpha,111.11
1,,,380 x 2280 mm (size),
1,,,Weight: 33.0 kg,
1,,,Glass: AAA,
1,,,XXX,
1,3,F2,Beta,222.22
1,,,880 x 2280 mm (size),
1,,,Weight: 68.4 kg,
1,,,Glass: BBB,
1,,,YYY,
1,4,F3,Gamma,333.33
1,,,1980 x 1580 mm (size),
1,,,Weight: 78.2 kg,
1,,,Glass: CCC,
1,,,XXX,
1,,,YYY,
1,,,ZZZ,
預期生成的 CSV：
page,item,id,name,size,weight,glass,price
1,2,F1,Alpha,380 x 2280,33.0,AAA,111.11
1,3,F2,Beta,880 x 2280,68.4,BBB,222.22
1,4,F3,Gamma,1980 x 1580,78.2,CCC,333.33
其中name將取代desc中的第一行。
更新：
在某些情況下，某些 awk 解決方案可能適用於上述情況，但在添加第 4 項時會失敗。要進行全面測試，請考慮將其添加到上述內容中：
1,7,F4,Delta,111.11
1,,,11 x 22 mm (size),
1,,,Weight: 33.0 kg,
1,,,Glass: DDD,
1,,,Random-1,
所以重要的3點：
列中的子行數desc可能會有所不同。
之後的任何子行都Glass:...應該被忽略。
列中可能有沒有任何子行的項目，它們也應該被忽略。desc
問：如何使用Awk將這些子行重新映射到新列？
（或者是否有更合適的工具在 bash 中執行此操作？）
可能相關（但不是很有幫助）問題：
如何使用awk連接具有多行的列
如何在 CSV 文件中將單列拆分為多列
如何將行轉換為列
連接同一 csv 文件的列以創建具有新標題的新列

awk 'BEGIN{ FS=OFS=","; print "page,item,id,name,size,weight,glass,price" }
   $2!=""{ price=$5; data=$1 FS $2 FS $3 FS $4; desc=""; c=0; next }
         { gsub(/ ?(mm \(size\)|Weight:|kg|Glass:) ?/, "") }
   ++c&lt;=3{ desc=(desc==""?"":desc OFS) $4; next }
   data  { print data, desc, price; data="" }
' infile

包括解釋：

awk 'BEGIN{ FS=OFS=","; print "page,item,id,name,size,weight,glass,price" }
    #this block will be executed only once before reading any line, and does: 
           #set FS (Field Separator), OFS (Output Field Separator) to a comma character
           #print the "header" line  ....

   $2!=""{ price=$5; data=$1 FS $2 FS $3 FS $4; desc=""; c=0; next }
   #this blocks will be executed only when column#2 value was not empty, and does:
           #backup column#5 into "price" variable
           #also backup columns#1~4 into "data" variable
           #reset the "desc" variable and also counter variable "c"
           #then read next line and skip processing the rest of the code 

         { gsub(/ ?(mm \(size\)|Weight:|kg|Glass:) ?/, "") }
           #this block runs for every line and replace strings above with empty string

   ++c&lt;=3{ desc=(desc==""?"":desc OFS) $4; next }
   #this block runs at most 3reps and 
          #joining the descriptions in column#4 of every line
          #and read the next line until counter var "c" has value &lt;=3 

    data { print data, desc, price; data="" }
    #if "data" variable has containing any data, then
          #print the data, desc, price and empty "data" variable 
' infile

引用自：https://unix.stackexchange.com/questions/647233

如何將具有多個子行項（從列）的 CSV 轉換為 Awk 中的新列？

相關問答

通過 awk 合併具有 N 個公共列的多個文件，如果任何文件沒有公共鍵，則希望將列值替換為 0

在新列中添加和減去多列

如何對每一行進行計算

比較 csv 文件的兩列並僅顯示不匹配的記錄

XML 數據到 CSV

附加到第二列