Awk

如何拆分 CSV 中的欄位並將行中的欄位複製到新行

  • May 31, 2019

我有一個使用 CSV 文件的目的地,第 6 個欄位包含單詞,但最大字元長度為 16。如果欄位長度超過 16 個字元,我想複製該行並將其拆分而不破壞單詞。

目前文件

"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK LMNOP Q RS TUV W XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"

期望的輸出

"5","4","3","2","1","XYZ ABCD E"
"1","2","3","4","5","AB CDE F GHI JK"
"1","2","3","4","5","LMNOP Q RS TUV W"
"1","2","3","4","5","XYZ 12 3456 7890"
"9","8","7","6","5","LMN O PQ R"

使用 GNU Awk ( gawk) 執行foldGetline /Variable/Coprocess

gawk -F, '
 BEGIN{
   OFS=FS; 
   cmd="fold -sw 16";
 }

 # if total length (16 + 2 for quotes) is within limit, print as-is
 length($NF) <= 18 {print; next}

 # else
 {
   # trim the quotes, then fold
   print substr($NF,2,length($NF)-2) |& cmd; 
   close(cmd,"to"); 
   NF--; 
   while((cmd |& getline var) > 0){

     # (optional) trim trailing whitespace
     sub(/[ \t]+$/,"",var);

     print $0, "\"" var "\"" ;
   }
   close(cmd,"from");
 }
' file.csv

從操作中sub刪除尾隨空格fold

請注意,要獲得顯示的精確輸出,需要使用fold -sw1716 個字元加上(隨後刪除的)尾隨空格來中斷。但是,這樣做可能會在折疊輸出的最後一行中超過 16 個字元的限制。

引用自:https://unix.stackexchange.com/questions/521978