Linux

刪除 CSV 中以逗號分隔並用雙引號封裝的區間雙引號

  • April 3, 2020

也許我不走運,因為我的雙引號逗號分隔的 CSV 文件在有用的文本中有雙引號和逗號。

所以我想轉這個:

"record 1","name 1","text 1, text 2"
"record 2","name ""2""","text 2"
"record 3","name 3",""

對此:

"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

請注意,我刪除了name ""2""to的雙引號name 2,但保留了第 3 行的雙引號:,""

用於將csvformat分隔符轉換為製表符 ( csvformat -T),刪除所有雙引號 ( tr -d '"'),然後在引用每個欄位(管道的最後一位)時將分隔符返回為逗號:

$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

csvformat是的一部分csvkit

無論您的輸入中有哪些字元,這都會起作用(引用欄位中的換行符除外,但這是另一個問題)。

使用 GNU awk 進行 FPAT:

$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{
   for ( i=1; i<=NF; i++ ) {
       gsub(/"/,"",$i)
   }
   print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

或任何 awk 的等價物:

$ awk -v OFS='","' '{
   orig=$0; $0=""; i=0;
   while ( match(orig,/("[^"]*")+/) ) {
       $(++i) = substr(orig,RSTART,RLENGTH)
       gsub(/"/,"",$i)
       orig = substr(orig,RSTART+RLENGTH)
   }
   print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

另請參閱whats-the-most-robust-way-to-efficiently-parse-csv-using-awk

引用自:https://unix.stackexchange.com/questions/577581