Linux
刪除 CSV 中以逗號分隔並用雙引號封裝的區間雙引號
也許我不走運,因為我的雙引號逗號分隔的 CSV 文件在有用的文本中有雙引號和逗號。
所以我想轉這個:
"record 1","name 1","text 1, text 2" "record 2","name ""2""","text 2" "record 3","name 3",""
對此:
"record 1","name 1","text 1, text 2" "record 2","name 2","text 2" "record 3","name 3",""
請注意,我刪除了
name ""2""
to的雙引號name 2
,但保留了第 3 行的雙引號:,""
用於將
csvformat
分隔符轉換為製表符 (csvformat -T
),刪除所有雙引號 (tr -d '"'
),然後在引用每個欄位(管道的最後一位)時將分隔符返回為逗號:$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1 "record 1","name 1","text 1, text 2" "record 2","name 2","text 2" "record 3","name 3",""
csvformat
是的一部分csvkit
。
無論您的輸入中有哪些字元,這都會起作用(引用欄位中的換行符除外,但這是另一個問題)。
使用 GNU awk 進行 FPAT:
$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{ for ( i=1; i<=NF; i++ ) { gsub(/"/,"",$i) } print "\"" $0 "\"" }' file "record 1","name 1","text 1, text 2" "record 2","name 2","text 2" "record 3","name 3",""
或任何 awk 的等價物:
$ awk -v OFS='","' '{ orig=$0; $0=""; i=0; while ( match(orig,/("[^"]*")+/) ) { $(++i) = substr(orig,RSTART,RLENGTH) gsub(/"/,"",$i) orig = substr(orig,RSTART+RLENGTH) } print "\"" $0 "\"" }' file "record 1","name 1","text 1, text 2" "record 2","name 2","text 2" "record 3","name 3",""
另請參閱whats-the-most-robust-way-to-efficiently-parse-csv-using-awk。