Text-Processing
僅使用 sed 或 perl 使用不正確的換行符修復格式錯誤的 CSV
我有一個逗號分隔的 CSV 文件,但由於某種原因,我們的系統在文件中的隨機位置插入了一個換行符,這導致整個文件中斷。我可以得到文件中的列數。
如何使用
sed
和/或perl
在單行命令中解決它?我知道這是可以解決的,awk
但這是出於學習目的。如果使用perl
,我不想使用內置的 CSV 函式。可以解決嗎??我解決這個問題好幾天了,我似乎找不到解決方案:(樣本格式錯誤的輸入(大量隨機插入 \n)
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity 119736,FL,CLAY COUNTY,-81.711777,“Residential Lot”,1 448094,FL,CLAY COUNTY,-81.707664,“Residen tial Lot”,3 206893,FL,CLAY COUNTY,-81.7 00455,“Residen tial Lot”,1 333743,FL,CLAY COUNTY,-81.707703,“Residential Lot”, 3 172534,FL,CLAY COUNTY,-81.702675,“Residential Lot”,1 785275,FL,CLAY COUNTY,-81.707703,“Residential Lot”,3 995932,FL,CLAY COUNTY,-81.713882, “Residential Lot”,1 223488,FL,CLAY COUNTY,-81.707146,“Residential Lot”,1 4335 12,FL,CLAY COUNTY,-81.704613, “Residential Lot”,1
所需輸出
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity 119736,FL,CLAY COUNTY,-81.711777,“Residential Lot”,1 448094,FL,CLAY COUNTY,-81.707664,“Residential Lot”,3 206893,FL,CLAY COUNTY,-81.700455,“Residential Lot”,1 333743,FL,CLAY COUNTY,-81.707703,“Residential Lot”,3 172534,FL,CLAY COUNTY,-81.702675,“Residential Lot”,1 785275,FL,CLAY COUNTY,-81.707703,“Residential Lot”,3 995932,FL,CLAY COUNTY,-81.713882,“Residential Lot”,1 223488,FL,CLAY COUNTY,-81.707146,“Residential Lot”,1 433512,FL,CLAY COUNTY,-81.704613,“Residential Lot”,1
$ awk -F, '{ while (NF < 6 || $NF == "") { brokenline=$0; getline; $0 = brokenline $0}; print }' file.csv policyID,statecode,county,Point longitude,Some Thing Here,point_granularity 119736,FL,CLAY COUNTY,-81.711777,“Residential Lot”,1 448094,FL,CLAY COUNTY,-81.707664,“Residential Lot”,3 206893,FL,CLAY COUNTY,-81.700455,“Residential Lot”,1 333743,FL,CLAY COUNTY,-81.707703,“Residential Lot”,3 172534,FL,CLAY COUNTY,-81.702675,“Residential Lot”,1 785275,FL,CLAY COUNTY,-81.707703,“Residential Lot”,3 995932,FL,CLAY COUNTY,-81.713882,“Residential Lot”,1 223488,FL,CLAY COUNTY,-81.707146,“Residential Lot”,1 433512,FL,CLAY COUNTY,-81.704613,“Residential Lot”,1
awk
只要目前行中的欄位少於六個,或者最後一個欄位為空(在最後一個欄位分隔符之後有一行被斷開),程式碼就會將下一行輸入附加到目前行。Perl 類似工作:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) { $_ .= readline; chomp } print "$_\n"' file.csv