根據文件的第 5 列值過濾 .CSV 文件並將這些記錄列印到新文件中

March 9, 2019

我有一個格式如下的 .CSV 文件：
"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""
文件的第 5 列有不同的字元串。我需要根據第 5 列值過濾掉文件。可以說，我需要目前文件中的一個新文件，該文件的第五個欄位中只有值“字元串 1”的記錄。
為此，我嘗試了以下命令，
awk -F"," ' { if toupper($5) == "STRING 1") PRINT }' file1.csv > file2.csv
但它給我一個錯誤如下：
awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error
awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error
然後我使用了以下內容，這給了我一個奇怪的輸出。
awk -F"," '$5="string 1" {print}' file1.csv > file2.csv
輸出：
"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a comma" string 1 without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string with string 1 commas" "string with or without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string with commas string 1 "string with or without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string with commas string 1 "string with or without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""
PS：為了安全起見，我使用了 toupper 命令，因為我不確定字元串是小寫還是大寫。我需要知道我的程式碼有什麼問題，以及在使用 AWK 搜尋模式時字元串中的空格是否重要。

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv &gt; file2.csv

輸出

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

我想這就是你想要的。

CSV 的問題在於沒有標準。如果您需要經常處理 CSV 格式的數據，您可能需要研究一種更強大的方法，而不僅僅是","用作欄位分隔符。在這種情況下，Perl 的Text::CSVCPAN 模組非常適合這項工作：
$ perl -mText::CSV_XS -WlanE '
   BEGIN {our $csv = Text::CSV_XS-&gt;new;} 
   $csv-&gt;parse($_); 
   my @fields = $csv-&gt;fields(); 
   print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

引用自：https://unix.stackexchange.com/questions/97070

根據文件的第 5 列值過濾 .CSV 文件並將這些記錄列印到新文件中

相關問答

用逗號替換下劃線並刪除 CSV 中的雙引號

如何刪除csv中的雙引號

修剪 CSV 文件中的路徑名

CSV 數據集到文本數據集

CSV - 在缺少的欄位周圍添加引號

如何將csv行中的每個字元串分隔為行