Bash

忽略引號內的分隔符

  • May 7, 2017

我有一個.csv文件如下:

"ID0054XX","PT. SUMUT","18 JL.BONJOL","SUMATERA UTARA, NORTH","MEDAN","","ID9856","PDSUIDSAXXX","","","","Y"
"ID00037687","PAN INDONESIA, PT.","JALAN JENDERAL, SUDIRMAN, SENAYAN","","INDIA","","ID566543","PINBIDJAXXX","","0601","","Y"

我有一個腳本,將每個逗號分隔值分配給一個唯一變數,,用作分隔符。

腳本部分如下:

IFS=,

[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }

while read Key  Name    Address1        Address2        City    State   Country SwiftCode       Nid     Chips   Aba     IsSwitching
do
         echo "-------------------------------------------------------------------"

    echo "From Key : $Key"

   echo "-------------------------------------------------------------------"
         echo "-------------------------------------------------------------------"

    echo "From Name : $Name"

它所做的是將引號內帶有逗號的值與我想要的輸出分開,將每個值唯一地分開到它們各自的變數。

我嘗試替換逗號,IFS=[","]但沒有運氣。非常感謝任何建議/幫助。

您在這裡做錯了幾件事:

  1. 您正在使用 shell 來解析文本。

雖然這是可能的,但效率非常低。它很慢,很難寫,很難讀,而且很難正確地做。外殼不是為這種事情設計的。 2. 您正在嘗試在沒有 csv 解析器的情況下解析 csv 文件。

CSV 不是一種簡單的格式。您可以像在此處一樣擁有包含分隔符的欄位。您還可以擁有跨越多行的欄位。嘗試使用簡單的模式匹配來解析任意 CSV 數據是非常非常複雜的,而且很難做到正確。

糟糕的,hacky的解決方案是做這樣的事情:

$ sed 's/","/"|"/g' file.csv | 
   while IFS='|' read -r Key Name Address1 Address2 City \
    State Country SwiftCode Nid Chips Aba IsSwitching; do 
       echo "From Key : $Key"; echo "From Name : $Name"; 
   done
From Key : "ID0054XX"
From Name : "PT. SUMUT"
From Key : "ID00037687"
From Name : "PAN INDONESIA, PT."

這將替換所有",""|"然後|用作分隔符。當然,如果您的任何欄位可以包含|.

好的、乾淨的方法是使用適當的腳本語言,而不是 shell 和 csv 解析器。例如,在 Perl 1中:

$ cat file.csv | perl -MText::CSV -le '
   $csv = Text::CSV->new({binary=>1}); 
   while ($row = $csv->getline(STDIN)){ my ($Key, $Name, $Address1, $Address2, $City, $State, $Country, $SwiftCode, $Nid, $Chips, $Aba, $IsSwitching) = @$row;
print "From Key: $Key\nFrom Name: $Name";}' 
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
   

或者,作為腳本:

#!/usr/bin/perl -l
use strict;
use warnings;
use Text::CSV;

open(my $fh, "file.csv");
my $csv = Text::CSV->new({binary=>1}); 
while (my $row = $csv->getline($fh)){
   my (
           $Key, $Name, $Address1, $Address2, $City,
           $State, $Country, $SwiftCode, $Nid, $Chips,
           $Aba, $IsSwitching
        ) = @$row;
   print "From Key: $Key\nFrom Name: $Name";
}

請注意,您必須先安裝Text::CSV模組 ( cpanm Text::CSV),並且您可能想要安裝(大多數發行版上的cpanm軟體包)cpanminus

或者,在 Python 3 中:

#!/usr/bin/env python3

import csv
with open('file.csv', newline='') as csvfile:
   linereader = csv.reader(csvfile, delimiter=',', quotechar='"')
   for row in linereader:
       print("From Key: %s\nFrom Name: %s" % (row[0], row[1]))
   

將上面的 Python 程式碼保存為腳本並在文件上執行將列印:

$ foo.py
From Key: ID0054XX
From Name: PT. SUMUT
From Key: ID00037687
From Name: PAN INDONESIA, PT.
   

1是的,我知道那是一個 UUoC,但以這種方式寫成一個單行字更簡單。

引用自:https://unix.stackexchange.com/questions/363356