Text-Processing

如何處理列的多個字元串

  • August 26, 2018

我有一個逗號分隔的文件,看起來類似於他的格式:

aa.com,1.21.3.4,string1 string2 K=12     K2=23  K3=45 K4=56
bb.com,5.6.7.8,string1 string2 K=66     K2=77  K3=88 K4=99

我想取第三列,其中包含用空格分隔的字元串。我想處理文件以用逗號分隔第三列的前兩個字元串,並忽略第 3 列中的其餘字元串。前兩個欄位不包含空格。請注意,第 3 列中的字元串數量並非對所有記錄都是固定的。在此範例中,它是由 5 個空格分隔的 6 個字元串。但它可以或多或少。

我所需要的只是取第 3 列的前兩個字元串,用逗號分隔它們,然後忽略第 3 列的其餘字元串。

aa.com,1.21.3.4,string1,string2
bb.com,5.6.7.8,string1,string2

嘗試:

awk '{print $1, $2}' OFS=, infile
aa.com,1.21.3.4,string1,string2
bb.com,5.6.7.8,string1,string2

如果在這種情況下,您在第一個或第二個欄位中有空格,您會這樣做:

awk -F, '{ match($3, /[^ ]* +[^ ]*/); 
          bkup=substr($3, RSTART, RLENGTH);
          gsub(/ +/, ",", bkup); # replace spaces with comma
          print $1, $2, bkup
}' OFS=, infile

**解釋:**讀入manawk

match(s, r [, a])  
         Return the position in s where the regular expression r occurs, 
         or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
         Return the at most n-character substring of s starting at I.
         If n is omitted, use the rest of s.

RSTART
         The index of the first character matched by match(); 0 if no
         match.  (This implies that character indices start at one.)

RLENGTH
         The length of the string matched by match(); -1 if no match.

引用自:https://unix.stackexchange.com/questions/464772