如何處理列的多個字元串

August 26, 2018

我有一個逗號分隔的文件，看起來類似於他的格式：
aa.com,1.21.3.4,string1 string2 K=12     K2=23  K3=45 K4=56
bb.com,5.6.7.8,string1 string2 K=66     K2=77  K3=88 K4=99
我想取第三列，其中包含用空格分隔的字元串。我想處理文件以用逗號分隔第三列的前兩個字元串，並忽略第 3 列中的其餘字元串。前兩個欄位不包含空格。請注意，第 3 列中的字元串數量並非對所有記錄都是固定的。在此範例中，它是由 5 個空格分隔的 6 個字元串。但它可以或多或少。
我所需要的只是取第 3 列的前兩個字元串，用逗號分隔它們，然後忽略第 3 列的其餘字元串。
aa.com,1.21.3.4,string1,string2
bb.com,5.6.7.8,string1,string2

嘗試：

awk '{print $1, $2}' OFS=, infile
aa.com,1.21.3.4,string1,string2
bb.com,5.6.7.8,string1,string2

如果在這種情況下，您在第一個或第二個欄位中有空格，您會這樣做：

awk -F, '{ match($3, /[^ ]* +[^ ]*/); 
          bkup=substr($3, RSTART, RLENGTH);
          gsub(/ +/, ",", bkup); # replace spaces with comma
          print $1, $2, bkup
}' OFS=, infile

**解釋：**讀入manawk：

match(s, r [, a])  
         Return the position in s where the regular expression r occurs, 
         or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
         Return the at most n-character substring of s starting at I.
         If n is omitted, use the rest of s.

RSTART
         The index of the first character matched by match(); 0 if no
         match.  (This implies that character indices start at one.)

RLENGTH
         The length of the string matched by match(); -1 if no match.

引用自：https://unix.stackexchange.com/questions/464772

如何處理列的多個字元串

相關問答

如何在非常長的行的非常大的文件中搜尋字元串？

從文件中提取特定值

使用 xargs 管道輸入 printf 格式化輸出

與 awk 匹配後僅列印下一行

僅列印與 awk 匹配後的最後一行

如何使用 sed、grep 或 awk 根據另一個文件中的行號將某些行保留在文件中