對欄位中的每個值執行一系列命令

February 13, 2020

我有一個 tsv 文件。對於第 5 列中的特定值，我想提取所有行，然後剪切三列，然後計算唯一行。例如對於第 5 列中的字元串“abc”，我想要
awk '$5 == "abc"' file.tsv | cut -f 1-3 | sort -u | wc -l
但我想對第 5 列中的所有唯一字元串執行此操作，而不僅僅是“abc”。應該有“for i in $5”之類的東西，但我並沒有完全得到這個“for循環”的東西。我不能給出單獨的命令，因為字元串太多了。

看起來你想要類似的東西

awk '{test[$5" "$1" "$2" "$3]++}END{for (t in test) print t}' file1 | cut -d' ' -f1 | sort | uniq -c

走過

test[$5" "$1" "$2" "$3]++ #populates an array with unique combinations of these fields
for (t in test) print t   #print each unique array index (field combination) once to STDOUT
cut -d' ' -f1             #extract what was the original 5th field
sort                      #yes, yes OK @Bodo
uniq -c                   #count the number of times it appears

輸出

2 abc
1 def

編輯

雖然承認在@Bodo 手中失敗，但尋找可行awk解決方案的決心仍然存在，所以我提供了這個醜陋的野獸……

awk '!test[$5" "$1" "$2" "$3]{out[$5]++;test[$5" "$1" "$2" "$3]++}
 END
 {for (o in out) print o, out[o]}' file1

這將列印預期的結果
cut -f 1-3,5 file.tsv | sort -u | cut -f 4 | sort | uniq -c | awk '{ print $2, $1; }'
解釋：
cut -f 1-3,5 file.tsv提取相關列 1、2、3、5
sort -u獲取唯一組合
cut -f 4僅提取現在在第 4 列中的原始第 5 列值
sort | uniq -c對唯一值
awk '{ print $2 "\t" $1; }'交換列和格式輸出進行排序和計數

引用自：https://unix.stackexchange.com/questions/567442

對欄位中的每個值執行一系列命令

相關問答

獲取與多個文件中的模式匹配的最後一行

從文本文件中提取域名（主機名）

如何對 1000 行文件中的每 20 行進行排序，並僅將每個間隔中具有最高值的排序行保存到另一個文件中？

對列表進行數字排序

僅在第一列中的第一個空格後刪除字元串

如何按列對腳本進行排序並刪除重複的腳本