Awk
使用任何列中的常用值合併行
我有一個如下所示的製表符分隔文件,並希望根據任何列中的匹配項合併行。列數通常為 2,但在某些情況下可能會有所不同,為 3。
輸入:
AMAZON NILE ALASKA NILE HELLO MY MANGROVE AMAZON MY NAME IS NAME
所需的輸出:
AMAZON NILE ALASKA MANGROVE HELLO MY NAME IS
一個人怎麼能這樣
awk
呢?這也適用於以下文件嗎?輸入:
apple_bin2file strawberry_24files mango2files strawberry_39files apple_bin8file strawberry_39files dastool_bin6files strawberry_40files apple_bin6file strawberry_40files orange_bin004file dastool_bin004files orange_bin005file dastool_bin005files apple_bin3file dastool_bin3files apple_bin5file dastool_bin5files apple_bin6file dastool_bin6files apple_bin7file dastool_bin7files apple_bin8file mango2files
製表符分隔格式的預期輸出:
apple_bin2file strawberry_24files mango2files strawberry_39files apple_bin8file dastool_bin6files strawberry_40files apple_bin6file orange_bin004file dastool_bin004files orange_bin005file dastool_bin005files apple_bin3file dastool_bin3files apple_bin5file dastool_bin5files apple_bin7file dastool_bin7files
抱歉那些回答的人,我更新了輸入文件!
使用 GNU awk
gawk ' { grp = 0 # see if any of these words already have a group for (i=1; i<=NF; i++) { if (group[$i]) { grp = group[$i] break } } # no words have been seen before: new group if (!grp) { grp = ++n } # if we have not seen this word, add it to the output for (i=1; i<=NF; i++) { if (!group[$i]) { line[grp] = line[grp] $i OFS } group[$i] = grp } } END { PROCINFO["sorted_in"] = "@ind_num_asc" for (n in line) { print line[n] } } ' input.file
使用第一個輸入:
AMAZON NILE ALASKA MANGROVE HELLO MY NAME IS
使用第二個輸入(將輸出連接到
column -t
):apple_bin2file strawberry_24files mango2files strawberry_39files apple_bin8file dastool_bin6files strawberry_40files apple_bin6file orange_bin004file dastool_bin004files orange_bin005file dastool_bin005files apple_bin3file dastool_bin3files apple_bin5file dastool_bin5files apple_bin7file dastool_bin7files