Bash

如何將字元串“title”放在其他行之前,直到出現新字元串“title”。用 awk、sed、perl 等解決的遞歸問題

  • May 29, 2020

我有一個包含超過 100 萬行的 txt 文件,其中包含下一個內容(它是 Match_n 和“cggggg”之間的選項卡):

Sequence_1
Match_1 cggggg
Match_2 gggggc
Match_3 ggggcc
Match_4 cgggcc
Match_5 agggca
Match_6 agggta
Sequence_2
Match_1 tgggca
Match_2 aggggg
Match_3 gggggc
Match_4 ggggca
Sequence_3
Match_1 cggggt
Match_2 ggggtt
Match_3 tgggga
Match_4 ggggac
Match_5 cggggc

我需要以下格式:

Sequence_1  Match_1 cggggg
Sequence_1  Match_2 gggggc
Sequence_1  Match_3 ggggcc
Sequence_1  Match_4 cgggcc
Sequence_1  Match_5 agggca
Sequence_1  Match_6 agggta
Sequence_2  Match_1 tgggca
Sequence_2  Match_2 aggggg
Sequence_2  Match_3 gggggc
Sequence_2  Match_4 ggggca
Sequence_3  Match_1 cggggt
Sequence_3  Match_2 ggggtt
Sequence_3  Match_3 tgggga
Sequence_3  Match_4 ggggac
Sequence_3  Match_5 cggggc

更多資訊:有 10.000 個“Sequence_N”,每個都有可變數量的“Match_n cggggc”

謝謝!!

awk -v OFS='\t' 'NF==1{seq=$0; next} {print seq, $0}' file

使用 sed,您可以將Sequence遇到的每個字元串移動到保留空間中,然後將其拉回:

sed -e '/^Sequence/{h;d;}' -e 'G;s/\(.*\)\n\(.*\)/\2\t\1/' file

需要進行一些重新排列以Sequence領先於Match- 但這可以納入無論如何更改換行符分隔符所需的替換。

引用自:https://unix.stackexchange.com/questions/589505