Bash
如何將字元串“title”放在其他行之前,直到出現新字元串“title”。用 awk、sed、perl 等解決的遞歸問題
我有一個包含超過 100 萬行的 txt 文件,其中包含下一個內容(它是 Match_n 和“cggggg”之間的選項卡):
Sequence_1 Match_1 cggggg Match_2 gggggc Match_3 ggggcc Match_4 cgggcc Match_5 agggca Match_6 agggta Sequence_2 Match_1 tgggca Match_2 aggggg Match_3 gggggc Match_4 ggggca Sequence_3 Match_1 cggggt Match_2 ggggtt Match_3 tgggga Match_4 ggggac Match_5 cggggc
我需要以下格式:
Sequence_1 Match_1 cggggg Sequence_1 Match_2 gggggc Sequence_1 Match_3 ggggcc Sequence_1 Match_4 cgggcc Sequence_1 Match_5 agggca Sequence_1 Match_6 agggta Sequence_2 Match_1 tgggca Sequence_2 Match_2 aggggg Sequence_2 Match_3 gggggc Sequence_2 Match_4 ggggca Sequence_3 Match_1 cggggt Sequence_3 Match_2 ggggtt Sequence_3 Match_3 tgggga Sequence_3 Match_4 ggggac Sequence_3 Match_5 cggggc
更多資訊:有 10.000 個“Sequence_N”,每個都有可變數量的“Match_n cggggc”
謝謝!!
awk -v OFS='\t' 'NF==1{seq=$0; next} {print seq, $0}' file
使用 sed,您可以將
Sequence
遇到的每個字元串移動到保留空間中,然後將其拉回:sed -e '/^Sequence/{h;d;}' -e 'G;s/\(.*\)\n\(.*\)/\2\t\1/' file
需要進行一些重新排列以
Sequence
領先於Match
- 但這可以納入無論如何更改換行符分隔符所需的替換。