Text-Formatting
如何在 linux 中重新格式化 Kegg 映射器輸出?
我需要重新格式化
kegg reconstruct pathway
輸出,我在file1中有這樣的東西:00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (8) K01190 K01191 K01192 K01201 K01227 K12309
我在file2中需要一些類似的東西:
00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309
我如何在 linux 或 python 中重新格式化它?
謝謝
這會讓你走多遠:
awk ' !NF {next # don"t process empty lines } /^[0-9]+ / {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # for the "glycan" lines (leading numerical) # correct the count in parentheses if (PRT) print PRT # print the PRT buffer (NOT first line when empty) PRT = "" # empty it after print CNT = gsub (/K[0-9]*/, "&") - 1 # get this line"s "K..." count, corr.for later incr. } {PRT = sprintf ("%s%s%s", PRT, PRT?" ":"", $0) # append this line to buffer CNT++ # increment "K..." count } END {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # see above print PRT } ' file 00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309