如何在 linux 中重新格式化 Kegg 映射器輸出?
kegg reconstruct pathway
輸出,我在file1中有這樣的東西:00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (8) K01190 K01191 K01192 K01201 K01227 K12309
00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309
我如何在 linux 或 python 中重新格式化它?
awk ' !NF {next # don"t process empty lines } /^[0-9]+ / {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # for the "glycan" lines (leading numerical) # correct the count in parentheses if (PRT) print PRT # print the PRT buffer (NOT first line when empty) PRT = "" # empty it after print CNT = gsub (/K[0-9]*/, "&") - 1 # get this line"s "K..." count, corr.for later incr. } {PRT = sprintf ("%s%s%s", PRT, PRT?" ":"", $0) # append this line to buffer CNT++ # increment "K..." count } END {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # see above print PRT } ' file 00550 Peptidoglycan biosynthesis (2) K01000 K02563 00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309