Linux
如何在linux中創建一個包含來自不同多個文件的所需列的新文件?
我有一個目錄
ballgown
,其中有大約 1000 個子目錄作為範例名稱。每個子目錄都有一個文件t_data.ctab
。所有子目錄中的文件名都相同。ballgown |_______TCGA-A2-A0T3-01A |___________ t_data.ctab |_______TCGA-A7-A4SA-01A |___________ t_data.ctab |_______TCGA-A7-A6VW-01A |___________ t_data.ctab
像上面一樣
ballgown
有 1000 個子目錄。所有這 1000 個子目錄中的t_data.ctab
文件如下所示:t_id chr strand start end t_name num_exons length gene_id gene_name cov FPKM 1 1 - 10060 10614 MSTRG.1.1 1 555 MSTRG.1 . 0.000000 0.000000 2 1 + 11140 30023 MSTRG.10.1 12 3981 MSTRG.10 . 2.052715 0.284182 3 1 - 11694 29342 MSTRG.11.1 8 6356 MSTRG.11 . 0.557588 0.077194 4 1 + 11869 14409 ENST00000456328.2 3 1657 MSTRG.10 DDX11L1 0.000000 0.000000 5 1 + 11937 29347 MSTRG.10.3 12 3544 MSTRG.10 . 0.000000 0.000000 6 1 - 11959 30203 MSTRG.11.2 11 4547 MSTRG.11 . 0.369929 0.051214 7 1 + 12010 13670 ENST00000450305.2 6 632 MSTRG.10 DDX11L1 0.000000 0.000000 8 1 + 12108 26994 MSTRG.10.5 10 5569 MSTRG.10 . 0.057091 0.007904 9 1 + 12804 199997 MSTRG.10.6 12 3567 MSTRG.10 . 0.000000 0.000000 10 1 + 13010 31097 MSTRG.10.7 12 4375 MSTRG.10 . 0.000000 0.000000 11 1 - 13068 26832 MSTRG.11.3 9 5457 MSTRG.11 . 0.995280 0.137788
從
t_data.ctab
我只想提取的所有文件中t_name
,FPKM
列並創建一個新文件。在新文件中,該FPKM
列應該是樣品名稱。它應該如下所示:t_name TCGA-A2-A0T3-01A TCGA-A7-A4SA-01A TCGA-A7-A6VW-01A MSTRG.1.1 0 0.028181 0 MSTRG.10.1 0.284182 0.002072 0.046302 MSTRG.11.1 0.077194 0.685535 0.105849 ENST00000456328.2 0 0.307315 0.038961 MSTRG.10.3 0 0.446015 0.009946 MSTRG.11.2 0.051214 0.053577 0.036081 ENST00000450305.2 0 0.110438 0.040319 MSTRG.10.5 0.007904 0 1.430825 MSTRG.10.6 0 0 0.221105 MSTRG.10.7 0 0.199354 0 MSTRG.11.3 0.137788 0.004792 0
如果是兩個或三個文件,我可以
cut
在每個文件上使用 -f6,12 然後加入它們。但我現在有大約 1000 個文件。
試試這個簡單的方法:
首先做:
awk 'FNR==1 { print substr(FILENAME,1,16) >substr(FILENAME,1,16)".tmp" } FNR >1 { print $12 > substr(FILENAME,1,16)".tmp" } NR==FNR{ print $6 >"first_column.tmp" }' TCGA-A*/t_data.ctab
然後
paste
將它們與逗號分隔的文件一起(-d,
如果你想要 Tab 而不是刪除):paste -d, *.tmp t_name,TCGA-A2-A0T3-01A,TCGA-A7-A4SA-01A,TCGA-A7-A6VW-01A MSTRG.1.1,0.000000,0.00000,0.0000 MSTRG.10.1,0.284182,0.28418,0.2841 MSTRG.11.1,0.077194,0.07719,0.0771 ENST00000456328.2,0.000000,0.00000,0.0000 MSTRG.10.3,0.000000,0.00000,0.0000 MSTRG.11.2,0.051214,0.05121,0.0512 ENST00000450305.2,0.000000,0.00000,0.0000 MSTRG.10.5,0.007904,0.00790,0.0079 MSTRG.10.6,0.000000,0.00000,0.0000 MSTRG.10.7,0.000000,0.00000,0.0000 MSTRG.11.3,0.137788,0.13778,0.1377