Linux

如何從linux中多個文件夾中的文件中創建具有特定列的新文件?

  • December 6, 2020

.tsv在 100 多個目錄中有文件。我想製作一個文件,其中tsv包含 100 個目錄中所有這些文件所需的所有資訊。

例如:

Data
|___ SOB33D
       |___ SOB33D.tsv
|___ SOB43E
       |___ SOB43E.tsv
|___ SOB58D
       |___ SOB58D.tsv
|___ SOB113A
       |___ SOB113A.tsv

中的數據SOB33D.tsv如下所示:

target_id         length    eff_length  est_counts
ENST00000456328.2   1657      1525.05      0
ENST00000450305.2   632       500.105      0
ENST00000488147.1   1351      1219.05    0.492522
ENST00000619216.1   68        12.9174    0.70395
ENST00000473358.1   712       580.105      0
ENST00000469289.1   535       403.105      0

SOB43E.tsv:

target_id   length  eff_length  est_counts
ENST00000456328.2   1657    1525.05 0.174591
ENST00000450305.2   632 500.105 0
ENST00000488147.1   1351    1219.05 7.70424
ENST00000619216.1   68  12.9174 0.295008
ENST00000473358.1   712 580.105 0
ENST00000469289.1   535 403.105 0

SOB58D.tsv:

target_id   length  eff_length  est_counts
ENST00000456328.2   1657    1525.05 0.282655
ENST00000450305.2   632 500.105 0
ENST00000488147.1   1351    1219.05 2.64778
ENST00000619216.1   68  12.9174 0
ENST00000473358.1   712 580.105 0
ENST00000469289.1   535 403.105 0

SOB113A.tsv:

target_id   length  eff_length  est_counts
ENST00000456328.2   1657    1525.05 0.0225974
ENST00000450305.2   632 500.105 0
ENST00000488147.1   1351    1219.05 1.35652
ENST00000619216.1   68  12.9174 0
ENST00000473358.1   712 580.105 0
ENST00000469289.1   535 403.105 0

我正在嘗試使用cut. 我這樣做幾乎是正確的。我想要所有文件中相同的第一列和第二列,並且4th column which is different in all the files. 所以,我像下面這樣使用它:

paste */*.tsv | cut -f 1,2,4,8,12,16 > all_samples.tsv

在上面的命令中,我取了所有文件中相同的第 1 列和第 2 列以及所有文件中的第 4 列。輸出如下所示:

輸出:

target_id        length est_counts  est_counts  est_counts  est_counts
ENST00000456328.2   1657    0   0.174591    0.282655    0.0225974
ENST00000450305.2   632 0   0   0   0
ENST00000488147.1   1351    0.492522    7.70424 2.64778 1.35652
ENST00000619216.1   68  0.70395 0.295008    0   0
ENST00000473358.1   712 0   0   0   0
ENST00000469289.1   535 0   0   0   0

預期輸出:

target_id         length    SOB33D  SOB43E  SOB58D  SOB113A
ENST00000456328.2   1657    0   0.174591    0.282655    0.0225974
ENST00000450305.2   632 0   0   0   0
ENST00000488147.1   1351    0.492522    7.70424 2.64778 1.35652
ENST00000619216.1   68  0.70395 0.295008    0   0
ENST00000473358.1   712 0   0   0   0
ENST00000469289.1   535 0   0   0   0

對於較少數量的文件,我可以使用paste,但我有100 files in 100 directories. 那麼,如何從文件夾名稱作為列名稱的所有這些.tsv文件中創建一個文件?100 directories

任何幫助表示讚賞。謝謝

$ cat tst.awk
BEGIN {
   FS=OFS="\t"
   numCols = 2
}
{
   if ( FNR == 1 ) {
       numCols++
       val = FILENAME
       sub("/[^/]+$","",val)
       sub(".*/","",val)
   }
   else {
       val = $4
   }
   vals[FNR,1] = $1
   vals[FNR,2] = $2
   vals[FNR,numCols] = val
}
END {
   for (rowNr=1; rowNr<=FNR; rowNr++) {
       for (colNr=1; colNr<=numCols; colNr++) {
           printf "%s%s", vals[rowNr,colNr], (colNr<numCols ? OFS : ORS)
       }
   }
}
$ awk -f tst.awk */estimate.tsv
target_id       length  SOB33D  SOB43E
ENST00000456328.2       1657    0       0.174591
ENST00000450305.2       632     0       0
ENST00000488147.1       1351    0.492522        7.70424
ENST00000619216.1       68      0.70395 0.295008
ENST00000473358.1       712     0       0
ENST00000469289.1       535     0       0

以上是使用此輸入執行的(所有空格都是製表符):

$ head */estimate.tsv
==> SOB33D/estimate.tsv <==
target_id       length  eff_length      est_counts
ENST00000456328.2       1657    1525.05 0
ENST00000450305.2       632     500.105 0
ENST00000488147.1       1351    1219.05 0.492522
ENST00000619216.1       68      12.9174 0.70395
ENST00000473358.1       712     580.105 0
ENST00000469289.1       535     403.105 0

==> SOB43E/estimate.tsv <==
target_id       length  eff_length      est_counts
ENST00000456328.2       1657    1525.05 0.174591
ENST00000450305.2       632     500.105 0
ENST00000488147.1       1351    1219.05 7.70424
ENST00000619216.1       68      12.9174 0.295008
ENST00000473358.1       712     580.105 0
ENST00000469289.1       535     403.105 0

引用自:https://unix.stackexchange.com/questions/622898