Zip

並行讀取壓縮文件的內容而不提取

  • December 13, 2016

我有以下 zip 存檔結構:

$ unzip -l Undetermined_S0_L004_R1_001_fastqc.zip 
Archive:  Undetermined_S0_L004_R1_001_fastqc.zip
 Length     Date   Time    Name
--------    ----   ----    ----
       0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/
       0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/
       0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/
    1197  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/fastqc_icon.png
    1450  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/warning.png
    1561  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/error.png
    1715  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/tick.png
     782  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/summary.txt
    9095  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_quality.png
   14381  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_tile_quality.png
   23205  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_sequence_quality.png
   30978  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_sequence_content.png
   31152  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_sequence_gc_content.png
    7861  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_n_content.png
   18356  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/sequence_length_distribution.png
   23040  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/duplication_levels.png
    9096  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/adapter_content.png
   58683  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/kmer_profiles.png
  355919  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc_report.html
  301092  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc_data.txt
   10117  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc.fo
--------                   -------
  899680                   21 files

怎麼可能並行使用fastqc_data.txtcrimson因為目前我收到以下錯誤:

find `pwd`/*_fastqc.zip -type f | parallel -j 3 unzip -c {} {}/fastqc_data.txt | crimson fastqc {} | less

Usage: crimson fastqc [OPTIONS] INPUT [OUTPUT]

Error: Invalid value for "input": Path "{}" does not exist.

您有一個由四個命令組成的管道:

  • find,其中列出了 zip 文件。
  • parallel,它呼叫unzip以在每個 zip 文件中提取一個文件。鑑於它{}被 zip 文件的路徑替換,您嘗試home/user977828/stuff/Undetermined_S0_L004_R1_001_fastqc.zip/fastqc_data.txt從存檔中提取文件(如果目前目錄是/home/user977828/stuff)。
  • crimson,它接收標準輸入上提取的文件的混亂,並使用參數fastqc和呼叫{}
  • less.

parallel``{}僅在其論點中替換。它不能對管道的其他部分做任何事情。如果要單獨呼叫crimson每個fastqc_data.txt文件,則需要將管道從unziptocrimson作為參數傳遞給parallel.

find *_fastqc.zip -type f | sed 's/\.zip$//' |
parallel -j 3 'unzip -c {}.zip {}/fastqc_data.txt | crimson fastqc /dev/stdin' |
less

引用自:https://unix.stackexchange.com/questions/329752