Shell

如何將 basename 與並行使用?

  • June 10, 2021

我在 Linux 系統上有這樣的文件:

   10S1_S5_L002_chrm.fasta  SRR3184711_chrm.fasta    SRR3987378_chrm.fasta  SRR4029368_chrm.fasta  SRR5204465_chrm.fasta    SRR5997546_chrm.fasta
13_S7_L003_chrm.fasta    SRR3184712_chrm.fasta    SRR3987379_chrm.fasta  SRR4029369_chrm.fasta  SRR5204520_chrm.fasta    SRR5997547_chrm.fasta
14_S8_L003_chrm.fasta    SRR3184713_chrm.fasta    SRR3987380_chrm.fasta  SRR4029370_chrm.fasta  SRR5208699_chrm.fasta    SRR5997548_chrm.fasta
17_S4_L002_chrm.fasta    SRR3184714_chrm.fasta    SRR3987415_chrm.fasta  SRR4029371_chrm.fasta  SRR5208700_chrm.fasta    SRR5997549_chrm.fasta
3_S1_L001_chrm.fasta     SRR3184715_chrm.fasta    SRR3987433_chrm.fasta  SRR4029372_chrm.fasta  SRR5208701_chrm.fasta    SRR5997550_chrm.fasta
4_S2_L001_chrm.fasta     SRR3184716_chrm.fasta    SRR3987482_chrm.fasta  SRR4029373_chrm.fasta  SRR5208770_chrm.fasta    SRR5997551_chrm.fasta
50m_S10_L004_chrm.fasta  SRR3184717_chrm.fasta    SRR3987489_chrm.fasta  SRR4029374_chrm.fasta  SRR5208886_chrm.fasta    SRR5997552_chrm.fasta
5_S3_L001_chrm.fasta     SRR3184718_chrm.fasta    SRR3987493_chrm.fasta  SRR4029375_chrm.fasta  SRR5211153_chrm.fasta    SRR6050903_chrm.fasta
65m_S11_L005_chrm.fasta  SRR3184719_chrm.fasta    SRR3987495_chrm.fasta  SRR4029376_chrm.fasta  SRR5211162_chrm.fasta    SRR6050905_chrm.fasta
6_S6_L002_chrm.fasta     SRR3184720_chrm.fasta    SRR3987647_chrm.fasta  SRR4029377_chrm.fasta  SRR5211163_chrm.fasta    SRR6050920_chrm.fasta
70m_S12_L006_chrm.fasta  SRR3184721_chrm.fasta    SRR3987651_chrm.fasta  SRR4029378_chrm.fasta  SRR5215118_chrm.fasta    SRR6050921_chrm.fasta
80m_S1_L002_chrm.fasta   SRR3184722_chrm.fasta    SRR3987657_chrm.fasta  SRR4029379_chrm.fasta  SRR5247122_chrm.fasta    SRR6050958_chrm.fasta

總共有 423 個,我被要求將它們分成 32 個部分,以便在 32 個 CPU 上實現最佳並行化所以現在我有了這個:

   10S1_S5_L002_chrm.part-10.fasta  SRR3986254_chrm.part-26.fasta  SRR4029372_chrm.part-22.fasta    SRR5581526-1_chrm.part-20.fasta
10S1_S5_L002_chrm.part-11.fasta  SRR3986254_chrm.part-27.fasta  SRR4029372_chrm.part-23.fasta    SRR5581526-1_chrm.part-21.fasta
10S1_S5_L002_chrm.part-12.fasta  SRR3986254_chrm.part-28.fasta  SRR4029372_chrm.part-24.fasta    SRR5581526-1_chrm.part-22.fasta
10S1_S5_L002_chrm.part-13.fasta  SRR3986254_chrm.part-29.fasta  SRR4029372_chrm.part-25.fasta    SRR5581526-1_chrm.part-23.fasta
10S1_S5_L002_chrm.part-14.fasta  SRR3986254_chrm.part-2.fasta   SRR4029372_chrm.part-26.fasta    SRR5581526-1_chrm.part-24.fasta
10S1_S5_L002_chrm.part-15.fasta  SRR3986254_chrm.part-30.fasta  SRR4029372_chrm.part-27.fasta    SRR5581526-1_chrm.part-25.fasta
10S1_S5_L002_chrm.part-16.fasta  SRR3986254_chrm.part-31.fasta  SRR4029372_chrm.part-28.fasta    SRR5581526-1_chrm.part-26.fasta
10S1_S5_L002_chrm.part-17.fasta  SRR3986254_chrm.part-32.fasta  SRR4029372_chrm.part-29.fasta    SRR5581526-1_chrm.part-27.fasta
10S1_S5_L002_chrm.part-18.fasta  SRR3986254_chrm.part-3.fasta   SRR4029372_chrm.part-2.fasta     SRR5581526-1_chrm.part-28.fasta
10S1_S5_L002_chrm.part-19.fasta  SRR3986254_chrm.part-4.fasta   SRR4029372_chrm.part-30.fasta    SRR5581526-1_chrm.part-29.fasta
10S1_S5_L002_chrm.part-1.fasta   SRR3986254_chrm.part-5.fasta   SRR4029372_chrm.part-3.fasta     SRR5581526-1_chrm.part-2.fasta
10S1_S5_L002_chrm.part-20.fasta  SRR3986254_chrm.part-6.fasta   SRR4029372_chrm.part-4.fasta     SRR5581526-1_chrm.part-30.fasta
10S1_S5_L002_chrm.part-21.fasta  SRR3986254_chrm.part-7.fasta   SRR4029372_chrm.part-5.fasta     SRR5581526-1_chrm.part-31.fasta

我想應用來自 CRISPRCasFinder 工具的命令 該命令在我單獨使用時執行良好 1namefile.fasta 該命令在我使用時也執行良好parallelon namefile.part*.fasta

但是當我嘗試通過使用使命令更通用時basename,沒有任何效果。我想用來basename將輸入文件的名稱保留在輸出文件夾中。

我在一個較小的數據集上試過這個:

time parallel 'dossierSortie=$(basename -s .fasta {}) ; singularity exec -B $PWD /usr/local/CRISPRCasFinder-release-4.2.20/CrisprCasFinder.simg perl /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl -so /usr/local/CRISPRCasFinder/sel392v2.so -cf /usr/local/CRISPRCasFinder/CasFinder-2.0.3 -drpt /usr/local/CRISPRCasFinder/supplementary_files/repeatDirection.tsv -rpts /usr/local/CRISPRCasFinder/supplementary_files/Repeat_List.csv -cas -def G --meta -out /databis/defontis/Dossier_fasta_chrm_avec_CRISPRCasFinder/Test/Result{} -in /databis/defontis/Dossier_fasta_chrm_avec_CRISPRCasFinder/Test/{}' ::: *_chrm.part*.fasta

它做到了這一點

   ERR358546_chrm.part-1.fasta    SRR4029114_k141_23527.fna.bck   SRR5100341_k141_10416.fna.lcp   SRR5100345_k141_3703.fna.al1
ERR358546_chrm.part-2.fasta    SRR4029114_k141_23527.fna.bwt   SRR5100341_k141_10416.fna.llv   SRR5100345_k141_3703.fna.bck
ERR358546_chrm.part-3.fasta    SRR4029114_k141_23527.fna.des   SRR5100341_k141_10416.fna.ois   SRR5100345_k141_3703.fna.bwt
ERR358546_chrm.part-4.fasta    SRR4029114_k141_23527.fna.lcp   SRR5100341_k141_10416.fna.prj   SRR5100345_k141_3703.fna.des
ERR358546_chrm.part-5.fasta    SRR4029114_k141_23527.fna.llv   SRR5100341_k141_10416.fna.sds   SRR5100345_k141_3703.fna.lcp
ERR358546_chrm.part-6.fasta    SRR4029114_k141_23527.fna.ois   SRR5100341_k141_10416.fna.sti1  SRR5100345_k141_3703.fna.llv
ERR358546_k141_26987.fna       SRR4029114_k141_23527.fna.prj   SRR5100341_k141_10416.fna.suf   SRR5100345_k141_3703.fna.ois
ERR358546_k141_33604.fna       SRR4029114_k141_23527.fna.sds   SRR5100341_k141_10416.fna.tis   SRR5100345_k141_3703.fna.prj
ERR358546_k141_90631.fna       SRR4029114_k141_23527.fna.sti1  SRR5100341_k141_10942.fna       SRR5100345_k141_3703.fna.sds
ResultERR358546_chrm.part-3    SRR4029114_k141_23527.fna.suf   SRR5100341_k141_164.fna         SRR5100345_k141_3703.fna.sti1
ResultERR358546_chrm.part-4    SRR4029114_k141_23527.fna.tis   SRR5100341_k141_3046.fna        SRR5100345_k141_3703.fna.suf
ResultSRR4029114_chrm.part-1   SRR5100341_chrm.part-10.fasta   SRR5100341_k141_3968.fna        SRR5100345_k141_3703.fna.tis
ResultSRR4029114_chrm.part-4   SRR5100341_chrm.part-11.fasta   SRR5100341_k141_631.fna         SRR5100345_k141_4429.fna
ResultSRR5100341_chrm.part-10  SRR5100341_chrm.part-12.fasta   SRR5100341_k141_6376.fna        SRR5100345_k141_4832.fna
ResultSRR5100341_chrm.part-11  SRR5100341_chrm.part-13.fasta   SRR5100341_k141_8699.fna        SRR5100345_k141_6139.fna
ResultSRR5100341_chrm.part-3   SRR5100341_chrm.part-1.fasta    SRR5100341_k141_8892.fna        SRR5100345_k141_731.fna
ResultSRR5100341_chrm.part-9   SRR5100341_chrm.part-2.fasta    SRR5100345_chrm.part-10.fasta   SRR5100345_k141_731.fna.al1
ResultSRR5100345_chrm.part-1   SRR5100341_chrm.part-3.fasta    SRR5100345_chrm.part-1.fasta    SRR5100345_k141_731.fna.bck
ResultSRR5100345_chrm.part-4   SRR5100341_chrm.part-4.fasta    SRR5100345_chrm.part-2.fasta    SRR5100345_k141_731.fna.bwt
ResultSRR5100345_chrm.part-9   SRR5100341_chrm.part-5.fasta    SRR5100345_chrm.part-3.fasta    SRR5100345_k141_731.fna.des
SRR4029114_chrm.part-1.fasta   SRR5100341_chrm.part-6.fasta    SRR5100345_chrm.part-4.fasta    SRR5100345_k141_731.fna.lcp
SRR4029114_chrm.part-2.fasta   SRR5100341_chrm.part-7.fasta    SRR5100345_chrm.part-5.fasta    SRR5100345_k141_731.fna.llv
SRR4029114_chrm.part-3.fasta   SRR5100341_chrm.part-8.fasta    SRR5100345_chrm.part-6.fasta    SRR5100345_k141_731.fna.ois
SRR4029114_chrm.part-4.fasta   SRR5100341_chrm.part-9.fasta    SRR5100345_chrm.part-7.fasta    SRR5100345_k141_731.fna.prj
SRR4029114_chrm.part-5.fasta   SRR5100341_k141_10416.fna       SRR5100345_chrm.part-8.fasta    SRR5100345_k141_731.fna.sds
SRR4029114_k141_14384.fna      SRR5100341_k141_10416.fna.al1   SRR5100345_chrm.part-9.fasta    SRR5100345_k141_731.fna.sti1
SRR4029114_k141_16765.fna      SRR5100341_k141_10416.fna.bck   SRR5100345_k141_1211.fna        SRR5100345_k141_731.fna.suf
SRR4029114_k141_23527.fna      SRR5100341_k141_10416.fna.bwt   SRR5100345_k141_2884.fna        SRR5100345_k141_731.fna.tis
SRR4029114_k141_23527.fna.al1  SRR5100341_k141_10416.fna.des   SRR5100345_k141_3703.fna

文件夾的名稱不好,因為我想要例如只是ResultERR358546而不是ResultERR358546_chrm.part-2.fasta 我不想要每個部分的結果,而只想要每個 ID。

您的basename命令僅刪除固定.fasta副檔名 - 據我所知,它無法刪除變數模式。

然而 GNUparallel提供了一個Perl 表達式替換字元串工具,它比basename- ex 強大得多。給定

$ ls *_chrm.part*.fasta
ERR358546_chrm.part-2.fasta  ERR358546_chrm.part-5.fasta  ERR358546_chrm.part-8.fasta
ERR358546_chrm.part-3.fasta  ERR358546_chrm.part-6.fasta  ERR358546_chrm.part-9.fasta
ERR358546_chrm.part-4.fasta  ERR358546_chrm.part-7.fasta

然後

$ parallel echo Result'{= s:_.*$:: =}' ::: *_chrm.part*.fasta
ResultERR358546
ResultERR358546
ResultERR358546
ResultERR358546
ResultERR358546
ResultERR358546
ResultERR358546
ResultERR358546

替換s:_.*$::替換下劃線後的所有內容。移植到您的原始命令:

time parallel ' 
 singularity exec -B "$PWD" /usr/local/CRISPRCasFinder-release-4.2.20/CrisprCasFinder.simg \
 perl /usr/local/CRISPRCasFinder/CRISPRCasFinder.pl \
 -so /usr/local/CRISPRCasFinder/sel392v2.so \
 -cf /usr/local/CRISPRCasFinder/CasFinder-2.0.3 \
 -drpt /usr/local/CRISPRCasFinder/supplementary_files/repeatDirection.tsv \
 -rpts /usr/local/CRISPRCasFinder/supplementary_files/Repeat_List.csv \
 -cas -def G --meta \
 -out /databis/defontis/Dossier_fasta_chrm_avec_CRISPRCasFinder/Test/Result'{= s:_.*$:: =}' \
 -in /databis/defontis/Dossier_fasta_chrm_avec_CRISPRCasFinder/Test/{}
' ::: *_chrm.part*.fasta

如果要擷取並包含零件索引,可以將表達式修改為

Result'{= s:_chrm\.part-(\d+)\.fasta$:_$1: =}'

或者

'{= s:_chrm\.part-(\d+)\.fasta$:Result_$1: =}'

例如。

引用自:https://unix.stackexchange.com/questions/653664