如何並行執行這個 bash 腳本？

March 25, 2017

我需要這個更有效率
現在根據線路最多需要 20 小時（這些是相當大的 MCS 數據集）。
將大數據文件拆分為“鏡頭”
創建要在 for 循環中使用的每個鏡頭名稱的列表
循環遍歷每個鏡頭並執行相同的過程
將每個鏡頭附加到一個新的數據文件中，以便您在之前有相同的行 aa，但已處理。在這種情況下，我重複過濾數據，這就是為什麼我認為這可以並行執行。
您可以忽略所有 SU 命令以及 for 循環中的所有內容，我只需要知道如何並行執行它（比如 32 個節點）。這對我來說是一個相對較新的話題，因此將不勝感激！
腳本：
#! /bin/bash    
# Split the input file into one file for each shot. NB mustclose each o/p file at the earliest opportunity otherwise it will crash!
susplit &lt;$1 key=fldr stem=fldr_ verbose=1 close=1

# Create a list of shot files
ls fldr* &gt; LIST

# Loop over each shot file; suppress direct wave; write to new concatenated output file
for i in `cat LIST`; do
   echo $i
   suchw key1=tstat key2=tstat a=200 &lt; $i | suwind key=tracf min=10 max=400 tmin=0 tmax=6 | suweight a=0 | suresamp rf=4 | sustatic hdrs=1 sign=-1 | sureduce rv=1.52 | sumedian median=1 xshift=0 tshift=0 nmed=41 | suflip flip=3 | sureduce rv=1.52 | suflip flip=3 | suresamp rf=0.25 | suweight inv=1 a=0 | sustatic hdrs=1 sign=1 &gt;&gt; $2
done

# Tidy up files by removing single shot gathers and LIST
rm -f fldr* LIST &

我認為這是for您想要並行化的循環：
#! /bin/bash    
# Split the input file into one file for each shot. NB mustclose each o/p file at the earliest opportunity otherwise it will crash!
susplit &lt;$1 key=fldr stem=fldr_ verbose=1 close=1

sucit() {
   i=$1
   echo $i
   suchw key1=tstat key2=tstat a=200 &lt; $i | suwind key=tracf min=10 max=400 tmin=0 tmax=6 | suweight a=0 | suresamp rf=4 | sustatic hdrs=1 sign=-1 | sureduce rv=1.52 | sumedian median=1 xshift=0 tshift=0 nmed=41 | suflip flip=3 | sureduce rv=1.52 | suflip flip=3 | suresamp rf=0.25 | suweight inv=1 a=0 | sustatic hdrs=1 sign=1
}
export -f sucit

parallel sucit ::: fldr* &gt; $2

# Tidy up files by removing single shot gathers and LIST
rm -f fldr* LIST &
取決於susplit你可以讓它更快。如果“large_data_file”中的一個鏡頭以開頭<shot>\n和結尾，</shot>\n那麼這樣的事情可能會起作用：
sucpipe() {
   suchw key1=tstat key2=tstat a=200 | suwind key=tracf min=10 max=400 tmin=0 tmax=6 | suweight a=0 | suresamp rf=4 | sustatic hdrs=1 sign=-1 | sureduce rv=1.52 | sumedian median=1 xshift=0 tshift=0 nmed=41 | suflip flip=3 | sureduce rv=1.52 | suflip flip=3 | suresamp rf=0.25 | suweight inv=1 a=0 | sustatic hdrs=1 sign=1
}
export -f sucpipe

parallel --block -1 --recstart '&lt;shot&gt;\n' --recend '&lt;/shot&gt;\n' --pipepart -a $1 sucpipe &gt; $2
它將嘗試將大文件拆分為 n 個塊，其中 n=核心數。拆分是即時完成的，因此它不會先寫入臨時文件。然後 GNU Parallel 會將每個塊傳遞給一個 sucpipe。
如果 bigfile 是二進制（即不是文本），標頭為 3200 字節，記錄長度為 1000 字節，那麼這可能有效：
parallel -a bigfile  --pipepart --recend '' --block 1000 --header '.{3200}' ...
有關更多詳細資訊，請瀏覽本教程：man parallel_tutorial您的命令行會因此而愛上您。

引用自：https://unix.stackexchange.com/questions/353379

如何並行執行這個 bash 腳本？

相關問答

Bash後台執行不返回

如何並行執行 PDF 到 TIFF 的轉換？

如何在多個實例中執行腳本？（Ubuntu伺服器）

終端命令遵循另一個終端命令的生命週期

如何在許多大文件中查找重複行？

當文件有單（長）行時使用 GNU Parallel