出現N個模式後如何將文件拆分為多個文件？

May 22, 2021

我在 Linux 上有一個文件，其中包含數千個分子的座標。每個分子都以包含始終相同模式的行開頭：
@&lt;TRIPOS&gt;MOLECULE
然後繼續其他行。我想將文件分成多個文件，每個文件都包含一定數量的分子。最簡單的方法是什麼？

一種方法是使用awk：
awk -v moleculesNum=7 '
/^@&lt;TRIPOS&gt;MOLECULE/{
   if((++num)%moleculesNum==1){
       close(outfile); outfile="file" (++Output)
   }
}
{ print &gt;outfile }' infile
這會將原始文件拆分為多個文件，每個文件最多包含 7 個 MOLECULE（可在moleculesNum=7參數中調整）

以下是一個bash基於csplit實用程序的方法：

### user customization section
tmpdir=$(mktemp -d)
prefix='outfile'
bunch=5
pat='@&lt;TRIPOS&gt;MOLECULE'

## break up the input file on pattern
csplit ./file \
 --silent \
 --elide-empty-files \
 --prefix "$tmpdir/$prefix" \
 --suffix-format='%d.tmp' \
 "/$pat/+1" '{*}' \
;

## coalesce the split up files into bunches
i=0
while :; do
 start=$(( bunch * i ))
 stop=$(( start + bunch - 1 ))
 for ((j=start; j&lt;=stop; j++)) {
   printf '%s\n' "$tmpdir/$prefix$j.tmp"
 } | xargs cat &gt; "./$prefix.$i" 2&gt;/dev/null || break
 (( i++ ))
done

目前目錄將保存 outfiles.* 束。

引用自：https://unix.stackexchange.com/questions/650739

出現N個模式後如何將文件拆分為多個文件？

相關問答

Bash - 將縮進的程式碼塊提取到新文件中

塊的文本處理行到列

將文件一分為二

僅刪除單引號中的逗號

bash 將行轉換為列

刪除文件中所有計數少於 5 次的單詞