Bash

csplit多個文件為多個文件

  • January 5, 2020

伙計們-

我有點難過,關於這個。我正在嘗試編寫一個 bash 腳本,該腳本將使用 csplit 獲取多個輸入文件並根據相同的模式拆分它們。(對於上下文:我有多個帶有問題的 TeX 文件,由 \question 命令分隔。我想將每個問題提取到他們自己的文件中。)

我到目前為止的程式碼:

#!/bin/bash
# This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files.
# This line is for the user to input the name of the file they need questions split from.

read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files

read -ep "Type the directory where you would like to save the split files: " save

read -ep "What unit do these questions belong to?" unit

# This is a check for the user to confirm the file list, and proceed if true:

echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel."
select ynf in "Yes" "No"; do
   case $ynf in 
       No ) exit;;
       Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here."
           select ynd in "Yes" "No"; do
           case $ynd in
               Yes )
#                   This line will create a loop to conduct the script over all the files in the list.
                   for i in ${files[@]}
                   do
#                   Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly.
#                   csplit is the utility used here; run "man csplit" to learn more of its functionality.
#                   the structure is "csplit [name of file] [output options] [search filter] [separator(s)].
#                   this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically).
#                   the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood.
                       csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
                   done; exit;;
               No ) exit;;
           esac
       done
   esac
done

return

我可以確認它確實按照我對我擁有的輸入文件的預期執行循環。但是,我注意到的行為是它會按預期將第一個文件拆分為“q1.tex q2.tex q3.tex”,並且當它移動到列表中的下一個文件時,它將拆分問題並覆蓋舊文件,第三個文件將覆蓋第二個文件的拆分等。我想要發生的是,比如說,如果 File1 有 3 個問題,它將輸出:

q1.tex
q2.tex
q3.tex

然後如果 File2 有 4 個問題,它將繼續遞增到:

q4.tex
q5.tex
q6.tex
q7.tex

csplit 有沒有辦法檢測在這個循環中已經完成的編號,並適當地增加?

感謝您提供的任何幫助!

csplit命令沒有保存的上下文(也不應該),因此它總是從 1 開始計數。沒有辦法解決這個問題,但您可以維護自己的插入到前綴字元串中的計數值。

或者,嘗試更換

read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files

...

for i in ${files[@]}
do
   csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
done

read -a files -ep 'Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. '

...

cat "${files[@]}" | csplit - --prefix="$save/${unit}q" --suffix-format='%03d.tex' '/\\question/' '{*}'

這是相對罕見的情況之一,其中確實需要使用cat {file} | ...ascsplit只需要一個文件參數(或-用於stdin)。

我已將您的read操作更改為使用數組變數,因為這就是您(正確地)嘗試在for ... do csplit ...循環中使用的內容。

無論您最終決定做什麼,我都強烈建議您在使用它們的所有變數處雙引號,特別是對數組列表的任何進一步使用,例如"${files[@]}".

引用自:https://unix.stackexchange.com/questions/560139