Bash
csplit多個文件為多個文件
伙計們-
我有點難過,關於這個。我正在嘗試編寫一個 bash 腳本,該腳本將使用 csplit 獲取多個輸入文件並根據相同的模式拆分它們。(對於上下文:我有多個帶有問題的 TeX 文件,由 \question 命令分隔。我想將每個問題提取到他們自己的文件中。)
我到目前為止的程式碼:
#!/bin/bash # This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files. # This line is for the user to input the name of the file they need questions split from. read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files read -ep "Type the directory where you would like to save the split files: " save read -ep "What unit do these questions belong to?" unit # This is a check for the user to confirm the file list, and proceed if true: echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel." select ynf in "Yes" "No"; do case $ynf in No ) exit;; Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here." select ynd in "Yes" "No"; do case $ynd in Yes ) # This line will create a loop to conduct the script over all the files in the list. for i in ${files[@]} do # Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly. # csplit is the utility used here; run "man csplit" to learn more of its functionality. # the structure is "csplit [name of file] [output options] [search filter] [separator(s)]. # this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically). # the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood. csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}' done; exit;; No ) exit;; esac done esac done return
我可以確認它確實按照我對我擁有的輸入文件的預期執行循環。但是,我注意到的行為是它會按預期將第一個文件拆分為“q1.tex q2.tex q3.tex”,並且當它移動到列表中的下一個文件時,它將拆分問題並覆蓋舊文件,第三個文件將覆蓋第二個文件的拆分等。我想要發生的是,比如說,如果 File1 有 3 個問題,它將輸出:
q1.tex q2.tex q3.tex
然後如果 File2 有 4 個問題,它將繼續遞增到:
q4.tex q5.tex q6.tex q7.tex
csplit 有沒有辦法檢測在這個循環中已經完成的編號,並適當地增加?
感謝您提供的任何幫助!
該
csplit
命令沒有保存的上下文(也不應該),因此它總是從 1 開始計數。沒有辦法解決這個問題,但您可以維護自己的插入到前綴字元串中的計數值。或者,嘗試更換
read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files ... for i in ${files[@]} do csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}' done
和
read -a files -ep 'Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. ' ... cat "${files[@]}" | csplit - --prefix="$save/${unit}q" --suffix-format='%03d.tex' '/\\question/' '{*}'
這是相對罕見的情況之一,其中確實需要使用
cat {file} | ...
ascsplit
只需要一個文件參數(或-
用於stdin)。我已將您的
read
操作更改為使用數組變數,因為這就是您(正確地)嘗試在for ... do csplit ...
循環中使用的內容。無論您最終決定做什麼,我都強烈建議您在使用它們的所有變數處雙引號,特別是對數組列表的任何進一步使用,例如
"${files[@]}"
.