Text-Processing
將文件拆分成 N 個同名但目標目錄不同的文件
我想將
sourcefile.txt
包含 10000 行的(每天增加)拆分為 30 個相等的文件。我有呼叫prog1
的目錄prog30
,我想將文件拆分保存到具有相同文件名的這些目錄中。例如/prog1/myfile.txt
,/prog2/myfile.txt
對/prog30/myfile.txt
。
divide.sh
這是我的名為在prog
目錄中執行的bash 腳本#!/bin/bash programpath=/home/mywebsite/project/a1/ array=/prog1/ totalline=$(wc -l < ./sourcefile.txt) divide="$(( $totalline / 30 ))" split --lines=$divide $./prog1/myfile.txt exit 1 fi
#!/bin/bash # assuming the file is in the same folder as the script INPUT=large_file.txt # assuming the folder called "output" is in the same folder # as the script and there are folders that have the patter # prog01 prog02 ... prog30 # create that with mkdir output/prog{01..30} OUTPUT_FOLDER=output OUTPUT_FILE_FORMAT=myfile # split # -n -> 30 files # $OUTPUT_FILE_FORMAT -> should start with this pattern # --numeric-suffixes=1 -> end of file name should start from 01 split -n 30 $INPUT $OUTPUT_FILE_FORMAT --numeric-suffixes=1 # move all files to their repective directories for i in {01..30} do mv $OUTPUT_FILE_FORMAT$i $OUTPUT_FOLDER/prog$i/myfile.txt done echo "done :)" exit
split 命令對於這項任務來說綽綽有餘。但是,這裡的解決方案要求您讓您的文件夾名稱開始於
prog01
而不是prog1
awk
唯一的解決方案(這裡的N等於 30 個文件):awk 'BEGIN{ cmd="wc -l <sourcefile.txt"; cmd|getline l; l=int((l+29)/30); close(cmd) } NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' sourcefile.txt
或者讓 shell 執行並返回sourcefile.txt中的行數並按照jthill
awk
的建議傳遞給。awk 'NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' l=$(( ($(wc -l <sourcefile.txt)+29)/30 )) sourcefile.txt