Text-Processing

將文件拆分成 N 個同名但目標目錄不同的文件

  • April 28, 2020

我想將 sourcefile.txt包含 10000 行的(每天增加)拆分為 30 個相等的文件。我有呼叫prog1的目錄prog30,我想將文件拆分保存到具有相同文件名的這些目錄中。例如/prog1/myfile.txt/prog2/myfile.txt/prog30/myfile.txt

divide.sh這是我的名為在prog目錄中執行的bash 腳本

#!/bin/bash
programpath=/home/mywebsite/project/a1/
array=/prog1/
totalline=$(wc -l < ./sourcefile.txt)   
divide="$(( $totalline / 30 ))"   
split --lines=$divide $./prog1/myfile.txt    
exit 1
fi
#!/bin/bash

# assuming the file is in the same folder as the script
INPUT=large_file.txt
# assuming the folder called "output" is in the same folder
# as the script and there are folders that have the patter
# prog01 prog02 ... prog30
# create that with mkdir output/prog{01..30} 
OUTPUT_FOLDER=output

OUTPUT_FILE_FORMAT=myfile

# split 
# -n -> 30 files
# $OUTPUT_FILE_FORMAT -> should start with this pattern
# --numeric-suffixes=1 -> end of file name should start from 01 
split -n 30 $INPUT $OUTPUT_FILE_FORMAT --numeric-suffixes=1

# move all files to their repective directories
for i in {01..30} 
do
   mv $OUTPUT_FILE_FORMAT$i $OUTPUT_FOLDER/prog$i/myfile.txt
done

echo "done :)"

exit

split 命令對於這項任務來說綽綽有餘。但是,這裡的解決方案要求您讓您的文件夾名稱開始於prog01而不是prog1

awk唯一的解決方案(這裡的N等於 30 個文件):

awk 'BEGIN{ cmd="wc -l <sourcefile.txt"; cmd|getline l; l=int((l+29)/30); close(cmd) } 
   NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' sourcefile.txt

或者讓 shell 執行並返回sourcefile.txt中的行數並按照jthillawk的建議傳遞給。

awk 'NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' 
   l=$(( ($(wc -l <sourcefile.txt)+29)/30 )) sourcefile.txt

引用自:https://unix.stackexchange.com/questions/401387