處理大量文件（300k+）以收集結果的更有效方法？

April 19, 2018

我有一個名為fields.txt並包含L=300k+如下行的文件：
field1 field2 field3
field1 field2 field3
field1 field2 field3
... 
field1 field2 field3
在同一個文件夾中，我的N文件只包含一個字元串（讓我們辨識它s(n)）和命名res-0-n-0的，在和n之間的任何地方。然而。0``L``N < L
我使用命令生成了res_numbers_sorted.tmp包含上述數字排序列表的文件n（不確定是否最有效，但它相當快，我需要將此排序用於其他目的）
find -maxdepth 1 -type f -name "res-0-*" | sort -t'-' -k3 -n | awk -F'-' '{print $3}'&gt;| res_numbers_sorted.tmp
該文件res_numbers_sorted.tmp如下所示：
0
1
8
... 
299963
最後，我想要的是一個名為的文件results.txt：
field1 field2 field3 s(0)
field1 field2 field3 s(1)
field1 field2 field3
...
field1 field2 field3 s(299963) 
...
field1 field2 field3
其中又s(n)是第 n 個中包含的字元串res-0-n-0。
我首先通過cp fields.txt resutls.txt以下while循環實現了我想要的東西：
while IFS='' read -r line; do 
    #storing the content of the file in a variable
    res=$(&lt;res-0-"$line"-0)     
    # this is needed in order to take into account that sed addresses the first line of a file with the number 1 whereas the file list starts with 0
    real_line=$(( line + 1 ))     
    sed -i "${real_line}s/.$/ ${res}/" field.txt
done &lt; res_numbers_sorted.tmp
但是，這非常慢，我需要執行幾次。我懷疑這可能sed不是這項工作的正確工具。

如果我理解正確，您有一個fields.txt包含多行的文件。你有幾個res-0-n-0文件。並且，對於您中的每一行，將它們與文件的內容（如果存在）fields.txt一起複製。results.txt``res-0-<line_number>
fields.txt我認為您可以簡單地逐行讀取文件，並根據需要在文件results.txt內容中回顯該行。res-0-<line_number>
我會用這樣的東西：
#! /bin/sh

LINE_NUMBER=0
while read line;
do
 if [ -f "res-0-$LINE_NUMBER-0" ]
 then
   echo "$line $(cat res-0-$LINE_NUMBER-0)" &gt;&gt; result.txt
 else
   echo "$line" &gt;&gt; result.txt
 fi
 ((LINE_NUMBER++))
done &lt; fields.txt

引用自：https://unix.stackexchange.com/questions/438672

處理大量文件（300k+）以收集結果的更有效方法？

相關問答

如何從文本文件中刪除視覺上的空行？

bash 將行轉換為列

用變數替換文件內容中匹配的正則表達式

sed 命令修復連接兩個字元串文字

複製即將到來的行的最後一列

大型單行文件上的基本 sed 命令：無法重新分配記憶體