從多個文件中平均第 n 行到一個平均主文件

January 30, 2022

我有 3 個文件，其中包含 8 行數值和文本。我試圖取所有三個文件中每一行的平均值，並用這些平均值列印一個新文件。下面是三個範例文件，都是同名格式 testfile1.1, testfile1.2, testfile1.3
測試文件1.1
1
2048
8
5
5
4
9
Lat:1
測試文件1.2
1
2048
10
7
7
4
9
Lat:1
測試文件1.3
1
2048
3
6
3
4
6
Lat:7
我希望輸出文件如下所示（取平均值後）
平均文件1
1
2048
7
6
5
4
8
Lat:3
希望這對我正在嘗試做的事情有意義！
我嘗試過使用 awk、sed 的不同組合，它們對 3-4 行數據執行良好，但我的實際數據在 40 多個文件名中有 2000 多行
編輯：所以我能夠理解如何控制我想要列印的 sig figs 以及如何編輯正則表達式以更好地匹配浮動小數。
（請讓我知道我是否應該將此作為另一個問題並刪除此問題！）。
我的實際數據有很多其他行，其中包含文本以及我想要取平均值的值。我試圖創建額外的字元串，但後來變得更加困惑。在我的真實文件中，在某些行上，我想要不同的命令，例如從行中列印文本，對實際數據取平均值，複製文本的行和數據的平均值以及日期和時間的平均值。
下面是 2 個文件的副本（每行都有我想對它們做的評論）。
豆腐1.1
ABCDEFGH #print text into output file (same on both files)
1     # Take average of values across all the files in this line
2048  # Take average of values across all the files in this line
8     # Take average of values across all the files in this line
5     # Take average of values across all the files in this line
5     # Take average of values across all the files in this line
4     # Take average of values across all the files in this line
9.5   # Take average of values across all the files in this line
1     # Take average of values across all the files in this line
90.00  # Check and make sure value in this line across print if same
Sprite # check and see if text is same across all values and print if same
cats10   # check and see if text is same across all values and print if same
07/02/20 # See below for explantion on next 3 lines
08:32
08:32
290.000000 # average across all 3 files on this line
10.750000 # average across all 3 files on this line
SCANS23   # output should be SCANS "average of values"
INT_TIME57500 # output should be INT_TIME with sum of all values
SITE northpole   #Check if all lines are same if so print line
LONGITUDE -147.850037  # Output should be LONGITUDE%f
LATITUDE 64.859375     # Output should be LONGITUDE%f
第 13 行是數據的來源日期，第 14 行是開始時間和結束時間。可能使用某種日期到十進制命令..有沒有辦法取日期的平均值？如果一個數據是在 2020 年 7 月 2 日獲取的，而另一個數據是在 2018 年 7 月 2 日獲取的，那麼輸出可以是 19 年 7 月 2 日嗎？時間的平均值也會被考慮在內。
豆腐1.2
ABCDEFGH #print text into output file (same on both files)
1     # Take average of values across all the files in this line
2048  # Take average of values across all the files in this line
10    # Take average of values across all the files in this line
7     # Take average of values across all the files in this line
7     # Take average of values across all the files in this line
4     # Take average of values across all the files in this line
8   # Take average of values across all the files in this line
1     # Take average of values across all the files in this line
90.00  # Check and make sure value in this line across print if same
Sprite # check and see if text is same across all values and print if same
cats10   # check and see if text is same across all values and print if same
07/02/20 # See below for explanation on next 3 lines
08:32
08:32
290.000000 # average across all 3 files on this line
10.750000 # average across all 3 files on this line
SCANS23   # output should be SCANS "average of values"
INT_TIME57500 # output should be INT_TIME with sum of all values
SITE northpole   #Check if all lines are same if so print line
LONGITUDE -147.850037  # Output should be LONGITUDE%f
LATITUDE 64.859375     # Output should be LONGITUDE%f
我厭倦了嘗試在腳本中包含多個字元串起始值的嘗試，但很快就變得非常混亂。
awk -F: '
 FNR==1     { c++ };
 /^LATITUDE/    { a[FNR] += $6 };
 /^SCANS/    { a[FNR] += $2 };
 /^[+-]?([0-9]*[.])?[0-9]+$/ { a[FNR] += $1 };

 END {
   for (i in a) {
     printf (i==22 ? "LATITUDE%f": i==18 ? "SCANS%2.3f": "%f") "\n", a[i] / c
   }
 }' tofu1.* &gt; askforhelp
這給了我
$ more askforhelp

90.000000
LATITUDE0.000000
290.000000
10.750000
SCANS0.000
1.000000
2048.000000
6.333333
4.666667
5.000000
4.000000
7.833333
2.666667
我還嘗試一次添加多個文本字元串，當我完全沒有從這次嘗試中得到輸出時，我感到非常困惑。
awk -F: '
 FNR==1     { c++ };
 /^LATITUDE/    { a[FNR] += $6 };
 /^LONGITUDE/    { a[FNR] += $5 };
 /^SITE/    { a[FNR] += $4 };
 /^INT_TIME/    { a[FNR] += $3 };
 /^SCANS/    { a[FNR] += $2 };
 /^[+-]?([0-9]*[.])?[0-9]+$/ { a[FNR] += $1 };

 END {
   for (i in a) {
     printf (i==22 ? "LATITUDE%f": 
             i==21 ? "LONGITUDE%2.3f": 
             i==20 ? "SITE%2.3f": 
             i==19 ? "INT_TIME%2.3f": 
             i==18 ? "SCANS%2.3f": "%f") "\n", a[i] / c 
   }
 }' /home/lmdjeu/test/test1.* &gt; /home/lmdjeu/test/askforhelp

$ awk -F: '
 FNR==1     { c++ };
 /^Lat:/    { a[FNR] += $2 };
 /^[0-9]+$/ { a[FNR] += $1 };

 END {
   for (i in a) {
     printf (i==8 ? "Lat:%i" : "%i") "\n", a[i] / c
   }
 }' Testfile1.* &gt; Averagefile1

$ cat Averagefile1
1
2048
7
6
5
4
8
Lat:3
這使用變數c來計算它已讀入的文件數。c每當讀取文件的第一行 ( FNR==1) 時都會遞增。 FNR是 awk 自動設置的輸入記錄（行號）計數器，每次讀取輸入文件時都會重置。
它還使用一個數組a來儲存每行輸入的累積和 -FNR用作數組的索引。如果該行僅包含數字，則將該行的第一個（也是唯一的欄位）添加到該行的數組元素中。如果它以 string 開頭Lat:，則添加第二個欄位。
一旦讀取並處理了所有輸入文件，就執行 END 塊。這會遍歷數組，列印每個元素的總和除以文件數。除了第 8 行之外的所有內容都僅列印為整數。
對於第 8 行，整數以字元串為前綴Lat:。該腳本為此使用 awk 的三元運算符：condition ? result_if_true : result_if_false

引用自：https://unix.stackexchange.com/questions/688277

從多個文件中平均第 n 行到一個平均主文件

相關問答

用結果替換數學方程

如何對 1000 行文件中的每 20 行進行排序，並僅將每個間隔中具有最高值的排序行保存到另一個文件中？

從一個巨大的日誌文件中僅提取 GPS 位置到以日期標記命名的新文件

是否有儲存可執行依賴項的標準位置？

命令行表達式求解器？

您如何鍵入帶有條形符號的數字？