Text-Processing

使用 awk 計算具有不同列數的每一行的平均值

  • October 10, 2021

是否可以awk用來計算每行的平均值(每行有不同的列)。我有一個如下文件,第一列是名稱,我喜歡計算每一行的平均值並將結果列印在輸入文件的最後一列:

輸入文件( data1.csv):

EMPLOYEE1,0.395314,0.384513,,
EMPLOYEE2,5.4908,5.2921,,
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486
EMPLOYEE9,33.5195,31.9736,33.6779,31.742

期望的輸出:

EMPLOYEE1,0.395314,0.384513,,,0.3899135
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91E-06
EMPLOYEE7,3.72E-06,3.87E-06,3.94E-06,3.72E-06,3.82E-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282

我嘗試 awk了以下方法,但它不會計算列小於最大 NF 的行的平均值。

awk  -F',' '{ s = 0; for (i = 2; i <= NF; i++) s += $i; print $1, (NF > 1) ? s / (NF - 1) : 0; }'  data1.csv

awk -F','  '{sum=0; for (i=2;i<=NF;i++)sum+=$i; print $0,sum/(NF-1)}'  data1.csv

但我的程式碼不會改變 NF 行。是否可以為每一行更改 NF 並獲得所需的輸出?

這是一種方法:

$ awk -F',' -v OFS=',' '{ 
       s=0; 
       numFields=0; 
       for(i=2; i<=NF;i++){ 
           if(length($i)){ 
               s+=$i; 
               numFields++
           } 
       } 
       print $0, (numFields ? s/numFields : 0)}' data1.csv 
EMPLOYEE1,0.395314,0.384513,,,0.389914
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282

請注意, awk 列印0.389914結果0.779827/2意味著第一行的平均值將為0.389914和 not 0.389915。這是因為 awk 將四捨五入到最接近的偶數,並且它的預設列印模式(由OFMT變數控制)是%0.6g. 如果您需要更高的準確性,您可以執行以下操作:

$ awk -F',' -v OFS=',' -v OFMT='%0.7g' '{ 
       s=0; 
       numFields=0; 
       for(i=2; i<=NF;i++){ 
           if(length($i)){ 
               s+=$i; 
               numFields++
           } 
       } 
       print $0, (numFields ? s/numFields : 0)}' data1.csv 
EMPLOYEE1,0.395314,0.384513,,,0.3899135
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.72825

引用自:https://unix.stackexchange.com/questions/672506