Text-Processing
使用 awk 計算具有不同列數的每一行的平均值
是否可以
awk
用來計算每行的平均值(每行有不同的列)。我有一個如下文件,第一列是名稱,我喜歡計算每一行的平均值並將結果列印在輸入文件的最後一列:輸入文件(
data1.csv
):EMPLOYEE1,0.395314,0.384513,, EMPLOYEE2,5.4908,5.2921,, EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931 EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163 EMPLOYEE5,1.4816,1.4367,1.4854,1.4353 EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06 EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06 EMPLOYEE8,0.699498,0.688892,0.704256,0.683486 EMPLOYEE9,33.5195,31.9736,33.6779,31.742
期望的輸出:
EMPLOYEE1,0.395314,0.384513,,,0.3899135 EMPLOYEE2,5.4908,5.2921,,,5.39145 EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086 EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855 EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975 EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91E-06 EMPLOYEE7,3.72E-06,3.87E-06,3.94E-06,3.72E-06,3.82E-06 EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033 EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282
我嘗試
awk
了以下方法,但它不會計算列小於最大 NF 的行的平均值。awk -F',' '{ s = 0; for (i = 2; i <= NF; i++) s += $i; print $1, (NF > 1) ? s / (NF - 1) : 0; }' data1.csv
和
awk -F',' '{sum=0; for (i=2;i<=NF;i++)sum+=$i; print $0,sum/(NF-1)}' data1.csv
但我的程式碼不會改變 NF 行。是否可以為每一行更改 NF 並獲得所需的輸出?
這是一種方法:
$ awk -F',' -v OFS=',' '{ s=0; numFields=0; for(i=2; i<=NF;i++){ if(length($i)){ s+=$i; numFields++ } } print $0, (numFields ? s/numFields : 0)}' data1.csv EMPLOYEE1,0.395314,0.384513,,,0.389914 EMPLOYEE2,5.4908,5.2921,,,5.39145 EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086 EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855 EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975 EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06 EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06 EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033 EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282
請注意, awk 列印
0.389914
結果0.779827/2
意味著第一行的平均值將為0.389914
和 not0.389915
。這是因為 awk 將四捨五入到最接近的偶數,並且它的預設列印模式(由OFMT
變數控制)是%0.6g
. 如果您需要更高的準確性,您可以執行以下操作:$ awk -F',' -v OFS=',' -v OFMT='%0.7g' '{ s=0; numFields=0; for(i=2; i<=NF;i++){ if(length($i)){ s+=$i; numFields++ } } print $0, (numFields ? s/numFields : 0)}' data1.csv EMPLOYEE1,0.395314,0.384513,,,0.3899135 EMPLOYEE2,5.4908,5.2921,,,5.39145 EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086 EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855 EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975 EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06 EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06 EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033 EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.72825