在 bash 中進行計算的有效方法

August 7, 2020

我正在嘗試計算一個充滿數字的文件（1 列）的幾何平均值。
幾何平均值的基本公式是所有值的自然對數（或對數）的平均值，然後將 e（或以 10 為底）提高到該值。
我目前的僅 bash 腳本如下所示：
# Geometric Mean
count=0;
total=0; 

for i in $( awk '{ print $1; }' input.txt )
 do
   if (( $(echo " "$i" &gt; "0" " | bc -l) )); then
       total="$(echo " "$total" + l("$i") " | bc -l )"
       ((count++))
   else
     total="$total"
   fi
 done

Geometric_Mean="$( printf "%.2f" "$(echo "scale=3; e( "$total" / "$count" )" | bc -l )" )"
echo "$Geometric_Mean"
本質上：
檢查輸入文件中的每個條目以確保它大於 0 每次呼叫 bc
如果條目 > 0，我取該值的自然對數 (l) 並將其添加到每次呼叫 bc 的執行總數中
如果條目 <=0，我什麼也不做
計算幾何平均值
這對於小型數據集非常有效。不幸的是，我試圖在大型數據集上使用它（input.txt 有 250,000 個值）。雖然我相信這最終會奏效，但速度非常慢。我從來沒有足夠的耐心讓它完成（45 分鐘以上）。
我需要一種更有效地處理此文件的方法。
還有其他方法，例如使用 Python
# Import the library you need for math
import numpy as np

# Open the file
# Load the lines into a list of float objects
# Close the file
infile = open('time_trial.txt', 'r')
x = [float(line) for line in infile.readlines()]
infile.close()

# Define a function called geo_mean
# Use numpy create a variable "a" with the ln of all the values
# Use numpy to EXP() the sum of all of a and divide it by the count of a
# Note ... this will break if you have values &lt;=0
def geo_mean(x):
   a = np.log(x)
   return np.exp(a.sum()/len(a))

print("The Geometric Mean is: ", geo_mean(x))
我想避免使用 Python、Ruby、Perl … 等。
關於如何更有效地編寫我的 bash 腳本的任何建議？

請不要在 shell 中執行此操作。沒有任何調整可以使其遠端高效。Shell 循環很慢，使用 shell 來解析文本只是不好的做法。您的整個腳本可以用這個簡單awk的單行替換，這將更快幾個數量級：
awk 'BEGIN{E = exp(1);} $1&gt;0{tot+=log($1); c++} END{m=tot/c; printf "%.2f\n", E^m}' file
例如，如果我在包含從 1 到 100 的數字的文件上執行它，我會得到：
$ seq 100 &gt; file
$ awk 'BEGIN{E = exp(1);} $1&gt;0{tot+=log($1); c++} END{m=tot/c; printf "%.2f\n", E^m}' file
37.99
在速度方面，我在包含 1 到 10000 的數字的文件上測試了你的 shell 解決方案、python 解決方案和我上面給出的 awk：
## Shell
$ time foo.sh
3677.54

real    1m0.720s
user    0m48.720s
sys     0m24.733s

### Python
$ time foo.py
The Geometric Mean is:  3680.827182220091

real    0m0.149s
user    0m0.121s
sys     0m0.027s


### Awk
$ time awk 'BEGIN{E = exp(1);} $1&gt;0{tot+=log($1); c++} END{m=tot/c; printf "%.2f\n", E^m}' input.txt
3680.83

real    0m0.011s
user    0m0.010s
sys     0m0.001s
如您所見，awk它甚至比 python 更快，而且編寫起來也簡單得多。如果你願意，你也可以把它做成一個“shell”腳本。要麼像這樣：
#!/bin/awk -f

BEGIN{
   E = exp(1);
} 
$1&gt;0{
   tot+=log($1);
   c++;
}

END{
   m=tot/c; printf "%.2f\n", E^m
}
或者通過將命令保存在 shell 腳本中：
#!/bin/sh
awk 'BEGIN{E = exp(1);} $1&gt;0{tot+=log($1); c++;} END{m=tot/c; printf "%.2f\n", E^m}' "$1"

引用自：https://unix.stackexchange.com/questions/570764

在 bash 中進行計算的有效方法

相關問答

egrep 命令對大文件夾沒有響應

CPU 是免費的，但 bash 腳本並未利用所有 CPU 資源

使用 shell 腳本，通過遞增幾條記錄從文件中複製一條記錄 1000 次或 n 次

循環瀏覽具有特定副檔名的文件（並非所有副檔名都可能存在）

清除兩個特定行之間的文本並添加新值

我應該使用 pwd 還是波浪號加號 (~+)？