Bash
BASH 中的多處理/多執行緒
我有一個看起來像這樣的測試文件
5002 2014-11-24 12:59:37.112 2014-11-24 12:59:37.112 0.000 UDP ...... 23.234.22.106 48104 101 0 0 8.8.8.8 53 68.0 1.0 1 0.0 0 68 0 48
每行包含一個源 ip 和目標 ip。這裡,源 ip 是 23.234.22.106,目標 ip 是 8.8.8.8。我正在為每個 ip 地址進行 ip 查找,然後使用
xidel
. 這是腳本。egrep -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" test-data.csv | sort | uniq | while read i #to get network id from arin.net do xidel http://whois.arin.net/rest/ip/$i -e "//table/tbody/tr[3]/td[2] " | sed 's/\/[0-9]\{1,2\}/\n/g' done | sort | uniq | egrep -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" | while read j ############## to get other information from ip-tracker.org do xidel http://www.ip-tracker.org/locator/ip-lookup.php?ip=$j -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[2]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[3]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[4]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[5]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[6]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[7]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[8]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[9]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[10]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[11]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[12]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[13]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[14]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[15]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[16]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[17]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[18]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[19]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[20]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[21]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[22]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[23]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[24]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[25]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[26]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[27]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[28]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[29]" done > abcd
第一個
xidel
用於報廢arin,第二個xidel
用於報廢此first 的輸出
xidel
是網路 ID。ip 查找是根據網路 ID 完成的。第二個的輸出xidel
是這樣的IP Address: 8.8.8.0 [IP Blacklist Check] Reverse DNS:** server can't find 0.8.8.8.in-addr.arpa: SERVFAIL Hostname: 8.8.8.0 IP Lookup Location For IP Address: 8.8.8.0 Continent:North America (NA) Country: United States (US) Capital:Washington State:California City Location:Mountain View Postal:94040 Area:650 Metro:807 ISP:Level 3 Communications Organization:Level 3 Communications AS Number:AS15169 Google Inc. Time Zone: America/Los_Angeles Local Time:10:51:40 Timezone GMT offset:-25200 Sunrise / Sunset:06:26 / 19:48 Extra IP Lookup Finder Info for IP Address: 8.8.8.0 Continent Lat/Lon: 46.07305 / -100.546 Country Lat/Lon: 38 / -98 City Lat/Lon: (37.3845) / (-122.0881) IP Language: English IP Address Speed:Dialup Internet Speed [ Check Internet Speed] IP Currency:United States dollar($) (USD) IDD Code:+1
截至目前,在我的測試文件中有 150 萬行的情況下,完成這項任務需要 6 個小時。這是因為腳本是串列執行的。
有什麼辦法可以劃分這個任務,使腳本並行執行,時間大大減少。對此的任何幫助將不勝感激。
PS:我正在使用具有 1 個處理器和 10 GB RAM 的 VM
根據需要調整 -jXXX%:
PARALLEL=-j200% export PARALLEL arin() { #to get network id from arin.net i="$@" xidel http://whois.arin.net/rest/ip/$i -e "//table/tbody/tr[3]/td[2] " | sed 's/\/[0-9]\{1,2\}/\n/g' } export -f arin iptrac() { # to get other information from ip-tracker.org j="$@" xidel http://www.ip-tracker.org/locator/ip-lookup.php?ip=$j -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[2]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[3]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[4]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[5]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[6]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[7]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[8]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[9]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[10]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[11]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[12]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[13]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[14]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[15]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[16]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[17]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[18]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[19]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[20]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[21]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[22]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[23]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[24]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[25]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[26]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[27]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[28]" -e "//table/tbody/tr[3]/td[2]/table/tbody/tr[29]" } export -f iptrac egrep -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" test-data.csv | sort | uniq | parallel arin | sort | uniq | egrep -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" | parallel iptrac > abcd