Shell

任何使這個命令鏈更短或更好的方法

  • October 9, 2019

我正在使用這個命令鏈來過濾掉機器人/爬蟲流量並禁止 IP 地址。有什麼辦法可以讓這個命令鏈更有效率嗎?

sudo awk -F' - |\\"' '{print $1, $7}' access.log | 
grep -i -E 'bot|crawler' | 
grep -i -v -E 'google|yahoo|bing|msn|ask|aol|duckduckgo' | 
awk '{system("sudo ufw deny from "$1" to any")}'

這是我正在解析的範例日誌文件。預設的 apache2 access.log

173.239.53.9 - - [09/Oct/2019:01:52:39 +0000] "GET /robots.txt HTTP/1.1" 200 3955 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FSL 7.0.6.01001)"
46.229.168.143 - - [09/Oct/2019:01:54:56 +0000] "GET /robots.txt HTTP/1.1" 200 4084 "-" "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)"
157.55.39.20 - - [09/Oct/2019:01:56:10 +0000] "GET /robots.txt HTTP/1.1" 200 3918 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
65.132.59.34 - - [09/Oct/2019:01:56:53 +0000] "GET /robots.txt HTTP/1.1" 200 4150 "-" "Gigabot (1.1 1.2)"
198.204.244.90 - - [09/Oct/2019:01:58:23 +0000] "GET /robots.txt HTTP/1.1" 200 4480 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
192.151.157.210 - - [09/Oct/2019:02:03:41 +0000] "GET /robots.txt HTTP/1.1" 200 4480 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)"
93.158.161.112 - - [09/Oct/2019:02:09:35 +0000] "GET /neighborhood/ballard/robots.txt HTTP/1.1" 404 31379 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
203.133.169.54 - - [09/Oct/2019:02:09:43 +0000] "GET /robots.txt HTTP/1.1" 200 4281 "-" "Mozilla/5.0 (compatible; Daum/4.1; +http://cs.daum.net/faq/15/4118.html?faqId=28966)"

謝謝

使用單個awk命令:

awk -F' - |\"' 'tolower($7) ~ /bot|crawler/ && tolower($7) !~ /google|yahoo|bing|msn|ask|aol|duckduckgo/{system("sudo ufw deny from "$1" to any")}' access.log

這將僅過濾掉具有botcrawler在第 7 列中的條目(您的第一個grep命令執行的操作。 僅當第 7 列包含google|yahoo|bing|msn|ask|aol|duckduckgo(您的第二個grep命令執行的操作)時。任何匹配的行都將sudo ufw deny from "$1" to any在它的第一列上執行。(您的最終awk命令確實如此)。

引用自:https://unix.stackexchange.com/questions/545954