Shell
任何使這個命令鏈更短或更好的方法
我正在使用這個命令鏈來過濾掉機器人/爬蟲流量並禁止 IP 地址。有什麼辦法可以讓這個命令鏈更有效率嗎?
sudo awk -F' - |\\"' '{print $1, $7}' access.log | grep -i -E 'bot|crawler' | grep -i -v -E 'google|yahoo|bing|msn|ask|aol|duckduckgo' | awk '{system("sudo ufw deny from "$1" to any")}'
這是我正在解析的範例日誌文件。預設的 apache2 access.log
173.239.53.9 - - [09/Oct/2019:01:52:39 +0000] "GET /robots.txt HTTP/1.1" 200 3955 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FSL 7.0.6.01001)" 46.229.168.143 - - [09/Oct/2019:01:54:56 +0000] "GET /robots.txt HTTP/1.1" 200 4084 "-" "Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)" 157.55.39.20 - - [09/Oct/2019:01:56:10 +0000] "GET /robots.txt HTTP/1.1" 200 3918 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 65.132.59.34 - - [09/Oct/2019:01:56:53 +0000] "GET /robots.txt HTTP/1.1" 200 4150 "-" "Gigabot (1.1 1.2)" 198.204.244.90 - - [09/Oct/2019:01:58:23 +0000] "GET /robots.txt HTTP/1.1" 200 4480 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" 192.151.157.210 - - [09/Oct/2019:02:03:41 +0000] "GET /robots.txt HTTP/1.1" 200 4480 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)" 93.158.161.112 - - [09/Oct/2019:02:09:35 +0000] "GET /neighborhood/ballard/robots.txt HTTP/1.1" 404 31379 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 203.133.169.54 - - [09/Oct/2019:02:09:43 +0000] "GET /robots.txt HTTP/1.1" 200 4281 "-" "Mozilla/5.0 (compatible; Daum/4.1; +http://cs.daum.net/faq/15/4118.html?faqId=28966)"
謝謝
使用單個
awk
命令:awk -F' - |\"' 'tolower($7) ~ /bot|crawler/ && tolower($7) !~ /google|yahoo|bing|msn|ask|aol|duckduckgo/{system("sudo ufw deny from "$1" to any")}' access.log
這將僅過濾掉具有
bot
或crawler
在第 7 列中的條目(您的第一個grep
命令執行的操作。 僅當第 7 列不包含google|yahoo|bing|msn|ask|aol|duckduckgo
(您的第二個grep
命令執行的操作)時。任何匹配的行都將sudo ufw deny from "$1" to any
在它的第一列上執行。(您的最終awk
命令確實如此)。