Text-Processing
解析 HTTP 訪問日誌,以便在一秒內獲得所有 429 響應的請求
來自 nginx 的典型 access.log 文件
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 1157 "data..." 000.00.000.002 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 200 741 "-" "data..." 000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..." 000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
問題是我如何從日誌文件中獲取所有 IP 地址,這些 IP 地址具有響應程式碼 429 並且任何時間都在一秒內生成。我正在嘗試使用 awk 找到解決方案,但如果有人能給出提示,還沒有成功。給定範例的輸出將是:
28/Jun/2021:06:37:02: 000.00.000.001 28/Jun/2021:06:37:03: 000.00.000.003
- 僅發出大於或等於 5 個請求的 IP
- 有響應狀態 429
- 如果在任何時間而不是特定秒內有響應顯示,則按秒分組
這是你想要做的嗎?
$ awk -F'[[ ]+' '$9==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}' 28/Jun/2021:06:37:02: 000.00.000.001 28/Jun/2021:06:37:03: 000.00.000.003
如果第一組引號(例如
"POST /abc/cba/ HTTP/1.1"
)中的內容並不總是像您的範例輸入中那樣由 3 個空格分隔的字元串,那麼只需將其調整為:$ awk -F'[[ ]+' '{sub(/"[^"]*"/,"")} $6==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}' 28/Jun/2021:06:37:02: 000.00.000.001 28/Jun/2021:06:37:03: 000.00.000.003
如果您出於某種原因更喜歡僅 awk 的解決方案:
$ awk -F'[[ ]+' '$9==429{cnt[$4":\n"$1]++} END{for (key in cnt) if (cnt[key]>4) print key}' file 28/Jun/2021:06:37:02: 000.00.000.001 28/Jun/2021:06:37:03: 000.00.000.003
上述所有腳本都可以在每個 Unix 機器上的任何 shell 中使用強制性 POSIX 工具。