Text-Processing

解析 HTTP 訪問日誌,以便在一秒內獲得所有 429 響應的請求

  • July 1, 2021

來自 nginx 的典型 access.log 文件

000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 1157 "data..."
000.00.000.002 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 200 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.001 - - [28/Jun/2021:06:37:02 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.003 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."
000.00.000.004 - - [28/Jun/2021:06:37:03 +0100] "POST /abc/cba/ HTTP/1.1" 429 741 "-" "data..."

問題是我如何從日誌文件中獲取所有 IP 地址,這些 IP 地址具有響應程式碼 429 並且任何時間都在一秒內生成。我正在嘗試使用 awk 找到解決方案,但如果有人能給出提示,還沒有成功。給定範例的輸出將是:

28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003
  1. 僅發出大於或等於 5 個請求的 IP
  2. 有響應狀態 429
  3. 如果在任何時間而不是特定秒內有響應顯示,則按秒分組

這是你想要做的嗎?

$ awk -F'[[ ]+' '$9==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}'
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

如果第一組引號(例如"POST /abc/cba/ HTTP/1.1")中的內容並不總是像您的範例輸入中那樣由 3 個空格分隔的字元串,那麼只需將其調整為:

$ awk -F'[[ ]+' '{sub(/"[^"]*"/,"")} $6==429{print $4, $1}' file | uniq -c | awk '$1>4{print $2 ":\n" $3}'
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

如果您出於某種原因更喜歡僅 awk 的解決方案:

$ awk -F'[[ ]+' '$9==429{cnt[$4":\n"$1]++} END{for (key in cnt) if (cnt[key]>4) print key}' file
28/Jun/2021:06:37:02:
000.00.000.001
28/Jun/2021:06:37:03:
000.00.000.003

上述所有腳本都可以在每個 Unix 機器上的任何 shell 中使用強制性 POSIX 工具。

引用自:https://unix.stackexchange.com/questions/656559