Awk

使用 sed 或 awk 按 URL 對 Apache 日誌行進行分組?

  • January 25, 2021

給定這樣的文件/var/log/apache2/other_vhosts_access.log

example.com:443 1.1.1.1 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm12 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel...
example.com:443 2.2.2.2 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm13 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel...
example.com:443 33.33.33.33 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm14 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel...
example.com:443 4.4.4.4 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm12 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel...

如何聚合“按”URL 分組的 IP?

例子:

/abc/def/ghi?token=jklm12
    1.1.1.1
    4.4.4.4
/abc/def/ghi?token=jklm13
    2.2.2.2
/abc/def/ghi?token=jklm14
    33.33.33.33

我知道我們可能可以awk用來提取某些列,但是如何進行“分組”呢?

awk '{a[$8]=a[$8] "\n\t" $2} END{for (url in a) print url, a[url]}' file

數組a最初是空的。

  • {a[$8]=a[$8] "\n\t" $2}通過換行符和製表符擴展元素的值,a[$8]後跟第二個欄位。
  • END塊僅在整個文件被解析後執行。對於數組中的每個鍵,都會列印鍵 ( url) 和關聯的值 ( )。a[url]

輸出:

/abc/def/ghi?token=jklm14
       33.33.33.33
/abc/def/ghi?token=jklm12
       1.1.1.1
       4.4.4.4
/abc/def/ghi?token=jklm13
       2.2.2.2

引用自:https://unix.stackexchange.com/questions/630885