Awk
使用 sed 或 awk 按 URL 對 Apache 日誌行進行分組?
給定這樣的文件
/var/log/apache2/other_vhosts_access.log
:example.com:443 1.1.1.1 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm12 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel... example.com:443 2.2.2.2 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm13 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel... example.com:443 33.33.33.33 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm14 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel... example.com:443 4.4.4.4 - - [25/Jan/2021:12:00:00 +0000] "GET /abc/def/ghi?token=jklm12 HTTP/1.1" 200 1000 "-" "Mozilla/5.0 (Macintosh; Intel...
如何聚合“按”URL 分組的 IP?
例子:
/abc/def/ghi?token=jklm12 1.1.1.1 4.4.4.4 /abc/def/ghi?token=jklm13 2.2.2.2 /abc/def/ghi?token=jklm14 33.33.33.33
我知道我們可能可以
awk
用來提取某些列,但是如何進行“分組”呢?
awk '{a[$8]=a[$8] "\n\t" $2} END{for (url in a) print url, a[url]}' file
數組
a
最初是空的。
{a[$8]=a[$8] "\n\t" $2}
通過換行符和製表符擴展元素的值,a[$8]
後跟第二個欄位。- 該
END
塊僅在整個文件被解析後執行。對於數組中的每個鍵,都會列印鍵 (url
) 和關聯的值 ( )。a[url]
輸出:
/abc/def/ghi?token=jklm14 33.33.33.33 /abc/def/ghi?token=jklm12 1.1.1.1 4.4.4.4 /abc/def/ghi?token=jklm13 2.2.2.2