Bash
提取一個帶參數的請求對應的所有流量
對於
access.log
帶有模式的每一行/mypattern
:www.example.com:80 192.0.2.17 - - [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5
我想提取
iptosearch
參數,並顯示所有access.log
具有此 IP且包含blah
. 例子:[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ... www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ... [27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
我正在嘗試這樣做:
grep -f <(grep -o 'mypattern.*iptosearch=(.*)' access.log) access.log |grep blah
但:
- 它可能不會像我之前的範例那樣排序:帶有標題,下面的列表對應於相關
iptosearch
- 我的範例中的標題 (
[29/Sep/2017:13:49:02 +0200] "GET /test?foo=bar&iptosearch=198.51.100.5:
) 不會顯示,因為它不包含blah
**如何做到這一點,讓顯示像以前一樣?**在這種情況下是否應該使用循環,如何?
擴展bash + grep + awk方法:
樣本
access.log
內容:www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ... [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ... www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... [27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
工作:
grep '/mypattern' access.log | while read -r l; do if [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]; then echo "$l" awk -v ip="${BASH_REMATCH[1]}" '$0~ip && /blah/;END{ print "" }' access.log fi done
輸出:
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ... www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ... [27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ... www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
細節:
while read -r l ...
- 遍歷包含, 由命令/mypattern
返回的行grep
[[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]
- 將每一行$l
與正則表達式匹配iptosearch=(([0-9]+\.){3}[0-9]+)
。
BASH_REMATCH
是一個數組變數,其成員由 ’=~
’ 二元運算符分配給[[
條件命令。帶索引的元素0
是字元串中匹配整個正則表達式的部分。帶有索引的元素n
是字元串中匹配第n
th 個帶括號的子表達式的部分(...)
。此變數是只讀的。
-v ip="${BASH_REMATCH[1]}"
- 將變數ip
傳入awk腳本$0~ip && /blah/
- 僅輸出包含目前ip
值和關鍵字的行blah