Bash

提取一個帶參數的請求對應的所有流量

  • October 4, 2017

對於access.log帶有模式的每一行/mypattern

www.example.com:80 192.0.2.17 - - [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5  

我想提取iptosearch參數,並顯示所有access.log具有此 IP包含blah. 例子:

[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: 
   www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
   www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
   www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...

[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: 
   www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
   www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

我正在嘗試這樣做:

grep -f <(grep -o 'mypattern.*iptosearch=(.*)' access.log) access.log |grep blah

但:

  • 它可能不會像我之前的範例那樣排序:帶有標題,下面的列表對應於相關iptosearch
  • 我的範例中的標題 ( [29/Sep/2017:13:49:02 +0200] "GET /test?foo=bar&iptosearch=198.51.100.5:) 不會顯示,因為它不包含blah

**如何做到這一點,讓顯示像以前一樣?**在這種情況下是否應該使用循環,如何?

擴展bash + grep + awk方法:

樣本access.log內容:

www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: 
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: 
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

工作:

grep '/mypattern' access.log | while read -r l; do 
   if [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]; then 
       echo "$l"
       awk -v ip="${BASH_REMATCH[1]}" '$0~ip && /blah/;END{ print "" }' access.log
   fi
done

輸出:

[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5:
www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...

[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5:
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

細節:

  • while read -r l ...- 遍歷包含, 由命令/mypattern返回的行grep
  • [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]- 將每一行$l與正則表達式匹配iptosearch=(([0-9]+\.){3}[0-9]+)

BASH_REMATCH是一個數組變數,其成員由 ’ =~’ 二元運算符分配給[[條件命令。帶索引的元素0是字元串中匹配整個正則表達式的部分。帶有索引的元素n是字元串中匹配第nth 個帶括號的子表達式的部分(...)。此變數是只讀的。

  • -v ip="${BASH_REMATCH[1]}"- 將變數ip傳入awk腳本
  • $0~ip && /blah/- 僅輸出包含目前ip值和關鍵字的行blah

引用自:https://unix.stackexchange.com/questions/396017