使用 grep 和 awk 從日誌文件中提取特定行

March 18, 2019

我有一個巨大的日誌文件（2000 萬行）告訴我某些 url 狀態是否響應“200 OK”。

我想提取所有狀態為“200 OK”的 url，以及附帶的文件名。

輸入範例：

Spider mode enabled. Check if remote file exists.
--2019-02-06 07:38:43--  https://www.example/download/123456789
Reusing existing connection to website.
HTTP request sent, awaiting response... 
 HTTP/1.1 200 OK
 Content-Type: application/zip
 Connection: keep-alive
 Status: 200 OK
 Content-Disposition: attachment; filename="myfile123.zip"
 Last-Modified: 2019-02-06 01:38:44 +0100
 Access-Control-Allow-Origin: *
 Cache-Control: private
 X-Runtime: 0.312890
 X-Frame-Options: SAMEORIGIN
 Access-Control-Request-Method: GET,OPTIONS
 X-Request-Id: 99920e01-d308-40ba-9461-74405e7df4b3
 Date: Wed, 06 Feb 2019 00:38:44 GMT 
 X-Powered-By: Phusion Passenger 5.1.11
 Server: nginx + Phusion Passenger 5.1.11
 X-Powered-By: cloud66
Length: unspecified [application/zip]
Last-modified header invalid -- time-stamp ignored.
Remote file exists.

Spider mode enabled. Check if remote file exists.
--2019-02-06 07:38:43--  https://www.example/download/234567890
Reusing existing connection to website.
HTTP request sent, awaiting response... 
 HTTP/1.1 404 Not Found
 Content-Type: text/html; charset=utf-8
 Connection: keep-alive
 Status: 404 Not Found
 Cache-Control: no-cache
 Access-Control-Allow-Origin: *
 X-Runtime: 0.020718
 X-Frame-Options: SAMEORIGIN
 Access-Control-Request-Method: GET,OPTIONS
 X-Request-Id: bc20626b-095f-4b28-8322-ad3f294e4ee2
 Date: Wed, 06 Feb 2019 00:37:42 GMT
 X-Powered-By: Phusion Passenger 5.1.11
 Server: nginx + Phusion Passenger 5.1.11
Remote file does not exist -- broken link!!!

期望的輸出：

https://www.example/download/123456789 myfile123.zip

我很想最終了解背後的邏輯。

如果我這樣做：

awk '/: 200 OK/{print $0}' file.log

我得到了所有有Status: 200 OK上下文但沒有上下文的行。

如果我這樣做：

grep -C4 "1 200 OK" file.log

我得到了上下文，但有“噪音”。我想重新排列輸出以僅在一行上獲取相關資訊。

您需要使用awk如下。首先將 URL 儲存在一個變數中，然後Status如果它OK從下一行獲取文件名，則儲存在該行中。它應該在 GNU 上工作，awk因為該match()函式需要第三個參數來將擷取的組儲存在數組中。
awk '/^--/{ url = $NF } 
   /^[[:space:]]+Status/ && $NF == "OK" { getline nextline; match(nextline, /filename="(.+)"/,arr); print url, arr[1] }' file

i=`awk '/Status: 200 OK/{x=NR+1}(NR&lt;x){getline;print $NF}' filename | awk -F "=" '{print $NF}'| sed 's/"//g'`

awk '{a[++i]=$0}/Status: 200 OK/{for(x=NR-7;x&lt;=NR;x++)print a[x]}' filename | awk -v i="$i" '/https:/{$1=$2="";print $0 " " i}'

輸出

https://www.example/download/123456789 myfile123.zip

引用自：https://unix.stackexchange.com/questions/498942

使用 grep 和 awk 從日誌文件中提取特定行

相關問答

與linux匹配的相對日誌文件？

如何使用 sedawk 在文件中查找某些行，然後匹配所有匹配不同模式的後續行？

Grep Syslog 防火牆日誌

awk/sed/grep：列印與字元串匹配的所有行以及在這些行之後帶有製表符的所有行

如何使用 grep 或 awk 操作日誌文件？

從日誌中搜尋模式