Awk
使用 grep 和 awk 從日誌文件中提取特定行
我有一個巨大的日誌文件(2000 萬行)告訴我某些 url 狀態是否響應“200 OK”。
我想提取所有狀態為“200 OK”的 url,以及附帶的文件名。
輸入範例:
Spider mode enabled. Check if remote file exists. --2019-02-06 07:38:43-- https://www.example/download/123456789 Reusing existing connection to website. HTTP request sent, awaiting response... HTTP/1.1 200 OK Content-Type: application/zip Connection: keep-alive Status: 200 OK Content-Disposition: attachment; filename="myfile123.zip" Last-Modified: 2019-02-06 01:38:44 +0100 Access-Control-Allow-Origin: * Cache-Control: private X-Runtime: 0.312890 X-Frame-Options: SAMEORIGIN Access-Control-Request-Method: GET,OPTIONS X-Request-Id: 99920e01-d308-40ba-9461-74405e7df4b3 Date: Wed, 06 Feb 2019 00:38:44 GMT X-Powered-By: Phusion Passenger 5.1.11 Server: nginx + Phusion Passenger 5.1.11 X-Powered-By: cloud66 Length: unspecified [application/zip] Last-modified header invalid -- time-stamp ignored. Remote file exists. Spider mode enabled. Check if remote file exists. --2019-02-06 07:38:43-- https://www.example/download/234567890 Reusing existing connection to website. HTTP request sent, awaiting response... HTTP/1.1 404 Not Found Content-Type: text/html; charset=utf-8 Connection: keep-alive Status: 404 Not Found Cache-Control: no-cache Access-Control-Allow-Origin: * X-Runtime: 0.020718 X-Frame-Options: SAMEORIGIN Access-Control-Request-Method: GET,OPTIONS X-Request-Id: bc20626b-095f-4b28-8322-ad3f294e4ee2 Date: Wed, 06 Feb 2019 00:37:42 GMT X-Powered-By: Phusion Passenger 5.1.11 Server: nginx + Phusion Passenger 5.1.11 Remote file does not exist -- broken link!!!
期望的輸出:
https://www.example/download/123456789 myfile123.zip
我很想最終了解背後的邏輯。
如果我這樣做:
awk '/: 200 OK/{print $0}' file.log
我得到了所有有
Status: 200 OK
上下文但沒有上下文的行。如果我這樣做:
grep -C4 "1 200 OK" file.log
我得到了上下文,但有“噪音”。我想重新排列輸出以僅在一行上獲取相關資訊。
您需要使用
awk
如下。首先將 URL 儲存在一個變數中,然後Status
如果它OK
從下一行獲取文件名,則儲存在該行中。它應該在 GNU 上工作,awk
因為該match()
函式需要第三個參數來將擷取的組儲存在數組中。awk '/^--/{ url = $NF } /^[[:space:]]+Status/ && $NF == "OK" { getline nextline; match(nextline, /filename="(.+)"/,arr); print url, arr[1] }' file
i=`awk '/Status: 200 OK/{x=NR+1}(NR<x){getline;print $NF}' filename | awk -F "=" '{print $NF}'| sed 's/"//g'` awk '{a[++i]=$0}/Status: 200 OK/{for(x=NR-7;x<=NR;x++)print a[x]}' filename | awk -v i="$i" '/https:/{$1=$2="";print $0 " " i}'
輸出
https://www.example/download/123456789 myfile123.zip