Awk

在 Perl 或 Python 中重寫日誌解析腳本(擺脫 awk)

  • June 12, 2022

我需要完成在日誌文件中過濾機器人活動的任務。

解決方案應僅顯示滿足以下條件的記錄(字元串)

  • 使用者登錄,使用者更改密碼,使用者在同一秒內註銷(所有 3 個操作必須在 1 秒內完成);
  • 這些操作(登錄、更改密碼、註銷)一個接一個地發生,中間沒有其他條目。

所有 3 個動作都在 1 秒內執行。

輸入數據範例

[a lot of data]
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|faaaaaa11111| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|terdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|faaaa0a11111| - |user logged in| -
[a lot of data]

為了完成任務,我編寫了下面的程式碼

awk 'BEGIN { FS=" " } { c[$5]++; l[$5,c[$5]]=$0 } END { for (i in c) { if (c[i] > 2) for (j = 1; j <= c[i]; j++) print l[i,j] } }' $1 \
 | grep -E 'logged in|changed password|logged off'

用法:

./parse_log.sh 日誌文件.log

輸出:

Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -

但我想知道用 Perl 或 Python 編寫的替代方案(使用最少的外部庫)會是什麼樣子?

在 Perl 中,這可以寫成單行,但感覺有點混亂:

perl -MTime::Piece -F'\|' -ae '$epoch=Time::Piece->strptime($F[0], "%a, %d %b %Y %H:%M:%S %z")->epoch; $diff=$epochlast2 - $epoch; $last =~ /user changed password/ && $last2 =~ /user logged in/ && $_ =~ /user logged off/ && $diff==0 && print $last2, $last, $_; $epochlast2=$epochlast; $epochlast=$epoch; $last2=$last; $last=$_' <<< "$data"
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -

或者,作為腳本:

use warnings;
use Time::Piece;

my $epochlast2=0;
my $epochlast=0;
my $last2="";
my $last="";

while($line = <STDIN>){
   $date=(split(/\|/, $line))[0];
   $epoch=Time::Piece->strptime($date, "%a, %d %b %Y %H:%M:%S %z")->epoch; 
   $diff=$epochlast2 - $epoch;
   if ($last =~ /user changed password/ && $last2 =~ /user logged in/ && $line =~ /user logged off/ && $diff==0) {
       print $last2, $last, $line; 
   }
   $epochlast2=$epochlast; 
   $epochlast=$epoch; 
   $last2=$last; 
   $last=$line
}

不是一個答案,但對於評論來說太大了,需要格式化 - 僅供參考,一個 awk 腳本可以執行我認為您的 python 腳本所做的事情:

awk -v column=5 '
   { records[$column] = records[$column] $0 ORS }
   END {
       for ( timestamp in records ) {
           if ( gsub(ORS,"&",records[timestamp]) > 2 ) {
               printf "%s", records[timestamp]
           }
       }
   }
' logfile.log

但是在處理之前將整個文件讀入記憶體是解決這個問題的一種非常低效的方法,你應該在每次時間改變時進行測試並列印:

awk -v column=5 '
   $column != prev {
       prt()
       records = ""
       prev = $column
   }
   { records = records $0 ORS }
   END { prt() }
   
   function prt() {
       if ( gsub(ORS,"&",records) > 2 ) {
           printf "%s", records
       }
   }
' logfile.log

引用自:https://unix.stackexchange.com/questions/705842