多模式匹配和單行列印

October 21, 2013

我需要匹配日誌文件中的兩個模式，並且需要獲取匹配的模式之一（兩個模式中）的下一行，最後需要在一行中列印這三個值。
範例日誌文件：
2013/09/05 04:26:00          Processing Batch /fbc/dev/cebi/dod/9739867262
2013/09/05 04:26:02          Batch 9739867262 was successful
2013/09/05 04:26:02          Total Time          =  3.13 Secs
2013/09/05 04:26:02          Repository API Time =  2.96 Secs
2013/09/05 04:26:02          File System Io Time =  0.06 Secs
2013/09/05 04:26:02          Doc Validation Time =  0.03 Secs
2013/09/05 04:26:02      Ending @ Thu Sep 05 04:26:02 EDT 2013
2013/09/05 08:18:10      Starting @ Thu Sep 05 08:18:10 EDT 2013
2013/09/05 08:18:10      Starting @ Thu Sep 05 08:18:10 EDT 2013
2013/09/05 08:18:10          Processing Batch /fbc/dev/cebi/dod/9844867675
2013/09/05 08:18:10          Processing Batch /fbc/dev/cebi/dod/9886743777
2013/09/05 08:18:16          Batch 9844867675 was successful
2013/09/05 08:18:16          Total Time          =  6.00 Secs
2013/09/05 08:18:16          Repository API Time =  5.63 Secs
2013/09/05 08:18:16          File System Io Time =  0.05 Secs
2013/09/05 08:18:16          Doc Validation Time =  0.19 Secs
2013/09/05 08:18:16      Ending @ Thu Sep 05 08:18:16 EDT 2013
2013/09/05 08:18:18          Batch 9886743777 was successful
2013/09/05 08:18:18          Total Time          =  8.27 Secs
2013/09/05 08:18:18          Repository API Time =  8.52 Secs
2013/09/05 08:18:18          File System Io Time =  0.08 Secs
2013/09/05 08:18:18          Doc Validation Time =  0.47 Secs
2013/09/05 08:18:18      Ending @ Thu Sep 05 08:18:18 EDT 2013
我在名為 cust_no.txt 的文件中有單獨的數字
9739867262
9844867675
9886743777
將這些數字作為輸入，我需要在日誌文件中匹配以下兩種模式
處理批次 /fbc/dev/cebi/dod/
批處理成功
需要以下輸出：
-> 在第一個模式 ( i.e Processing Batch /fbc/dev/cebi/dod/<numbers in the cust_no.txt>) 的匹配中，我需要獲取第二個單詞，即 $2 。-> 在第二個模式（i.e Batch <numbers in the cust_no.txt> was successful）的匹配上，我需要得到第二個單詞，即 $ 2 -> And the 6th word ( $ 6) 在第二個模式之後匹配後的下一行（即以開頭的行Total Time）
期望的輸出：
9739867262,04:26:00,04:26:02,3.13 Secs
9844867675,08:18:10,08:18:16,6.00 Secs
9886743777,08:18:10,08:18:18,8.27 Secs
為了得到這個，我嘗試了以下方式，但這似乎不起作用：
awk -v cn=$cust_no '{{if ($0 ~ "Processing.*" cn) st=$2 && if ($0 ~ "Customer cn was successful" et=$2; getline; tt=$4} ; print st,et,tt}

這個怎麼樣：

while read number;do
   start=$(grep "Processing Batch /fbc/dev/cebi/dod/$number" log_file\
           |head -n 1|awk '{print $2}')
   end=$(grep -A 1 "Batch $number was successful" log_file\
           |head -n 2|tail -n 1|awk -v OFS=',' '{print $2,$6}')
   echo "$number,$start,$end Secs"
done &lt;cust_no.txt

如果您不介意使用 Perl 和 grep，這裡可以解決您的問題。這是腳本，稱為cmd.pl：

#!/usr/bin/env perl

use feature 'say';
#use Data::Dumper;

@file = `grep -f cust_no.txt -A 1 sample.log`;

my (%info, $secLineSeen, $time, $custno);

$secLineSeen = 0;
foreach my $line (@file) {
   if ($secLineSeen == 1) {
       #2013/09/05 08:18:18          Total Time          =  8.27 Secs
       (my $totTime) = ($line =~ m!\S+ \S+\s+Total Time\s+=\s+(\S+ Secs)!);
       $info{$custno}{totTime} = $totTime;
       $secLineSeen = 0;

   } elsif ($line =~ m/Processing Batch/) {
       #2013/09/05 08:18:10          Processing Batch /fbc/dev/cebi/dod/9844867675
   ($time, $custno) = ($line =~ m!\S+ (\S+)\s+Processing Batch.*/(\S+)!);
       $info{$custno}{onetwo} = $time;

 } elsif ($line =~ m/Batch.*successful/) {
       #2013/09/05 08:18:18          Batch 9886743777 was successful
       ($time, $custno) = ($line =~ m!\S+ (\S+)\s+Batch (\S+) was.*!);
       $info{$custno}{twotwo} = $time;
       $secLineSeen = 1;
   }
}

#print Dumper(\%info);

#9739867262,04:26:00,04:26:02,3.13 Secs
foreach my $key (sort keys %info) {
   say "$key,$info{$key}{onetwo},$info{$key}{twotwo},$info{$key}{totTime}";
}

例子

$ ./cmd.pl 
9739867262,04:26:00,04:26:02,3.13 Secs
9844867675,08:18:10,08:18:16,6.00 Secs
9886743777,08:18:10,08:18:18,8.27 Secs

細節

此 Perl 腳本首先創建一個數組，@file其中包含此命令的結果：

$ grep -f cust_no.txt -A 1 sample.log

此命令獲取日誌文件，sample.log並從文件中選擇包含客戶編號的所有行cust_no.txt，如下所示：

2013/09/05 04:26:00          Processing Batch /fbc/dev/cebi/dod/9739867262
2013/09/05 04:26:02          Batch 9739867262 was successful
2013/09/05 04:26:02          Total Time          =  3.13 Secs
--
2013/09/05 08:18:10          Processing Batch /fbc/dev/cebi/dod/9844867675
2013/09/05 08:18:10          Processing Batch /fbc/dev/cebi/dod/9886743777
2013/09/05 08:18:16          Batch 9844867675 was successful
2013/09/05 08:18:16          Total Time          =  6.00 Secs
--
2013/09/05 08:18:18          Batch 9886743777 was successful
2013/09/05 08:18:18          Total Time          =  8.27 Secs

這個grep命令做了一件特別值得一提的事情，主要是它在-A 1任何匹配的 ( ) 之後保留了一行。這使我們能夠抓住其中包含“總時間”的行。

提取此數據後，Perl 腳本將根據問題中提到的要求，使用多維散列來儲存此輸出中關鍵數據的結果。

一旦我們完成了對以下內容的處理，雜湊看起來就像這樣@file：

$VAR1 = {
         '9739867262' =&gt; {
                           'twotwo' =&gt; '04:26:02',
                           'totTime' =&gt; '3.13 Secs',
                           'onetwo' =&gt; '04:26:00'
                         },
         '9886743777' =&gt; {
                           'twotwo' =&gt; '08:18:18',
                           'totTime' =&gt; '8.27 Secs',
                           'onetwo' =&gt; '08:18:10'
                         },
         '9844867675' =&gt; {
                           'twotwo' =&gt; '08:18:16',
                           'totTime' =&gt; '6.00 Secs',
                           'onetwo' =&gt; '08:18:10'
                         }
       };

最後，我們遍歷這個雜湊並以問題中指定的格式列印我們收集的內容。

引用自：https://unix.stackexchange.com/questions/96525

多模式匹配和單行列印

例子

細節

相關問答

ksh88 AIX 根據一列中的部分字元串匹配合併兩個文件

awk 匹配參數上方的所有行

不使用“column - t”均勻對齊列

如何在 awk if 語句中使用星號

如何多次忽略所有包含特殊字元的文本？

僅使用文件 du 獲取大小