需要有關 sed/awk 命令 shell 腳本以檢查 Sybase 錯誤日誌的幫助
我一直在使用 Sybase 的偉大專家 Rob Verschoor 先生在此處發布的 shell 腳本。該作業每小時通過 cron 作業呼叫,如果任何關鍵字與錯誤日誌中的預定義關鍵字匹配,它會向我們發送電子郵件。為了方便參考,我在下面發布了可能導致問題的程式碼:
LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY) LAST_MARKER=`echo "$LAST_MARKER+0"|bs` if [ ! "$LAST_MARKER" = "" ] then sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x cp $TMP.x $LOGFILE_COPY fi
從過去的 2 年開始,這一直在完美地工作,沒有任何問題,只在第 1 行後添加了一行。從我這邊如下:
LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
這是為了格式化以科學格式返回的正確數字格式的行數。
在我們禁用了一個幾乎每秒都會用跟踪消息填充錯誤日誌的監控工具後,從最近幾天開始查找最後一個標記似乎存在問題。所以,基本上從最後一個標記到新標記 - 我們曾經有這麼多行條目,從未遇到任何問題。現在,在禁用此工具後 - 在非工作時間,沒有活動,因此最後一個標記和新標記正在成為後續行。
早些時候,它的錯誤日誌看起來像下面這樣,有很多消息:
00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_ 00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_ ... 0:0002:00000:00608:2020/04/16 11:12:40.88 server DBCC TRACEON 3604, SPID 608 00:0002:00000:00608:2020/04/16 11:12:40.88 server DBCC TRACEOFF 3604, SPID 608 00:0006:00000:00660:2020/04/16 11:13:40.47 server DBCC TRACEON 3604, SPID 660 00:0006:00000:00660:2020/04/16 11:13:40.47 server DBCC TRACEOFF 3604, SPID 660 00:0006:00000:00664:2020/04/16 11:13:40.51 server DBCC TRACEON 3604, SPID 664 00:0006:00000:00664:2020/04/16 11:13:40.51 server DBCC TRACEOFF 3604, SPID 664 00:0002:00000:00608:2020/04/16 11:13:40.54 server DBCC TRACEON 3604, SPID 608 00:0002:00000:00608:2020/04/16 11:13:40.54 server DBCC TRACEOFF 3604, SPID 608 00:0006:00000:00660:2020/04/16 11:13:40.87 server DBCC TRACEON 3604, SPID 660 00:0006:00000:00660:2020/04/16 11:13:40.87 server DBCC TRACEOFF 3604, SPID 660 00:0004:00000:00608:2020/04/16 11:14:40.92 server DBCC TRACEOFF 3604, SPID 608 ... 00:0005:00000:00514:2020/04/17 11:15:59.92 server _Marker_For_Checking_Errorlog_ 00:0005:00000:00514:2020/04/17 11:15:59.92 server _Marker_End_
現在,錯誤日誌如下所示:
00:0004:00000:00974:2020/04/17 09:15:28.80 server _Marker_For_Checking_Errorlog_ 00:0004:00000:00974:2020/04/17 09:15:38.80 server _Marker_End_ 00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_ 00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_ 00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_For_Checking_Errorlog_ 00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_End_
該工具無法區分以前的標記和最後一個標記,因此甚至會一次又一次地發送 3-4 小時前發生的那些錯誤,而它應該不發送錯誤郵件,因為過去一小時內錯誤日誌中沒有寫入任何內容。
我不是 shell 腳本專家;因此,對此的任何幫助將不勝感激。
編輯:此工具的正確行為是在 4:15(計劃時間)發送如下電子郵件,因為預定義的匹配關鍵字在最後一小時(3:15 和 4:15 之間)存在:
Checking ASE errorlog Fri Apr 17 04:16:06 WAT 2020 Server=Sybaseprd Errorlog=/mount/ASE-15_0/install/Sybaseprd.log 00:0006:00000:00061:2020/04/17 04:03:37.15 server Error: 1621, Severity: 18, State: 1 00:0006:00000:00061:2020/04/17 04:03:37.15 server Type '16' not allowed before login. 00:0004:00000:00668:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1 00:0004:00000:00668:2020/04/17 04:03:42.17 server Type '16' not allowed before login. 00:0004:00000:00100:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1 00:0004:00000:00100:2020/04/17 04:03:42.17 server Type '16' not allowed before login. 00:0012:00000:00000:2020/04/17 04:03:49.30 kernel ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208 00:0003:00000:00932:2020/04/17 04:04:59.20 server Error: 1621, Severity: 18, State: 1 00:0003:00000:00932:2020/04/17 04:04:59.20 server Type '3' not allowed before login. 9 error lines found in errorlog for ASE server 'SybasePrd' (end)
不正確的行為如下:
Checking ASE errorlog Fri Apr 17 05:16:01 WAT 2020 Server=SybasePrd Errorlog=/mount/ASE-15_0/install/Sybaseprd.log 00:0006:00000:00061:2020/04/17 04:03:37.15 server Error: 1621, Severity: 18, State: 1 00:0006:00000:00061:2020/04/17 04:03:37.15 server Type '16' not allowed before login. 00:0004:00000:00668:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1 00:0004:00000:00668:2020/04/17 04:03:42.17 server Type '16' not allowed before login. 00:0004:00000:00100:2020/04/17 04:03:42.17 server Error: 1621, Severity: 18, State: 1 00:0004:00000:00100:2020/04/17 04:03:42.17 server Type '16' not allowed before login. 00:0012:00000:00000:2020/04/17 04:03:49.30 kernel ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208 00:0003:00000:00932:2020/04/17 04:04:59.20 server Error: 1621, Severity: 18, State: 1 00:0003:00000:00932:2020/04/17 04:04:59.20 server Type '3' not allowed before login. 9 error lines found in errorlog for ASE server 'SybasePRD' (end)
上述作業在 5:15 觸發,並且在 4:15 和 5:15 之間沒有匹配行,因此不應該報告任何內容。正如我之前提到的,這個程序一直發送電子郵件直到接下來的 5 個時間表,即到 10:15,並且只有在上述錯誤後錯誤日誌中的條目數超過 40 左右時才停止。
因此,期望的結果是在上面的 shell 腳本中找到錯誤並修復它以準確檢查過去一小時,即從最後一個標記到錯誤日誌中的最後一行,如果沒有條目,這意味著沒有添加行從最後一次檢查開始,然後不要檢查或不報告任何事情,因為它發生在下面:
00:0004:00000:00974:2020/04/17 09:15:28.80 server _Marker_For_Checking_Errorlog_ 00:0004:00000:00974:2020/04/17 09:15:38.80 server _Marker_End_ 00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_For_Checking_Errorlog_ 00:0005:00000:00514:2020/04/17 10:15:59.92 server _Marker_End_ 00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_For_Checking_Errorlog_ 00:0003:00000:00030:2020/04/17 11:16:01.51 server _Marker_End_
我們有點停滯了,所以讓我們看看我們是否可以讓球再次滾動。假設您發布的程式碼:
LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY) LAST_MARKER=`echo "$LAST_MARKER+0"|bs` if [ ! "$LAST_MARKER" = "" ] then sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x cp $TMP.x $LOGFILE_COPY fi
旨在刪除包含最後一行的文本 $ MARKER, if it exists, from $ LOGFILE_COPY,如果你有,你會這樣做
tac
:tac "$LOGFILE_COPY" | awk -v m="$MARKER" '$0~m{exit} 1' | tac > "${TMP}.x" && mv "${TMP}.x" "$LOGFILE_COPY"
如果你沒有,
tac
那麼下面的 2-pass awk-only 解決方案執行速度會慢一些,並且不適用於來自管道的輸入,但它適用於任何大小的輸入文件,而上面的 tac 解決方案可能會失敗,如果輸入文件絕對龐大:awk -v m="$MARKER" 'NR==FNR{if ($0~m) a=NR; next} FNR>a' "$LOGFILE_COPY" "$LOGFILE_COPY" > "${TMP}.x" && mv "${TMP}.x" "$LOGFILE_COPY"
如果這太慢了(如果是這樣的話,我會感到驚訝),這可能會更快一些(它肯定會比您開始使用的腳本更快):
start=$(awk -v m="$MARKER" '$0~m{a=NR} END{printf "%d\n", a+1; exit (a?0:1)}' "$LOGFILE_COPY") && tail -n +"$start" "$LOGFILE_COPY" > "${TMP}.x" && mv "${TMP}.x" "$LOGFILE_COPY"
這能解決你的問題嗎?
另外:這是如何修改原始腳本以解決其中最基本的問題並使其更易於閱讀的開始:
#!/bin/sh this_prog=$(basename "$0") usage() { echo "Usage:" echo " $this_prog <servername> <login> <passwd> [<errorlog-pathname> [\"all\"]]" } #--------------------------------------------------------------------------- # Check parameters if [ $# -lt 3 ] || [ $# -gt 5 ] then usage exit 1 fi srv=$1 login=$2 psswd=$3 logfile=$4 opt=$5 #--------------------------------------------------------------------------- # Temp directory tmp=$(mktemp -d) || exit 1 trap 'rm -f "$tmp"/*; rmdir "$tmp"; exit' 0 logfile_copy="${tmp}/errlog" #--------------------------------------------------------------------------- # Some contants; do NOT change these ! dft_mailprog="your_mail_program" #DO NOT CHANGE -- go to the next section dft_dba_mail="you@yourcompany.com yourcollege@yourcompany.com" #DO NOT CHANGE # -- go to the next section #--------------------------------------------------------------------------- # Some definitions # # mailprog must be set to your command-line mail program, like 'mail', 'mailx', # etc. Later in this script, it is assumed that this mail program supports # specifying the mail subject on the command line with the "-s" option. # Should you use 'sendmail', you'll have to modify the script, or do without # the mail subject, as 'sendmail' does not have this "-s" option. # NT users may want to use 'ssmtp' (part of CygWin) as their mail # program (also see comment below). mailprog="$dft_mailprog" # define your own setting here # Define a list of people receiving results by email: dba_mail="$dft_dba_mail" # define your own setting here skip_when_empty=NO # if YES, will not send mail when no errors were found #--------------------------------------------------------------------------- # The marker strings below can be set to any arbitrary string, as long # as this is unique and does not appear in the errorlog as part of any # error message. # These strings should not be changed anymore once you've started using # this script. marker="_Marker_For_Checking_Errorlog_" #do not change this ! marker2="_Marker_End_" #do not change this ! #-------------------------------------------------------------------------- # Change the below to 'gawk' (or 'nawk') if desired... This may be needed # when hitting built-in max. string length limits in 'awk'. 'gawk' etc. # tend to be more flexible. AWK='awk' # awk|gawk #--------------------------------------------------------------------- # Check the mail program and email adresses have been defined if [ "$mailprog" = "$dft_mailprog" ] then echo "" echo "You must first define the variable 'mailprog' in this script;" echo "please set it to the name of your command-line mail program," echo "like 'mail', 'mailx', etc." echo "" exit 1 fi if [ "$dba_mail" = "$dft_dba_mail" ] then echo "" echo "You must first define the variable 'dba_mail' in this script;" echo "please set it to a list of recipients." echo "" exit 1 fi #-------------------------------------------------------------------------- # First locate the server errorlog rm -f "$logfile_copy" if [ "$logfile" = "" ] then # Pick up the server errorlog pathname; first check if this is 12.0 # or later to determine the method for doing this # cat << --EOF-- > "${tmp}/vchk.sql" select name from sysobjects -- used for ASE version check where name = "sysqueryplans" go dbcc traceon(3604) go dbcc resource -- contains errorlog pathname go --EOF-- # The below isql session also doubles as an ASE access and # privilege check. # Using 'cat' and piping the SQL to isql is done to make it run on # Windows NT as well ('cos the NT version of 'isql' won't understand # Unix-style pathnames) # < "${tmp}/vchk.sql" isql -S"$srv" -U"$login" -P"$psswd" -w500 > "${tmp}/vchk" if grep -q "CT-LIBRARY error" "${tmp}/vchk" then cat "${tmp}/vchk" echo "" echo "*** Note: in case you cannot connect because the ASE server is down," echo "*** you can also specify the errorlog pathname explicitly." echo "" usage exit 1 fi if grep "You must have the following role(s) to" "${tmp}/vchk" then exit 1 fi # 18-Sep-2001 Corrected the test below: it said "-ne 1" instead of "-eq 1", # causing it to not to identify version pre-12.0 correctly # (thanks to Jean Loesch) # if [ "$(grep -c "sysqueryplans" "${tmp}/vchk")" -eq 1 ] then #-------------------------------------------------------------------------- # This is ASE 12.0+, so locate the errorlog through @@errorlog (this isn't # really necessary, as dbcc resource would still work fine), but let's do # it anyway for educational purposes ... cat << --EOF-- > "${tmp}/ataterrlog.sql" print @@errorlog go --EOF-- < "${tmp}/ataterrlog.sql" isql -S"$srv" -U"$login" -P"$psswd" > "${tmp}/ataterrlog" logfile=$( "$AWK" '{print $1}' "${tmp}/ataterrlog" ) #-------------------------------------------------------------------------- else # not 12.0+ # This is ASE pre-12.0, so locate the errorlog through dbcc resource (already # executed above) logfile=$( "$AWK" 'sub(/.*rerrfile=/,""){print $1}' "${tmp}/vchl" ) fi fi # if $logfile = "" #-------------------------------------------------------------------------- # Errorlog file name known now, check if it's there if [ ! -f "$logfile" ] then echo "Error accessing server errorlog file [$logfile] - file not found" echo "Note: this script must be run on the same host where the " echo "ASE errorlog file is located." exit 1 fi cp "$logfile" "$logfile_copy" #-------------------------------------------------------------------------- # Check option parameter # if [ "$opt" = "" ] then scan_all=N else scan_all=Y echo "Scanning the entire ASE errorlog." fi #-------------------------------------------------------------------------- if [ "$scan_all" = "N" ] then # Skip the part of the errorlog until the last marker # Note: if the next line gives an error message, use a different shell last_marker=$("$AWK" -v marker="$marker" '$0 ~ marker { a=NR } END { print a+0 }' "$logfile_copy") if [ ! "$last_marker" = "" ] then sed "1,${last_marker}d" "$logfile_copy" > "${tmp}/x" && cp "${tmp}/x" "$logfile_copy" fi fi #-------------------------------------------------------------------------- # Create output file { echo "Checking ASE errorlog" date echo "Server=$srv" echo "Errorlog=$logfile" echo "" } > "${tmp}/out" #-------------------------------------------------------------------------- # Finally... search for errors in the log file. The below set of search # strings catches pretty much everything, but you can add any string here # which you would also like to search for... # # Note that these strings indicate the presence of messages that should # be investigated. Still, this may require further inspection of the # errorlog, as more messages may be present which contain additional # information. grep -Ei '(warning|severity|fail|unmirror|mirror exit|not enough|error|suspect|corrupt|correct|deadlock|critical|allow|infect|error|full|problem|unable|not found|threshold|couldn|not valid|invalid|NO_LOG|logsegment|syslogs|stacktrace)' "$logfile_copy" | grep -Evi '(successfull|_Marker_|(Suspect Granularity))' > "${tmp}/out2" nrlines=$(wc -l "${tmp}/out2" | "$AWK" '{print $1}') cat "${tmp}/out2" >> "${tmp}/out" #-------------------------------------------------------------------------- # echo "$nrlines error lines found in errorlog for ASE server '$srv'" { echo "" echo "$nrlines error lines found in errorlog for ASE server '$srv'" echo "" echo "(end)" } >> "${tmp}/out" if [ "$skip_when_empty" = "NO" ] && [ "$nrlines" -eq 0 ] then nrlines=1 # to force it into mailing anyway fi if [ "$nrlines" -gt 0 ] then # Mail any error messages found to the list of recipients # (note: assumption is that the -s "subject" option is available for # your email program. Should you use "sendmail", it may not be # available, and you'd have to remove this option; when you're familiar # with 'sendmail', you can add the subject line yourself by inserting # header lines into the message file) # # Note for NT users: if you need a command-line mail program on NT, # consider 'ssmtp'. This is part of the CygWin package, which you need # anyway to run this script on NT. The download location for CygWin # is in the file header above. subj="Results of ASE errorlog check for '$srv'" "$mailprog" -s "$subj" "$dba_mail" < "${tmp}/out" fi #-------------------------------------------------------------------------- if [ "$scan_all" = "N" ] then # Write a new marker to the server errorlog to indicate we got till here # Only do this when (i) no explicit errorlog pathname was specified and # (ii) only the last part of the log was scanned. cat << --EOF-- > "${tmp}/logprint.sql" dbcc logprint ("$marker") dbcc logprint ("$marker2") -- need a second line to avoid missing the last line if @@error = 0 print "Writing marker to ASE errorlog." -- note: in ASE 12.0, we could the more tidy "dbcc printolog(string)" instead go --EOF-- < "${tmp}/logprint.sql" isql -S"$srv" -U"$login" -P"$psswd" | grep -Ev '(DBCC execution compl|(SA))' fi #-------------------------------------------------------------------------- # end #
還可以進行其他改進,並且未經測試,因此可能存在錯誤,但希望您可以將其與原始版本進行比較,以了解應以何種方式更改原始版本。