Shell

需要有關 sed/awk 命令 shell 腳本以檢查 Sybase 錯誤日誌的幫助

  • April 19, 2020

我一直在使用 Sybase 的偉大專家 Rob Verschoor 先生在此處發布的 shell 腳本。該作業每小時通過 cron 作業呼叫,如果任何關鍵字與錯誤日誌中的預定義關鍵字匹配,它會向我們發送電子郵件。為了方便參考,我在下面發布了可能導致問題的程式碼:

LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY)
 LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
if [ ! "$LAST_MARKER" = "" ]
  then
     sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x
     cp $TMP.x $LOGFILE_COPY
  fi

從過去的 2 年開始,這一直在完美地工作,沒有任何問題,只在第 1 行後添加了一行。從我這邊如下:

LAST_MARKER=`echo "$LAST_MARKER+0"|bs`

這是為了格式化以科學格式返回的正確數字格式的行數。

在我們禁用了一個幾乎每秒都會用跟踪消息填充錯誤日誌的監控工具後,從最近幾天開始查找最後一個標記似乎存在問題。所以,基本上從最後一個標記到新標記 - 我們曾經有這麼多行條目,從未遇到任何問題。現在,在禁用此工具後 - 在非工作時間,沒有活動,因此最後一個標記和新標記正在成為後續行。

早些時候,它的錯誤日誌看起來像下面這樣,有很多消息:

00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_End_
...
0:0002:00000:00608:2020/04/16 11:12:40.88 server  DBCC TRACEON 3604, SPID 608
00:0002:00000:00608:2020/04/16 11:12:40.88 server  DBCC TRACEOFF 3604, SPID 608
00:0006:00000:00660:2020/04/16 11:13:40.47 server  DBCC TRACEON 3604, SPID 660
00:0006:00000:00660:2020/04/16 11:13:40.47 server  DBCC TRACEOFF 3604, SPID 660
00:0006:00000:00664:2020/04/16 11:13:40.51 server  DBCC TRACEON 3604, SPID 664
00:0006:00000:00664:2020/04/16 11:13:40.51 server  DBCC TRACEOFF 3604, SPID 664
00:0002:00000:00608:2020/04/16 11:13:40.54 server  DBCC TRACEON 3604, SPID 608
00:0002:00000:00608:2020/04/16 11:13:40.54 server  DBCC TRACEOFF 3604, SPID 608
00:0006:00000:00660:2020/04/16 11:13:40.87 server  DBCC TRACEON 3604, SPID 660
00:0006:00000:00660:2020/04/16 11:13:40.87 server  DBCC TRACEOFF 3604, SPID 660
00:0004:00000:00608:2020/04/16 11:14:40.92 server  DBCC TRACEOFF 3604, SPID 608
...
00:0005:00000:00514:2020/04/17 11:15:59.92 server  _Marker_For_Checking_Errorlog_
00:0005:00000:00514:2020/04/17 11:15:59.92 server  _Marker_End_

現在,錯誤日誌如下所示:

   00:0004:00000:00974:2020/04/17 09:15:28.80 server  _Marker_For_Checking_Errorlog_
   00:0004:00000:00974:2020/04/17 09:15:38.80 server  _Marker_End_
   00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_For_Checking_Errorlog_
   00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_End_
   00:0003:00000:00030:2020/04/17 11:16:01.51 server  _Marker_For_Checking_Errorlog_
   00:0003:00000:00030:2020/04/17 11:16:01.51 server  _Marker_End_

該工具無法區分以前的標記和最後一個標記,因此甚至會一次又一次地發送 3-4 小時前發生的那些錯誤,而它應該不發送錯誤郵件,因為過去一小時內錯誤日誌中沒有寫入任何內容。

我不是 shell 腳本專家;因此,對此的任何幫助將不勝感激。

編輯:此工具的正確行為是在 4:15(計劃時間)發送如下電子郵件,因為預定義的匹配關鍵字在最後一小時(3:15 和 4:15 之間)存在:

Checking ASE errorlog
Fri Apr 17 04:16:06 WAT 2020
Server=Sybaseprd
Errorlog=/mount/ASE-15_0/install/Sybaseprd.log
00:0006:00000:00061:2020/04/17 04:03:37.15 server  Error: 1621, Severity: 18, State: 1
00:0006:00000:00061:2020/04/17 04:03:37.15 server  Type '16' not allowed before login.
00:0004:00000:00668:2020/04/17 04:03:42.17 server  Error: 1621, Severity: 18, State: 1
00:0004:00000:00668:2020/04/17 04:03:42.17 server  Type '16' not allowed before login.
00:0004:00000:00100:2020/04/17 04:03:42.17 server  Error: 1621, Severity: 18, State: 1
00:0004:00000:00100:2020/04/17 04:03:42.17 server  Type '16' not allowed before login.
00:0012:00000:00000:2020/04/17 04:03:49.30 kernel  ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208
00:0003:00000:00932:2020/04/17 04:04:59.20 server  Error: 1621, Severity: 18, State: 1
00:0003:00000:00932:2020/04/17 04:04:59.20 server  Type '3' not allowed before login.
9 error lines found in errorlog for ASE server 'SybasePrd'
(end)

不正確的行為如下:

Checking ASE errorlog
Fri Apr 17 05:16:01 WAT 2020
Server=SybasePrd
Errorlog=/mount/ASE-15_0/install/Sybaseprd.log
00:0006:00000:00061:2020/04/17 04:03:37.15 server  Error: 1621, Severity: 18, State: 1
00:0006:00000:00061:2020/04/17 04:03:37.15 server  Type '16' not allowed before login.
00:0004:00000:00668:2020/04/17 04:03:42.17 server  Error: 1621, Severity: 18, State: 1
00:0004:00000:00668:2020/04/17 04:03:42.17 server  Type '16' not allowed before login.
00:0004:00000:00100:2020/04/17 04:03:42.17 server  Error: 1621, Severity: 18, State: 1
00:0004:00000:00100:2020/04/17 04:03:42.17 server  Type '16' not allowed before login.
00:0012:00000:00000:2020/04/17 04:03:49.30 kernel  ksmask__rpacket: Invalid tdslength value 21536, kpid: 268895208
00:0003:00000:00932:2020/04/17 04:04:59.20 server  Error: 1621, Severity: 18, State: 1
00:0003:00000:00932:2020/04/17 04:04:59.20 server  Type '3' not allowed before login.
9 error lines found in errorlog for ASE server 'SybasePRD'
(end)

上述作業在 5:15 觸發,並且在 4:15 和 5:15 之間沒有匹配行,因此不應該報告任何內容。正如我之前提到的,這個程序一直發送電子郵件直到接下來的 5 個時間表,即到 10:15,並且只有在上述錯誤後錯誤日誌中的條目數超過 40 左右時才停止。

因此,期望的結果是在上面的 shell 腳本中找到錯誤並修復它以準確檢查過去一小時,即從最後一個標記到錯誤日誌中的最後一行,如果沒有條目,這意味著沒有添加行從最後一次檢查開始,然後不要檢查或不報告任何事情,因為它發生在下面:

00:0004:00000:00974:2020/04/17 09:15:28.80 server  _Marker_For_Checking_Errorlog_
   00:0004:00000:00974:2020/04/17 09:15:38.80 server  _Marker_End_
   00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_For_Checking_Errorlog_
   00:0005:00000:00514:2020/04/17 10:15:59.92 server  _Marker_End_
   00:0003:00000:00030:2020/04/17 11:16:01.51 server  _Marker_For_Checking_Errorlog_
   00:0003:00000:00030:2020/04/17 11:16:01.51 server  _Marker_End_

我們有點停滯了,所以讓我們看看我們是否可以讓球再次滾動。假設您發布的程式碼:

LAST_MARKER=$(${AWK} '/'$MARKER'/ { a=NR } END { print a }' $LOGFILE_COPY)
 LAST_MARKER=`echo "$LAST_MARKER+0"|bs`
if [ ! "$LAST_MARKER" = "" ]
  then
     sed "1,${LAST_MARKER}d" $LOGFILE_COPY > $TMP.x
     cp $TMP.x $LOGFILE_COPY
  fi

旨在刪除包含最後一行的文本 $ MARKER, if it exists, from $ LOGFILE_COPY,如果你有,你會這樣做tac

tac "$LOGFILE_COPY" | awk -v m="$MARKER" '$0~m{exit} 1' | tac > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"

如果你沒有,tac那麼下面的 2-pass awk-only 解決方案執行速度會慢一些,並且不適用於來自管道的輸入,但它適用於任何大小的輸入文件,而上面的 tac 解決方案可能會失敗,如果輸入文件絕對龐大:

awk -v m="$MARKER" 'NR==FNR{if ($0~m) a=NR; next} FNR>a' "$LOGFILE_COPY" "$LOGFILE_COPY" > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"

如果這太慢了(如果是這樣的話,我會感到驚訝),這可能會更快一些(它肯定會比您開始使用的腳本更快):

start=$(awk -v m="$MARKER" '$0~m{a=NR} END{printf "%d\n", a+1; exit (a?0:1)}' "$LOGFILE_COPY") &&
tail -n +"$start" "$LOGFILE_COPY" > "${TMP}.x" &&
mv "${TMP}.x" "$LOGFILE_COPY"

這能解決你的問題嗎?


另外:這是如何修改原始腳本以解決其中最基本的問題並使其更易於閱讀的開始:

#!/bin/sh

this_prog=$(basename "$0")

usage()
{
   echo "Usage:"
   echo " $this_prog <servername> <login> <passwd> [<errorlog-pathname> [\"all\"]]"
}

#---------------------------------------------------------------------------

# Check parameters
if [ $# -lt 3 ] || [ $# -gt 5 ]
then
   usage
   exit 1
fi

srv=$1
login=$2
psswd=$3
logfile=$4
opt=$5

#---------------------------------------------------------------------------

# Temp directory
tmp=$(mktemp -d) || exit 1
trap 'rm -f "$tmp"/*; rmdir "$tmp"; exit' 0

logfile_copy="${tmp}/errlog"

#---------------------------------------------------------------------------

# Some contants; do NOT change these !
dft_mailprog="your_mail_program" #DO NOT CHANGE -- go to the next section
dft_dba_mail="you@yourcompany.com yourcollege@yourcompany.com" #DO NOT CHANGE
#                                                   -- go to the next section

#---------------------------------------------------------------------------

# Some definitions
#
# mailprog must be set to your command-line mail program, like 'mail', 'mailx',
# etc. Later in this script, it is assumed that this mail program supports
# specifying the mail subject on the command line with the "-s" option.
# Should you use 'sendmail', you'll have to modify the script, or do without
# the mail subject, as 'sendmail' does not have this "-s" option.
# NT users may want to use 'ssmtp' (part of CygWin) as their mail
# program (also see comment below).
mailprog="$dft_mailprog"  # define your own setting here

# Define a list of people receiving results by email:
dba_mail="$dft_dba_mail"  # define your own setting here

skip_when_empty=NO # if YES, will not send mail when no errors were found

#---------------------------------------------------------------------------

# The marker strings below can be set to any arbitrary string, as long
# as this is unique and does not appear in the errorlog as part of any
# error message.
# These strings should not be changed anymore once you've started using
# this script.
marker="_Marker_For_Checking_Errorlog_"        #do not change this !
marker2="_Marker_End_"                #do not change this !

#--------------------------------------------------------------------------

# Change the below to 'gawk' (or 'nawk') if desired... This may be needed
# when hitting built-in max. string length limits in 'awk'. 'gawk' etc.
# tend to be more flexible.
AWK='awk'   # awk|gawk

#---------------------------------------------------------------------

# Check the mail program and email adresses have been defined
if [ "$mailprog" = "$dft_mailprog" ]
then
   echo ""
   echo "You must first define the variable 'mailprog' in this script;"
   echo "please set it to the name of your command-line mail program,"
   echo "like 'mail', 'mailx', etc."
   echo ""
   exit 1
fi

if [ "$dba_mail" = "$dft_dba_mail" ]
then
   echo ""
   echo "You must first define the variable 'dba_mail' in this script;"
   echo "please set it to a list of recipients."
   echo ""
   exit 1
fi

#--------------------------------------------------------------------------

# First locate the server errorlog
rm -f "$logfile_copy"

if [ "$logfile" = "" ]
then
   # Pick up the server errorlog pathname; first check if this is 12.0
   # or later to determine the method for doing this
   #
   cat << --EOF-- > "${tmp}/vchk.sql"
select name from sysobjects  -- used for ASE version check
where name = "sysqueryplans"
go
dbcc traceon(3604)
go
dbcc resource -- contains errorlog pathname
go
--EOF--

   # The below isql session also doubles as an ASE access and
   # privilege check.
   # Using 'cat' and piping the SQL to isql is done to make it run on
   # Windows NT as well ('cos the NT version of 'isql' won't understand
   # Unix-style pathnames)
   #
   < "${tmp}/vchk.sql" isql -S"$srv" -U"$login" -P"$psswd" -w500 > "${tmp}/vchk"

   if grep -q "CT-LIBRARY error" "${tmp}/vchk"
   then
       cat "${tmp}/vchk"
       echo ""
       echo "*** Note: in case you cannot connect because the ASE server is down,"
       echo "*** you can also specify the errorlog pathname explicitly."
       echo ""
       usage
       exit 1
   fi

   if grep "You must have the following role(s) to" "${tmp}/vchk"
   then
       exit 1
   fi

   # 18-Sep-2001 Corrected the test below: it said "-ne 1" instead of "-eq 1",
   # causing it to not to identify version pre-12.0 correctly
   # (thanks to Jean Loesch)
   #
   if [ "$(grep -c "sysqueryplans" "${tmp}/vchk")" -eq 1 ]
   then
       #--------------------------------------------------------------------------

       # This is ASE 12.0+, so locate the errorlog through @@errorlog (this isn't
       # really necessary, as dbcc resource would still work fine), but let's do
       # it anyway for educational purposes ...

       cat << --EOF-- > "${tmp}/ataterrlog.sql"
print @@errorlog
go
--EOF--

       < "${tmp}/ataterrlog.sql" isql -S"$srv" -U"$login" -P"$psswd" > "${tmp}/ataterrlog"

       logfile=$( "$AWK" '{print $1}' "${tmp}/ataterrlog" )

       #--------------------------------------------------------------------------

   else # not 12.0+

       # This is ASE pre-12.0, so locate the errorlog through dbcc resource (already
       # executed above)

       logfile=$( "$AWK" 'sub(/.*rerrfile=/,""){print $1}' "${tmp}/vchl" )

   fi

fi # if $logfile = ""

#--------------------------------------------------------------------------

# Errorlog file name known now, check if it's there
if [ ! -f "$logfile" ]
then
   echo "Error accessing server errorlog file [$logfile] - file not found"
   echo "Note: this script must be run on the same host where the "
   echo "ASE errorlog file is located."
   exit 1
fi

cp "$logfile" "$logfile_copy"

#--------------------------------------------------------------------------
# Check option parameter
#
if [ "$opt" = "" ]
then
   scan_all=N
else
   scan_all=Y
   echo "Scanning the entire ASE errorlog."
fi

#--------------------------------------------------------------------------

if [ "$scan_all" = "N" ]
then

   # Skip the part of the errorlog until the last marker

   # Note: if the next line gives an error message, use a different shell

   last_marker=$("$AWK" -v marker="$marker" '$0 ~ marker { a=NR } END { print a+0 }' "$logfile_copy")
   if [ ! "$last_marker" = "" ]
   then
       sed "1,${last_marker}d" "$logfile_copy" > "${tmp}/x" &&
       cp "${tmp}/x" "$logfile_copy"
   fi

fi

#--------------------------------------------------------------------------

# Create output file
{
   echo "Checking ASE errorlog"
   date
   echo "Server=$srv"
   echo "Errorlog=$logfile"
   echo ""
} > "${tmp}/out"

#--------------------------------------------------------------------------

# Finally... search for errors in the log file. The below set of search
# strings catches pretty much everything, but you can add any string here
# which you would also like to search for...
#
# Note that these strings indicate the presence of messages that should
# be investigated. Still, this may require further inspection of the
# errorlog, as more messages may be present which contain additional
# information.

grep -Ei '(warning|severity|fail|unmirror|mirror exit|not enough|error|suspect|corrupt|correct|deadlock|critical|allow|infect|error|full|problem|unable|not found|threshold|couldn|not valid|invalid|NO_LOG|logsegment|syslogs|stacktrace)' "$logfile_copy" |
grep -Evi '(successfull|_Marker_|(Suspect Granularity))' > "${tmp}/out2"

nrlines=$(wc -l "${tmp}/out2" | "$AWK" '{print $1}')

cat "${tmp}/out2" >> "${tmp}/out"

#--------------------------------------------------------------------------
#
echo "$nrlines error lines found in errorlog for ASE server '$srv'"

{
   echo ""
   echo "$nrlines error lines found in errorlog for ASE server '$srv'"
   echo ""
   echo "(end)"
} >> "${tmp}/out"

if [ "$skip_when_empty" = "NO" ] && [ "$nrlines" -eq 0 ]
then
   nrlines=1  # to force it into mailing anyway
fi

if [ "$nrlines" -gt 0 ]
then
   # Mail any error messages found to the list of recipients
   # (note: assumption is that the -s "subject" option is available for
   # your email program. Should you use "sendmail", it may not be
   # available, and you'd have to remove this option; when you're familiar
   # with 'sendmail', you can add the subject line yourself by inserting
   # header lines into the message file)
   #
   # Note for NT users: if you need a command-line mail program on NT,
   # consider 'ssmtp'. This is part of the CygWin package, which you need
   # anyway to run this script on NT. The download location for CygWin
   # is in the file header above.

   subj="Results of ASE errorlog check for '$srv'"
   "$mailprog" -s "$subj" "$dba_mail" < "${tmp}/out"
fi

#--------------------------------------------------------------------------

if [ "$scan_all" = "N" ]
then
   # Write a new marker to the server errorlog to indicate we got till here
   # Only do this when (i) no explicit errorlog pathname was specified and
   # (ii) only the last part of the log was scanned.

   cat << --EOF-- > "${tmp}/logprint.sql"
dbcc logprint ("$marker")
dbcc logprint ("$marker2") -- need a second line to avoid missing the last line
if @@error = 0 print "Writing marker to ASE errorlog."
-- note: in ASE 12.0, we could the more tidy "dbcc printolog(string)" instead
go
--EOF--

   < "${tmp}/logprint.sql" isql -S"$srv" -U"$login" -P"$psswd" | grep -Ev '(DBCC execution compl|(SA))'
fi

#--------------------------------------------------------------------------
# end
#

還可以進行其他改進,並且未經測試,因此可能存在錯誤,但希望您可以將其與原始版本進行比較,以了解應以何種方式更改原始版本。

引用自:https://unix.stackexchange.com/questions/580887