自 debian 升級以來,check_nrpe 命令在 nagios 伺服器上不起作用
昨天我將一個伺服器從 Debian 9 升級到 Debian 10。這個伺服器由 nagios 監督。自升級以來,我收到一個警報,狀態未知:
“卷組數組 03-0 無效或未使用“-v 卷組”指定,再見。假
服務是VG baie03-0用法,它的命令是check_nrpe!check_vgs_array03-0。此服務的目標是在陣列上的儲存快滿時生成警報。
check_nrpe 命令是標準的:
# 'check_NRPE' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }
如果我沒記錯的話,這意味著我在受監督伺服器上的 /etc/nagios/nrpe.cfg 中有一個 check_vgs_array03-0 命令。讓我們看一下,這裡是:
命令
$$ check_vgs_array03-0 $$=/usr/lib/nagios/plugins/check_vg_size -w 20 -c 10 -v array03-0
如果我只是在受監督的伺服器上鍵入此命令,我沒有錯誤,它可以工作。
VG array03-0 OK 可用空間為 805 GB;| 數組03-0=805GB;20;10;0;19155
例如,如果我輸入了一個不存在的捲組名稱,我就會收到錯誤消息。
check_vg_size 外掛腳本是這樣的:
#!/bin/bash #check_vg_size #set -x # Plugin for Nagios # Written by M. Koettenstorfer (mko@lihas.de) # Some additions by J. Schoepfer (jsc@lihas.de) # Major changes into functions and input/output values J. Veverka (veverka.kuba@gmail.com) # Last Modified: 2012-11-06 # # Description: # # This plugin will check howmany space in volume groups is free # Nagios return codes STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 STATE_DEPENDENT=4 SERVICEOUTPUT="" SERVICEPERFDATA="" PROGNAME=$(basename $0) vgs_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f vgs | awk '{ print $2 }'` _vgs="$vgs_bin --units=g" bc_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f bc | awk '{ print $2 }'` exitstatus=$STATE_OK #default declare -a volumeGroups; novg=0; #number of volume groups allVG=false; #Will we use all volume groups we can find on system? inPercent=false; #Use percentage for comparison? unitsGB="GB" unitsPercent="%" units=$unitsGB ######################################################################## ### DEFINE FUNCTIONS ######################################################################## print_usage() { echo "Usage: $PROGNAME -w <min size warning level in gb> -c <min size critical level in gb> -v <volumegroupname> [-a] [-p]" echo "If '-a' and '-v' are specified: all volumegroups defined by -v will be ommited and the remaining groups which are found on system are checked" echo "If '-p' is specified: the warning and critical levels are represented as the percent space left on device" echo "" } print_help() { print_usage echo "" echo "This plugin will check how much space is free in volume groups" echo "usage: " exit $STATE_UNKNOWN } checkArgValidity () { # Check arguments for validity if [[ -z $critlevel || -z $warnlevel ]] # Did we get warn and crit values? then echo "You must specify a warning and critical level" print_usage exitstatus=$STATE_UNKNOWN exit $exitstatus elif [ $warnlevel -le $critlevel ] # Do the warn/crit values make sense? then if [ $inPercent != 'true' ] then echo "CRITICAL value of $critlevel GB is less than WARNING level of $warnlevel GB" print_usage exitstatus=$STATE_UNKNOWN exit $exitstatus else echo "CRITICAL value of $critlevel % is higher than WARNING level of $warnlevel %" print_usage exitstatus=$STATE_UNKNOWN exit $exitstatus fi fi } #Does volume group actually exist? volumeGroupExists () { local volGroup="$@" VGValid=$($_vgs 2>/dev/null | grep "$volGroup" | wc -l ) if [[ -z "$volGroup" || $VGValid = 0 ]] then echo "Volumegroup $volGroup wasn't valid or wasn't specified" echo "with \"-v Volumegroup\", bye." echo false return 1 else #The volume group exists echo true return 0 fi } getNumberOfVGOnSystem () { local novg=$($_vgs 2>/dev/null | wc -l) let novg-- echo $novg } getAllVGOnSystem () { novg=$(getNumberOfVGOnSystem) local found=false; for (( i=0; i < novg; i++)); do volumeGroups[$i]=$($_vgs | tail -n $(($i+1)) | head -n 1 | awk '{print $1}') found=true; done if ( ! $found ); then echo "$found" echo "No Volumegroup wasn't valid or wasn't found" exit $STATE_UNKNOWN fi } getColumnNoByName () { columnName=$1 result=$($_vgs 2>/dev/null | head -n1 | awk -v name=$columnName ' BEGIN{} { for(i=1;i<=NF;i++){ if ($i ~ name) {print i } } }') echo $result } convertToPercent () { #$1 = xx% #$2 = 100% # Make values numbers only local input="$(echo $1 | sed 's/g//i')" local max="$(echo $2 | sed 's/g//i')" local onePercent=''; local freePercent=''; if [ -x "$bc_bin" ] ; then onePercent=$( echo "scale=2; $max / 100" | bc ); freePercent=$( echo "$input / $onePercent" | bc ); else freePercent=$(perl -e "print int((($max-$input)*100/$max))") fi echo $freePercent; return 0; } getSizesOfVolume () { volumeName="$1"; #Check the actual sizes cnFree=`getColumnNoByName "VFree"`; cnSize=`getColumnNoByName "VSize"`; freespace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnFree '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`; fullspace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnSize '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`; if ( $inPercent ); then #Convert to Percents freespace="$(convertToPercent $freespace $fullspace)" fi } setExitStatus () { local status=$1 local volGroup="$2" local formerStatus=$exitstatus if [ $status -gt $formerStatus ] then formerStatus=$status fi if [ $status = $STATE_UNKNOWN ] ; then SERVICEOUTPUT="${volGroup}" exitstatus=$STATE_UNKNOWN return fi if [ "$freespace" -le "$critlevel" ] then SERVICEOUTPUT=$SERVICEOUTPUT" VG $volGroup CRITICAL Available space is $freespace $units;" exitstatus=$STATE_CRITICAL elif [ "$freespace" -le "$warnlevel" ] then SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup WARNING Available space is $freespace $units;" exitstatus=$STATE_WARNING else SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup OK Available space is $freespace $units;" exitstatus=$STATE_OK fi SERVICEPERFDATA="$SERVICEPERFDATA $volGroup=$freespace$units;$warnlevel;$critlevel" if [ $inPercent != 'true' ] ; then SERVICEPERFDATA="${SERVICEPERFDATA};0;$fullspace" fi if [ $formerStatus -gt $exitstatus ] then exitstatus=$formerStatus fi } checkVolumeGroups () { checkArgValidity for (( i=0; i < novg; i++ )); do local status="$STATE_OK" local currentVG="${volumeGroups[$i]}" local groupExists="$(volumeGroupExists "$currentVG" )" if [ "$groupExists" = 'true' ]; then getSizesOfVolume "$currentVG" status=$STATE_OK else status=$STATE_UNKNOWN setExitStatus $status "${groupExists}" break fi setExitStatus $status "$currentVG" done } ######################################################################## ### RUN PROGRAM ######################################################################## ######################################################################## #Read input values while getopts ":w:c:v:h:ap" opt ;do case $opt in h) print_help; exit $exitstatus; ;; w) warnlevel=$OPTARG; ;; c) critlevel=$OPTARG; ;; v) if ( ! $allVG ); then volumeGroups[$novg]=$OPTARG; let novg++; fi ;; a) allVG=true; getAllVGOnSystem; ;; p) inPercent=true; units=$unitsPercent ;; \?) echo "Invalid option: -$OPTARG" >&2 ;; esac done checkVolumeGroups echo $SERVICEOUTPUT"|"$SERVICEPERFDATA exit $exitstatus
II 對 check_nrpe 命令使用另一個 arg(另一個腳本),它可以工作。
例如 :
root@nagiosserver:/usr/local/nagios# /usr/local/nagios/libexec/check_nrpe -H srv-supervised04 -c check_load OK - 平均負載:3.79, 2.99, 1.83|load1=3.790;25.000;30.000;0; 負載5=2.990;20.000;25.000;0; 負載15=1.830;15.000;20.000;0;
VG array03-0 確實存在:
root@srv-supervised04:/usr/lib/nagios/plugins# vgdisplay — Volume group — VG Name array03-0 System ID Format
lvm2 Metadata Areas 1 Metadata Sequence No 34 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 4 Max PV
0 Cur PV 1 Act PV 1 VG Size
<18,71 TiB PE Size 4,00 MiB Total PE
4903887 Alloc PE / Size 4697600 / <17,92 TiB Free PE / Size 206287 / < 805,81 GiB VG UUID
OgzAMF-DGbW-3t3L-Wk7k-gY1g-s6fH-zYEKad
所以。VG確實存在。check_vg_size 外掛在本地使用時工作,check_nrpe 命令在與另一個外掛一起使用時在 nagios 伺服器上工作,但 check_vg_size 在 nagios 伺服器上不起作用。錯誤消息顯然是 array03-0 在它存在時不存在。我沒有更改所有文件中的任何內容。它出現在 Debian 從 9 到 10 的更新中(在安裝過程中,我決定保留我的 nrpe.cfg 修改文件)。
有誰知道它可以從哪裡來?Debian 版本?也許是新的 bash 版本?nagios 伺服器(仍然是 Debian 9)和受監督的伺服器(Debian 10)之間不兼容?
我修好了它。
這是一個權限問題。我必須在外掛上向 nagios 使用者授予 sudo 權限。
nagios ALL=(root) NOPASSWD: /usr/lib/nagios/plugins/check_vg_size
然後修改
/etc/nrpe.cfg
在命令開始之前添加 sudo
command[check_vgs_array03-0]= sudo /usr/lib/nagios/plugins/check_vg_size -w 20 -c 10 -v array03-0