Centos

測試腳本/程序/應用程序以檢查我的網站是否線上

  • August 8, 2016

我剛剛在 CentOS 中使用 Apache 部署了我的網路伺服器,我想知道是否有人對如何在伺服器出現故障時檢查每個指定時間量有什麼好的想法,然後我可以在發生這種情況時使用 postfix 給我發電子郵件,這樣我就可以回去了立即到我的伺服器並修復問題,看看是什麼導致了問題。我猜許多網站使用一些軟體/腳本在客戶開始抱怨問題之前讓他們知道他們的服務何時停止。

如果伺服器還活著,你可以在你的伺服器中有一個腳本,在它們之間進行各種測試,這是一個:

#!/bin/bash
   date;
   echo "uptime:"
   uptime
   echo "Currently connected:"
   w
   echo "--------------------"
   echo "Last logins:"
   last -a |head -3
   echo "--------------------"
   echo "Disk and memory usage:"
   df -h | xargs | awk '{print "Free/total disk: " $11 " / " $9}'
   free -m | xargs | awk '{print "Free/total memory: " $17 " / " $8 " MB"}'
   echo "--------------------"
   start_log=`head -1 /var/log/messages |cut -c 1-12`
   oom=`grep -ci kill /var/log/messages`
   echo -n "OOM errors since $start_log :" $oom
   echo ""
   echo "--------------------"
   echo "Utilization and most expensive processes:"
   top -b |head -3
   echo
   top -b |head -10 |tail -4
   echo "--------------------"
   echo "Open TCP ports:"
   nmap -p- -T4 127.0.0.1
   echo "--------------------"
   echo "Current connections:"
   ss -s
   echo "--------------------"
   echo "processes:"
   ps auxf --width=200
   echo "--------------------"
   echo "vmstat:"
   vmstat 1 5

這將為您提供以下結果:

./Server-Health.sh

Tue Jul 16 22:01:06 IST 2013
uptime:
22:01:06 up 174 days,  4:42,  1 user,  load average: 0.36, 0.25, 0.18
Currently connected:
22:01:06 up 174 days,  4:42,  1 user,  load average: 0.36, 0.25, 0.18
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
tecmint   pts/0    116.72.134.162   21:48    0.00s  0.03s  0.03s sshd: tecmint [priv]
--------------------
Last logins:
tecmint   pts/0        Tue Jul 16 21:48   still logged in    116.72.134.162
tecmint   pts/0        Tue Jul 16 21:24 - 21:43  (00:19)     116.72.134.162
--------------------
Disk and memory usage:
Free/total disk: 292G / 457G
Free/total memory: 3510 / 3838 MB
--------------------
OOM errors since Jul 14 03:37 : 0
--------------------
Utilization and most expensive processes:
top - 22:01:07 up 174 days,  4:42,  1 user,  load average: 0.36, 0.25, 0.18
Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.0%sy,  0.0%ni, 99.3%id,  0.6%wa,  0.0%hi,  0.0%si,  0.0%st

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   1 root      20   0  3788 1128  932 S  0.0  0.0   0:32.94 init
   2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
   3 root      RT   0     0    0    0 S  0.0  0.0   0:14.07 migration/0

有像nagiosicinga這樣的工具可以監控伺服器。

他們可以監控的不僅僅是 http url: things linke ssh, mysql, , 如果磁碟空間變得稀缺 - - 還有更多。可能對您的使用有點矯枉過正?如果事情失敗,服務會寫郵件。

引用自:https://unix.stackexchange.com/questions/207367