Centos
測試腳本/程序/應用程序以檢查我的網站是否線上
我剛剛在 CentOS 中使用 Apache 部署了我的網路伺服器,我想知道是否有人對如何在伺服器出現故障時檢查每個指定時間量有什麼好的想法,然後我可以在發生這種情況時使用 postfix 給我發電子郵件,這樣我就可以回去了立即到我的伺服器並修復問題,看看是什麼導致了問題。我猜許多網站使用一些軟體/腳本在客戶開始抱怨問題之前讓他們知道他們的服務何時停止。
如果伺服器還活著,你可以在你的伺服器中有一個腳本,在它們之間進行各種測試,這是一個:
#!/bin/bash date; echo "uptime:" uptime echo "Currently connected:" w echo "--------------------" echo "Last logins:" last -a |head -3 echo "--------------------" echo "Disk and memory usage:" df -h | xargs | awk '{print "Free/total disk: " $11 " / " $9}' free -m | xargs | awk '{print "Free/total memory: " $17 " / " $8 " MB"}' echo "--------------------" start_log=`head -1 /var/log/messages |cut -c 1-12` oom=`grep -ci kill /var/log/messages` echo -n "OOM errors since $start_log :" $oom echo "" echo "--------------------" echo "Utilization and most expensive processes:" top -b |head -3 echo top -b |head -10 |tail -4 echo "--------------------" echo "Open TCP ports:" nmap -p- -T4 127.0.0.1 echo "--------------------" echo "Current connections:" ss -s echo "--------------------" echo "processes:" ps auxf --width=200 echo "--------------------" echo "vmstat:" vmstat 1 5
這將為您提供以下結果:
./Server-Health.sh Tue Jul 16 22:01:06 IST 2013 uptime: 22:01:06 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18 Currently connected: 22:01:06 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT tecmint pts/0 116.72.134.162 21:48 0.00s 0.03s 0.03s sshd: tecmint [priv] -------------------- Last logins: tecmint pts/0 Tue Jul 16 21:48 still logged in 116.72.134.162 tecmint pts/0 Tue Jul 16 21:24 - 21:43 (00:19) 116.72.134.162 -------------------- Disk and memory usage: Free/total disk: 292G / 457G Free/total memory: 3510 / 3838 MB -------------------- OOM errors since Jul 14 03:37 : 0 -------------------- Utilization and most expensive processes: top - 22:01:07 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18 Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 99.3%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 3788 1128 932 S 0.0 0.0 0:32.94 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:14.07 migration/0
他們可以監控的不僅僅是 http url: things linke
ssh
,mysql
, , 如果磁碟空間變得稀缺 - - 還有更多。可能對您的使用有點矯枉過正?如果事情失敗,服務會寫郵件。