節點 1 上的 MySQL 伺服器 monitor_20000 ‘未執行’ - HA 集群 - Pacemaker - Corosync - DRBD
完成集群配置後出現此錯誤。在嘗試從 node2 進行故障恢復後
mysql_service01_monitor_20000 on node1 'not running' (7): call=20, status=complete, exitreason='none'
關閉集群並重新啟動 mariadb
pcs cluster stop --all service mariadb restart service mariadb stop pcs cluster start --all
一切都線上。
pcs cluster standby node1
故障轉移到節點 2。但是我再次收到此錯誤…
mysql_service01_monitor_20000 on node1 'not running' (7): call=77, status=complete, exitreason='none'
嘗試故障回復到 node1
pcs cluster unstandby node1 pcs cluster standby node2
不會故障恢復並顯示以下內容:
Failed Actions: * mysql_service01_monitor_20000 on node2 'not running' (7): call=141, status=complete, exitreason='none', last-rc-change='Mon Jun 13 20:33:36 2016', queued=0ms, exec=43ms * mysql_service01_monitor_20000 on node1 'not running' (7): call=77, status=complete, exitreason='none', last-rc-change='Mon Jun 13 20:31:23 2016', queued=0ms, exec=42ms * mysql_fs01_start_0 on node1 'unknown error' (1): call=113, status=complete, exitreason='Couldn't mount filesystem /dev/drbd0 on /var/lib/mysql', last-rc-change='Mon Jun 13 20:33:47 2016', queued=0ms, exec=53ms
關閉集群並在兩個節點上再次重新啟動 MariaDB 後,我再次啟動它。
mysql_service01 (ocf::heartbeat:mysql): FAILED node1
在完整的 pcs cluster stop –all (successfull) 並重新啟動和 pcs cluster start –all 之後。一切正常!
這有點隨意,但它確實是 HA,而且我有故障轉移設置的電子郵件通知,所以希望我們可以在 node1 上完成備份、關閉和重新啟動服務的一天。但我很想知道發生了什麼,以及如何阻止這種情況,這肯定會讓我老闆的展示看起來很糟糕。
我的配置:
禁用防火牆/SELinux
sed -i 's/\(^SELINUX=\).*/\SELINUX=disabled/' /etc/selinux/config systemctl disable firewalld.service systemctl stop firewalld.service iptables --flush reboot
安裝 PaceMaker + Corosync (CentOS 7)
hostnamectl set-hostname $(uname -n | sed s/\\..*//) yum install -y pcs policycoreutils-python psmisc echo "passwd" | passwd hacluster --stdin systemctl start pcsd.service systemctl enable pcsd.service
在 Node1 上授權
pcs cluster auth node1 node2 -u hacluster -p passwd pcs cluster setup --force --name mysql_cluster node1 node2 pcs cluster start --all pcs status | grep UNCLEAN
安裝 DRBD/MariaDB :
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm yum install -y kmod-drbd84 drbd84-utils mariadb-server mariadb systemctl disable mariadb.service cat << EOL > /etc/my.cnf [mysqld] symbolic-links=0 bind_address = 0.0.0.0 datadir = /var/lib/mysql pid_file = /var/run/mariadb/mysqld.pid socket = /var/run/mariadb/mysqld.sock [mysqld_safe] bind_address = 0.0.0.0 datadir = /var/lib/mysql pid_file = /var/run/mariadb/mysqld.pid socket = /var/run/mariadb/mysqld.sock !includedir /etc/my.cnf.d EOL
Drbd 資源:
cat << EOL >/etc/drbd.d/mysql01.res resource mysql01 { protocol C; meta-disk internal; device /dev/drbd0; disk /dev/sdb1; handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh root"; } net { allow-two-primaries no; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } disk { on-io-error detach; } syncer { verify-alg sha1; } on node1 { address 192.168.1.216:7788; } on node2 { address 192.168.1.220:7788; } } EOL fdisk /dev/sdb drbdadm create-md mysql01 modprobe drbd drbdadm up mysql01 drbdadm -- --overwrite-data-of-peer primary mysql01 drbdadm primary --force mysql01 watch cat /proc/drbd mkfs.ext4 /dev/drbd0 mount /dev/drbd0 /mnt df -h | grep drbd umount /mnt mount /dev/drbd0 /mnt # I Always get IO Errors so I just drbdadm up mysql01 # Both nodes watch cat /proc/drbd mount /dev/drbd0 /mnt df -h | grep drbd systemctl start mariadb mysql_install_db --datadir=/mnt --user=mysql umount /mnt systemctl stop mariadb
PaceMaker Corosync 配置:
pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \ drbd_resource=mysql01 \ op monitor interval=30s pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \ master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 \ notify=true pcs -f clust_cfg resource create mysql_fs01 Filesystem \ device="/dev/drbd0" \ directory="/var/lib/mysql" \ fstype="ext4" pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \ binary="/usr/bin/mysqld_safe" \ config="/etc/my.cnf" \ datadir="/var/lib/mysql" \ pid="/var/lib/mysql/mysql.pid" \ socket="/var/lib/mysql/mysql.sock" \ additional_parameters="--bind-address=0.0.0.0" \ op start timeout=60s \ op stop timeout=60s \ op monitor interval=20s timeout=30s pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \ ip=192.168.1.215 cidr_netmask=32 nic=eth0 \ op monitor interval=30s pcs -f clust_cfg constraint colocation add mysql_service01 with mysql_fs01 INFINITY pcs -f clust_cfg constraint colocation add mysql_VIP01 with mysql_service01 INFINITY pcs -f clust_cfg constraint colocation add mysql_fs01 with MySQLClone01 INFINITY with-rsc-role=Master pcs -f clust_cfg constraint order mysql_service01 then mysql_VIP01 pcs -f clust_cfg constraint location mysql_fs01 prefers node1=50 pcs -f clust_cfg property set stonith-enabled=false pcs -f clust_cfg property set no-quorum-policy=ignore pcs -f clust_cfg resource defaults resource-stickiness=200 pcs -f clust_cfg resource group add SQL-Group mysql_service01 mysql_fs01 mysql_VIP01 pcs cluster cib-push clust_cfg pcs status
更新評論:
這是否足夠,我假設我想要複製之前的 FS 和服務之前的 FS。此外,使用我從中複製的 Apache 配置,我在網路伺服器之前啟動了 VIP,但在我為 SQL 遵循的指南中,它首先啟動了 VIP。有什麼想法嗎?
pcs -f clust_cf constraint order promote MySQLClone01 then start mysql_fs01 pcs -f clust_cf constraint order mysql_fs01 then mysql_service01
如果它修復它,我將測試並返回!謝謝
這似乎已經解決了問題,故障轉移應該發生,但我仍然收到錯誤,但就像我說的,它工作得很好!不喜歡看到錯誤,但故障轉移時間就像 2 秒。
pcs constraint order promote MySQLClone01 then start mysql_fs01 pcs constraint order mysql_service01 then mysql_fs01
組意味著排序和位置。所以你的小組說,“啟動 mysql,然後掛載文件系統,然後啟動 VIP”。這不僅是不正確的排序,而且與您的排序約束相矛盾。
您應該將除 DRBD 之外的所有內容都放在組中,然後放置一個單一的排序和單一的託管約束,將組與 DRBD 是 Master 的位置聯繫起來。
向集群添加約束的順序對結果絕對沒有影響。
根據你在那裡的情況,它看起來像這樣:
# pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \ drbd_resource=mysql01 op monitor interval=30s # pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ notify=true # pcs -f clust_cfg resource create mysql_fs01 Filesystem \ device="/dev/drbd0" directory="/var/lib/mysql" fstype="ext4" # pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \ binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" \ datadir="/var/lib/mysql" pid="/var/lib/mysql/mysql.pid" \ socket="/var/lib/mysql/mysql.sock" \ additional_parameters="--bind-address=0.0.0.0" \ op start timeout=60s op stop timeout=60s \ op monitor interval=20s timeout=30s # pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \ ip=192.168.1.215 cidr_netmask=32 nic=eth0 op monitor interval=30s # pcs -f clust_cfg resource group add SQL-Group mysql_fs01 \ mysql_service01 mysql_VIP01 # pcs -f clust_cf constraint order promote MySQLClone01 \ then start SQL-Group # pcs -f clust_cfg constraint colocation add SQL-Group with MySQLClone01 INFINITY with-rsc-role=Master # pcs cluster cib-push clust_cfg