節點 1 上的 MySQL 伺服器 monitor_20000 ‘未執行’ - HA 集群 - Pacemaker - Corosync - DRBD

June 14, 2016

完成集群配置後出現此錯誤。在嘗試從 node2 進行故障恢復後

mysql_service01_monitor_20000 on node1 'not running' (7): call=20, status=complete, exitreason='none'

關閉集群並重新啟動 mariadb

pcs cluster stop --all

service mariadb restart

service mariadb stop

pcs cluster start --all

一切都線上。

pcs cluster standby node1

故障轉移到節點 2。但是我再次收到此錯誤…

mysql_service01_monitor_20000 on node1 'not running' (7): call=77, status=complete, exitreason='none'

嘗試故障回復到 node1

pcs cluster unstandby node1
pcs cluster standby node2

不會故障恢復並顯示以下內容：

Failed Actions:
* mysql_service01_monitor_20000 on node2 'not running' (7): call=141, status=complete, exitreason='none',
last-rc-change='Mon Jun 13 20:33:36 2016', queued=0ms, exec=43ms
* mysql_service01_monitor_20000 on node1 'not running' (7): call=77, status=complete, exitreason='none',
last-rc-change='Mon Jun 13 20:31:23 2016', queued=0ms, exec=42ms
* mysql_fs01_start_0 on node1 'unknown error' (1): call=113, status=complete, exitreason='Couldn't mount filesystem /dev/drbd0 on /var/lib/mysql',
last-rc-change='Mon Jun 13 20:33:47 2016', queued=0ms, exec=53ms

關閉集群並在兩個節點上再次重新啟動 MariaDB 後，我再次啟動它。

mysql_service01    (ocf::heartbeat:mysql): FAILED node1

在完整的 pcs cluster stop –all (successfull) 並重新啟動和 pcs cluster start –all 之後。一切正常！

這有點隨意，但它確實是 HA，而且我有故障轉移設置的電子郵件通知，所以希望我們可以在 node1 上完成備份、關閉和重新啟動服務的一天。但我很想知道發生了什麼，以及如何阻止這種情況，這肯定會讓我老闆的展示看起來很糟糕。

我的配置：

禁用防火牆/SELinux

sed -i 's/\(^SELINUX=\).*/\SELINUX=disabled/' /etc/selinux/config
systemctl disable firewalld.service
systemctl stop firewalld.service
iptables --flush
reboot

安裝 PaceMaker + Corosync (CentOS 7)

hostnamectl set-hostname $(uname -n | sed s/\\..*//)
yum install -y pcs policycoreutils-python psmisc
echo "passwd" | passwd hacluster --stdin
systemctl start pcsd.service
systemctl enable pcsd.service

在 Node1 上授權

pcs cluster auth node1 node2 -u hacluster -p passwd
pcs cluster setup --force --name mysql_cluster node1 node2
pcs cluster start --all
pcs status | grep UNCLEAN

安裝 DRBD/MariaDB ：

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install -y kmod-drbd84 drbd84-utils mariadb-server mariadb
systemctl disable mariadb.service

cat &lt;&lt; EOL &gt; /etc/my.cnf
[mysqld]
symbolic-links=0
bind_address            = 0.0.0.0
datadir                 = /var/lib/mysql
pid_file                = /var/run/mariadb/mysqld.pid
socket                  = /var/run/mariadb/mysqld.sock

[mysqld_safe]
bind_address            = 0.0.0.0
datadir                 = /var/lib/mysql
pid_file                = /var/run/mariadb/mysqld.pid
socket                  = /var/run/mariadb/mysqld.sock

!includedir /etc/my.cnf.d
EOL

Drbd 資源：

cat &lt;&lt; EOL &gt;/etc/drbd.d/mysql01.res
resource mysql01 {
protocol C;
meta-disk internal;
device /dev/drbd0;
disk   /dev/sdb1;
handlers {
 split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
 allow-two-primaries no;
 after-sb-0pri discard-zero-changes;
 after-sb-1pri discard-secondary;
 after-sb-2pri disconnect;
 rr-conflict disconnect;
}
disk {
 on-io-error detach;
}
syncer {
 verify-alg sha1;
}
on node1 {
 address  192.168.1.216:7788;
}
on node2 {
 address  192.168.1.220:7788;
}
}
EOL

fdisk /dev/sdb

drbdadm create-md mysql01
modprobe drbd
drbdadm up mysql01

drbdadm -- --overwrite-data-of-peer primary mysql01
drbdadm primary --force mysql01
watch cat /proc/drbd
mkfs.ext4 /dev/drbd0
mount /dev/drbd0 /mnt
df -h | grep drbd
umount /mnt
mount /dev/drbd0 /mnt # I Always get IO Errors so I just
drbdadm up mysql01 # Both nodes
watch cat /proc/drbd
mount /dev/drbd0 /mnt
df -h | grep drbd
systemctl start mariadb
mysql_install_db --datadir=/mnt --user=mysql
umount /mnt
systemctl stop mariadb

PaceMaker Corosync 配置：

pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \
 drbd_resource=mysql01 \
 op monitor interval=30s
   pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \
 master-max=1 master-node-max=1 \
 clone-max=2 clone-node-max=1 \
 notify=true
   pcs -f clust_cfg resource create mysql_fs01 Filesystem \
 device="/dev/drbd0" \
 directory="/var/lib/mysql" \
 fstype="ext4"
   pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \
 binary="/usr/bin/mysqld_safe" \
 config="/etc/my.cnf" \
 datadir="/var/lib/mysql" \
 pid="/var/lib/mysql/mysql.pid" \
 socket="/var/lib/mysql/mysql.sock" \
 additional_parameters="--bind-address=0.0.0.0" \
 op start timeout=60s \
 op stop timeout=60s \
 op monitor interval=20s timeout=30s
   pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \
ip=192.168.1.215 cidr_netmask=32 nic=eth0 \
op monitor interval=30s
   pcs -f clust_cfg constraint colocation add mysql_service01 with mysql_fs01 INFINITY
   pcs -f clust_cfg constraint colocation add mysql_VIP01 with mysql_service01 INFINITY
   pcs -f clust_cfg constraint colocation add mysql_fs01 with MySQLClone01 INFINITY with-rsc-role=Master
   pcs -f clust_cfg constraint order mysql_service01 then mysql_VIP01
   pcs -f clust_cfg constraint location mysql_fs01 prefers node1=50
   pcs -f clust_cfg property set stonith-enabled=false
   pcs -f clust_cfg property set no-quorum-policy=ignore
   pcs -f clust_cfg resource defaults resource-stickiness=200
   pcs -f clust_cfg resource group add SQL-Group  mysql_service01  mysql_fs01 mysql_VIP01
   pcs cluster cib-push clust_cfg
   pcs status

更新評論：

這是否足夠，我假設我想要複製之前的 FS 和服務之前的 FS。此外，使用我從中複製的 Apache 配置，我在網路伺服器之前啟動了 VIP，但在我為 SQL 遵循的指南中，它首先啟動了 VIP。有什麼想法嗎？

pcs -f clust_cf constraint order promote MySQLClone01 then start mysql_fs01
pcs -f clust_cf constraint order mysql_fs01 then mysql_service01

如果它修復它，我將測試並返回！謝謝

這似乎已經解決了問題，故障轉移應該發生，但我仍然收到錯誤，但就像我說的，它工作得很好！不喜歡看到錯誤，但故障轉移時間就像 2 秒。

pcs constraint order promote MySQLClone01 then start mysql_fs01
pcs constraint order mysql_service01 then mysql_fs01

組意味著排序和位置。所以你的小組說，“啟動 mysql，然後掛載文件系統，然後啟動 VIP”。這不僅是不正確的排序，而且與您的排序約束相矛盾。

您應該將除 DRBD 之外的所有內容都放在組中，然後放置一個單一的排序和單一的託管約束，將組與 DRBD 是 Master 的位置聯繫起來。

向集群添加約束的順序對結果絕對沒有影響。

根據你在那裡的情況，它看起來像這樣：

# pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd \
 drbd_resource=mysql01 op monitor interval=30s
# pcs -f clust_cfg resource master MySQLClone01 mysql_data01 \
 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
 notify=true
# pcs -f clust_cfg resource create mysql_fs01 Filesystem \
 device="/dev/drbd0" directory="/var/lib/mysql" fstype="ext4"
# pcs -f clust_cfg resource create mysql_service01 ocf:heartbeat:mysql \
 binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" \
 datadir="/var/lib/mysql" pid="/var/lib/mysql/mysql.pid" \
 socket="/var/lib/mysql/mysql.sock" \
 additional_parameters="--bind-address=0.0.0.0" \
 op start timeout=60s op stop timeout=60s \
 op monitor interval=20s timeout=30s
# pcs -f clust_cfg resource create mysql_VIP01 ocf:heartbeat:IPaddr2 \
 ip=192.168.1.215 cidr_netmask=32 nic=eth0 op monitor interval=30s
# pcs -f clust_cfg resource group add SQL-Group mysql_fs01 \
 mysql_service01 mysql_VIP01
# pcs -f clust_cf constraint order promote MySQLClone01 \
 then start SQL-Group
# pcs -f clust_cfg constraint colocation add SQL-Group with MySQLClone01 INFINITY with-rsc-role=Master
# pcs cluster cib-push clust_cfg

引用自：https://unix.stackexchange.com/questions/289568

節點 1 上的 MySQL 伺服器 monitor_20000 ‘未執行’ - HA 集群 - Pacemaker - Corosync - DRBD

相關問答

如果第一個節點關閉，PCS Stonith (fencing) 將殺死兩個節點集群

Pacemaker 按特定順序提升組 - 或指定最後開始

DRBD - ’node1’ 未在您的配置中定義（對於此主機） - 設置 Primary 時出錯

PCS 和 corosync/起搏器

為 PaceMaker + Corosync 創建 Virtual_IP 時遇到問題 - CentOS 7

集群成員中出現 Corosync 錯誤“未定義介面”