Linux
當我啟動 corosync 時,所有伺服器都因核心轉儲而恐慌
我升級了我的伺服器。然後我在我的伺服器上一一啟動 corosync 服務。我首先在 3 台伺服器上開始,然後等待 5 分鐘。然後我在其他伺服器上啟動了下一個 4 corosync,同時有 7 個伺服器崩潰了。我使用 corosync 已經 5 年了。我正在使用;
Kernel: 4.14.32-1-lts Corosync 2.4.2-1 Pacemaker 1.1.18-1
我以前從未見過。我猜在新的 corosync 版本中有些東西壞了真的很糟糕!
Kernel: 4.14.70-1-lts Corosync 2.4.4-3 Pacemaker 2.0.0-1
這是我的 corosync.conf:https ://paste.ubuntu.com/p/7KCq8pHKn3/你能告訴我如何找到問題的原因嗎?
Sep 25 08:56:03 SRV-2 corosync[29089]: [TOTEM ] A new membership (10.10.112.10:56) was formed. Members joined: 7 Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28 Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28 Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28 Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28 Sep 25 08:56:03 SRV-2 corosync[29089]: [QUORUM] Members[7]: 1 2 3 4 5 6 7 Sep 25 08:56:03 SRV-2 corosync[29089]: [MAIN ] Completed service synchronization, ready to provide service. Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28 Sep 25 08:56:03 SRV-2 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Sep 25 08:56:03 SRV-2 systemd[1]: Started Process Core Dump (PID 43798/UID 0). Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Main process exited, code=dumped, status=11/SEGV Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Failed with result 'core-dump'. Sep 25 08:56:03 SRV-2 kernel: watchdog: watchdog0: watchdog did not stop! Sep 25 08:56:03 SRV-2 systemd-coredump[43799]: Process 29089 (corosync) of user 0 dumped core. Stack trace of thread 29089: #0 0x0000000000000000 n/a (n/a) Write failed: Broken pipe coredumpctl info PID: 23658 (corosync) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Mon 2018-09-24 09:50:58 +03 (1 day 3h ago) Command Line: corosync Executable: /usr/bin/corosync Control Group: /system.slice/corosync.service Unit: corosync.service Slice: system.slice Boot ID: 79d67a83f83c4804be6ded8e6bd5f54d Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45 Hostname: SRV-1 Storage: /var/lib/systemd/coredump/core.corosync.0.79d67a83f83c4804be6ded8e6bd5f54d.23658.153777185> Message: Process 23658 (corosync) of user 0 dumped core. Stack trace of thread 23658: #0 0x0000000000000000 n/a (n/a) PID: 5164 (corosync) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Tue 2018-09-25 08:56:03 +03 (4h 9min ago) Command Line: corosync Executable: /usr/bin/corosync Control Group: /system.slice/corosync.service Unit: corosync.service Slice: system.slice Boot ID: 2f49ec6cdcc144f0a8eb712bbfbd7203 Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45 Hostname: SRV-1 Storage: /var/lib/systemd/coredump/core.corosync.0.2f49ec6cdcc144f0a8eb712bbfbd7203.5164.1537854963> Message: Process 5164 (corosync) of user 0 dumped core. Stack trace of thread 5164: #0 0x0000000000000000 n/a (n/a)
我找不到更多日誌,所以我無法探勘問題。
降級到“corosync 2.4.2-1”後問題解決。為什麼你們為這個話題投票“-”?它是如此清晰,就像您看到的那樣,這是 corosync 的錯或拱形建造者。
如果您遇到問題,只需降級並節省您的時間。