從 Windows 遷移到 Debian 時,Zookeeper DNS 名稱與領導者選舉有關
我正在將 Windows 上的 kafka/zookeeper 集群遷移到 Debian wheezy。
- Java版本:1.7.0_80
- Debian 版本:7.9
- 動物園管理員版本:3.3.5+dfsg1-2 0
- 卡夫卡版本:2.10-0.8.2.1
如果我在 Debian 伺服器上使用其他 Debian 伺服器的 IP 地址配置 zookeeper,那麼一切正常。如果我改用 DNS 名稱,則 Debian 伺服器上的領導選舉會失敗。
在 Debian 伺服器上,我可以使用 ‘host’ 命令查找任何其他 Debian 伺服器的 IP,因此 DNS 解析工作正常。
一切都是自動化的:伺服器創建、Debian 安裝、zookeeper 安裝、zookeeper 配置;因此手動配置錯誤的視窗最少,並且易於重現或更改。
使用
clientPortAddress=DNSNAME
沒有任何區別;它仍然失敗。iptables 中沒有配置任何內容。這些伺服器之間沒有防火牆。在下文中,伺服器 1-3 是 Windows 2012R2 伺服器,伺服器 4-6 是 Debian 伺服器。
此配置有效:
server.1=testkafka400:2888:3888 server.2=testkafka401:2888:3888 server.3=testkafka402:2888:3888 server.4=10.1.132.152:2888:3888 server.5=10.1.132.153:2888:3888 server.6=10.1.132.154:2888:3888
此配置不起作用:
server.1=testkafka400:2888:3888 server.2=testkafka401:2888:3888 server.3=testkafka402:2888:3888 server.4=testkafka403:2888:3888 server.5=testkafka404:2888:3888 server.6=testkafka405:2888:3888
當我使用 DNS 名稱時,我得到以下輸出——例外只是重複自己。請注意,為了測試,以下日誌來自僅包含Debian 伺服器的集群設置,使用 DNS 名稱。如果我轉移到 IP,集群可以工作並且可以舉行選舉。
[2015-11-03 13:55:52,309] INFO Reading configuration from: /etc/zookeeper/config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [2015-11-03 13:55:52,322] INFO Defaulting to majority quorums (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [2015-11-03 13:55:52,344] INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager) [2015-11-03 13:55:52,344] INFO autopurge.purgeInterval set to 24 (org.apache.zookeeper.server.DatadirCleanupManager) [2015-11-03 13:55:52,345] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager) [2015-11-03 13:55:52,454] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager) [2015-11-03 13:55:52,472] INFO Starting quorum peer (org.apache.zookeeper.server.quorum.QuorumPeerMain) [2015-11-03 13:55:52,581] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory) [2015-11-03 13:55:52,601] INFO tickTime set to 3000 (org.apache.zookeeper.server.quorum.QuorumPeer) [2015-11-03 13:55:52,601] INFO minSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer) [2015-11-03 13:55:52,601] INFO maxSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer) [2015-11-03 13:55:52,601] INFO initLimit set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer) [2015-11-03 13:55:52,626] INFO Reading snapshot /etc/zookeeper/data/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap) [2015-11-03 13:55:52,675] INFO My election bind port: testkafka403.prod.local/127.0.1.1:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [2015-11-03 13:55:52,713] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer) [2015-11-03 13:55:52,715] INFO New election. My id = 4, proposed zxid=0x100000014 (org.apache.zookeeper.server.quorum.FastLeaderElection) [2015-11-03 13:55:52,717] INFO Notification: 1 (message format version), 4 (n.leader), 0x100000014 (n.zxid), 0x1 (n.round), LOOKING (n.state), 4 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state) (org.apache.zookeeper.server.quorum.FastLeaderElection) [2015-11-03 13:55:52,732] WARN Cannot open channel to 5 at election address testkafka404.prod.local/10.1.132.153:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) java.net.SocketTimeoutException at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) [2015-11-03 13:55:52,737] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) [2015-11-03 13:55:52,919] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
我們真的希望能夠使用 DNS 名稱,但不知道應該從哪裡開始尋找解決方案。也許我們錯過了安裝或啟動重要的 Debian 或 Java 功能?
好的,所以我知道這裡發生了什麼。在嘗試在 Linux VM 上的 Vagrant 中設置 3 節點 Spring-XD 集群時,我看到了同樣的問題。
此配置有效:
server.1=172.28.128.3:2888:3888 server.2=172.28.128.4:2888:3888 server.3=172.28.128.7:2888:3888
但是這個沒有:
server.1=spring-xd-1:2888:3888 server.2=spring-xd-2:2888:3888 server.3=spring-xd-3:2888:3888
“確鑿證據”是我的動物園管理員日誌中的這一行:
2015-11-26 20:48:31,439
$$ myid:1 $$- 資訊 $$ Thread-2:QuorumCnxManager$Listener@504 $$- 我選的綁定埠:spring-xd-1/127.0.0.1:3888
那麼,為什麼 Zookeeper 會在 loopback 介面上綁定選舉埠呢?好…
我
/etc/hosts
在其中一個虛擬機上看起來像這樣:127.0.0.1 spring-xd-1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ## vagrant-hostmanager-start 172.28.128.3 spring-xd-1 172.28.128.4 spring-xd-2 172.28.128.7 spring-xd-3 ## vagrant-hostmanager-end
我從
127.0.0.1
行中刪除了主機名,/etc/hosts
並在所有 3 個節點上反彈了 zookeeper 服務,然後**BAM!**一切都來了玫瑰。因此,現在每台機器上的主機文件如下所示:127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ## vagrant-hostmanager-start 172.28.128.3 spring-xd-1 172.28.128.4 spring-xd-2 172.28.128.7 spring-xd-3 ## vagrant-hostmanager-end
我猜你在 Windows 上沒有看到這個問題,因為主機文件 (
C:\Windows\System32\drivers\etc\hosts
) 預設沒有條目。127.0.0.1
通過添加類似的行,您應該能夠在 Windows 上重現該問題。我稱之為 Zookeeper 錯誤。編輯 hosts 文件足以證明問題並在 Vagrant 中修復它,但我不建議將它用於任何“真實”環境。
**編輯:**根據http://ccl.cse.nd.edu/operations/condor/hostname.shtml,這似乎是 Linux 上集群應用程序的一個相當普遍的問題,並建議像我上面描述的那樣編輯主機文件. 然而,關於集群設置的 Zookeeper 文件沒有提到它。