執行 ntpd 但未同步的虛擬機

October 20, 2015

TL; 博士
VM使用KVM，時間不同步。暫停 2 分鐘後，它會保持 2 分鐘的永久間隔。使用不同的網路配置設置另一個 VM 表明網路配置阻止 ntp 工作。解決這個網路問題不在主題範圍內。
但是，沒有網路問題的新 VM 在恢復後也不會同步。相同的測試：暫停 2 分鐘。使用正確同步的機器檢查日期差異。2 分鐘的延遲是永久性的。
這似乎是一個常見問題，關於如何保持 VM 同步以及同時使用 NTP 和 kvm-clock 存在爭議。我找到了很多參考，但沒有答案。
問題
我有一個 Debian VMntpd正在執行但沒有更正時間。例如，在暫停/恢復之後，我得到了 2 分鐘的永久偏移。
/etc/ntp.conf是預設或接近預設，沒什麼特別的：
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift


# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable


# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example

# pool.ntp.org maps to about 1000 low-stratum NTP servers.  Your server will
# pick a different set every time it starts up.  Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst


# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust


# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255

# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines.  Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
ntpq 似乎報告了一個問題：
# cat ntpq -pn
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
37.187.7.160    .INIT.          16 u    - 1024    0    0.000    0.000   0.000
195.154.211.37  .INIT.          16 u    - 1024    0    0.000    0.000   0.000
195.154.216.44  .INIT.          16 u    - 1024    0    0.000    0.000   0.000
95.81.173.155   .INIT.          16 u    - 1024    0    0.000    0.000   0.000
但是，我不是 netcat 嚮導，但 UDP 埠 123 上的 AFAIU 傳出流量通過：
# nc -vvzu 37.187.7.160 123
mail.lafkor.de [37.187.7.160] 123 (ntp) open
sent 0, rcvd 0
這個測試是否足以排除防火牆問題？
主機（也是 Debian 機器）具有相同的 NTP 配置並且同步工作正常。兩台機器的網路配置不同，這就是為什麼我認為這可能是網路問題。
我可以執行任何其他有用的測試嗎？
我認為該tinker panic 0參數在這裡不相關，因為它旨在強制更新巨大的差距，而不是 2 分鐘的差距。無論如何，AFAIU，它會在時間偏移的情況下影響行為，但它不能解決ntpq -pn只返回零的問題。
FWIW，受此問題啟發的其他測試輸出：
# ntpq
ntpq&gt; pe
    remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
mail.lafkor.de  .INIT.          16 u    - 1024    0    0.000    0.000   0.000
atoll.tropicdre .INIT.          16 u    - 1024    0    0.000    0.000   0.000
oods.roflcopter .INIT.          16 u    - 1024    0    0.000    0.000   0.000
ntp-3.arkena.ne .INIT.          16 u    - 1024    0    0.000    0.000   0.000
ntpq&gt; as

ind assid status  conf reach auth condition  last_event cnt
===========================================================
 1 21025  8011   yes    no  none    reject    mobilize  1
 2 21026  8011   yes    no  none    reject    mobilize  1
 3 21027  8011   yes    no  none    reject    mobilize  1
 4 21028  8011   yes    no  none    reject    mobilize  1
ntpq&gt; rv
associd=0 status=c012 leap_alarm, sync_unspec, 1 event, freq_set,
version="ntpd 4.2.6p5@1.2349-o Fri Apr 10 19:04:04 UTC 2015 (1)",
processor="x86_64", system="Linux/3.16.0-4-amd64", leap=11, stratum=16,
precision=-23, rootdelay=0.000, rootdisp=6683.055, refid=INIT,
reftime=00000000.00000000  Mon, Jan  1 1900  0:09:21.000,
clock=d9b51587.b7a1085f  Tue, Sep 29 2015 15:49:59.717, peer=0, tc=3,
mintc=3, offset=0.000, frequency=-0.125, sys_jitter=0.000,
clk_jitter=0.000, clk_wander=0.000
ntpq&gt; rv 21025
associd=21025 status=8011 conf, sel_reject, 1 event, mobilize,
srcadr=mail.lafkor.de, srcport=123, dstadr=147.210.157.185, dstport=123,
leap=11, stratum=16, precision=-23, rootdelay=0.000, rootdisp=0.000,
refid=INIT, reftime=00000000.00000000  Mon, Jan  1 1900  0:09:21.000,
rec=00000000.00000000  Mon, Jan  1 1900  0:09:21.000, reach=000,
unreach=1137, hmode=3, pmode=0, hpoll=10, ppoll=10, headway=0,
flash=1600 peer_stratum, peer_dist, peer_unreach, keyid=0, offset=0.000,
delay=0.000, dispersion=15937.500, jitter=0.000, xleave=0.167,
filtdelay=     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00,
filtoffset=    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00,
filtdisp=   16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0
tcpdump / ntpdate 測試
在 NTP 同步正常工作的機器上，我啟動tcpdump udp port ntp，當我重新啟動時ntpd，我看到這種輸出：
# tcpdump udp port ntp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:31:33.719166 IP 10.0.2.15.ntp &gt; spica.beduzar.fr.ntp: NTPv4, Client, length 48
17:31:33.736804 IP spica.beduzar.fr.ntp &gt; 10.0.2.15.ntp: NTPv4, Server, length 48
17:31:35.973551 IP 10.0.2.15.ntp &gt; ntp.tuxfamily.net.ntp: NTPv4, Client, length 48
17:31:35.992671 IP ntp.tuxfamily.net.ntp &gt; 10.0.2.15.ntp: NTPv4, Server, length 48
[...]
在我遇到問題的機器上，重新啟動時我根本看不到任何輸出ntpd（沒有請求，沒有回复）。我不應該至少看到請求嗎？
在好機器上：
# ntpdate 0.debian.pool.ntp.org
29 Sep 17:24:49 ntpdate[700]: adjust time server 193.55.167.1 offset -0.005196 sec
在壞機器上：
# ntpdate 0.debian.pool.ntp.org
29 Sep 17:43:18 ntpdate[3180]: no server suitable for synchronization found
用另一個虛擬機測試
我們設置了另一個具有相同 NTP 配置但具有另一個網路配置的 VM。
tcpdump和的這個結果ntpdate是正確的，並ntpq -pn返回良好的結果。顯然，網路配置確實是故障虛擬機上的一個問題。
但是，新 VM 也不同步。如果我暫停它以使其具有大約 100 秒的延遲，它不會同步（我的意思是幾分鐘後，間隙仍然是相同的秒數）。但是，當重新啟動 ntpd 時，它會立即同步。
我似乎有兩個問題：
第一個虛擬機上的網路配置
ntp 不會在兩者上同步（除非重新啟動）

問題解決了。
網路問題
VM 有網路問題阻止 ntpd 成功。它有兩個eth介面，一個帶有網關的介面通過我們不直接管理的路由器。雖然我的測試沒有顯示出來，但我猜一些 UDP 幀被阻止了。我們使用另一個網路配置設置了另一個 VM，並ntpq產生了更好的結果。
最終，我們更改了ntp配置，以便主機在本地廣播時間，並且所有 VM 在其上同步。更有意義並最大限度地減少公共ntp伺服器上的負載。
ntpd幾分鐘後立即設置時鐘
在測試期間可能誤導我的一件事是 ntpd 不會立即同步。我認為它會立即檢測到間隙，然後修改時鐘速度，以便時鐘逐漸加入源時鐘。事實上，我們注意到（除非ntpd重新啟動）時鐘在幾分鐘內沒有變化，然後突然之間它被設置為立即顯示的內容。同時，輸出中最右邊的列ntpq顯示同步正在進行。
這種ntpd行為可能解釋了為什麼我認為ntpd即使它起作用也不起作用。我只是沒有等待足夠長的時間，我不明白ntpq輸出。

引用自：https://unix.stackexchange.com/questions/232770

執行 ntpd 但未同步的虛擬機

TL; 博士

問題

tcpdump / ntpdate 測試

用另一個虛擬機測試

網路問題

`ntpd`幾分鐘後立即設置時鐘

相關問答

CentOS 6 KVM 主機與 CentOS 6 來賓 kvm-clock 啟用主要時間漂移

如何對精簡配置的 lvm 進行快照

Virt Manager：Windows vm 看不到任何網路

有什麼方法可以讓虛擬機（例如 KVM）使用像系統這樣的 overlayfs 來分別擁有主要的文件集和附加組件？

為什麼儘管防火牆規則阻止了所有發往它的出站流量，但流量仍繼續流經我的網關？

99 kvm/qemu/virt-manager 問題

執行 ntpd 但未同步的虛擬機

TL; 博士

問題

tcpdump / ntpdate 測試

用另一個虛擬機測試

網路問題

ntpd幾分鐘後立即設置時鐘

相關問答

CentOS 6 KVM 主機與 CentOS 6 來賓 kvm-clock 啟用主要時間漂移

如何對精簡配置的 lvm 進行快照

Virt Manager：Windows vm 看不到任何網路

有什麼方法可以讓虛擬機（例如 KVM）使用像系統這樣的 overlayfs 來分別擁有主要的文件集和附加組件？

為什麼儘管防火牆規則阻止了所有發往它的出站流量，但流量仍繼續流經我的網關？

99 kvm/qemu/virt-manager 問題

`ntpd`幾分鐘後立即設置時鐘