Synchronization
chrony 3.2 同步到 NTP 伺服器池的問題
我有一個類似於Chrony 3.1 拒絕與 ntp 伺服器同步的問題
設想:
使用 SLES15 SP2 的新安裝伺服器正在執行 chrony 3.2。我已經配置了兩個執行官方 ntpd 4.2.8p15 的 NTP 伺服器池(都是 Intranet)。
問題:
Chrony 從池中“拉”伺服器,但它從來沒有得到伺服器的響應,我想知道為什麼。是 chrony 中的問題、ntpd 中的問題還是我的設置中的問題?
調試:
(我正在使用改進 NTP 數據包解碼的 tcpdump 破解版本)來自 ntpd 的請求看起來像這樣(實際上它是一個任播請求,從遠端監控):
10:22:29.373395 IP (tos 0xb8, ttl 4, id 21390, offset 0, flags [DF], proto UDP (17), length 100) 172.20.16.13.123 > 239.192.123.21.123: [udp sum ok] NTP leap indicator=0 (Nominal), Version=4, Mode=3 (Client), length=72 Stratum 2 (secondary reference), poll 6 (64s), precision -24 Root Delay: 0.000106, Root dispersion: 0.004196, Reference-ID: 0xac140219 Reference Timestamp: 3808714798.372973455 (2020-09-10T08:19:58.372973) Originator Timestamp: 0.000000000 Receive Timestamp: 0.000000000 Transmit Timestamp: 3808714949.372178320 (2020-09-10T08:22:29.372178) MAC: Key ID: 421, SHA1-Digest=48d73ad9 5b1d2401 9a8d3c02 91b849cb 28400475
相比之下,來自 chrony 的查詢(本地監控)如下所示:
08:52:33.338684 IP (tos 0x0, ttl 64, id 4141, offset 0, flags [DF], proto UDP (17), length 76) h31.51625 > h03.ntp: [bad udp cksum 0x7894 -> 0xea6e!] NTPv4, length 48 Client, Leap indicator: (0), Stratum 0 (unspecified), poll 10 (1024s), precision 32 Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: (unspec) Reference Timestamp: 0.000000000 Originator Timestamp: 0.000000000 Receive Timestamp: 0.000000000 Transmit Timestamp: 502153526.517788040 (2052/01/06 06:33:42) Originator - Receive Timestamp: 0.000000000 Originator - Transmit Timestamp: 502153526.517788040 (2052/01/06 06:33:42) 10:12:22.173989 IP (tos 0x0, ttl 64, id 58250, offset 0, flags [DF], proto UDP (17), length 76) h31.39573 > nm1.ntp: [bad udp cksum 0x6a92 -> 0x02d5!] NTP leap indicator=0 (Nominal), Version=4, Mode=3 (Client), length=48 Stratum 0 (unspecified), poll 9 (512s), precision 32 Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: 00000000 Reference Timestamp: 0.000000000 Originator Timestamp: 0.000000000 Receive Timestamp: 0.000000000 Transmit Timestamp: 1885145870.079837521 (2095-11-03T02:06:06.079838)
至少傳輸時間戳看起來很奇怪,我不知道其他欄位是否有效。
問題可能是chrony的請求數據包,但也可能是伺服器上的某些過濾使請求被忽略。我已經驗證數據包至少到達了一個池伺服器,但我沒有看到任何響應。
實際上,池外的一台伺服器(顯示的最後一個數據包中的那個)響應如下,保持奇數發起者時間戳:
10:12:22.174191 IP (tos 0xb8, ttl 63, id 30184, offset 0, flags [DF], proto UDP (17), length 76) nm1.ntp > h31.39573: [udp sum ok] NTP leap indicator=0 (Nominal), Version=4, Mode=4 (Server), length=48 Stratum 3 (secondary reference), poll 9 (512s), precision -23 Root Delay: 0.000518, Root dispersion: 0.025527, Reference-ID: 0xac141002 Reference Timestamp: 3808714309.712800696 (2020-09-10T08:11:49.712801) Originator Timestamp: 1885145870.079837521 (2095-11-03T02:06:06.079838) Receive Timestamp: 3808714342.174128206 (2020-09-10T08:12:22.174128) Transmit Timestamp: 3808714342.174187417 (2020-09-10T08:12:22.174187)
更多調試資訊
# chronyc -n chrony version 3.2 Copyright (C) 1997-2003, 2007, 2009-2017 Richard P. Curnow and others chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public License version 2 for details. chronyc> tracking Reference ID : 00000000 () Stratum : 0 Ref time (UTC) : Thu Jan 01 00:00:00 1970 System time : 0.000000009 seconds slow of NTP time Last offset : +0.000000000 seconds RMS offset : 0.000000000 seconds Frequency : 86.905 ppm slow Residual freq : +0.000 ppm Skew : 0.000 ppm Root delay : 1.000000000 seconds Root dispersion : 1.000000000 seconds Update interval : 0.0 seconds Leap status : Not synchronised chronyc> sources 210 Number of sources = 8 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? 172.20.16.3 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.1 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.13 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.14 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.5 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.12 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.11 0 10 0 - +0ns[ +0ns] +/- 0ns ^- 172.20.2.1 3 10 377 667 +16.2s[ +16.2s] +/- 36ms chronyc> sourcestats 210 Number of sources = 8 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev ============================================================================== 172.20.16.3 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.1 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.13 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.14 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.5 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.12 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.11 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.2.1 22 10 232m -0.650 0.003 +16.2s 17us chronyc> activity 200 OK 8 sources online 0 sources offline 0 sources doing burst (return to online) 0 sources doing burst (return to offline) 0 sources with unknown address chronyc> ntpdata Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : [UNSPEC] (00000000) Remote port : 0 Local address : [UNSPEC] (00000000) Leap status : Normal Version : 0 Mode : Invalid Stratum : 0 Poll interval : 0 (1 seconds) Precision : 0 (1.000000000 seconds) Root delay : 0.000000 seconds Root dispersion : 0.000000 seconds Reference ID : 00000000 () Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.000000000 seconds Peer delay : 0.000000000 seconds Peer dispersion : 0.000000000 seconds Response time : 0.000000000 seconds Jitter asymmetry: +0.00 NTP tests : 000 000 0000 Interleaved : No Authenticated : No TX timestamping : Invalid RX timestamping : Invalid Total TX : 672 Total RX : 0 Total valid RX : 0 Remote address : 172.20.2.1 (AC140201) Remote port : 123 Local address : 172.20.16.31 (AC14101F) Leap status : Normal Version : 4 Mode : Server Stratum : 3 Poll interval : 10 (1024 seconds) Precision : -23 (0.000000119 seconds) Root delay : 0.000534 seconds Root dispersion : 0.036041 seconds Reference ID : AC141002 () Reference time : Thu Oct 08 08:20:28 2020 Offset : -16.152969360 seconds Peer delay : 0.000214426 seconds Peer dispersion : 0.000000195 seconds Response time : 0.000017658 seconds Jitter asymmetry: +0.23 NTP tests : 111 111 1111 Interleaved : No Authenticated : No TX timestamping : Daemon RX timestamping : Daemon Total TX : 1969 Total RX : 1969 Total valid RX : 1969 chronyc> clients Hostname NTP Drop Int IntL Last Cmd Drop Int Last =============================================================================== chronyc> serverstats NTP packets received : 0 NTP packets dropped : 0 Command packets received : 81 Command packets dropped : 0 Client log records dropped : 0 chronyc> rtcdata 513 RTC driver not running chronyc> quit # journalctl -b SYSLOG_IDENTIFIER=chronyd -- Logs begin at Wed 2020-09-30 13:32:17 CEST, end at Thu 2020-10-08 11:27:08 CEST. -- Sep 30 13:33:04 h31 chronyd[3522]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER +> Sep 30 13:33:04 h31 chronyd[3522]: Enabled HW timestamping (TX only) on em3 Sep 30 13:33:04 h31 chronyd[3522]: Enabled HW timestamping (TX only) on em4 Sep 30 13:33:04 h31 chronyd[3522]: Frequency -86.905 +/- 0.107 ppm read from /var/lib/chrony/drift
我解決了這個問題,這個問題
mask
在 antpd
的restrict
指令中確實很糟糕,有效地導致 NTP 時間查詢除了一台伺服器之外的所有伺服器都不能回答。另外我已經設置minsources 3
了/etc/chrony.conf
。使這個問題變得有趣的是如何
chronyd
處理它(請參閱有問題的“更多調試資訊”):
- 好的,
reach
在sources
is0
which 的輸出中可能表明一堆不同的問題。ntpdata
實際上沒有數據時會輸出大量數據。我錯過的一個重要線索是Total RX
零,以及Total valid RX
. 但這仍然可能有多種原因。serverstats
指示NTP packets received
為零似乎很奇怪,因為172.20.2.1
顯然確實發送了響應。activity
說8 sources online
並且0 sources offline
似乎非常令人困惑:不應該將不響應的來源視為“離線”而不是“線上”嗎?相比之下,這裡是問題解決後的輸出(三個來源響應):
Oct 08 11:29:32 h31 systemd[1]: Starting NTP client/server... Oct 08 11:29:32 h31 chronyd[18823]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER > Oct 08 11:29:32 h31 chronyd[18823]: Enabled HW timestamping (TX only) on em3 Oct 08 11:29:32 h31 chronyd[18823]: Enabled HW timestamping (TX only) on em4 Oct 08 11:29:32 h31 chronyd[18823]: Frequency -86.905 +/- 0.107 ppm read from /var/lib/chrony/drift Oct 08 11:29:32 h31 systemd[1]: Started NTP client/server. Oct 09 08:09:43 h31 chronyd[18823]: Selected source 172.20.2.1 Oct 09 08:09:43 h31 chronyd[18823]: System clock wrong by -16.101294 seconds, adjustment started Oct 09 08:09:27 h31 chronyd[18823]: System clock was stepped by -16.101294 seconds Oct 09 08:11:36 h31 chronyd[18823]: Selected source 172.20.16.3
chronyc> tracking Reference ID : AC141003 (172.20.16.3) Stratum : 3 Ref time (UTC) : Fri Oct 09 06:21:18 2020 System time : 0.000007615 seconds fast of NTP time Last offset : +0.000007168 seconds RMS offset : 0.000022300 seconds Frequency : 87.841 ppm slow Residual freq : +0.002 ppm Skew : 0.090 ppm Root delay : 0.000269273 seconds Root dispersion : 0.002195312 seconds Update interval : 64.6 seconds Leap status : Normal chronyc> sources 210 Number of sources = 9 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? 172.20.16.13 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.1 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.5 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.12 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.14 0 10 0 - +0ns[ +0ns] +/- 0ns ^? 172.20.16.11 0 10 0 - +0ns[ +0ns] +/- 0ns ^- 172.20.2.1 3 9 377 239 +15us[ +27us] +/- 27ms ^- 172.20.16.2 2 8 377 65 +208us[ +215us] +/- 8147us ^* 172.20.16.3 2 6 377 64 +27us[ +34us] +/- 4417us chronyc> sourcestats 210 Number of sources = 9 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev ============================================================================== 172.20.16.13 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.1 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.5 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.12 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.14 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.16.11 0 0 0 +0.000 2000.000 +0ns 4000ms 172.20.2.1 7 5 51m +0.254 0.070 +105us 23us 172.20.16.2 6 3 21m +0.219 0.218 +227us 27us 172.20.16.3 15 7 907 +0.002 0.074 +52ns 19us chronyc> activity 200 OK 9 sources online 0 sources offline 0 sources doing burst (return to online) 0 sources doing burst (return to offline) 0 sources with unknown address chronyc> ntpdata ... Remote address : 172.20.2.1 (AC140201) Remote port : 123 Local address : 172.20.16.31 (AC14101F) Leap status : Normal Version : 4 Mode : Server Stratum : 3 Poll interval : 9 (512 seconds) Precision : -23 (0.000000119 seconds) Root delay : 0.000366 seconds Root dispersion : 0.026947 seconds Reference ID : AC14100E () Reference time : Fri Oct 09 06:11:14 2020 Offset : -0.000026963 seconds Peer delay : 0.000219559 seconds Peer dispersion : 0.000000190 seconds Response time : 0.000020624 seconds Jitter asymmetry: +0.20 NTP tests : 111 111 1111 Interleaved : No Authenticated : No TX timestamping : Daemon RX timestamping : Daemon Total TX : 297 Total RX : 296 Total valid RX : 296 Remote address : 172.20.16.2 (AC141002) Remote port : 123 Local address : 172.20.16.31 (AC14101F) Leap status : Normal Version : 4 Mode : Server Stratum : 2 Poll interval : 8 (256 seconds) Precision : -23 (0.000000119 seconds) Root delay : 0.000305 seconds Root dispersion : 0.007904 seconds Reference ID : AC140219 () Reference time : Fri Oct 09 06:14:48 2020 Offset : -0.000215189 seconds Peer delay : 0.000180311 seconds Peer dispersion : 0.000000190 seconds Response time : 0.000057180 seconds Jitter asymmetry: +0.50 NTP tests : 111 111 1111 Interleaved : No Authenticated : Yes TX timestamping : Daemon RX timestamping : Daemon Total TX : 466 Total RX : 453 Total valid RX : 453 Remote address : 172.20.16.3 (AC141003) Remote port : 123 Local address : 172.20.16.31 (AC14101F) Leap status : Normal Version : 4 Mode : Server Stratum : 2 Poll interval : 6 (64 seconds) Precision : -24 (0.000000060 seconds) Root delay : 0.000168 seconds Root dispersion : 0.006165 seconds Reference ID : AC140219 () Reference time : Fri Oct 09 06:18:14 2020 Offset : -0.000028130 seconds Peer delay : 0.000198109 seconds Peer dispersion : 0.000000131 seconds Response time : 0.000038736 seconds Jitter asymmetry: +0.00 NTP tests : 111 111 1111 Interleaved : No Authenticated : No TX timestamping : Daemon RX timestamping : Daemon Total TX : 16 Total RX : 16 Total valid RX : 16 chronyc> serverstats NTP packets received : 0 NTP packets dropped : 0 Command packets received : 353 Command packets dropped : 0 Client log records dropped : 0 chronyc> rtcdata 513 RTC driver not running
chronyd
似乎或中有一些錯誤chronyc
。