Synchronization

chrony 3.2 同步到 NTP 伺服器池的問題

  • October 9, 2020

我有一個類似於Chrony 3.1 拒絕與 ntp 伺服器同步的問題

設想:

使用 SLES15 SP2 的新安裝伺服器正在執行 chrony 3.2。我已經配置了兩個執行官方 ntpd 4.2.8p15 的 NTP 伺服器池(都是 Intranet)。

問題:

Chrony 從池中“拉”伺服器,但它從來沒有得到伺服器的響應,我想知道為什麼。是 chrony 中的問題、ntpd 中的問題還是我的設置中的問題?

調試:

(我正在使用改進 NTP 數據包解碼的 tcpdump 破解版本)來自 ntpd 的請求看起來像這樣(實際上它是一個任播請求,從遠端監控):

10:22:29.373395 IP (tos 0xb8, ttl 4, id 21390, offset 0, flags [DF], proto UDP (17), length 100)
   172.20.16.13.123 > 239.192.123.21.123: [udp sum ok] NTP leap indicator=0 (Nominal), Version=4, Mode=3 (Client), length=72
   Stratum 2 (secondary reference), poll 6 (64s), precision -24
   Root Delay: 0.000106, Root dispersion: 0.004196, Reference-ID: 0xac140219
   Reference Timestamp:  3808714798.372973455 (2020-09-10T08:19:58.372973)
   Originator Timestamp: 0.000000000
   Receive Timestamp:    0.000000000
   Transmit Timestamp:   3808714949.372178320 (2020-09-10T08:22:29.372178)
   MAC: Key ID: 421, SHA1-Digest=48d73ad9 5b1d2401 9a8d3c02 91b849cb 28400475

相比之下,來自 chrony 的查詢(本地監控)如下所示:

08:52:33.338684 IP (tos 0x0, ttl 64, id 4141, offset 0, flags [DF], proto UDP (17), length 76)
   h31.51625 > h03.ntp: [bad udp cksum 0x7894 -> 0xea6e!] NTPv4, length 48
       Client, Leap indicator:  (0), Stratum 0 (unspecified), poll 10 (1024s), precision 32
       Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: (unspec)
         Reference Timestamp:  0.000000000
         Originator Timestamp: 0.000000000
         Receive Timestamp:    0.000000000
         Transmit Timestamp:   502153526.517788040 (2052/01/06 06:33:42)
           Originator - Receive Timestamp:  0.000000000
           Originator - Transmit Timestamp: 502153526.517788040 (2052/01/06 06:33:42)

10:12:22.173989 IP (tos 0x0, ttl 64, id 58250, offset 0, flags [DF], proto UDP (17), length 76)
   h31.39573 > nm1.ntp: [bad udp cksum 0x6a92 -> 0x02d5!] NTP leap indicator=0 (Nominal), Version=4, Mode=3 (Client), length=48
   Stratum 0 (unspecified), poll 9 (512s), precision 32
   Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: 00000000
   Reference Timestamp:  0.000000000
   Originator Timestamp: 0.000000000
   Receive Timestamp:    0.000000000
   Transmit Timestamp:   1885145870.079837521 (2095-11-03T02:06:06.079838)

至少傳輸時間戳看起來很奇怪,我不知道其他欄位是否有效。

問題可能是chrony的請求數據包,但也可能是伺服器上的某些過濾使請求被忽略。我已經驗證數據包至少到達了一個池伺服器,但我沒有看到任何響應。

實際上,池外的一台伺服器(顯示的最後一個數據包中的那個)響應如下,保持奇數發起者時間戳:

10:12:22.174191 IP (tos 0xb8, ttl 63, id 30184, offset 0, flags [DF], proto UDP (17), length 76)
   nm1.ntp > h31.39573: [udp sum ok] NTP leap indicator=0 (Nominal), Version=4, Mode=4 (Server), length=48
   Stratum 3 (secondary reference), poll 9 (512s), precision -23
   Root Delay: 0.000518, Root dispersion: 0.025527, Reference-ID: 0xac141002
   Reference Timestamp:  3808714309.712800696 (2020-09-10T08:11:49.712801)
   Originator Timestamp: 1885145870.079837521 (2095-11-03T02:06:06.079838)
   Receive Timestamp:    3808714342.174128206 (2020-09-10T08:12:22.174128)
   Transmit Timestamp:   3808714342.174187417 (2020-09-10T08:12:22.174187)

更多調試資訊

# chronyc -n
chrony version 3.2
Copyright (C) 1997-2003, 2007, 2009-2017 Richard P. Curnow and others
chrony comes with ABSOLUTELY NO WARRANTY.  This is free software, and
you are welcome to redistribute it under certain conditions.  See the
GNU General Public License version 2 for details.

chronyc> tracking
Reference ID    : 00000000 ()
Stratum         : 0
Ref time (UTC)  : Thu Jan 01 00:00:00 1970
System time     : 0.000000009 seconds slow of NTP time
Last offset     : +0.000000000 seconds
RMS offset      : 0.000000000 seconds
Frequency       : 86.905 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 0.0 seconds
Leap status     : Not synchronised
chronyc> sources
210 Number of sources = 8
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 172.20.16.3                   0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.1                   0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.13                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.14                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.5                   0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.12                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.11                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^- 172.20.2.1                    3  10   377   667   +16.2s[ +16.2s] +/-   36ms
chronyc> sourcestats
210 Number of sources = 8
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
172.20.16.3                 0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.1                 0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.13                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.14                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.5                 0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.12                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.11                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.2.1                 22  10  232m     -0.650      0.003   +16.2s    17us
chronyc> activity
200 OK
8 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address
chronyc> ntpdata

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : [UNSPEC] (00000000)
Remote port     : 0
Local address   : [UNSPEC] (00000000)
Leap status     : Normal
Version         : 0
Mode            : Invalid
Stratum         : 0
Poll interval   : 0 (1 seconds)
Precision       : 0 (1.000000000 seconds)
Root delay      : 0.000000 seconds
Root dispersion : 0.000000 seconds
Reference ID    : 00000000 ()
Reference time  : Thu Jan 01 00:00:00 1970
Offset          : +0.000000000 seconds
Peer delay      : 0.000000000 seconds
Peer dispersion : 0.000000000 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: +0.00
NTP tests       : 000 000 0000
Interleaved     : No
Authenticated   : No
TX timestamping : Invalid
RX timestamping : Invalid
Total TX        : 672
Total RX        : 0
Total valid RX  : 0

Remote address  : 172.20.2.1 (AC140201)
Remote port     : 123
Local address   : 172.20.16.31 (AC14101F)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 3
Poll interval   : 10 (1024 seconds)
Precision       : -23 (0.000000119 seconds)
Root delay      : 0.000534 seconds
Root dispersion : 0.036041 seconds
Reference ID    : AC141002 ()
Reference time  : Thu Oct 08 08:20:28 2020
Offset          : -16.152969360 seconds
Peer delay      : 0.000214426 seconds
Peer dispersion : 0.000000195 seconds
Response time   : 0.000017658 seconds
Jitter asymmetry: +0.23
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Daemon
Total TX        : 1969
Total RX        : 1969
Total valid RX  : 1969
chronyc> clients
Hostname                      NTP   Drop Int IntL Last     Cmd   Drop Int  Last
===============================================================================
chronyc> serverstats
NTP packets received       : 0
NTP packets dropped        : 0
Command packets received   : 81
Command packets dropped    : 0
Client log records dropped : 0
chronyc> rtcdata
513 RTC driver not running
chronyc> quit
# journalctl -b SYSLOG_IDENTIFIER=chronyd
-- Logs begin at Wed 2020-09-30 13:32:17 CEST, end at Thu 2020-10-08 11:27:08 CEST. --
Sep 30 13:33:04 h31 chronyd[3522]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER +>
Sep 30 13:33:04 h31 chronyd[3522]: Enabled HW timestamping (TX only) on em3
Sep 30 13:33:04 h31 chronyd[3522]: Enabled HW timestamping (TX only) on em4
Sep 30 13:33:04 h31 chronyd[3522]: Frequency -86.905 +/- 0.107 ppm read from /var/lib/chrony/drift

我解決了這個問題,這個問題mask在 antpdrestrict指令中確實很糟糕,有效地導致 NTP 時間查詢除了一台伺服器之外的所有伺服器都不能回答。另外我已經設置minsources 3/etc/chrony.conf

使這個問題變得有趣的是如何chronyd處理它(請參閱有問題的“更多調試資訊”):

  • 好的,reachsourcesis 0which 的輸出中可能表明一堆不同的問題。
  • ntpdata實際上沒有數據時會輸出大量數據。我錯過的一個重要線索是Total RX零,以及Total valid RX. 但這仍然可能有多種原因。
  • serverstats指示NTP packets received為零似乎很奇怪,因為172.20.2.1顯然確實發送了響應。
  • activity8 sources online並且0 sources offline似乎非常令人困惑:不應該將不響應的來源視為“離線”而不是“線上”嗎?

相比之下,這裡是問題解決後的輸出(三個來源響應):

Oct 08 11:29:32 h31 systemd[1]: Starting NTP client/server...
Oct 08 11:29:32 h31 chronyd[18823]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER >
Oct 08 11:29:32 h31 chronyd[18823]: Enabled HW timestamping (TX only) on em3
Oct 08 11:29:32 h31 chronyd[18823]: Enabled HW timestamping (TX only) on em4
Oct 08 11:29:32 h31 chronyd[18823]: Frequency -86.905 +/- 0.107 ppm read from /var/lib/chrony/drift
Oct 08 11:29:32 h31 systemd[1]: Started NTP client/server.
Oct 09 08:09:43 h31 chronyd[18823]: Selected source 172.20.2.1
Oct 09 08:09:43 h31 chronyd[18823]: System clock wrong by -16.101294 seconds, adjustment started
Oct 09 08:09:27 h31 chronyd[18823]: System clock was stepped by -16.101294 seconds
Oct 09 08:11:36 h31 chronyd[18823]: Selected source 172.20.16.3
chronyc> tracking
Reference ID    : AC141003 (172.20.16.3)
Stratum         : 3
Ref time (UTC)  : Fri Oct 09 06:21:18 2020
System time     : 0.000007615 seconds fast of NTP time
Last offset     : +0.000007168 seconds
RMS offset      : 0.000022300 seconds
Frequency       : 87.841 ppm slow
Residual freq   : +0.002 ppm
Skew            : 0.090 ppm
Root delay      : 0.000269273 seconds
Root dispersion : 0.002195312 seconds
Update interval : 64.6 seconds
Leap status     : Normal
chronyc> sources
210 Number of sources = 9
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 172.20.16.13                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.1                   0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.5                   0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.12                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.14                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^? 172.20.16.11                  0  10     0     -     +0ns[   +0ns] +/-    0ns
^- 172.20.2.1                    3   9   377   239    +15us[  +27us] +/-   27ms
^- 172.20.16.2                   2   8   377    65   +208us[ +215us] +/- 8147us
^* 172.20.16.3                   2   6   377    64    +27us[  +34us] +/- 4417us
chronyc> sourcestats
210 Number of sources = 9
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
172.20.16.13                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.1                 0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.5                 0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.12                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.14                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.16.11                0   0     0     +0.000   2000.000     +0ns  4000ms
172.20.2.1                  7   5   51m     +0.254      0.070   +105us    23us
172.20.16.2                 6   3   21m     +0.219      0.218   +227us    27us
172.20.16.3                15   7   907     +0.002      0.074    +52ns    19us
chronyc> activity
200 OK
9 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address
chronyc> ntpdata
...
Remote address  : 172.20.2.1 (AC140201)
Remote port     : 123
Local address   : 172.20.16.31 (AC14101F)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 3
Poll interval   : 9 (512 seconds)
Precision       : -23 (0.000000119 seconds)
Root delay      : 0.000366 seconds
Root dispersion : 0.026947 seconds
Reference ID    : AC14100E ()
Reference time  : Fri Oct 09 06:11:14 2020
Offset          : -0.000026963 seconds
Peer delay      : 0.000219559 seconds
Peer dispersion : 0.000000190 seconds
Response time   : 0.000020624 seconds
Jitter asymmetry: +0.20
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Daemon
Total TX        : 297
Total RX        : 296
Total valid RX  : 296

Remote address  : 172.20.16.2 (AC141002)
Remote port     : 123
Local address   : 172.20.16.31 (AC14101F)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 2
Poll interval   : 8 (256 seconds)
Precision       : -23 (0.000000119 seconds)
Root delay      : 0.000305 seconds
Root dispersion : 0.007904 seconds
Reference ID    : AC140219 ()
Reference time  : Fri Oct 09 06:14:48 2020
Offset          : -0.000215189 seconds
Peer delay      : 0.000180311 seconds
Peer dispersion : 0.000000190 seconds
Response time   : 0.000057180 seconds
Jitter asymmetry: +0.50
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : Yes
TX timestamping : Daemon
RX timestamping : Daemon
Total TX        : 466
Total RX        : 453
Total valid RX  : 453

Remote address  : 172.20.16.3 (AC141003)
Remote port     : 123
Local address   : 172.20.16.31 (AC14101F)
Leap status     : Normal
Version         : 4
Mode            : Server
Stratum         : 2
Poll interval   : 6 (64 seconds)
Precision       : -24 (0.000000060 seconds)
Root delay      : 0.000168 seconds
Root dispersion : 0.006165 seconds
Reference ID    : AC140219 ()
Reference time  : Fri Oct 09 06:18:14 2020
Offset          : -0.000028130 seconds
Peer delay      : 0.000198109 seconds
Peer dispersion : 0.000000131 seconds
Response time   : 0.000038736 seconds
Jitter asymmetry: +0.00
NTP tests       : 111 111 1111
Interleaved     : No
Authenticated   : No
TX timestamping : Daemon
RX timestamping : Daemon
Total TX        : 16
Total RX        : 16
Total valid RX  : 16
chronyc> serverstats
NTP packets received       : 0
NTP packets dropped        : 0
Command packets received   : 353
Command packets dropped    : 0
Client log records dropped : 0
chronyc> rtcdata
513 RTC driver not running

chronyd似乎或中有一些錯誤chronyc

引用自:https://unix.stackexchange.com/questions/608751