Debian
崩潰重啟導致意外崩潰結果
我使用的物理伺服器(Debian 11 靶心)在過去一年一直執行良好,而在過去的這些日子裡,它開始表現得非常奇怪,它隨機重啟崩潰,我無法通過檢查系統日誌找出問題所在。 ..
REBOOT CRASH 1 Nov 26 07:04:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:04:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:04:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Nov 26 07:05:01 testing CRON[320608]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Nov 26 07:05:02 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:05:02 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:05:20 testing smartd[1136]: Device: /dev/bus/0 [megaraid_disk_04], SMART Failure: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5 Nov 26 07:05:57 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Nov 26 07:06:01 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:06:01 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:28:51 testing systemd-random-seed[456]: Kernel entropy pool is not initialized yet, waiting until it is. Nov 26 07:28:51 testing systemd[1]: Starting Flush Journal to Persistent Storage... Nov 26 07:28:51 testing systemd[1]: Finished Create System Users. Nov 26 07:28:51 testing systemd[1]: Starting Create Static Device Nodes in /dev... Nov 26 07:28:51 testing systemd[1]: modprobe@drm.service: Succeeded. Nov 26 07:28:51 testing kernel: [ 0.000000] Linux version 5.10.0-9-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30) Nov 26 07:28:51 testing systemd[1]: Finished Load Kernel Module drm. Nov 26 07:28:51 testing systemd[1]: Finished Coldplug All udev Devices. Nov 26 07:28:51 testing kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-8f99-5441121afaf2412 ro quiet crashkernel=2000M crashkernel=384M-:128M Nov 26 07:28:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown... Nov 26 07:28:51 testing kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-provided physical RAM map: Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20 Nov 26 07:28:51 testing systemd[1]: Finished Set the console keyboard layout. Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved Nov 26 07:28:51 testing apparmor.systemd[962]: Restarting AppArmor Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable Nov 26 07:28:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bd369000-0x00000000bf38efff] reserved ------------------------------------------------------------------------------------------------------------------------------------------------------ REBOOT CRASH 2 Nov 26 07:45:41 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:45:41 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:46:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Nov 26 07:46:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:46:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:47:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Nov 26 07:47:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:47:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:48:42 testing systemd[1]: Starting Collect ipmitool sensor metrics for prometheus-node-exporter... Nov 26 07:48:43 testing ddclient[2198]: CONNECT: checkip.dyndns.org Nov 26 07:48:43 testing ddclient[2198]: CONNECTED: using HTTP Nov 26 07:48:43 testing ddclient[2198]: SENDING: GET / HTTP/1.0 Nov 26 07:48:43 testing ddclient[2198]: SENDING: Host: checkip.dyndns.org Nov 26 07:48:43 testing ddclient[2198]: SENDING: User-Agent: ddclient/3.9.1 Nov 26 07:48:43 testing ddclient[2198]: SENDING: Connection: close Nov 26 07:48:43 testing ddclient[2198]: SENDING: Nov 26 07:48:43 testing ddclient[2198]: SENDING: Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: HTTP/1.1 200 OK#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Date: Fri, 26 Nov 2021 06:48:43 GMT#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Content-Type: text/html#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Content-Length: 104#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Connection: close#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Cache-Control: no-cache#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: Pragma: no-cache#015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: #015 Nov 26 07:48:43 testing ddclient[2198]: RECEIVE: <html><head><title>Current IP Check</title></head><body>Current IP Address: 123.456.78.90</body></html>#015 Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: database.testing.com: skipped: IP address was already set to 123.456.78.90. Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: jenkins.testing.com: skipped: IP address was already set to 123.456.78.90. Nov 26 07:48:43 testing ddclient[2198]: SUCCESS: monitors.testing.com: skipped: IP address was already set to 123.456.78.90. Nov 26 07:48:46 testing systemd[1]: prometheus-node-exporter-ipmitool-sensor.service: Succeeded. Nov 26 07:48:46 testing systemd[1]: Finished Collect ipmitool sensor metrics for prometheus-node-exporter. Nov 26 07:49:11 testing kernel: [ 356.406208] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 Nov 26 07:56:51 testing systemd-random-seed[448]: Kernel entropy pool is not initialized yet, waiting until it is. Nov 26 07:56:51 testing systemd[1]: Starting Flush Journal to Persistent Storage... Nov 26 07:56:51 testing systemd[1]: modprobe@drm.service: Succeeded. Nov 26 07:56:51 testing systemd[1]: Finished Load Kernel Module drm. Nov 26 07:56:51 testing kernel: [ 0.000000] Linux version 5.10.0-9-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.70-1 (2021-09-30) Nov 26 07:56:51 testing systemd[1]: Finished Coldplug All udev Devices. Nov 26 07:56:51 testing systemd[1]: Starting Helper to synchronize boot up for ifupdown... Nov 26 07:56:51 testing kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=14f7f68b-d049-4637-1234-123456789 ro quiet crashkernel=2000M crashkernel=384M-:128M Nov 26 07:56:51 testing systemd[1]: Finished Create Static Device Nodes in /dev. Nov 26 07:56:51 testing kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-provided physical RAM map: Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000010000-0x000000000009ffff] usable Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bc767fff] usable Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc768000-0x00000000bc867fff] type 20 Nov 26 07:56:51 testing systemd[1]: Starting Rule-based Manager for Device Events and Files... Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc868000-0x00000000bc967fff] reserved Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bc968000-0x00000000bca66fff] usable Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca67000-0x00000000bca6bfff] ACPI NVS Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bca6c000-0x00000000bcaebfff] ACPI data Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcaec000-0x00000000bcf11fff] usable Nov 26 07:56:51 testing kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bcf42000-0x00000000bcf68fff] usable
Nov 23 11:05:13 myserver kernel: [ 2.352549] ata_piix 0000:00:1f.2: version 2.13 Nov 23 11:05:13 myserver kernel: [ 3.576528] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08 Nov 23 11:05:13 myserver kernel: [ 3.723306] sr 1:0:0:0: Attached scsi CD-ROM sr0 Nov 23 11:05:13 myserver kernel: [ 4.093233] PM: Image not found (code -22) Nov 23 11:05:13 myserver kernel: [ 10.167638] checking generic (d5800000 130000) vs hw (d5800000 800000) Nov 23 12:37:25 myserver PackageKit: daemon start Nov 26 07:28:51 myserver kernel: [ 0.002793] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Nov 26 07:28:51 myserver kernel: [ 0.002798] e820: remove [mem 0x000a0000-0x000fffff] usable Nov 26 07:28:51 myserver kernel: [ 0.002814] MTRR default type: uncachable Nov 26 07:28:51 myserver kernel: [ 0.002815] MTRR fixed ranges enabled: Nov 26 07:28:51 myserver kernel: [ 0.002817] 00000-9FFFF write-back Nov 26 07:28:51 myserver kernel: [ 0.002819] A0000-BFFFF uncachable Nov 26 07:28:51 myserver kernel: [ 0.002820] C0000-CBFFF write-protect Nov 26 07:28:51 myserver kernel: [ 0.002822] CC000-D3FFF write-back Nov 26 07:28:51 myserver kernel: [ 0.002823] D4000-EBFFF uncachable Nov 26 07:28:51 myserver kernel: [ 0.002825] EC000-FFFFF write-protect Nov 26 07:28:51 myserver kernel: [ 0.002826] MTRR variable ranges enabled: Nov 26 07:28:51 myserver kernel: [ 0.002829] 0 base 0000000000 mask FF80000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002831] 1 base 0080000000 mask FFC0000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002833] 2 base 0100000000 mask FF00000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002834] 3 base 0200000000 mask FE00000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002836] 4 base 0400000000 mask FC00000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002838] 5 base 0800000000 mask F800000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002840] 6 base 1000000000 mask F800000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002842] 7 base 1800000000 mask FFC0000000 write-back Nov 26 07:28:51 myserver kernel: [ 0.002843] 8 disabled Nov 26 07:28:51 myserver kernel: [ 0.002844] 9 disabled Nov 26 07:28:51 myserver kernel: [ 0.004625] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved Nov 26 07:28:51 myserver kernel: [ 0.021048] e820: update [mem 0xba378000-0xba37afff] usable ==> reserved Nov 26 07:28:51 myserver kernel: [ 0.022384] ACPI: Local APIC address 0xfee00000 Nov 26 07:28:51 myserver kernel: [ 0.023393] On node 0 totalpages: 12582912 Nov 26 07:28:51 myserver kernel: [ 0.023395] Normal zone: 196608 pages used for memmap Nov 26 07:28:51 myserver kernel: [ 0.023396] Normal zone: 12582912 pages, LIFO batch:63 Nov 26 07:28:51 myserver kernel: [ 0.023401] On node 1 totalpages: 12569668 Nov 26 07:28:51 myserver kernel: [ 0.023402] DMA zone: 64 pages used for memmap Nov 26 07:28:51 myserver kernel: [ 0.023404] DMA zone: 3984 pages, LIFO batch:0 Nov 26 07:28:51 myserver kernel: [ 0.023405] DMA32 zone: 12019 pages used for memmap Nov 26 07:28:51 myserver kernel: [ 0.023407] DMA32 zone: 769204 pages, LIFO batch:63 Nov 26 07:28:51 myserver kernel: [ 0.023408] Normal zone: 184320 pages used for memmap Nov 26 07:28:51 myserver kernel: [ 0.023410] Normal zone: 11796480 pages, LIFO batch:63 Nov 26 07:28:51 myserver kernel: [ 0.040393] ACPI: Local APIC address 0xfee00000 Nov 26 07:28:51 myserver kernel: [ 0.040434] ACPI: IRQ0 used by override. Nov 26 07:28:51 myserver kernel: [ 0.040436] ACPI: IRQ9 used by override. Nov 26 07:28:51 myserver kernel: [ 0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152 Nov 26 07:28:51 myserver kernel: [ 0.040436] ACPI: IRQ9 used by override. Nov 26 07:28:51 myserver kernel: [ 0.049931] pcpu-alloc: s184152 r8192 d28840 u262144 alloc=1*2097152 Nov 26 07:28:51 myserver kernel: [ 0.049933] pcpu-alloc: [0] 00 02 04 06 08 10 12 14 [0] 16 18 20 22 -- -- -- -- Nov 26 07:28:51 myserver kernel: [ 0.049950] pcpu-alloc: [1] 01 03 05 07 09 11 13 15 [1] 17 19 21 23 -- -- -- -- Nov 26 07:28:51 myserver kernel: [ 0.950514] PCI: root bus fe: using default resources Nov 26 07:28:51 myserver kernel: [ 0.950516] PCI: Probing PCI hardware (bus fe) Nov 26 07:28:51 myserver kernel: [ 0.952145] PCI: root bus ff: using default resources Nov 26 07:28:51 myserver kernel: [ 0.952146] PCI: Probing PCI hardware (bus ff) Nov 26 07:28:51 myserver kernel: [ 0.953705] PCI: pci_cache_line_size set to 64 bytes Nov 26 07:28:51 myserver kernel: [ 0.953817] e820: reserve RAM buffer [mem 0xba378000-0xbbffffff] Nov 26 07:28:51 myserver kernel: [ 0.953820] e820: reserve RAM buffer [mem 0xbc768000-0xbfffffff] Nov 26 07:28:51 myserver kernel: [ 0.953823] e820: reserve RAM buffer [mem 0xbca67000-0xbfffffff] Nov 26 07:28:51 myserver kernel: [ 0.953826] e820: reserve RAM buffer [mem 0xbcf12000-0xbfffffff] Nov 26 07:28:51 myserver kernel: [ 0.953828] e820: reserve RAM buffer [mem 0xbcf69000-0xbfffffff] Nov 26 07:28:51 myserver kernel: [ 0.974640] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active) Nov 26 07:28:51 myserver kernel: [ 0.974699] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active) Nov 26 07:28:51 myserver kernel: [ 0.975062] pnp 00:02: Plug and Play ACPI device, IDs PNP0501 (active) Nov 26 07:28:51 myserver kernel: [ 0.975413] pnp 00:03: Plug and Play ACPI device, IDs PNP0501 (active) Nov 26 07:28:51 myserver kernel: [ 0.976584] system 00:04: Plug and Play ACPI device, IDs PNP0c01 (active) Nov 26 07:28:51 myserver kernel: [ 0.976656] pnp 00:05: [irq 0 disabled] Nov 26 07:28:51 myserver kernel: [ 0.976725] system 00:05: Plug and Play ACPI device, IDs IPI0001 PNP0c01 (active) Nov 26 07:28:51 myserver kernel: [ 0.977664] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active) Nov 26 07:28:51 myserver kernel: [ 0.977799] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active) Nov 26 07:28:51 myserver kernel: [ 1.847200] intel_idle: MWAIT substates: 0x1120 Nov 26 07:28:51 myserver kernel: [ 1.847270] Monitor-Mwait will be used to enter C-1 state Nov 26 07:28:51 myserver kernel: [ 1.847287] Monitor-Mwait will be used to enter C-3 state Nov 26 07:28:51 myserver kernel: [ 1.847390] intel_idle: v0.5.1 model 0x2C Nov 26 07:28:51 myserver kernel: [ 1.848984] intel_idle: Local APIC timer is reliable in all C-states Nov 26 07:28:51 myserver kernel: [ 2.168571] with arguments: Nov 26 07:28:51 myserver kernel: [ 2.168572] /init Nov 26 07:28:51 myserver kernel: [ 2.168573] with environment: Nov 26 07:28:51 myserver kernel: [ 2.168575] HOME=/ Nov 26 07:28:51 myserver kernel: [ 2.168576] TERM=linux Nov 26 07:28:51 myserver kernel: [ 2.168577] BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 Nov 26 07:28:51 myserver kernel: [ 2.168579] crashkernel=384M-:128M Nov 26 07:28:51 myserver kernel: [ 2.336535] megaraid_sas 0000:04:00.0: BAR:0x1 BAR's base_addr(phys):0x00000000df1bc000 mapped virt_addr:0x(____ptrval____) Nov 26 07:28:51 myserver kernel: [ 2.348825] libata version 3.00 loaded. Nov 26 07:28:51 myserver kernel: [ 2.353143] ata_piix 0000:00:1f.2: version 2.13 Nov 26 07:28:51 myserver kernel: [ 3.577545] sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08 Nov 26 07:28:51 myserver kernel: [ 3.697853] sr 1:0:0:0: Attached scsi CD-ROM sr0 Nov 26 07:28:51 myserver kernel: [ 4.107713] PM: Image not found (code -22) Nov 26 07:28:51 myserver kernel: [ 12.677156] checking generic (d5800000 130000) vs hw (d5800000 800000) Nov 26 07:43:30 myserver kernel: [ 0.002794] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Nov 26 07:43:30 myserver kernel: [ 0.002799] e820: remove [mem 0x000a0000-0x000fffff] usable Nov 26 07:43:30 myserver kernel: [ 0.002814] MTRR default type: uncachable Nov 26 07:43:30 myserver kernel: [ 0.002815] MTRR fixed ranges enabled: Nov 26 07:43:30 myserver kernel: [ 0.002817] 00000-9FFFF write-back Nov 26 07:43:30 myserver kernel: [ 0.002819] A0000-BFFFF uncachable Nov 26 07:43:30 myserver kernel: [ 0.002821] C0000-CBFFF write-protect Nov 26 07:43:30 myserver kernel: [ 0.002822] CC000-D3FFF write-back Nov 26 07:43:30 myserver kernel: [ 0.002824] D4000-EBFFF uncachable Nov 26 07:43:30 myserver kernel: [ 0.002825] EC000-FFFFF write-protect Nov 26 07:43:30 myserver kernel: [ 0.002827] MTRR variable ranges enabled: Nov 26 07:43:30 myserver kernel: [ 0.002829] 0 base 0000000000 mask FF80000000 write-back Nov 26 07:43:30 myserver kernel: [ 0.002831] 1 base 0080000000 mask FFC0000000 write-back
我一直在監控我的 CPU/RAM 使用情況,它從未達到超過 34c 的 CPU 溫度,也從未達到超過 30% 的過載。記憶體使用量約為 70GB 可用…
我不太確定重啟的原因,並希望得到任何幫助,我們將不勝感激!
機器可能會因 a 崩潰
kernel panic
,因此您不會在日誌中看到任何內容,因為一旦 apanic
發生,核心實際上會崩潰並且它無法再向日誌寫入任何內容。崩潰前核心未同步到磁碟的任何內容都將失去。
core dump
您應該使用啟用核心,一旦觸發kdump
a ,它將將記憶體轉儲寫入本地磁碟中的文件。panic
在機器開始使用諸如crash
.您可以在此處閱讀有關如何啟用核心核心轉儲的說明。如果它不適合您的發行版,您可能會找到一些其他文章來解釋如何做到這一點。在您的機器崩潰並產生核心轉儲後,您需要使用
crash
它來分析它。在dedoimedo中可以找到一些不錯的教程。這並不一定容易,但這是您找到崩潰線索的唯一方法。在核心轉儲中,您還可以讀取崩潰前未同步到磁碟的日誌。