Sles
kdump:kexec_file_load 失敗:無法分配請求的地址
問題:
SERVER:~ # systemctl start kdump.service Job for kdump.service failed because the control process exited with error code. See "systemctl status kdump.service" and "journalctl -xe" for details. SERVER:~ # systemctl status kdump.service ● kdump.service - Load kdump kernel on startup Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2018-10-17 12:29:34 EDT; 1s ago Process: 59804 ExecStart=/lib/kdump/load.sh (code=exited, status=1/FAILURE) Main PID: 59804 (code=exited, status=1/FAILURE) Oct 17 12:29:33 SERVER systemd[1]: Starting Load kdump kernel on startup... Oct 17 12:29:34 SERVER load.sh[59804]: kexec_file_load failed: Cannot assign requested address Oct 17 12:29:34 SERVER systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE Oct 17 12:29:34 SERVER systemd[1]: Failed to start Load kdump kernel on startup. Oct 17 12:29:34 SERVER systemd[1]: kdump.service: Unit entered failed state. Oct 17 12:29:34 SERVER systemd[1]: kdump.service: Failed with result 'exit-code'. SERVER:~ #
日誌:
SERVER:~ # tail /var/log/messages 2018-10-17T12:29:33.980232-04:00 SERVER systemd[1]: Starting Load kdump kernel on startup... 2018-10-17T12:29:34.133151-04:00 SERVER kdump[59974]: FAILED to load kdump kernel: /sbin/kexec -p /boot/vmlinuz-4.4.121-92.80-default --append="quiet console=tty0 console=ttyS0,9600 elevator=noop transparent_hugepage=never numa_balancing=disable intel_idle.max_cstate=1 elevator=deadline sysrq=yes reset_devices acpi_no_memhotplug cgroup_disable=memory irqpoll nr_cpus=1 root=kdump rootflags=bind rd.udev.children-max=8 disable_cpu_apicid=0 panic=1" --initrd=/boot/initrd-4.4.121-92.80-default-kdump -s, Result: kexec_file_load failed: Cannot assign requested address 2018-10-17T12:29:34.133560-04:00 SERVER load.sh[59804]: kexec_file_load failed: Cannot assign requested address 2018-10-17T12:29:34.133726-04:00 SERVER systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE 2018-10-17T12:29:34.133958-04:00 SERVER systemd[1]: Failed to start Load kdump kernel on startup. 2018-10-17T12:29:34.134105-04:00 SERVER systemd[1]: kdump.service: Unit entered failed state. 2018-10-17T12:29:34.134233-04:00 SERVER systemd[1]: kdump.service: Failed with result 'exit-code'. SERVER:~ #
版本資訊:
SERVER:~ # rpm -qa|grep -i kdump yast2-kdump-3.1.44-11.6.15.x86_64 kdump-0.8.15-28.5.x86_64 SERVER:~ # uname -a Linux SERVER 4.4.121-92.80-default #1 SMP Mon May 21 14:40:10 UTC 2018 (2afdd00) x86_64 x86_64 x86_64 GNU/Linux SERVER:~ # SERVER:~ # cat /etc/SuSE-release SUSE Linux Enterprise Server 12 (x86_64) VERSION = 12 PATCHLEVEL = 2 # This file is deprecated and will be removed in a future service pack or release. # Please check /etc/os-release for details about this release. SERVER:~ #
**問題:**為什麼kdump.service 不能啟動?我錯過了什麼?
AFAIK SLES 12 不需要 kernel-kdump 包還是我錯了?如果是,我可以從哪裡獲得 kernel-kdump 包?
基於https://distrowatch.com/table-mobile.php?distribution=sle&pkglist=true&version=12-sp2 kdump 版本看起來不錯。
2018 年 12 月 5 日更新:
- rpm -V kdump-0.8.15-28.5.x86_64;迴聲$?-> 它是 0,沒關係
- 我找到了一台具有相同核心版本的機器,但是在那裡,kdump 可以工作!但是找不到健康與這個壞主機之間的區別..
- 試圖替換 initrd,但沒有幫助。
- 嘗試重新安裝 kdump,沒有幫助: rpm -e yast2-kdump; rpm -e kdump;kdump 中的 zypper
- 嘗試執行“systemctl unmask kdump;systemctl enable kdump;systemctl restart kdump”和“systemctl daemon-reload”,沒有幫助。
2018 年 12 月 7 日更新:
cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.4.121-92.80-default root=/dev/mapper/vg00-lv_root splash=silent quiet showopts console=tty0 console=ttyS0,9600 elevator=noop transparent_hugepage=never crashkernel=768M numa_balancing=disable intel_idle.max_cstate=1
2018 年 12 月 11 日更新:從無法啟動 kdump 的節點發布 /proc/iomem:
SERVER:~ # cat /proc/iomem 00000000-00000fff : reserved 00001000-0009bfff : System RAM 0009c000-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c7fff : Video ROM 000cd800-000d53ff : Adapter ROM 000e0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-5eeb0fff : System RAM 01000000-015fbb30 : Kernel code 015fbb31-01d59b7f : Kernel data 01f6b000-021e8fff : Kernel bss 5eeb1000-66eb8fff : reserved 66eb9000-6a733fff : System RAM 6a734000-6a742fff : reserved 6a743000-6a743fff : System RAM 6a744000-7a7c4fff : reserved 7a7c5000-7cc82fff : System RAM 7cc83000-7ccb4fff : reserved 7ccb5000-a41b7fff : System RAM a41b8000-b93fefff : reserved b93ff000-bb3fefff : ACPI Non-volatile Storage bb3ff000-bb7fefff : ACPI Tables bb7ff000-bb7fffff : System RAM bb800000-cfffffff : reserved c0000000-cfffffff : PCI MMCONFIG 0000 [bus 00-ff] d0000000-e7ffbfff : PCI Bus 0000:00 d0000000-d01fffff : PCI Bus 0000:06 d0000000-d00fffff : 0000:06:00.0 d0100000-d01fffff : 0000:06:00.1 d0200000-d020ffff : 0000:00:11.0 d03fc000-d03fcfff : 0000:00:05.4 d03fe000-d03fe3ff : 0000:00:1a.0 d03fe000-d03fe3ff : ehci_hcd d03ff000-d03ff3ff : 0000:00:1d.0 d03ff000-d03ff3ff : ehci_hcd d0400000-d05fffff : PCI Bus 0000:0b d04f0000-d04fffff : 0000:0b:00.0 d04f0000-d04fffff : megasas: LSI d0500000-d05fffff : 0000:0b:00.0 d0600000-d0ffffff : PCI Bus 0000:11 d0600000-d0ffffff : PCI Bus 0000:12 d0600000-d06fffff : PCI Bus 0000:15 d06fe000-d06fefff : 0000:15:00.0 d06ff000-d06fffff : 0000:15:00.0 d0700000-d0ffffff : PCI Bus 0000:13 d0700000-d0ffffff : PCI Bus 0000:14 d07fc000-d07fffff : 0000:14:00.0 d07fc000-d07fffff : mgadrmfb_mmio d0800000-d0ffffff : 0000:14:00.0 d1000000-d1ffffff : PCI Bus 0000:11 d1000000-d1ffffff : PCI Bus 0000:12 d1000000-d1ffffff : PCI Bus 0000:13 d1000000-d1ffffff : PCI Bus 0000:14 d1000000-d1ffffff : 0000:14:00.0 d1000000-d1ffffff : mgadrmfb_vram d2000000-d5ffffff : PCI Bus 0000:06 d2000000-d3ffffff : 0000:06:00.0 d2000000-d3ffffff : mlx5_core d4000000-d5ffffff : 0000:06:00.1 d4000000-d5ffffff : mlx5_core e7ffc000-e7ffcfff : dmar1 e8000000-fbffbfff : PCI Bus 0000:80 e8000000-e81fffff : PCI Bus 0000:81 e8000000-e80fffff : 0000:81:00.0 e8100000-e81fffff : 0000:81:00.1 e9fff000-e9ffffff : 0000:80:05.4 ea000000-edffffff : PCI Bus 0000:81 ea000000-ebffffff : 0000:81:00.0 ea000000-ebffffff : mlx5_core ec000000-edffffff : 0000:81:00.1 ec000000-edffffff : mlx5_core fbffc000-fbffcfff : dmar0 fec00000-fecfffff : PNP0003:00 fec00000-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec40000-fec403ff : IOAPIC 2 fed00000-fed003ff : HPET 0 fed00000-fed003ff : PNP0103:00 fed12000-fed1200f : pnp 00:01 fed12010-fed1201f : pnp 00:01 fed1b000-fed1bfff : pnp 00:01 fed1c000-fed1ffff : reserved fed1f410-fed1f414 : iTCO_wdt.0.auto fed45000-fed8bfff : pnp 00:01 fee00000-feefffff : pnp 00:01 fee00000-fee00fff : Local APIC ff000000-ffffffff : reserved ff000000-ffffffff : pnp 00:01 100000000-1003fffffff : System RAM 38000000000-3bfffffffff : PCI Bus 0000:00 38000000000-38000000fff : 0000:00:1f.6 3800000c000-3800000c00f : 0000:00:16.0 3800000d000-3800000d00f : 0000:00:16.1 3800000e000-3800000e0ff : 0000:00:1f.3 38000010000-3800001ffff : 0000:00:14.0 38000010000-3800001ffff : xhci-hcd 3c000000000-3ffffffffff : PCI Bus 0000:80 SERVER:~ #
讓我用提供的資訊盡可能地回答。
首先,SLES 12(及更高版本)確實不需要 kernel-kdump 包。這種特殊的核心風格只在古代才需要,因為恐慌核心必須載入到與執行核心不同的物理地址,但載入地址只能在編譯時更改(也就是核心不可重定位)。
其次,kdump 不會啟動,因為底層
kexec_file_load
系統呼叫失敗並顯示EADDRNOTAVAIL
. 如果系統無法分配將緊急核心載入到 RAM 所需的一個或多個緩衝區,則會發生這種情況。請注意,理論上可能有足夠的記憶體留給恐慌核心,但由於分配有一些由 Linux 核心引導程式碼和/或驅動程序施加的額外限制,因此該 RAM 可能無法用於載入恐慌核心。由於不同的物理記憶體佈局,另一個系統可能更幸運。作為第一步,我會嘗試在核心命令行(例如
crashkernel=256M
)上增加保留的記憶體大小,重新啟動並查看它是否有幫助。