Linux
帶有 freeNas 作業系統的來賓 vm 在帶有 amd r9 5995x 的主機中進入 kdb 模式
問題描述:
我曾經在帶有 amd r3 3100 的主機上安裝 ubuntu20.04,我安裝了 kvm 並啟動了一個 freeNas vm,一切正常。但是一旦我更換了 cpu,freeNas 客戶機無法工作,但其他帶有 ubuntu 的客戶機可以執行。
登錄freeNas訪客
db> reboot cpu_reset: Restarting BSP cpu_reset_proxy: Stopped CPU 1 GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2019 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.3-RELEASE-p14 #0 r325575+c936002dbe2(HEAD): Mon Sep 28 10:48:27 EDT 2020 root@tnbuilds05.tn.ixsystems.net:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64-DEBUG amd64 FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): text 80x25 CPU: AMD EPYC-Milan Processor (3400.05-MHz K8-class CPU) Origin="AuthenticAMD" Id=0xa00f11 Family=0x19 Model=0x1 Stepping=1 Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2> Features2=0xfff83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0xc003f7<LAHF,CMP,SVM,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology,PCXC> Structured Extended Features=0x211c07ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLWB,SHA> Structured Extended Features2=0x40060c<UMIP,PKU,RDPID> Structured Extended Features3=0xac000010<IBPB,STIBP,ARCH_CAP,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0x69<RDCL_NO,SKIP_L1DFL_VME> AMD Extended Feature Extensions ID EBX=0x300d205<CLZERO,XSaveErPtr> SVM: NP,NRIP,NAsids=16 Hypervisor: Origin = "KVMKVMKVM" real memory = 8489271296 (8096 MB) avail memory = 8143572992 (7766 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <BOCHS BXPCAPIC> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 2 package(s) WARNING: VIMAGE (virtualized network stack) is a highly experimental feature. ioapic0 <Version 1.1> irqs 0-23 on motherboard SMP: AP CPU #1 Launched! random: entropy device external interface random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" kbd1 at kbdmux0 mlx5en: Mellanox Ethernet driver 3.5.1 (April 2019) nexus0 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard padlock0: No ACE support. acpi0: <BOCHS BXPCRSDT> on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc1a0-0xc1af at device 1.1 on pci0 ata0: <ATA channel> at channel 0 on atapci0 ata1: <ATA channel> at channel 1 on atapci0 pci0: <bridge> at device 1.3 (no driver attached) vgapci0: <VGA-compatible display> port 0xc100-0xc11f mem 0xf4000000-0xf7ffffff,0xf8000000-0xfbffffff,0xfc094000-0xfc095fff irq 10 at device 2.0 on pci0 vgapci0: Boot video device virtio_pci0: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfc096000-0xfc096fff,0xfebf0000-0xfebf3fff irq 11 at device 3.0 on pci0 vtnet0: <VirtIO Networking Adapter> on virtio_pci0 vtnet0: Ethernet address: 52:54:00:9b:85:3a pci0: <multimedia, HDA> at device 4.0 (no driver attached) uhci0: <Intel 82801I (ICH9) USB controller> port 0xc140-0xc15f irq 10 at device 5.0 on pci0 usbus0 on uhci0 usbus0: 12Mbps Full Speed USB v1.0 uhci1: <Intel 82801I (ICH9) USB controller> port 0xc160-0xc17f irq 10 at device 5.1 on pci0 usbus1 on uhci1 usbus1: 12Mbps Full Speed USB v1.0 uhci2: <Intel 82801I (ICH9) USB controller> port 0xc180-0xc19f irq 11 at device 5.2 on pci0 usbus2 on uhci2 usbus2: 12Mbps Full Speed USB v1.0 ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc097000-0xfc097fff irq 11 at device 5.7 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci0 usbus3: 480Mbps High Speed USB v2.0 virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfc098000-0xfc098fff,0xfebf4000-0xfebf7fff irq 10 at device 6.0 on pci0 virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0ff mem 0xfebf8000-0xfebfbfff irq 11 at device 7.0 on pci0 vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2 virtio_pci3: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfc099000-0xfc099fff,0xfebfc000-0xfebfffff irq 11 at device 8.0 on pci0 vtblk0: <VirtIO Block Adapter> on virtio_pci3 vtblk0: 5723166MB (11721045168 512 byte sectors) acpi_syscontainer0: <System Container> on acpi0 acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0 acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0 acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: console (9600,n,8,1) orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0 attimer0: <AT timer> at port 0x40 on isa0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 10.000 msec freenas_sysctl: adding account. freenas_sysctl: adding directoryservice. freenas_sysctl: adding middlewared. freenas_sysctl: adding network. freenas_sysctl: adding services. ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled ugen2.1: <Intel UHCI root HUB> at usbus2 ugen3.1: <Intel EHCI root HUB> at usbus3 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen0.1: <Intel UHCI root HUB> at usbus0 uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel UHCI root HUB> at usbus1 uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ada0 at ata0 bus 0 scbus0 target 0 lun 0 ada0: <QEMU HARDDISK 2.5+> ATA-7 device ada0: Serial Number QM00001 ada0: 16.700MB/s transfers (WDMA2, PIO 8192bytes) ada0: 61440MB (125829120 512 byte sectors) cd0 at ata0 bus 0 scbus0 target 1 lun 0 cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device cd0: Serial Number QM00002 cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from zfs:freenas-boot/ROOT/default []... Root mount waiting for: usbus3 usbus2 usbus1 usbus0 uhub0: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub3: 2 ports with 2 removable, self powered Root mount waiting for: usbus3 Root mount waiting for: usbus3 uhub1: 6 ports with 6 removable, self powered Root mount waiting for: usbus3 ugen3.2: <QEMU QEMU USB Tablet> at usbus3 Starting devd. warning: KLD '/boot/kernel-debug/uhid.ko' is newer than the linker.hints file lo0: link state changed to UP Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xfffffe02311f30c0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff81016d09 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xfffffe02311c60c0 stack pointer = 0x28:0xfffffe02311f1eb0 frame pointer = 0x28:0xfffffe02311f1eb0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 99 (python3.7) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02311f1b70 vpanic() at vpanic+0x17e/frame 0xfffffe02311f1bd0 panic() at panic+0x43/frame 0xfffffe02311f1c30 trap_fatal() at trap_fatal+0x369/frame 0xfffffe02311f1c80 trap_pfault() at trap_pfault+0x62/frame 0xfffffe02311f1cd0 trap() at trap+0x2b3/frame 0xfffffe02311f1de0 calltrap() at calltrap+0x8/frame 0xfffffe02311f1de0 --- trap 0xc, rip = 0xffffffff81016d09, rsp = 0xfffffe02311f1eb0, rbp = 0xfffffe02311f1eb0 --- bcopy() at bcopy+0x19/frame 0xfffffe02311f1eb0 fpugetregs() at fpugetregs+0x192/frame 0xfffffe02311f1f00 get_mcontext() at get_mcontext+0x1b4/frame 0xfffffe02311f1f50 sys_getcontext() at sys_getcontext+0x56/frame 0xfffffe02311f2300 amd64_syscall() at amd64_syscall+0x792/frame 0xfffffe02311f2430 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02311f2430 --- syscall (421, FreeBSD ELF64, sys_getcontext), rip = 0x801c26280, rsp = 0x7fffffffd188, rbp = 0x7fffffffdcf0 --- KDB: enter: panic [ thread pid 99 tid 100490 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why
cpu的bios設置如下:
dmidecode | grep "Processor Information" -A 54 Processor Information Socket Designation: AM4 Type: Central Processor Family: Zen Manufacturer: Advanced Micro Devices, Inc. ID: 10 0F A2 00 FF FB 8B 17 Signature: Family 25, Model 33, Stepping 0 Flags: FPU (Floating-point unit on-chip) VME (Virtual mode extension) DE (Debugging extension) PSE (Page size extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) MCE (Machine check exception) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) SEP (Fast system call) MTRR (Memory type range registers) PGE (Page global enable) MCA (Machine check architecture) CMOV (Conditional move instruction supported) PAT (Page attribute table) PSE-36 (36-bit page size extension) CLFSH (CLFLUSH instruction supported) MMX (MMX technology supported) FXSR (FXSAVE and FXSTOR instructions supported) SSE (Streaming SIMD extensions) SSE2 (Streaming SIMD extensions 2) HTT (Multi-threading) Version: AMD Ryzen 9 5950X 16-Core Processor Voltage: 1.1 V External Clock: 100 MHz Max Speed: 5050 MHz Current Speed: 3400 MHz Status: Populated, Enabled Upgrade: Socket AM4 L1 Cache Handle: 0x0013 L2 Cache Handle: 0x0014 L3 Cache Handle: 0x0015 Serial Number: Unknown Asset Tag: Unknown Part Number: Unknown Core Count: 16 Core Enabled: 16 Thread Count: 32 Characteristics: 64-bit capable Multi-Core Hardware Thread Execute Protection Enhanced Virtualization Power/Performance Control
在 kdb 中重置後,我發現以下資訊:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xfffffe02311d00c0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff81016d09 stack pointer = 0x28:0xfffffe02311ceeb0 frame pointer = 0x28:0xfffffe02311ceeb0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 99 (python3.7) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace:
我嘗試過的事情:
- 重新安裝來賓但因同樣的問題而失敗也未能進入 kdb 模式
- 重啟主機,但無法修復
問題:
- 我該怎麼做才能從 kdb 收集更詳細的資訊
- 如何解決問題
- freeNas 不支持 AMD Ryzen 9 5950X 16 核處理器
在 Wu 的幫助下,我能夠使用以下命令創建帶有 freeNas 作業系統映像的測試虛擬機:
virt-install \ --name test \ --memory 8096 \ --vcpus 2 \ --cpu host-model-only \ --cdrom /var/lib/libvirt/isos/TrueNAS-12.0-U5.1.iso \ --disk size=30,bus=virtio \ --network type=direct,source=enp42s0,source_mode=bridge \ --os-type=linux \ --os-variant freebsd11.3 \ --graphics vnc,listen=0.0.0.0,port=20012 \ --video vga --input tablet,bus=usb
在比較了 freeNas vm 和 test vm 的 xml 之後,我將 cpu 組件更改為以下
<cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>EPYC-Rome</model> <feature policy='require' name='ibpb'/> <feature policy='require' name='spec-ctrl'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='virt-ssbd'/> </cpu>
並執行如下命令
virsh destroy freeNas virsh start freeNas
最後它回來了。
目前,我不知道為什麼會這樣,因為這只是受到嘗試而不是理論的啟發。