Kernel
核心:BUG:無法處理地址的頁面錯誤
今天我們的一台設備因以下核心消息而當機:
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034 [79648.067315] #PF: supervisor read access in kernel mode [79648.067318] #PF: error_code(0x0000) - not-present page
從呼叫跟踪(見下文)看來,此錯誤是由圖形驅動程序 (i915) 引起的。據推測,核心更新可以解決這個問題,但是,我對這個問題的背景感興趣,所以我有 3 個問題:
- 這 3 行到底是什麼意思,或者我在哪裡可以找到這些錯誤的描述?
- 如果我啟用硬體看門狗,它會在發生此錯誤時重新啟動系統嗎?
- 是否會由於硬體(記憶體)故障而發生此錯誤?
系統:5.4.0-91-generic,Ubuntu 20.04.1 LTS
核心環形緩衝區 (dmesg) 的完整轉儲:
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034 [79648.067315] #PF: supervisor read access in kernel mode [79648.067318] #PF: error_code(0x0000) - not-present page [79648.067322] PGD 0 P4D 0 [79648.067328] Oops: 0000 [#1] SMP PTI [79648.067335] CPU: 3 PID: 668 Comm: Xorg Not tainted 5.4.0-91-generic #102-Ubuntu [79648.067338] Hardware name: Shuttle Inc. DH310S/DH310S, BIOS 1.06 03/23/2020 [79648.067349] RIP: 0010:find_get_entry+0x7a/0x170 [79648.067355] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45 [79648.067359] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246 [79648.067364] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000 [79648.067367] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728 [79648.067370] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000 [79648.067373] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000 [79648.067376] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0 [79648.067381] FS: 00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000 [79648.067384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [79648.067387] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0 [79648.067390] Call Trace: [79648.067401] find_lock_entry+0x1f/0xe0 [79648.067408] shmem_getpage_gfp+0xef/0x940 [79648.067417] ? __kmalloc+0x194/0x290 [79648.067424] shmem_read_mapping_page_gfp+0x44/0x80 [79648.067520] shmem_get_pages+0x250/0x650 [i915] [79648.067530] ? __update_load_avg_se+0x23b/0x320 [79648.067538] ? update_load_avg+0x7c/0x670 [79648.067619] ____i915_gem_object_get_pages+0x22/0x40 [i915] [79648.067692] __i915_gem_object_get_pages+0x5b/0x70 [i915] [79648.067774] __i915_vma_do_pin+0x3ee/0x470 [i915] [79648.067845] eb_lookup_vmas+0x68a/0xb70 [i915] [79648.067930] ? eb_pin_engine+0x255/0x410 [i915] [79648.067990] i915_gem_do_execbuffer+0x38f/0xc20 [i915] [79648.067997] ? security_file_alloc+0x29/0x90 [79648.068004] ? _cond_resched+0x19/0x30 [79648.068010] ? apparmor_file_alloc_security+0x3e/0x160 [79648.068016] ? __radix_tree_replace+0x6d/0x120 [79648.068020] ? radix_tree_iter_tag_clear+0x12/0x20 [79648.068027] ? kmem_cache_alloc_trace+0x177/0x240 [79648.068035] ? __pm_runtime_resume+0x60/0x80 [79648.068040] ? recalibrate_cpu_khz+0x10/0x10 [79648.068044] ? ktime_get_mono_fast_ns+0x4e/0xa0 [79648.068048] ? __kmalloc_node+0x213/0x330 [79648.068107] i915_gem_execbuffer2_ioctl+0x1eb/0x3d0 [i915] [79648.068112] ? radix_tree_lookup+0xd/0x10 [79648.068167] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915] [79648.068196] drm_ioctl_kernel+0xae/0xf0 [drm] [79648.068218] drm_ioctl+0x24a/0x3f0 [drm] [79648.068278] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915] [79648.068288] do_vfs_ioctl+0x407/0x670 [79648.068293] ? fput+0x13/0x20 [79648.068299] ? __sys_recvmsg+0x88/0xa0 [79648.068305] ksys_ioctl+0x67/0x90 [79648.068311] __x64_sys_ioctl+0x1a/0x20 [79648.068317] do_syscall_64+0x57/0x190 [79648.068323] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [79648.068327] RIP: 0033:0x7f5b0db7937b [79648.068332] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48 [79648.068335] RSP: 002b:00007fff24ca5d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [79648.068339] RAX: ffffffffffffffda RBX: 000055eaa18c2290 RCX: 00007f5b0db7937b [79648.068342] RDX: 00007fff24ca5db0 RSI: 0000000040406469 RDI: 000000000000000c [79648.068345] RBP: 00007f5b0ba31000 R08: 0000000000000002 R09: 0000000000000001 [79648.068347] R10: 00007f5b0d4156a0 R11: 0000000000000246 R12: 00007fff24ca5db0 [79648.068350] R13: 000000000000000c R14: 000000000000001a R15: 0000000000000068 [79648.068354] Modules linked in: wdat_wdt nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi intel_rapl_msr snd_seq_midi_event intel_rapl_common snd_rawmidi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_seq kvm rtsx_pci_ms rapl snd_seq_device intel_cstate memstick snd_timer mei_me mei snd soundcore mac_hid acpi_pad sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_algo_bit rtsx_pci_sdmmc glue_helper drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_i801 fb_sys_fops r8169 rtsx_pci drm realtek ahci libahci video [79648.068413] CR2: 0000000004000034 [79648.068418] ---[ end trace 447ad409d057183e ]--- [79648.068425] RIP: 0010:find_get_entry+0x7a/0x170 [79648.068429] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45 [79648.068432] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246 [79648.068435] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000 [79648.068438] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728 [79648.068441] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000 [79648.068443] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000 [79648.068446] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0 [79648.068449] FS: 00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000 [79648.068452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [79648.068455] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034 [79648.067315] #PF: supervisor read access in kernel mode [79648.067318] #PF: error_code(0x0000) - not-present page
這些錯誤表明核心程式碼試圖訪問一個無效的指針。核心程式碼嘗試訪問虛擬記憶體地址
0x0000000004000034
,但發現它與任何真實記憶體頁面都不對應(該頁面無法出錯)。第二行和第三行給出的上下文是 1) 程式碼在核心模式(主管模式)下執行 2) 訪問是讀取;3)問題是頁面失去,而不是不兼容的頁面保護(例如寫入只讀頁面)。
這可能是核心/驅動程式碼中的錯誤。