Out-of-Memory
Ansible 觸發 oom-killer
執行 ArchLinux
uname -a:
Linux localhost 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux
16gb 記憶體 14gb 交換
當我執行大型 ansible 作業時,它會觸發我的 oom-killer。我認為 16gb 足以執行此類作業,但我不是 oom 日誌專家(或 linux 記憶體專家),以下是日誌:
Feb 14 11:35:36 localhost kernel: Out of memory: Kill process 22698 (systemd-coredum) score 503 or sacrifice child Feb 14 11:35:36 localhost kernel: Killed process 22698 (systemd-coredum) total-vm:880316kB, anon-rss:37604kB, file-rss:67380kB, shmem-rss:0kB Feb 14 11:42:52 localhost kernel: ansible invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 Feb 14 11:42:52 localhost kernel: ansible cpuset=/ mems_allowed=0 Feb 14 11:42:52 localhost kernel: CPU: 0 PID: 27123 Comm: ansible Not tainted 4.7.2-1-ARCH #1 Feb 14 11:42:52 localhost kernel: Hardware name: Dell Inc. OptiPlex 7020/08WKV3, BIOS A02 11/20/2014 Feb 14 11:42:52 localhost kernel: 0000000000000286 00000000a544d0e1 ffff8803b3147b48 ffffffff812eb132 Feb 14 11:42:52 localhost kernel: ffff8803b3147d28 ffff88024193f000 ffff8803b3147bb8 ffffffff811f6e5c Feb 14 11:42:52 localhost kernel: ffff8803b3148000 0000000000000000 ffffffff81b28920 ffffffff811789c0 Feb 14 11:42:52 localhost kernel: Call Trace: Feb 14 11:42:52 localhost kernel: [<ffffffff812eb132>] dump_stack+0x63/0x81 Feb 14 11:42:52 localhost kernel: [<ffffffff811f6e5c>] dump_header+0x60/0x1e8 Feb 14 11:42:52 localhost kernel: [<ffffffff811789c0>] ? page_alloc_cpu_notify+0x50/0x50 Feb 14 11:42:52 localhost kernel: [<ffffffff811762fa>] oom_kill_process+0x22a/0x440 Feb 14 11:42:52 localhost kernel: [<ffffffff8117696a>] out_of_memory+0x40a/0x4b0 Feb 14 11:42:52 localhost kernel: [<ffffffff812ffe08>] ? find_next_bit+0x18/0x20 Feb 14 11:42:52 localhost kernel: [<ffffffff8117c05b>] __alloc_pages_nodemask+0xf0b/0xf30 Feb 14 11:42:52 localhost kernel: [<ffffffff8117c3d4>] alloc_kmem_pages_node+0x54/0xd0 Feb 14 11:42:52 localhost kernel: [<ffffffff81077c06>] copy_process.part.8+0x136/0x19a0 Feb 14 11:42:52 localhost kernel: [<ffffffff811a974a>] ? handle_mm_fault+0xa7a/0x1f60 Feb 14 11:42:52 localhost kernel: [<ffffffff81079647>] _do_fork+0xd7/0x3d0 Feb 14 11:42:52 localhost kernel: [<ffffffff810655f5>] ? __do_page_fault+0x1f5/0x510 Feb 14 11:42:52 localhost kernel: [<ffffffff810799e9>] SyS_clone+0x19/0x20 Feb 14 11:42:52 localhost kernel: [<ffffffff81003c07>] do_syscall_64+0x57/0xb0 Feb 14 11:42:52 localhost kernel: [<ffffffff815de861>] entry_SYSCALL64_slow_path+0x25/0x25 Feb 14 11:42:52 localhost kernel: Mem-Info: Feb 14 11:42:52 localhost kernel: active_anon:548787 inactive_anon:232682 isolated_anon:0 active_file:28394 inactive_file:24931 isolated_file:8 unevictable:0 dirty:1 writeback:0 unstable:0 slab_reclaimable:1897009 slab_unreclaimable:19547 mapped:51240 shmem:28342 pagetables:20339 bounce:0 free:1284106 free_pcp:446 free_cma:0 Feb 14 11:42:52 localhost kernel: Node 0 DMA free:15628kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900k Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 3468 15978 15978 Feb 14 11:42:52 localhost kernel: Node 0 DMA32 free:1221320kB min:14632kB low:18288kB high:21944kB active_anon:274224kB inactive_anon:273556kB active_file:40556kB inactive_file:36556kB unevictable:0kB isolated(anon):0kB isolated(file):32k Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 12510 12510 Feb 14 11:42:52 localhost kernel: Node 0 Normal free:3899476kB min:52884kB low:66104kB high:79324kB active_anon:1920924kB inactive_anon:657172kB active_file:73020kB inactive_file:63168kB unevictable:0kB isolated(anon):0kB isolated(file):0 Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 0 0 Feb 14 11:42:52 localhost kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (ME) = 15628kB Feb 14 11:42:52 localhost kernel: Node 0 DMA32: 166992*4kB (UME) 68889*8kB (UE) 7*16kB (H) 11*32kB (H) 11*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1220760kB Feb 14 11:42:52 localhost kernel: Node 0 Normal: 721354*4kB (UME) 126667*8kB (UEH) 16*16kB (H) 2*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3899072kB Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Feb 14 11:42:52 localhost kernel: 125644 total pagecache pages Feb 14 11:42:52 localhost kernel: 43931 pages in swap cache Feb 14 11:42:52 localhost kernel: Swap cache stats: add 2753281, delete 2709350, find 730647/1154037 Feb 14 11:42:52 localhost kernel: Free swap = 12677364kB Feb 14 11:42:52 localhost kernel: Total swap = 14124028kB Feb 14 11:42:52 localhost kernel: 4179504 pages RAM Feb 14 11:42:52 localhost kernel: 0 pages HighMem/MovableOnly Feb 14 11:42:52 localhost kernel: 84923 pages reserved Feb 14 11:42:52 localhost kernel: 0 pages hwpoisoned (...) Feb 14 11:42:52 localhost kernel: Out of memory: Kill process 27876 (firefox) score 41 or sacrifice child Feb 14 11:42:52 localhost kernel: Killed process 27876 (firefox) total-vm:4003016kB, anon-rss:1091960kB, file-rss:41516kB, shmem-rss:80216kB
以下是我玩過的一些 sysctl 值,它們有點幫助,但在更大的工作中,它仍在發生:
vm.overcommit_memory = 2 vm.overcommit_ratio = 100
我的一些 ansible 工作真的會佔用我所有系統的記憶體 + 交換嗎?
Ansible 絕對不應該使用那麼多記憶體。你能詳細說明你正在執行的工作嗎?(有多少,他們在做什麼,使用的模組,範例等。)我看到 firefox 在那裡被殺死,你在用 firefox 做很多事情嗎?