[PVE-User] [OT?] OOM...
Falko Trojahn
trojahn+proxmox at pluspol.info
Tue Jan 10 14:00:35 CET 2017
Hello Marco,
did you ever find out more about your OOMs?
Hello all,
I'd like to get some idea what we can do here.
Since last pve updates last week (no idea if related or not) we get OOMs
sometimes during the night. We have 5 proxmox nodes with ceph and kvms,
3 nodes are servers with Supermicro Boards with >=60 GB RAM, two are
only for transition process from old Proxmox 3.x to new 4.x cluster,
Asus P6T6 Boards with 12GB (no kvms) and 24GB which will be sorted out
later if possible.
When we first noticed the oom, two kvm processes were killed one after
another, now at least two times a ceph osd process was involved
(see lists / syslog excerpts further down.
Our munin graphs never show memory shortages at the time of the ooms,
seems plenty of RAM available.
So why does rados kill the process with the most memory, and how
can this be prevented?
If more info about our config is needed, please ask.
Many thanks in advance
and best regards
Falko
Full output of the first oom:
Jan 6 04:04:49 vm1 pvedaemon[714]: worker exit
Jan 6 04:36:34 vm1 kernel: [238186.770831] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 6 04:36:34 vm1 kernel: [238186.770837] rados cpuset=/ mems_allowed=0-1
Jan 6 04:36:34 vm1 kernel: [238186.770845] CPU: 6 PID: 11757 Comm:
rados Tainted: G O 4.4.35-1-pve #1
Jan 6 04:36:34 vm1 kernel: [238186.770847] Hardware name: Supermicro
X8DT3/X8DT3, BIOS 2.0c 08/30/2011
Jan 6 04:36:34 vm1 kernel: [238186.770849] 0000000000000286
00000000b47f489a ffff8800b249bb50 ffffffff813f9743
Jan 6 04:36:34 vm1 kernel: [238186.770852] ffff8800b249bd40
0000000000000000 ffff8800b249bbb8 ffffffff8120adcb
Jan 6 04:36:34 vm1 kernel: [238186.770855] ffff880f3f69ae10
ffffea001f005880 0000000100000001 0000000000000000
Jan 6 04:36:34 vm1 kernel: [238186.770858] Call Trace:
Jan 6 04:36:34 vm1 kernel: [238186.770871] [<ffffffff813f9743>]
dump_stack+0x63/0x90
Jan 6 04:36:34 vm1 kernel: [238186.770877] [<ffffffff8120adcb>]
dump_header+0x67/0x1d5
Jan 6 04:36:34 vm1 kernel: [238186.770883] [<ffffffff811925c5>]
oom_kill_process+0x205/0x3c0
Jan 6 04:36:34 vm1 kernel: [238186.770886] [<ffffffff81192a17>]
out_of_memory+0x237/0x4a0
Jan 6 04:36:34 vm1 kernel: [238186.770891] [<ffffffff81198d0e>]
__alloc_pages_nodemask+0xcee/0xe20
Jan 6 04:36:34 vm1 kernel: [238186.770894] [<ffffffff81198e8b>]
alloc_kmem_pages_node+0x4b/0xd0
Jan 6 04:36:34 vm1 kernel: [238186.770901] [<ffffffff8107f053>]
copy_process+0x1c3/0x1c00
Jan 6 04:36:34 vm1 kernel: [238186.770907] [<ffffffff8119fa37>] ?
lru_cache_add_active_or_unevictable+0x27/0xa0
Jan 6 04:36:34 vm1 kernel: [238186.770910] [<ffffffff811c24c9>] ?
handle_mm_fault+0xdb9/0x19c0
Jan 6 04:36:34 vm1 kernel: [238186.770914] [<ffffffff811c743f>] ?
__split_vma.isra.31+0x1cf/0x1f0
Jan 6 04:36:34 vm1 kernel: [238186.770917] [<ffffffff81080c20>]
_do_fork+0x80/0x360
Jan 6 04:36:34 vm1 kernel: [238186.770920] [<ffffffff81080fa9>]
SyS_clone+0x19/0x20
Jan 6 04:36:34 vm1 kernel: [238186.770926] [<ffffffff8185c276>]
entry_SYSCALL_64_fastpath+0x16/0x75
Jan 6 04:36:34 vm1 kernel: [238186.770928] Mem-Info:
Jan 6 04:36:34 vm1 kernel: [238186.770937] active_anon:6659112
inactive_anon:714162 isolated_anon:0
Jan 6 04:36:34 vm1 kernel: [238186.770937] active_file:3739306
inactive_file:3764944 isolated_file:0
Jan 6 04:36:34 vm1 kernel: [238186.770937] unevictable:880 dirty:167
writeback:0 unstable:0
Jan 6 04:36:34 vm1 kernel: [238186.770937] slab_reclaimable:282429
slab_unreclaimable:30118
Jan 6 04:36:34 vm1 kernel: [238186.770937] mapped:41400 shmem:19954
pagetables:23536 bounce:0
Jan 6 04:36:34 vm1 kernel: [238186.770937] free:58861 free_pcp:0
free_cma:0
Jan 6 04:36:34 vm1 kernel: [238186.770941] Node 0 DMA free:15884kB
min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
Jan 6 04:36:34 vm1 kernel: [238186.770948] lowmem_reserve[]: 0 2941
30107 30107 30107
Jan 6 04:36:34 vm1 kernel: [238186.770952] Node 0 DMA32 free:120148kB
min:1532kB low:1912kB high:2296kB active_anon:1184692kB
inactive_anon:395524kB active_file:529956kB inactive_file:540236kB
unevictable:576kB isolated(anon):0kB isolated(file):0kB
present:3120448kB managed:3039560kB mlocked:576kB dirty:4kB
writeback:0kB mapped:7204kB shmem:4292kB slab_reclaimable:225644kB
slab_unreclaimable:11816kB kernel_stack:3120kB pagetables:3472kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 6 04:36:34 vm1 kernel: [238186.770958] lowmem_reserve[]: 0 0 27166
27166 27166
Jan 6 04:36:34 vm1 kernel: [238186.770961] Node 0 Normal free:46660kB
min:14148kB low:17684kB high:21220kB active_anon:13025448kB
inactive_anon:1197896kB active_file:6324252kB inactive_file:6378116kB
unevictable:2904kB isolated(anon):0kB isolated(file):0kB
present:28311552kB managed:27818072kB mlocked:2904kB dirty:136kB
writeback:0kB mapped:63884kB shmem:39584kB slab_reclaimable:394676kB
slab_unreclaimable:49392kB kernel_stack:22608kB pagetables:43232kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 6 04:36:34 vm1 kernel: [238186.770967] lowmem_reserve[]: 0 0 0 0 0
Jan 6 04:36:34 vm1 kernel: [238186.770971] Node 1 Normal free:52752kB
min:15748kB low:19684kB high:23620kB active_anon:12426308kB
inactive_anon:1263228kB active_file:8103016kB inactive_file:8141424kB
unevictable:40kB isolated(anon):0kB isolated(file):0kB
present:31457280kB managed:30963564kB mlocked:40kB dirty:528kB
writeback:0kB mapped:94512kB shmem:35940kB slab_reclaimable:509396kB
slab_unreclaimable:59248kB kernel_stack:18400kB pagetables:47440kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 6 04:36:34 vm1 kernel: [238186.770976] lowmem_reserve[]: 0 0 0 0 0
Jan 6 04:36:34 vm1 kernel: [238186.770979] Node 0 DMA: 1*4kB (U) 1*8kB
(U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB
(U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Jan 6 04:36:34 vm1 kernel: [238186.770991] Node 0 DMA32: 11568*4kB
(UMEH) 9135*8kB (UMEH) 13*16kB (H) 11*32kB (H) 5*64kB (H) 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 120232kB
Jan 6 04:36:34 vm1 kernel: [238186.771001] Node 0 Normal: 11978*4kB
(UME) 43*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 48256kB
Jan 6 04:36:34 vm1 kernel: [238186.771010] Node 1 Normal: 5833*4kB
(UMEH) 3694*8kB (UMEH) 6*16kB (H) 4*32kB (H) 5*64kB (H) 1*128kB (H)
1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 54836kB
Jan 6 04:36:34 vm1 kernel: [238186.771022] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 6 04:36:34 vm1 kernel: [238186.771024] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 6 04:36:34 vm1 kernel: [238186.771025] Node 1 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 6 04:36:34 vm1 kernel: [238186.771027] Node 1 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 6 04:36:34 vm1 kernel: [238186.771028] 7524772 total pagecache pages
Jan 6 04:36:34 vm1 kernel: [238186.771030] 4 pages in swap cache
Jan 6 04:36:34 vm1 kernel: [238186.771032] Swap cache stats: add 16,
delete 12, find 0/0
Jan 6 04:36:34 vm1 kernel: [238186.771033] Free swap = 7333500kB
Jan 6 04:36:34 vm1 kernel: [238186.771034] Total swap = 7333564kB
Jan 6 04:36:34 vm1 kernel: [238186.771036] 15726316 pages RAM
Jan 6 04:36:34 vm1 kernel: [238186.771037] 0 pages HighMem/MovableOnly
Jan 6 04:36:34 vm1 kernel: [238186.771038] 267042 pages reserved
Jan 6 04:36:34 vm1 kernel: [238186.771039] 0 pages cma reserved
Jan 6 04:36:34 vm1 kernel: [238186.771040] 0 pages hwpoisoned
Jan 6 04:36:34 vm1 kernel: [238186.771041] [ pid ] uid tgid total_vm
rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 6 04:36:34 vm1 kernel: [238186.771050] [ 504] 0 504 12338
5805 29 3 0 0 systemd-journal
Jan 6 04:36:34 vm1 kernel: [238186.771053] [ 512] 0 512 10415
877 23 3 0 -1000 systemd-udevd
Jan 6 04:36:34 vm1 kernel: [238186.771056] [ 987] 0 987 3378
613 12 3 0 0 mdadm
Jan 6 04:36:34 vm1 kernel: [238186.771059] [ 1546] 111 1546 25011
588 21 3 0 0 systemd-timesyn
Jan 6 04:36:34 vm1 kernel: [238186.771062] [ 1817] 0 1817 9270
662 23 3 0 0 rpcbind
Jan 6 04:36:34 vm1 kernel: [238186.771065] [ 1831] 0 1831 1272
392 8 3 0 0 iscsid
Jan 6 04:36:34 vm1 kernel: [238186.771067] [ 1832] 0 1832 1397
876 8 3 0 -17 iscsid
Jan 6 04:36:34 vm1 kernel: [238186.771070] [ 1840] 102 1840 9320
724 23 3 0 0 rpc.statd
Jan 6 04:36:34 vm1 kernel: [238186.771073] [ 1854] 0 1854 5839
49 16 3 0 0 rpc.idmapd
Jan 6 04:36:34 vm1 kernel: [238186.771075] [ 1916] 115 1916 1840
435 8 3 0 0 vnstatd
Jan 6 04:36:34 vm1 kernel: [238186.771078] [ 1917] 0 1917 6286
1042 18 3 0 0 smartd
Jan 6 04:36:34 vm1 kernel: [238186.771080] [ 1919] 0 1919 3790
481 13 3 0 0 cgmanager
Jan 6 04:36:34 vm1 kernel: [238186.771083] [ 1923] 0 1923 13796
1291 30 3 0 -1000 sshd
Jan 6 04:36:34 vm1 kernel: [238186.771085] [ 1924] 0 1924 4756
444 16 3 0 0 atd
Jan 6 04:36:34 vm1 kernel: [238186.771088] [ 1928] 0 1928 58730
506 18 3 0 0 lxcfs
Jan 6 04:36:34 vm1 kernel: [238186.771090] [ 1933] 0 1933 1022
174 7 3 0 -1000 watchdog-mux
Jan 6 04:36:34 vm1 kernel: [238186.771093] [ 1935] 0 1935 64668
950 29 3 0 0 rsyslogd
Jan 6 04:36:34 vm1 kernel: [238186.771096] [ 1945] 109 1945 10562
873 24 3 0 -900 dbus-daemon
Jan 6 04:36:34 vm1 kernel: [238186.771098] [ 1960] 0 1960 5054
640 15 3 0 0 ksmtuned
Jan 6 04:36:34 vm1 kernel: [238186.771101] [ 1965] 0 1965 7088
712 19 3 0 0 systemd-logind
Jan 6 04:36:34 vm1 kernel: [238186.771104] [ 2006] 0 2006 1064
388 8 3 0 0 acpid
Jan 6 04:36:34 vm1 kernel: [238186.771106] [ 2027] 103 2027 8346
994 21 3 0 0 ntpd
Jan 6 04:36:34 vm1 kernel: [238186.771108] [ 2030] 0 2030 3180
294 10 4 0 0 mcelog
Jan 6 04:36:34 vm1 kernel: [238186.771111] [ 2037] 0 2037 132815
908 55 4 0 0 rrdcached
Jan 6 04:36:34 vm1 kernel: [238186.771114] [ 2142] 0 2142 13160
3409 31 3 0 0 munin-node
Jan 6 04:36:34 vm1 kernel: [238186.771116] [ 2157] 0 2157 181910
38674 178 4 0 0 pmxcfs
Jan 6 04:36:34 vm1 kernel: [238186.771119] [ 2223] 0 2223 9042
956 23 3 0 0 master
Jan 6 04:36:34 vm1 kernel: [238186.771121] [ 2225] 104 2225 9599
998 24 3 0 0 qmgr
Jan 6 04:36:34 vm1 kernel: [238186.771124] [ 2246] 0 2246 61342
1860 116 3 0 0 winbindd
Jan 6 04:36:34 vm1 kernel: [238186.771126] [ 2247] 0 2247 49150
1342 93 3 0 0 nmbd
Jan 6 04:36:34 vm1 kernel: [238186.771129] [ 2249] 0 2249 62904
2340 122 3 0 0 winbindd
Jan 6 04:36:34 vm1 kernel: [238186.771131] [ 2263] 0 2263 70393
2993 136 3 0 0 smbd
Jan 6 04:36:34 vm1 kernel: [238186.771133] [ 2267] 0 2267 61342
1543 117 3 0 0 winbindd
Jan 6 04:36:34 vm1 kernel: [238186.771136] [ 2269] 0 2269 61342
1199 118 3 0 0 winbindd
Jan 6 04:36:34 vm1 kernel: [238186.771138] [ 2270] 0 2270 70393
1530 131 3 0 0 smbd
Jan 6 04:36:34 vm1 kernel: [238186.771141] [ 2322] 0 2322 6476
602 17 3 0 0 cron
Jan 6 04:36:34 vm1 kernel: [238186.771143] [ 2667] 0 2667 5011
698 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771146] [ 2729] 0 2729 99893
33778 150 4 0 0 ceph-mon
Jan 6 04:36:34 vm1 kernel: [238186.771148] [ 3495] 0 3495 63912
16854 120 3 0 0 pve-firewall
Jan 6 04:36:34 vm1 kernel: [238186.771151] [ 3516] 0 3516 63500
16626 121 3 0 0 pvestatd
Jan 6 04:36:34 vm1 kernel: [238186.771153] [ 3555] 0 3555 44711
7724 53 3 0 0 corosync
Jan 6 04:36:34 vm1 kernel: [238186.771156] [ 4092] 0 4092 5011
692 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771159] [ 4104] 0 4104 289927
52074 368 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771162] [ 4356] 0 4356 5011
690 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771164] [ 4367] 0 4367 438594
165825 659 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771167] [ 4559] 0 4559 5011
690 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771170] [ 4568] 0 4568 265170
59113 316 3 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771172] [ 4818] 0 4818 5011
690 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771175] [ 4827] 0 4827 276392
68121 337 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771178] [ 5066] 0 5066 5011
689 14 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771180] [ 5070] 0 5070 297925
63233 383 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771183] [ 5145] 0 5145 89494
22004 152 3 0 0 pvedaemon
Jan 6 04:36:34 vm1 kernel: [238186.771186] [ 5377] 0 5377 5011
689 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771188] [ 5384] 0 5384 278575
50903 343 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771191] [ 5718] 0 5718 5011
711 15 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771193] [ 5723] 0 5723 285856
52842 357 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771196] [ 5901] 0 5901 65792
18867 125 3 0 0 pve-ha-crm
Jan 6 04:36:34 vm1 kernel: [238186.771199] [ 5994] 0 5994 5011
688 16 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771201] [ 5998] 0 5998 445181
170112 677 4 0 0 ceph-osd
Jan 6 04:36:34 vm1 kernel: [238186.771204] [ 6321] 104 6321 10615
1350 25 3 0 0 tlsmgr
Jan 6 04:36:34 vm1 kernel: [238186.771207] [ 6424] 0 6424 65699
18720 123 3 0 0 pve-ha-lrm
Jan 6 04:36:34 vm1 kernel: [238186.771209] [ 7166] 0 7166 28072
1725 58 3 0 0 sshd
Jan 6 04:36:34 vm1 kernel: [238186.771212] [ 7168] 0 7168 6936
1015 19 3 0 0 systemd
Jan 6 04:36:34 vm1 kernel: [238186.771214] [ 7171] 0 7171 17662
768 38 3 0 0 (sd-pam)
Jan 6 04:36:34 vm1 kernel: [238186.771217] [ 7173] 0 7173 1066570
1359 96 7 0 0 console-kit-dae
Jan 6 04:36:34 vm1 kernel: [238186.771219] [ 7243] 0 7243 6339
1789 17 3 0 0 bash
Jan 6 04:36:34 vm1 kernel: [238186.771222] [31975] 0 31975 1364974
1068319 2553 8 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771224] [32375] 0 32375 2432979
2088834 4576 12 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771227] [ 655] 0 655 809393
520014 1453 6 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771229] [ 949] 0 949 854762
535517 1493 6 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771232] [ 3329] 0 3329 1077725
811812 1950 8 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771234] [ 9616] 0 9616 3021
274 7 3 0 0 3dm2
Jan 6 04:36:34 vm1 kernel: [238186.771236] [ 9694] 0 9694 3021
274 7 3 0 0 3dm2
Jan 6 04:36:34 vm1 kernel: [238186.771239] [ 9695] 0 9695 3021
274 7 3 0 0 3dm2
Jan 6 04:36:34 vm1 kernel: [238186.771241] [28700] 0 28700 1308128
1065894 2453 8 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771243] [29161] 0 29161 728046
442902 1310 6 0 0 kvm
Jan 6 04:36:34 vm1 kernel: [238186.771246] [14609] 0 14609 3604
469 12 3 0 0 agetty
Jan 6 04:36:34 vm1 kernel: [238186.771249] [26803] 33 26803 91263
22203 156 3 0 0 pveproxy
Jan 6 04:36:34 vm1 kernel: [238186.771251] [26832] 33 26832 90961
21909 154 3 0 0 spiceproxy
Jan 6 04:36:34 vm1 kernel: [238186.771254] [26833] 33 26833 91581
22842 157 3 0 0 spiceproxy work
Jan 6 04:36:34 vm1 kernel: [238186.771257] [26862] 0 26862 22988
429 14 3 0 0 pvefw-logger
Jan 6 04:36:34 vm1 kernel: [238186.771261] [16389] 0 16389 92120
24033 160 3 0 0 pvedaemon worke
Jan 6 04:36:34 vm1 kernel: [238186.771264] [22049] 0 22049 92120
24032 160 3 0 0 pvedaemon worke
Jan 6 04:36:34 vm1 kernel: [238186.771267] [25675] 0 25675 92120
24083 160 3 0 0 pvedaemon worke
Jan 6 04:36:34 vm1 kernel: [238186.771270] [27588] 33 27588 94012
24641 163 3 0 0 pveproxy worker
Jan 6 04:36:34 vm1 kernel: [238186.771273] [ 6338] 104 6338 9558
972 24 3 0 0 pickup
Jan 6 04:36:34 vm1 kernel: [238186.771276] [ 7267] 33 7267 93969
24499 163 3 0 0 pveproxy worker
Jan 6 04:36:34 vm1 kernel: [238186.771279] [11753] 0 11753 1059
168 7 3 0 0 sleep
Jan 6 04:36:34 vm1 kernel: [238186.771282] [11756] 33 11756 91850
23657 160 3 0 0 pveproxy worker
Jan 6 04:36:34 vm1 kernel: [238186.771284] [11757] 0 11757 41311
1924 41 3 0 0 rados
Jan 6 04:36:34 vm1 kernel: [238186.771286] Out of memory: Kill process
32375 (kvm) score 117 or sacrifice child
Jan 6 04:36:34 vm1 kernel: [238186.771515] Killed process 32375 (kvm)
total-vm:9731916kB, anon-rss:8335120kB, file-rss:20216kB
Jan 6 04:36:35 vm1 kernel: [238187.412710] vmbr0: port 3(tap153i0)
entered disabled state
Jan 6 04:36:35 vm1 kernel: [238187.413482] vmbr0: port 3(tap153i0)
entered disabled state
Short List of all ooms up to now so far up to now:
vm1:
Jan 6 04:36:34 vm1 kernel: [238186.770831] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 6 04:36:34 vm1 kernel: [238186.770837] rados cpuset=/ mems_allowed=0-1
...
Jan 6 04:36:34 vm1 kernel: [238186.771286] Out of memory: Kill process
32375 (kvm) score 117 or sacrifice child
Jan 6 04:36:34 vm1 kernel: [238186.771515] Killed process 32375 (kvm)
total-vm:9731916kB, anon-rss:8335120kB, file-rss:20216kB
vm1:
Jan 6 06:44:29 vm1 kernel: [245861.452966] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 6 06:44:29 vm1 kernel: [245861.452971] rados cpuset=/ mems_allowed=0-1
...
Jan 6 06:44:29 vm1 kernel: [245861.453452] Out of memory: Kill process
31975 (kvm) score 59 or sacrifice child
Jan 6 06:44:29 vm1 kernel: [245861.453627] Killed process 31975 (kvm)
total-vm:5459896kB, anon-rss:4245424kB, file-rss:22740kB
vm4:
Jan 7 01:06:01 vm4 kernel: [312533.458674] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 7 01:06:01 vm4 kernel: [312533.458681] rados cpuset=/ mems_allowed=0
...
Jan 7 01:06:01 vm4 kernel: [312533.459281] Out of memory: Kill process
2469 (ceph-osd) score 33 or sacrifice child
Jan 7 01:06:01 vm4 kernel: [312533.459718] Killed process 2469
(ceph-osd) total-vm:1991344kB, anon-rss:812984kB, file-rss:19644kB
Jan 7 01:06:02 vm4 bash[2458]: /bin/bash: Zeile 1: 2469 Getötet
/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c
/etc/ceph/ceph.conf --cluster ceph -f
Jan 7 01:06:02 vm4 systemd[1]: ceph-osd.1.1483435007.558822720.service:
main process exited, code=exited, status=137/n/a
vm2:
Jan 10 02:04:20 vm2 kernel: [559245.365789] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:04:20 vm2 kernel: [559245.365794] rados cpuset=/ mems_allowed=0-1
...
Jan 10 02:04:20 vm2 kernel: [559245.366165] Out of memory: Kill process
13864 (kvm) score 54 or sacrifice child
Jan 10 02:04:20 vm2 kernel: [559245.366346] Killed process 13864 (kvm)
total-vm:5297888kB, anon-rss:4111816kB, file-rss:21584kB
vm5:
Jan 10 02:26:56 vm5 kernel: [553973.070463] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:26:56 vm5 kernel: [553973.070466] rados cpuset=/ mems_allowed=0
...
Jan 10 02:26:56 vm5 kernel: [553973.070732] Out of memory: Kill process
24324 (kvm) score 29 or sacrifice child
Jan 10 02:26:56 vm5 kernel: [553973.070845] Killed process 24324 (kvm)
total-vm:1998440kB, anon-rss:1090016kB, file-rss:17268kB
vm5:
Jan 10 02:35:56 vm5 kernel: [554512.290864] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:35:56 vm5 kernel: [554512.290868] rados cpuset=/ mems_allowed=0
...
Jan 10 02:35:56 vm5 kernel: [554512.291167] Out of memory: Kill process
3779 (ceph-osd) score 25 or sacrifice child
Jan 10 02:35:56 vm5 kernel: [554512.291485] Killed process 3779
(ceph-osd) total-vm:2039244kB, anon-rss:924820kB, file-rss:17924kB
Jan 10 02:35:56 vm5 bash[3762]: /bin/bash: Zeile 1: 3779 Getötet
/usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c
/etc/ceph/ceph.conf --cluster ceph -f
Some config:
# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
26192G 16768G 9423G 35.98
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
sataRBD 1 2812G 21.47 5873G 731583
ssdpool 2 1836G 14.02 1362G 475776
# rados df
pool name KB objects clones degraded
unfound rd rd KB wr wr KB
sataRBD 2948940764 731583 0 18
0 107126632 5066939656 77093760 8107947980
ssdpool 1925252908 475776 0 0
0 255856168 13891414666 507302782 4787138165
total used 9878935752 1207359
total avail 17585444468
total space 27464380220
# ceph -s
cluster 2b1be149-adc0-4b6f-a2bc-80d19c613c71
health HEALTH_OK
monmap e5: 5 mons at
{0=192.168.10.3:6789/0,1=192.168.10.4:6789/0,2=192.168.10.5:6789/0,3=192.168.10.10:6789/0,4=192.168.10.200:6789/0}
election epoch 1434, quorum 0,1,2,3,4 0,1,2,3,4
osdmap e11494: 29 osds: 26 up, 25 in
pgmap v4737983: 1280 pgs, 2 pools, 4648 GB data, 1179 kobjects
9310 GB used, 16881 GB / 26192 GB avail
1280 active+clean
client io 50647 B/s rd, 604 kB/s wr, 113 op/s
# pveversion -v
proxmox-ve: 4.4-77 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
ceph: 0.94.9-1~bpo80+1
More information about the pve-user
mailing list