[PVE-User] [OT?] OOM...

Falko Trojahn trojahn+proxmox at pluspol.info
Tue Jan 10 14:00:35 CET 2017


Hello Marco,

did you ever find out more about your OOMs?

Hello all,

I'd like to get some idea what we can do here.

Since last pve updates last week (no idea if related or not) we get OOMs
sometimes during the night. We have 5 proxmox nodes with ceph and kvms,
3 nodes are servers with Supermicro Boards with >=60 GB RAM, two are
only for transition process from old Proxmox 3.x to new 4.x cluster,
Asus P6T6 Boards with 12GB (no kvms) and 24GB which will be sorted out
later if possible.

When we first noticed the oom, two kvm processes were killed one after
another, now at least two times a ceph osd process was involved
(see lists / syslog excerpts further down.

Our munin graphs never show memory shortages at the time of the ooms,
seems plenty of RAM available.

So why does rados kill the process with the most memory, and how
can this be prevented?

If more info about our config is needed, please ask.

Many thanks in advance
and best regards
Falko



Full output of the first oom:


Jan  6 04:04:49 vm1 pvedaemon[714]: worker exit
Jan  6 04:36:34 vm1 kernel: [238186.770831] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan  6 04:36:34 vm1 kernel: [238186.770837] rados cpuset=/ mems_allowed=0-1
Jan  6 04:36:34 vm1 kernel: [238186.770845] CPU: 6 PID: 11757 Comm:
rados Tainted: G           O    4.4.35-1-pve #1
Jan  6 04:36:34 vm1 kernel: [238186.770847] Hardware name: Supermicro
X8DT3/X8DT3, BIOS 2.0c    08/30/2011
Jan  6 04:36:34 vm1 kernel: [238186.770849]  0000000000000286
00000000b47f489a ffff8800b249bb50 ffffffff813f9743
Jan  6 04:36:34 vm1 kernel: [238186.770852]  ffff8800b249bd40
0000000000000000 ffff8800b249bbb8 ffffffff8120adcb
Jan  6 04:36:34 vm1 kernel: [238186.770855]  ffff880f3f69ae10
ffffea001f005880 0000000100000001 0000000000000000
Jan  6 04:36:34 vm1 kernel: [238186.770858] Call Trace:
Jan  6 04:36:34 vm1 kernel: [238186.770871]  [<ffffffff813f9743>]
dump_stack+0x63/0x90
Jan  6 04:36:34 vm1 kernel: [238186.770877]  [<ffffffff8120adcb>]
dump_header+0x67/0x1d5
Jan  6 04:36:34 vm1 kernel: [238186.770883]  [<ffffffff811925c5>]
oom_kill_process+0x205/0x3c0
Jan  6 04:36:34 vm1 kernel: [238186.770886]  [<ffffffff81192a17>]
out_of_memory+0x237/0x4a0
Jan  6 04:36:34 vm1 kernel: [238186.770891]  [<ffffffff81198d0e>]
__alloc_pages_nodemask+0xcee/0xe20
Jan  6 04:36:34 vm1 kernel: [238186.770894]  [<ffffffff81198e8b>]
alloc_kmem_pages_node+0x4b/0xd0
Jan  6 04:36:34 vm1 kernel: [238186.770901]  [<ffffffff8107f053>]
copy_process+0x1c3/0x1c00
Jan  6 04:36:34 vm1 kernel: [238186.770907]  [<ffffffff8119fa37>] ?
lru_cache_add_active_or_unevictable+0x27/0xa0
Jan  6 04:36:34 vm1 kernel: [238186.770910]  [<ffffffff811c24c9>] ?
handle_mm_fault+0xdb9/0x19c0
Jan  6 04:36:34 vm1 kernel: [238186.770914]  [<ffffffff811c743f>] ?
__split_vma.isra.31+0x1cf/0x1f0
Jan  6 04:36:34 vm1 kernel: [238186.770917]  [<ffffffff81080c20>]
_do_fork+0x80/0x360
Jan  6 04:36:34 vm1 kernel: [238186.770920]  [<ffffffff81080fa9>]
SyS_clone+0x19/0x20
Jan  6 04:36:34 vm1 kernel: [238186.770926]  [<ffffffff8185c276>]
entry_SYSCALL_64_fastpath+0x16/0x75
Jan  6 04:36:34 vm1 kernel: [238186.770928] Mem-Info:
Jan  6 04:36:34 vm1 kernel: [238186.770937] active_anon:6659112
inactive_anon:714162 isolated_anon:0
Jan  6 04:36:34 vm1 kernel: [238186.770937]  active_file:3739306
inactive_file:3764944 isolated_file:0
Jan  6 04:36:34 vm1 kernel: [238186.770937]  unevictable:880 dirty:167
writeback:0 unstable:0
Jan  6 04:36:34 vm1 kernel: [238186.770937]  slab_reclaimable:282429
slab_unreclaimable:30118
Jan  6 04:36:34 vm1 kernel: [238186.770937]  mapped:41400 shmem:19954
pagetables:23536 bounce:0
Jan  6 04:36:34 vm1 kernel: [238186.770937]  free:58861 free_pcp:0
free_cma:0
Jan  6 04:36:34 vm1 kernel: [238186.770941] Node 0 DMA free:15884kB
min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
Jan  6 04:36:34 vm1 kernel: [238186.770948] lowmem_reserve[]: 0 2941
30107 30107 30107
Jan  6 04:36:34 vm1 kernel: [238186.770952] Node 0 DMA32 free:120148kB
min:1532kB low:1912kB high:2296kB active_anon:1184692kB
inactive_anon:395524kB active_file:529956kB inactive_file:540236kB
unevictable:576kB isolated(anon):0kB isolated(file):0kB
present:3120448kB managed:3039560kB mlocked:576kB dirty:4kB
writeback:0kB mapped:7204kB shmem:4292kB slab_reclaimable:225644kB
slab_unreclaimable:11816kB kernel_stack:3120kB pagetables:3472kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan  6 04:36:34 vm1 kernel: [238186.770958] lowmem_reserve[]: 0 0 27166
27166 27166
Jan  6 04:36:34 vm1 kernel: [238186.770961] Node 0 Normal free:46660kB
min:14148kB low:17684kB high:21220kB active_anon:13025448kB
inactive_anon:1197896kB active_file:6324252kB inactive_file:6378116kB
unevictable:2904kB isolated(anon):0kB isolated(file):0kB
present:28311552kB managed:27818072kB mlocked:2904kB dirty:136kB
writeback:0kB mapped:63884kB shmem:39584kB slab_reclaimable:394676kB
slab_unreclaimable:49392kB kernel_stack:22608kB pagetables:43232kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan  6 04:36:34 vm1 kernel: [238186.770967] lowmem_reserve[]: 0 0 0 0 0
Jan  6 04:36:34 vm1 kernel: [238186.770971] Node 1 Normal free:52752kB
min:15748kB low:19684kB high:23620kB active_anon:12426308kB
inactive_anon:1263228kB active_file:8103016kB inactive_file:8141424kB
unevictable:40kB isolated(anon):0kB isolated(file):0kB
present:31457280kB managed:30963564kB mlocked:40kB dirty:528kB
writeback:0kB mapped:94512kB shmem:35940kB slab_reclaimable:509396kB
slab_unreclaimable:59248kB kernel_stack:18400kB pagetables:47440kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan  6 04:36:34 vm1 kernel: [238186.770976] lowmem_reserve[]: 0 0 0 0 0
Jan  6 04:36:34 vm1 kernel: [238186.770979] Node 0 DMA: 1*4kB (U) 1*8kB
(U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB
(U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Jan  6 04:36:34 vm1 kernel: [238186.770991] Node 0 DMA32: 11568*4kB
(UMEH) 9135*8kB (UMEH) 13*16kB (H) 11*32kB (H) 5*64kB (H) 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 120232kB
Jan  6 04:36:34 vm1 kernel: [238186.771001] Node 0 Normal: 11978*4kB
(UME) 43*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 48256kB
Jan  6 04:36:34 vm1 kernel: [238186.771010] Node 1 Normal: 5833*4kB
(UMEH) 3694*8kB (UMEH) 6*16kB (H) 4*32kB (H) 5*64kB (H) 1*128kB (H)
1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 54836kB
Jan  6 04:36:34 vm1 kernel: [238186.771022] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan  6 04:36:34 vm1 kernel: [238186.771024] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan  6 04:36:34 vm1 kernel: [238186.771025] Node 1 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan  6 04:36:34 vm1 kernel: [238186.771027] Node 1 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan  6 04:36:34 vm1 kernel: [238186.771028] 7524772 total pagecache pages
Jan  6 04:36:34 vm1 kernel: [238186.771030] 4 pages in swap cache
Jan  6 04:36:34 vm1 kernel: [238186.771032] Swap cache stats: add 16,
delete 12, find 0/0
Jan  6 04:36:34 vm1 kernel: [238186.771033] Free swap  = 7333500kB
Jan  6 04:36:34 vm1 kernel: [238186.771034] Total swap = 7333564kB
Jan  6 04:36:34 vm1 kernel: [238186.771036] 15726316 pages RAM
Jan  6 04:36:34 vm1 kernel: [238186.771037] 0 pages HighMem/MovableOnly
Jan  6 04:36:34 vm1 kernel: [238186.771038] 267042 pages reserved
Jan  6 04:36:34 vm1 kernel: [238186.771039] 0 pages cma reserved
Jan  6 04:36:34 vm1 kernel: [238186.771040] 0 pages hwpoisoned
Jan  6 04:36:34 vm1 kernel: [238186.771041] [ pid ]   uid  tgid total_vm
     rss nr_ptes nr_pmds swapents oom_score_adj name
Jan  6 04:36:34 vm1 kernel: [238186.771050] [  504]     0   504    12338
    5805      29       3        0             0 systemd-journal
Jan  6 04:36:34 vm1 kernel: [238186.771053] [  512]     0   512    10415
     877      23       3        0         -1000 systemd-udevd
Jan  6 04:36:34 vm1 kernel: [238186.771056] [  987]     0   987     3378
     613      12       3        0             0 mdadm
Jan  6 04:36:34 vm1 kernel: [238186.771059] [ 1546]   111  1546    25011
     588      21       3        0             0 systemd-timesyn
Jan  6 04:36:34 vm1 kernel: [238186.771062] [ 1817]     0  1817     9270
     662      23       3        0             0 rpcbind
Jan  6 04:36:34 vm1 kernel: [238186.771065] [ 1831]     0  1831     1272
     392       8       3        0             0 iscsid
Jan  6 04:36:34 vm1 kernel: [238186.771067] [ 1832]     0  1832     1397
     876       8       3        0           -17 iscsid
Jan  6 04:36:34 vm1 kernel: [238186.771070] [ 1840]   102  1840     9320
     724      23       3        0             0 rpc.statd
Jan  6 04:36:34 vm1 kernel: [238186.771073] [ 1854]     0  1854     5839
      49      16       3        0             0 rpc.idmapd
Jan  6 04:36:34 vm1 kernel: [238186.771075] [ 1916]   115  1916     1840
     435       8       3        0             0 vnstatd
Jan  6 04:36:34 vm1 kernel: [238186.771078] [ 1917]     0  1917     6286
    1042      18       3        0             0 smartd
Jan  6 04:36:34 vm1 kernel: [238186.771080] [ 1919]     0  1919     3790
     481      13       3        0             0 cgmanager
Jan  6 04:36:34 vm1 kernel: [238186.771083] [ 1923]     0  1923    13796
    1291      30       3        0         -1000 sshd
Jan  6 04:36:34 vm1 kernel: [238186.771085] [ 1924]     0  1924     4756
     444      16       3        0             0 atd
Jan  6 04:36:34 vm1 kernel: [238186.771088] [ 1928]     0  1928    58730
     506      18       3        0             0 lxcfs
Jan  6 04:36:34 vm1 kernel: [238186.771090] [ 1933]     0  1933     1022
     174       7       3        0         -1000 watchdog-mux
Jan  6 04:36:34 vm1 kernel: [238186.771093] [ 1935]     0  1935    64668
     950      29       3        0             0 rsyslogd
Jan  6 04:36:34 vm1 kernel: [238186.771096] [ 1945]   109  1945    10562
     873      24       3        0          -900 dbus-daemon
Jan  6 04:36:34 vm1 kernel: [238186.771098] [ 1960]     0  1960     5054
     640      15       3        0             0 ksmtuned
Jan  6 04:36:34 vm1 kernel: [238186.771101] [ 1965]     0  1965     7088
     712      19       3        0             0 systemd-logind
Jan  6 04:36:34 vm1 kernel: [238186.771104] [ 2006]     0  2006     1064
     388       8       3        0             0 acpid
Jan  6 04:36:34 vm1 kernel: [238186.771106] [ 2027]   103  2027     8346
     994      21       3        0             0 ntpd
Jan  6 04:36:34 vm1 kernel: [238186.771108] [ 2030]     0  2030     3180
     294      10       4        0             0 mcelog
Jan  6 04:36:34 vm1 kernel: [238186.771111] [ 2037]     0  2037   132815
     908      55       4        0             0 rrdcached
Jan  6 04:36:34 vm1 kernel: [238186.771114] [ 2142]     0  2142    13160
    3409      31       3        0             0 munin-node
Jan  6 04:36:34 vm1 kernel: [238186.771116] [ 2157]     0  2157   181910
   38674     178       4        0             0 pmxcfs
Jan  6 04:36:34 vm1 kernel: [238186.771119] [ 2223]     0  2223     9042
     956      23       3        0             0 master
Jan  6 04:36:34 vm1 kernel: [238186.771121] [ 2225]   104  2225     9599
     998      24       3        0             0 qmgr
Jan  6 04:36:34 vm1 kernel: [238186.771124] [ 2246]     0  2246    61342
    1860     116       3        0             0 winbindd
Jan  6 04:36:34 vm1 kernel: [238186.771126] [ 2247]     0  2247    49150
    1342      93       3        0             0 nmbd
Jan  6 04:36:34 vm1 kernel: [238186.771129] [ 2249]     0  2249    62904
    2340     122       3        0             0 winbindd
Jan  6 04:36:34 vm1 kernel: [238186.771131] [ 2263]     0  2263    70393
    2993     136       3        0             0 smbd
Jan  6 04:36:34 vm1 kernel: [238186.771133] [ 2267]     0  2267    61342
    1543     117       3        0             0 winbindd
Jan  6 04:36:34 vm1 kernel: [238186.771136] [ 2269]     0  2269    61342
    1199     118       3        0             0 winbindd
Jan  6 04:36:34 vm1 kernel: [238186.771138] [ 2270]     0  2270    70393
    1530     131       3        0             0 smbd
Jan  6 04:36:34 vm1 kernel: [238186.771141] [ 2322]     0  2322     6476
     602      17       3        0             0 cron
Jan  6 04:36:34 vm1 kernel: [238186.771143] [ 2667]     0  2667     5011
     698      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771146] [ 2729]     0  2729    99893
   33778     150       4        0             0 ceph-mon
Jan  6 04:36:34 vm1 kernel: [238186.771148] [ 3495]     0  3495    63912
   16854     120       3        0             0 pve-firewall
Jan  6 04:36:34 vm1 kernel: [238186.771151] [ 3516]     0  3516    63500
   16626     121       3        0             0 pvestatd
Jan  6 04:36:34 vm1 kernel: [238186.771153] [ 3555]     0  3555    44711
    7724      53       3        0             0 corosync
Jan  6 04:36:34 vm1 kernel: [238186.771156] [ 4092]     0  4092     5011
     692      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771159] [ 4104]     0  4104   289927
   52074     368       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771162] [ 4356]     0  4356     5011
     690      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771164] [ 4367]     0  4367   438594
  165825     659       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771167] [ 4559]     0  4559     5011
     690      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771170] [ 4568]     0  4568   265170
   59113     316       3        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771172] [ 4818]     0  4818     5011
     690      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771175] [ 4827]     0  4827   276392
   68121     337       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771178] [ 5066]     0  5066     5011
     689      14       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771180] [ 5070]     0  5070   297925
   63233     383       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771183] [ 5145]     0  5145    89494
   22004     152       3        0             0 pvedaemon
Jan  6 04:36:34 vm1 kernel: [238186.771186] [ 5377]     0  5377     5011
     689      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771188] [ 5384]     0  5384   278575
   50903     343       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771191] [ 5718]     0  5718     5011
     711      15       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771193] [ 5723]     0  5723   285856
   52842     357       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771196] [ 5901]     0  5901    65792
   18867     125       3        0             0 pve-ha-crm
Jan  6 04:36:34 vm1 kernel: [238186.771199] [ 5994]     0  5994     5011
     688      16       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771201] [ 5998]     0  5998   445181
  170112     677       4        0             0 ceph-osd
Jan  6 04:36:34 vm1 kernel: [238186.771204] [ 6321]   104  6321    10615
    1350      25       3        0             0 tlsmgr
Jan  6 04:36:34 vm1 kernel: [238186.771207] [ 6424]     0  6424    65699
   18720     123       3        0             0 pve-ha-lrm
Jan  6 04:36:34 vm1 kernel: [238186.771209] [ 7166]     0  7166    28072
    1725      58       3        0             0 sshd
Jan  6 04:36:34 vm1 kernel: [238186.771212] [ 7168]     0  7168     6936
    1015      19       3        0             0 systemd
Jan  6 04:36:34 vm1 kernel: [238186.771214] [ 7171]     0  7171    17662
     768      38       3        0             0 (sd-pam)
Jan  6 04:36:34 vm1 kernel: [238186.771217] [ 7173]     0  7173  1066570
    1359      96       7        0             0 console-kit-dae
Jan  6 04:36:34 vm1 kernel: [238186.771219] [ 7243]     0  7243     6339
    1789      17       3        0             0 bash
Jan  6 04:36:34 vm1 kernel: [238186.771222] [31975]     0 31975  1364974
 1068319    2553       8        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771224] [32375]     0 32375  2432979
 2088834    4576      12        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771227] [  655]     0   655   809393
  520014    1453       6        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771229] [  949]     0   949   854762
  535517    1493       6        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771232] [ 3329]     0  3329  1077725
  811812    1950       8        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771234] [ 9616]     0  9616     3021
     274       7       3        0             0 3dm2
Jan  6 04:36:34 vm1 kernel: [238186.771236] [ 9694]     0  9694     3021
     274       7       3        0             0 3dm2
Jan  6 04:36:34 vm1 kernel: [238186.771239] [ 9695]     0  9695     3021
     274       7       3        0             0 3dm2
Jan  6 04:36:34 vm1 kernel: [238186.771241] [28700]     0 28700  1308128
 1065894    2453       8        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771243] [29161]     0 29161   728046
  442902    1310       6        0             0 kvm
Jan  6 04:36:34 vm1 kernel: [238186.771246] [14609]     0 14609     3604
     469      12       3        0             0 agetty
Jan  6 04:36:34 vm1 kernel: [238186.771249] [26803]    33 26803    91263
   22203     156       3        0             0 pveproxy
Jan  6 04:36:34 vm1 kernel: [238186.771251] [26832]    33 26832    90961
   21909     154       3        0             0 spiceproxy
Jan  6 04:36:34 vm1 kernel: [238186.771254] [26833]    33 26833    91581
   22842     157       3        0             0 spiceproxy work
Jan  6 04:36:34 vm1 kernel: [238186.771257] [26862]     0 26862    22988
     429      14       3        0             0 pvefw-logger
Jan  6 04:36:34 vm1 kernel: [238186.771261] [16389]     0 16389    92120
   24033     160       3        0             0 pvedaemon worke
Jan  6 04:36:34 vm1 kernel: [238186.771264] [22049]     0 22049    92120
   24032     160       3        0             0 pvedaemon worke
Jan  6 04:36:34 vm1 kernel: [238186.771267] [25675]     0 25675    92120
   24083     160       3        0             0 pvedaemon worke
Jan  6 04:36:34 vm1 kernel: [238186.771270] [27588]    33 27588    94012
   24641     163       3        0             0 pveproxy worker
Jan  6 04:36:34 vm1 kernel: [238186.771273] [ 6338]   104  6338     9558
     972      24       3        0             0 pickup
Jan  6 04:36:34 vm1 kernel: [238186.771276] [ 7267]    33  7267    93969
   24499     163       3        0             0 pveproxy worker
Jan  6 04:36:34 vm1 kernel: [238186.771279] [11753]     0 11753     1059
     168       7       3        0             0 sleep
Jan  6 04:36:34 vm1 kernel: [238186.771282] [11756]    33 11756    91850
   23657     160       3        0             0 pveproxy worker
Jan  6 04:36:34 vm1 kernel: [238186.771284] [11757]     0 11757    41311
    1924      41       3        0             0 rados
Jan  6 04:36:34 vm1 kernel: [238186.771286] Out of memory: Kill process
32375 (kvm) score 117 or sacrifice child
Jan  6 04:36:34 vm1 kernel: [238186.771515] Killed process 32375 (kvm)
total-vm:9731916kB, anon-rss:8335120kB, file-rss:20216kB
Jan  6 04:36:35 vm1 kernel: [238187.412710] vmbr0: port 3(tap153i0)
entered disabled state
Jan  6 04:36:35 vm1 kernel: [238187.413482] vmbr0: port 3(tap153i0)
entered disabled state



Short List of all ooms up to now so far up to now:



vm1:
Jan  6 04:36:34 vm1 kernel: [238186.770831] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan  6 04:36:34 vm1 kernel: [238186.770837] rados cpuset=/ mems_allowed=0-1
...
Jan  6 04:36:34 vm1 kernel: [238186.771286] Out of memory: Kill process
32375 (kvm) score 117 or sacrifice child
Jan  6 04:36:34 vm1 kernel: [238186.771515] Killed process 32375 (kvm)
total-vm:9731916kB, anon-rss:8335120kB, file-rss:20216kB

vm1:
Jan  6 06:44:29 vm1 kernel: [245861.452966] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan  6 06:44:29 vm1 kernel: [245861.452971] rados cpuset=/ mems_allowed=0-1
...
Jan  6 06:44:29 vm1 kernel: [245861.453452] Out of memory: Kill process
31975 (kvm) score 59 or sacrifice child
Jan  6 06:44:29 vm1 kernel: [245861.453627] Killed process 31975 (kvm)
total-vm:5459896kB, anon-rss:4245424kB, file-rss:22740kB

vm4:
Jan  7 01:06:01 vm4 kernel: [312533.458674] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan  7 01:06:01 vm4 kernel: [312533.458681] rados cpuset=/ mems_allowed=0
...
Jan  7 01:06:01 vm4 kernel: [312533.459281] Out of memory: Kill process
2469 (ceph-osd) score 33 or sacrifice child
Jan  7 01:06:01 vm4 kernel: [312533.459718] Killed process 2469
(ceph-osd) total-vm:1991344kB, anon-rss:812984kB, file-rss:19644kB
Jan  7 01:06:02 vm4 bash[2458]: /bin/bash: Zeile 1:  2469 Getötet
         /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c
/etc/ceph/ceph.conf --cluster ceph -f
Jan  7 01:06:02 vm4 systemd[1]: ceph-osd.1.1483435007.558822720.service:
main process exited, code=exited, status=137/n/a

vm2:
Jan 10 02:04:20 vm2 kernel: [559245.365789] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:04:20 vm2 kernel: [559245.365794] rados cpuset=/ mems_allowed=0-1
...
Jan 10 02:04:20 vm2 kernel: [559245.366165] Out of memory: Kill process
13864 (kvm) score 54 or sacrifice child
Jan 10 02:04:20 vm2 kernel: [559245.366346] Killed process 13864 (kvm)
total-vm:5297888kB, anon-rss:4111816kB, file-rss:21584kB

vm5:
Jan 10 02:26:56 vm5 kernel: [553973.070463] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:26:56 vm5 kernel: [553973.070466] rados cpuset=/ mems_allowed=0
...
Jan 10 02:26:56 vm5 kernel: [553973.070732] Out of memory: Kill process
24324 (kvm) score 29 or sacrifice child
Jan 10 02:26:56 vm5 kernel: [553973.070845] Killed process 24324 (kvm)
total-vm:1998440kB, anon-rss:1090016kB, file-rss:17268kB

vm5:
Jan 10 02:35:56 vm5 kernel: [554512.290864] rados invoked oom-killer:
gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 10 02:35:56 vm5 kernel: [554512.290868] rados cpuset=/ mems_allowed=0
...
Jan 10 02:35:56 vm5 kernel: [554512.291167] Out of memory: Kill process
3779 (ceph-osd) score 25 or sacrifice child
Jan 10 02:35:56 vm5 kernel: [554512.291485] Killed process 3779
(ceph-osd) total-vm:2039244kB, anon-rss:924820kB, file-rss:17924kB
Jan 10 02:35:56 vm5 bash[3762]: /bin/bash: Zeile 1:  3779 Getötet
         /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c
/etc/ceph/ceph.conf --cluster ceph -f



Some config:

# ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    26192G     16768G        9423G         35.98
POOLS:
    NAME        ID     USED      %USED     MAX AVAIL     OBJECTS
    sataRBD     1      2812G     21.47         5873G      731583
    ssdpool     2      1836G     14.02         1362G      475776

# rados df
pool name                 KB      objects       clones     degraded
unfound           rd        rd KB           wr        wr KB
sataRBD           2948940764       731583            0           18
     0    107126632   5066939656     77093760   8107947980
ssdpool           1925252908       475776            0            0
     0    255856168  13891414666    507302782   4787138165
  total used      9878935752      1207359
  total avail    17585444468
  total space    27464380220


# ceph -s
    cluster 2b1be149-adc0-4b6f-a2bc-80d19c613c71
     health HEALTH_OK
     monmap e5: 5 mons at
{0=192.168.10.3:6789/0,1=192.168.10.4:6789/0,2=192.168.10.5:6789/0,3=192.168.10.10:6789/0,4=192.168.10.200:6789/0}
            election epoch 1434, quorum 0,1,2,3,4 0,1,2,3,4
     osdmap e11494: 29 osds: 26 up, 25 in
      pgmap v4737983: 1280 pgs, 2 pools, 4648 GB data, 1179 kobjects
            9310 GB used, 16881 GB / 26192 GB avail
                1280 active+clean
  client io 50647 B/s rd, 604 kB/s wr, 113 op/s


# pveversion -v
proxmox-ve: 4.4-77 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
ceph: 0.94.9-1~bpo80+1






More information about the pve-user mailing list