[PVE-User] A less aggressive OOM?

Mon Jul 7 23:39:34 CEST 2025

Hi,

I would start by analyzing the memory status at the time of the OOM. 
There should be a some lines in journal/syslog were the kernel writes 
what the memory looked like and you can figure out why it had to kill a 
process.

Makes few sense that OOM triggers in 64GB hosts with just 24GB 
configured in VMs and, probably, less real usage. IMHO it's not VMs what 
fill your memory up to the point of OOM, but some other process, ZFS 
ARC, maybe even some mem leak. Maybe some process is producing severe 
memory fragmentation.

Regards,

On 7/7/25 11:26, Marco Gaiarin wrote:
> We have upgraded a set of clusters from PVE6 to PVE8, and we have found that
> in newer kernels, OOM is a bit more 'aggressive' and sometime kill a VMs.
>
> Nodes have plently of RAM (64GB, VMs are 2-3, each 8GB ram), VMs have qemu
> agent installed and ballooning enabled, but still sometime OOM happen.
> Clearly, if get OOM the main VMs that have the local DNS, we get some
> trouble.
>
>
> I've looked in PVE wiki, but found nothing. There's some way to relax OOM,
> or control their behaviour?
>
> In nodes there's no swap, so probably the best thing to do (but the hardest
> one ;-) is to setup some swap with a lower swappiness, but i'm seeking
> feedback.
>
>
> Thanks.
>
--