[PVE-User] Memory Hotplug Woes
Matt Corallo
lpxdfsfs at mattcorallo.com
Sat Apr 12 19:43:49 CEST 2025
The Memory Hot Unplug section on the wiki at
https://pve.proxmox.com/wiki/Hotplug_(qemu_disk,nic,cpu,memory)#Memory_Hot_Unplug is somewhat
outdated but subtly dangerous.
It still refers to the now-removed CONFIG_MOVABLE_NODE, but its now available at least on Debian
stable as of two years ago. But, more importantly, just blindly setting movable_node leads to system
instability if the VM has a nontrivial amount of memory.
eg I had a stock debian VM running postgresql with 16GB RAM with "memhp_default_state=online
movable_node" added to the kernel command line and the result was that the OOM killer got invoked
regularly when postgres pulled lots of disk into filesystem caches all at once for large queries,
despite there being a GB or two of available memory. After chatting with the #mm folks on OFTC they
pointed out that, yes, this is generally expected behavior because the result will be zero memory in
the Normal zone, forcing all kernel allocations aside from pagecache into the bottom 1GiB of RAM
which can easily run out and lead to OOM kills.
In fact, movable_node's documentation even says "This means that the memory of such nodes will be
usable only for movable allocations which rules out almost all kernel allocations. Use with caution!"
The hotplug guide at https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html
suggests a better option as memory_hotplug.online_policy=auto-movable which keeps a sought ratio
between movable and normal zones, fixing the issue, but sadly proxmox doesn't handle it with the
automagic hotplug. Sadly, linux (at least 6.12.21) doesn't pick the last dimms to make movable when
using auto-movable, but rather (sometimes?) picks ones in the middle of the range (eg the VM I'm
looking at is making memory zones 32-67/183 movable, with the rest Normal/DMA32). If I go in the
qemu monitor and device_del DIMMs in the lower range they get removed fine, however.
It seems like the wiki should be updated to mention the drawbacks of `movable_node` and ideally the
auto-hotunplug logic should try more than just the highest dimm and `online_policy` mentioned in the
wiki.
Thanks,
Matt
More information about the pve-user
mailing list