[pve-devel] 3 numa topology issues

Wolfgang Bumiller w.bumiller at proxmox.com
Thu Jul 28 09:24:58 CEST 2016

On Thu, Jul 28, 2016 at 08:44:47AM +0200, Alexandre DERUMIER wrote:
> I'm looking at openstack implementation
> https://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
> and it seem that they check if host numa nodes exist too
> "hw:numa_nodes=NN - numa of NUMA nodes to expose to the guest.
> The most common case will be that the admin only sets ‘hw:numa_nodes’ and then the flavor vCPUs and RAM will be divided equally across the NUMA nodes.
> "
> This is what we are doing with numa:1.  (we use sockets to known how many numa nodes we need)
> " So, given an example config:
> vcpus=8
> mem=4
> hw:numa_nodes=2 - numa of NUMA nodes to expose to the guest.
> hw:numa_cpus.0=0,1,2,3,4,5
> hw:numa_cpus.1=6,7
> hw:numa_mem.0=3072
> hw:numa_mem.1=1024
> The scheduler will look for a host with 2 NUMA nodes with the ability to run 6 CPUs + 3 GB of RAM on one node, and 2 CPUS + 1 GB of RAM on another node. If a host has a single NUMA node with capability to run 8 CPUs and 4 GB of RAM it will not be considered a valid match.
> "
> So, if host don't have enough numa nodes, it's invalid

This is the equivalent for a custom topology, there it's perfectly fine
to throw an error, and that's a different `die` statement from the one
I want to remove in our code, too.

In our configuration `numa` is just a boolean, not a count like in the
above example, so IMO if no topology is defined but numa enabled we
should just let qemu do its thing, which is the behavior we used to have
before hugepages.

So in order to restore the old behavior I'd like to apply the following
patch, note that the very same check still exists in the `numaX` entry
loop further up in the code.

>From da9b76607c5dbb12477976117c6f91cbc127f992 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller at proxmox.com>
Date: Wed, 27 Jul 2016 09:05:57 +0200
Subject: [PATCH qemu-server] memory: don't restrict sockets to the number of
 host numa nodes

Removes an error for when there is no custom numa topology
defined and there are more virtual sockets defined than
host numa nodes available.
 PVE/QemuServer/Memory.pm | 2 --
 1 file changed, 2 deletions(-)

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
index 047ddad..fec447a 100644
--- a/PVE/QemuServer/Memory.pm
+++ b/PVE/QemuServer/Memory.pm
@@ -263,8 +263,6 @@ sub config {
 	    my $numa_memory = ($static_memory / $sockets);
 	    for (my $i = 0; $i < $sockets; $i++)  {
-		die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/";
 		my $cpustart = ($cores * $i);
 		my $cpuend = ($cpustart + $cores - 1) if $cores && $cores > 1;
 		my $cpus = $cpustart;

More information about the pve-devel mailing list