[pve-devel] 3 numa topology issues

Thu Jul 28 12:45:56 CEST 2016

>>In our configuration `numa` is just a boolean, not a count like in the
>>above example, so IMO if no topology is defined but numa enabled we
>>should just let qemu do its thing, which is the behavior we used to have
>>before hugepages.
>>
>>So in order to restore the old behavior I'd like to apply the following
>>patch, note that the very same check still exists in the `numaX` entry
>>loop further up in the code.

Ok, I thinked we already check that before hugepages. 
(I really don't know the behaviour (performance) of auto numa balacing on host when guest have more numa nodes).

But I think we still need the check if hugepages are enabled.


something like:

- die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/"; 
+ die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/" && $conf->{hugepages);

----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 28 Juillet 2016 09:24:58
Objet: Re: [pve-devel] 3 numa topology issues

On Thu, Jul 28, 2016 at 08:44:47AM +0200, Alexandre DERUMIER wrote: 
> I'm looking at openstack implementation 
> 
> https://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html 
> 
> and it seem that they check if host numa nodes exist too 
> 
> 
> "hw:numa_nodes=NN - numa of NUMA nodes to expose to the guest. 
> The most common case will be that the admin only sets ‘hw:numa_nodes’ and then the flavor vCPUs and RAM will be divided equally across the NUMA nodes. 
> " 
> 
> This is what we are doing with numa:1. (we use sockets to known how many numa nodes we need) 
> 
> 
> " So, given an example config: 
> 
> vcpus=8 
> mem=4 
> hw:numa_nodes=2 - numa of NUMA nodes to expose to the guest. 
> hw:numa_cpus.0=0,1,2,3,4,5 
> hw:numa_cpus.1=6,7 
> hw:numa_mem.0=3072 
> hw:numa_mem.1=1024 
> The scheduler will look for a host with 2 NUMA nodes with the ability to run 6 CPUs + 3 GB of RAM on one node, and 2 CPUS + 1 GB of RAM on another node. If a host has a single NUMA node with capability to run 8 CPUs and 4 GB of RAM it will not be considered a valid match. 
> " 
> 
> So, if host don't have enough numa nodes, it's invalid 

This is the equivalent for a custom topology, there it's perfectly fine 
to throw an error, and that's a different `die` statement from the one 
I want to remove in our code, too. 

In our configuration `numa` is just a boolean, not a count like in the 
above example, so IMO if no topology is defined but numa enabled we 
should just let qemu do its thing, which is the behavior we used to have 
before hugepages. 

So in order to restore the old behavior I'd like to apply the following 
patch, note that the very same check still exists in the `numaX` entry 
loop further up in the code. 

>From da9b76607c5dbb12477976117c6f91cbc127f992 Mon Sep 17 00:00:00 2001 
From: Wolfgang Bumiller <w.bumiller at proxmox.com> 
Date: Wed, 27 Jul 2016 09:05:57 +0200 
Subject: [PATCH qemu-server] memory: don't restrict sockets to the number of 
host numa nodes 

Removes an error for when there is no custom numa topology 
defined and there are more virtual sockets defined than 
host numa nodes available. 
--- 
PVE/QemuServer/Memory.pm | 2 -- 
1 file changed, 2 deletions(-)

diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm 
index 047ddad..fec447a 100644 
--- a/PVE/QemuServer/Memory.pm 
+++ b/PVE/QemuServer/Memory.pm 
@@ -263,8 +263,6 @@ sub config { 
my $numa_memory = ($static_memory / $sockets); 

for (my $i = 0; $i < $sockets; $i++) { 
- die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/"; 
- 
my $cpustart = ($cores * $i); 
my $cpuend = ($cpustart + $cores - 1) if $cores && $cores > 1; 
my $cpus = $cpustart; 
-- 
2.1.4