[pve-devel] 3 numa topology issues

Alexandre DERUMIER aderumier at odiso.com
Wed Jul 27 11:38:04 CEST 2016


>>I believe we can simply remove this line since qemu allows it and just
>>applies its default policy. Alternatively we can keep a counter and
>>apply host-nodes manually, starting over at 0 when we run out of nodes,
>>but that's no better than letting qemu do this.

Well, I don't known how auto numa_balancing is working on host, when  for example, 

a guest define 2 numa nodes and host have only 1 numa node.

I'll have more time next week to do a lot of tests

----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 27 Juillet 2016 09:16:07
Objet: Re: 3 numa topology issues

> On July 26, 2016 at 2:18 PM Alexandre DERUMIER <aderumier at odiso.com> wrote: 
> 
> 
> > >>Issue #1: The above code currently does not honor our 'hostnodes' option 
> > >>and breaks when trying to use them together. 
> 
> Also I need to check how to allocated hugepage, when hostnodes is defined with range like : "hostnodes:0-1". 
> 
> 
> 
> 
> 
> >>Useless, yes, which is why I'm wondering whether this should be 
> >>supported/warned about/error... 
> 
> I think we could force to define "hostnodes". 
> I don't known if a lot of people already use numaX option, but as we never exposed it in GUI, i don't think it could break setup of too many people. 
> 
> 
> 
> 
> 
> ----- Mail original ----- 
> De: "Wolfgang Bumiller" <w.bumiller at proxmox.com> 
> À: "aderumier" <aderumier at odiso.com> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Mardi 26 Juillet 2016 13:59:42 
> Objet: Re: 3 numa topology issues 
> 
> On Tue, Jul 26, 2016 at 01:35:50PM +0200, Alexandre DERUMIER wrote: 
> > Hi Wolfgang, 
> > 
> > I just come back from holiday. 
> 
> Hope you had a good time :-) 
> 
> > 
> > 
> > 
> > >>Issue #1: The above code currently does not honor our 'hostnodes' option 
> > >>and breaks when trying to use them together. 
> > 
> > mmm indeed. I think this can be improved. I'll try to check that next week. 
> > 
> > 
> > 
> > >>Issue #2: We create one node per *virtual* socket, which means enabling 
> > >>hugepages with more virtual sockets than physical numa nodes will die 
> > >>with the error that the numa node doesn't exist. This should be fixable 
> > >>as far as I can tell, as nothing really prevents us from putting them on 
> > >>the same node? At least this used to work and I've already asked this 
> > >>question at some point. You said the host kernel will try to map them, 
> > >>yet it worked without issues before, so I'm still not sure about this. 
> > >>Here's the conversation snippet: 
> > 
> > you can create more virtual numa node than physical, only if you don't define "hostnodes" option. 
> > 
> > (from my point of vue, it's totally useless, as the whole point of numa option is to map virtual node to physical node, to avoid memory access bottleneck) 
> 
> Useless, yes, which is why I'm wondering whether this should be 
> supported/warned about/error... 
> 
> > 
> > if hostnodes is defined, you need to have physical numa node available (vm with 2 numa node need host with 2 numa node) 
> > 
> > With hugepage enabled, I have added a restriction to have hostnode defined, because you want to be sure that memory is on same node. 
> > 
> > 
> > # hostnodes 
> > my $hostnodelists = $numa->{hostnodes}; 
> > if (defined($hostnodelists)) { 
> > my $hostnodes; 
> > foreach my $hostnoderange (@$hostnodelists) { 
> > my ($start, $end) = @$hostnoderange; 
> > $hostnodes .= ',' if $hostnodes; 
> > $hostnodes .= $start; 
> > $hostnodes .= "-$end" if defined($end); 
> > $end //= $start; 
> > for (my $i = $start; $i <= $end; ++$i ) { 
> > die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/"; 
> > } 
> > } 
> > 
> > # policy 
> > my $policy = $numa->{policy}; 
> > die "you need to define a policy for hostnode $hostnodes\n" if !$policy; 
> > $mem_object .= ",host-nodes=$hostnodes,policy=$policy"; 
> > } else { 
> > die "numa hostnodes need to be defined to use hugepages" if $conf->{hugepages}; 
> > } 
> > 
> > 
> > >>Issue #3: Actually just an extension to #2: we currently cannot enable 
> > >>NUMA at all (even without hugepages) when there are more virtual sockets 
> > >>than physical numa nodes, and this used to work. The big question is 
> > >>now: does this even make sense? Or should we tell users not to do this? 
> > 
> > That's strange, it should work if you don't defined hugepages and hostnodes option(in numaX) 
> 
> Actually this one was my own faulty configuration, sorry. 

Gotta take that back, here's the problem: 
sockets: 2 
numa: 1 
(no numaX defined) 

will go through Memory.pm's sub config: 

| if ($conf->{numa}) { 
| 
| my $numa_totalmemory = undef; 
| for (my $i = 0; $i < $MAX_NUMA; $i++) { 
| next if !$conf->{"numa$i"}; 
(...) 
| } 
| 
| #if no custom tology, we split memory and cores across numa nodes 
| if(!$numa_totalmemory) { 
| 
| my $numa_memory = ($static_memory / $sockets); 
| 
| for (my $i = 0; $i < $sockets; $i++) { 
| die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/"; 

and dies there if no numa node exists. 

I believe we can simply remove this line since qemu allows it and just 
applies its default policy. Alternatively we can keep a counter and 
apply host-nodes manually, starting over at 0 when we run out of nodes, 
but that's no better than letting qemu do this. 



More information about the pve-devel mailing list