[pve-devel] 3 numa topology issues

Wolfgang Bumiller w.bumiller at proxmox.com
Wed Jul 27 09:16:07 CEST 2016


> On July 26, 2016 at 2:18 PM Alexandre DERUMIER <aderumier at odiso.com> wrote:
> 
> 
> > >>Issue #1: The above code currently does not honor our 'hostnodes' option 
> > >>and breaks when trying to use them together. 
> 
> Also I need to check how to allocated hugepage, when hostnodes is defined with range like : "hostnodes:0-1".
> 
> 
> 
> 
> 
> >>Useless, yes, which is why I'm wondering whether this should be 
> >>supported/warned about/error... 
> 
> I think we could force to define "hostnodes".
> I don't known if a lot of people already use numaX option, but as we never exposed it in GUI, i don't think it could break setup of too many people.
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> À: "aderumier" <aderumier at odiso.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mardi 26 Juillet 2016 13:59:42
> Objet: Re: 3 numa topology issues
> 
> On Tue, Jul 26, 2016 at 01:35:50PM +0200, Alexandre DERUMIER wrote: 
> > Hi Wolfgang, 
> > 
> > I just come back from holiday. 
> 
> Hope you had a good time :-) 
> 
> > 
> > 
> > 
> > >>Issue #1: The above code currently does not honor our 'hostnodes' option 
> > >>and breaks when trying to use them together. 
> > 
> > mmm indeed. I think this can be improved. I'll try to check that next week. 
> > 
> > 
> > 
> > >>Issue #2: We create one node per *virtual* socket, which means enabling 
> > >>hugepages with more virtual sockets than physical numa nodes will die 
> > >>with the error that the numa node doesn't exist. This should be fixable 
> > >>as far as I can tell, as nothing really prevents us from putting them on 
> > >>the same node? At least this used to work and I've already asked this 
> > >>question at some point. You said the host kernel will try to map them, 
> > >>yet it worked without issues before, so I'm still not sure about this. 
> > >>Here's the conversation snippet: 
> > 
> > you can create more virtual numa node than physical, only if you don't define "hostnodes" option. 
> > 
> > (from my point of vue, it's totally useless, as the whole point of numa option is to map virtual node to physical node, to avoid memory access bottleneck) 
> 
> Useless, yes, which is why I'm wondering whether this should be 
> supported/warned about/error... 
> 
> > 
> > if hostnodes is defined, you need to have physical numa node available (vm with 2 numa node need host with 2 numa node) 
> > 
> > With hugepage enabled, I have added a restriction to have hostnode defined, because you want to be sure that memory is on same node. 
> > 
> > 
> > # hostnodes 
> > my $hostnodelists = $numa->{hostnodes}; 
> > if (defined($hostnodelists)) { 
> > my $hostnodes; 
> > foreach my $hostnoderange (@$hostnodelists) { 
> > my ($start, $end) = @$hostnoderange; 
> > $hostnodes .= ',' if $hostnodes; 
> > $hostnodes .= $start; 
> > $hostnodes .= "-$end" if defined($end); 
> > $end //= $start; 
> > for (my $i = $start; $i <= $end; ++$i ) { 
> > die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/"; 
> > } 
> > } 
> > 
> > # policy 
> > my $policy = $numa->{policy}; 
> > die "you need to define a policy for hostnode $hostnodes\n" if !$policy; 
> > $mem_object .= ",host-nodes=$hostnodes,policy=$policy"; 
> > } else { 
> > die "numa hostnodes need to be defined to use hugepages" if $conf->{hugepages}; 
> > } 
> > 
> > 
> > >>Issue #3: Actually just an extension to #2: we currently cannot enable 
> > >>NUMA at all (even without hugepages) when there are more virtual sockets 
> > >>than physical numa nodes, and this used to work. The big question is 
> > >>now: does this even make sense? Or should we tell users not to do this? 
> > 
> > That's strange, it should work if you don't defined hugepages and hostnodes option(in numaX) 
> 
> Actually this one was my own faulty configuration, sorry. 

Gotta take that back, here's the problem:
  sockets: 2
  numa: 1
  (no numaX defined)

will go through Memory.pm's sub config:

|    if ($conf->{numa}) {
|
|        my $numa_totalmemory = undef;
|        for (my $i = 0; $i < $MAX_NUMA; $i++) {
|            next if !$conf->{"numa$i"};
             (...)
|        }
|
|        #if no custom tology, we split memory and cores across numa nodes
|        if(!$numa_totalmemory) {
|
|            my $numa_memory = ($static_memory / $sockets);
|
|            for (my $i = 0; $i < $sockets; $i++)  {
|                die "host NUMA node$i doesn't exist\n" if ! -d "/sys/devices/system/node/node$i/";

and dies there if no numa node exists.

I believe we can simply remove this line since qemu allows it and just
applies its default policy. Alternatively we can keep a counter and
apply host-nodes manually, starting over at 0 when we run out of nodes,
but that's no better than letting qemu do this.




More information about the pve-devel mailing list