[pve-devel] [PATCH] add numa options

Alexandre DERUMIER aderumier at odiso.com
Tue Jan 6 22:47:56 CET 2015


>>Being mssql into the VM, the DBA showed me as mssql can see the numa nodes,
>>and mssql has his own form of manage his own processes between the numa
>>nodes for get a better performance. It is for it that i think will be better
>>that in the PVE GUI we have the option of enable or disable the cpu pinning
>>for each VM, and obviously i would like to do some tests for compare which
>>of the two options is better.

the mssql process will see the virtual numa nodes of the vms, so indeed I'll manage virtual memory and virtual cpus.

But that don't mean that physically, the virtual cpus|numa nodes will be mapped to correct physical cpus.

For this, you need :

-manual pinning
or
-auto numabalancing


maybe read this presentation, page 45
http://www.linux-kvm.org/wiki/images/7/75/01x07b-NumaAutobalancing.pdf


>>Maybe will be better do a more test, disabling autonuma in the 3.10 kernel. 
>>Question: 
>>How can i disable autonuma in the /etc/default/grub file? 

Don't known, maybe put a simple "echo 0 >..." in rc.local.

(But I'm pretty sure you'll got lower performance without pinning or without numa balancing)



----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>
Cc: "dietmar" <dietmar at proxmox.com>, "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mardi 6 Janvier 2015 20:52:50
Objet: Re: [pve-devel] [PATCH] add numa options

>(Note that I don't see how mssql can pin vcpus on real host cpu). 
Being mssql into the VM, the DBA showed me as mssql can see the numa nodes, 
and mssql has his own form of manage his own processes between the numa 
nodes for get a better performance. It is for it that i think will be better 
that in the PVE GUI we have the option of enable or disable the cpu pinning 
for each VM, and obviously i would like to do some tests for compare which 
of the two options is better. 

>host kernel 3.10 autonuma is doing autopinning, so you can try to disable 
>it. 
If the autonuma isn't customizable for each VM, i guess that will be 
better leave it as is, but i am not sure, due that we will have two systems 
doing the auto balance: the 3.10 Kernel and the mssql into the VM.... ??? 

Maybe will be better do a more test, disabling autonuma in the 3.10 kernel. 
Question: 
How can i disable autonuma in the /etc/default/grub file? 

Note: 
The test that i did in the past, was with and without your patches, always 
with the 3.10 kernel (without do changes on his configuration), and with 
your patches, the performance was very top (two a three times more quick in 
several tests, never minus of two, talking in terms of data base). 


----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Cesar Peschiera" <brain at click.com.py> 
Cc: "dietmar" <dietmar at proxmox.com>; "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Tuesday, January 06, 2015 2:31 PM 
Subject: Re: [pve-devel] [PATCH] add numa options 


>>ase excuse me if i don't talk with property, i meant the cpu pinning that 
>>will have pve-manager and QEMU in the next release. Ie, that i would like 
>>to 
>>have the option of enable or disable in PVE GUI the cpu pinning that QEMU 
>>can apply for each VM, if so, i will can to choose if i want that QEMU or 
>>the application inside of the VM managed the cpu pinning with the numa 
>>nodes. And the DBA says that the MS-SQL Server will manage better the cpu 
>>pinning that QEMU, and i would like to do some tests for confirm it. 

Oh,ok. 

so numa:1 should do the trick, it's create numa nodes but don't pin cpu. 

(Note that I don't see how mssql can pin vcpus on real host cpu). 

host kernel 3.10 autonuma is doing autopinning,so you can try to disable it. 



About qemu-server pve-no-subscription, I don't known if Dietmar plan to 
release it until next proxmox release. 
Because big changes are coming in this package this week. 



----- Mail original ----- 
De: "Cesar Peschiera" <brain at click.com.py> 
À: "aderumier" <aderumier at odiso.com> 
Cc: "dietmar" <dietmar at proxmox.com>, "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mardi 6 Janvier 2015 17:33:42 
Objet: Re: [pve-devel] [PATCH] add numa options 

Hi Alexandre 

Please excuse me if i don't talk with property, i meant the cpu pinning that 
will have pve-manager and QEMU in the next release. Ie, that i would like to 
have the option of enable or disable in PVE GUI the cpu pinning that QEMU 
can apply for each VM, if so, i will can to choose if i want that QEMU or 
the application inside of the VM managed the cpu pinning with the numa 
nodes. And the DBA says that the MS-SQL Server will manage better the cpu 
pinning that QEMU, and i would like to do some tests for confirm it. 

Moreover, as i have 2 servers identical in Hardware, where is running this 
unique VM, i would like also to have the option of live migration enabled. 

>I'm interested to see results between both method 
With pleasure i will report the results 

Moreover, talking about of the download of qemu-server deb from git, as very 
soon this server will be in production, i would like to wait that this 
package is in the "pve-no-subscription" repository for apply a upgrade, that 
being well, I will run less risks of down times, unless you tell me you have 
already tested and is very stable. 


----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Cesar Peschiera" <brain at click.com.py> 
Cc: "dietmar" <dietmar at proxmox.com>; "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Tuesday, January 06, 2015 5:02 AM 
Subject: Re: [pve-devel] [PATCH] add numa options 


Hi, 

>>As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive 
>>for 
>>MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can 
>>manage 
>>his own numa-processes better than QEMU, and as i guess that also will 
>>exist 
>>many applications that will manage his own numa-processes better than 
>>QEMU, 
>>is that i would like to order that PVE GUI has a option of enable or 
>>disable 
>>the automatic administration of the numa-processes, also with the 
>>possibility of do live migration. 

I'm not sure to understand what do you mean by 
"says that MS-SQL Server can manage his own numa-processes better than 
QEMU," 


Numa are not process, it's an architecture to regroup cpus with memory 
bank,for fast memory access. 


They are 2 parts: 

1)currently, qemu expose the virtual numa nodes to the guest. 
(each numa node = X cores with X memory) 

This can be simply enabled with numa:1 with last patches, 
(I'll create 1 numa node by virtual socket, and split the ram amount between 
each node 


or if you want to custom memory access, cores by nodes,or setup specific 
virtual numa nodes to specific host numa nodes 
you can do it with 
numa0: ...., 
numa1: 
"cpus=<id[-id],memory=<mb>[[,hostnodes=<id[-id]>][,policy=<preferred|bind|interleave>]]" 


But this is always the application inside the guest which manage the memory 
access. 


2) Now with kernel 3.10, we have also auto numabalancing at the host side. 
I'll try to map if possible the virtual numa nodes to host numa node. 

you can disable this feature with "echo 0 > /proc/sys/kernel/numa_balancing" 


So for my point of view, numa:1 + auto numa balancing should give you 
already good results, 
and it's allow live migration between different hosts numa architecture 


Maybe with only 1vm, you can try to manually map virtual nodes to specific 
nodes. 

I'm interested to see results between both method (Maybe do you want last 
qemu-server deb from git ?) 



I plan to add gui for part1. 




----- Mail original ----- 
De: "Cesar Peschiera" <brain at click.com.py> 
À: "aderumier" <aderumier at odiso.com>, "dietmar" <dietmar at proxmox.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mardi 6 Janvier 2015 06:35:15 
Objet: Re: [pve-devel] [PATCH] add numa options 

Hi Alexandre and developers team. 

I would like to order a feature for the next release of pve-manager: 

As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive for 
MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can manage 
his own numa-processes better than QEMU, and as i guess that also will exist 
many applications that will manage his own numa-processes better than QEMU, 
is that i would like to order that PVE GUI has a option of enable or disable 
the automatic administration of the numa-processes, also with the 
possibility of do live migration. 

Moreover, if you can to add such feature, i will can to run a test with 
MS-SQL Server for know which of the two options give me better results and 
publish it (with the times of wait for each case) 

@Alexandre: 
Moreover, with your temporal patches for manage the numa-processes, in 
MS-SQL Server i saw a difference of time between two to three times more 
quick for get the results (that it is fantastic, a great difference), but as 
i yet don't finish of do the tests (talking about of do some changes in the 
Bios Hardware, HugePages managed for the Windows Server, etc), is that yet i 
don't publish a resume very detailed of the tests. I guess that soon i will 
do it (I depend on third parties, and the PVE host not must lose the cluster 
communication). 

And talking about of lose the cluster communication, from that i have "I/OAT 
DMA engine" enabled in the Hardware Bios, the node never more lost the 
cluster communication, but i must do some extensive testing to confirm it. 

Best regards 
Cesar 

----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Dietmar Maurer" <dietmar at proxmox.com> 
Cc: <pve-devel at pve.proxmox.com> 
Sent: Tuesday, December 02, 2014 8:17 PM 
Subject: Re: [pve-devel] [PATCH] add numa options 


> Ok, 
> 
> Finally I found the last pieces of the puzzle: 
> 
> to have autonuma balancing, we just need: 
> 
> 2sockes-2cores-2gb ram 
> 
> -object memory-backend-ram,size=1024M,id=ram-node0 
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 
> -object memory-backend-ram,size=1024M,id=ram-node1 
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 
> 
> Like this, the host kernel will try to balance the numa node. 
> This command line works if the host don't support numa. 
> 
> 
> 
> now if we want to bind guest numa node to specific host numa node, 
> 
> -object 
> memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=preferred 
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 
> -object 
> memory-backend-ram,size=1024M,id=ram-node1,host-nodes=1,policy=bind \ 
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 
> 
> This require that host-nodes=X exist on the physical host 
> and need also the qemu-kvm --enable-numa flag 
> 
> 
> 
> So, 
> I think we could add: 
> 
> numa:0|1. 
> 
> which generate the first config, create 1numa node by socket, and share 
> the ram across the the nodes 
> 
> 
> 
> and also,for advanced users which need manual pinning: 
> 
> 
> numa0:cpus=<X-X>,memory=<mb>,hostnode=<X-X>,policy="bind|preferred|....) 
> numa1:... 
> 
> 
> 
> what do you think about it ? 
> 
> 
> 
> 
> BTW, about pc-dimm hotplug, it's possible to add nume nodeid in 
> "device_add pc-dimm,node=X" 
> 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier at odiso.com> 
> À: "Dietmar Maurer" <dietmar at proxmox.com> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 20:25:51 
> Objet: Re: [pve-devel] [PATCH] add numa options 
> 
>>>shared? That looks strange to me. 
> I mean split across the both nodes. 
> 
> 
> I have check a little libvirt, 
> and I'm not sure, but I think that memory-backend-ram is optionnal, to 
> have autonuma. 
> 
> It's more about cpu pinning/memory pinning on selected host node 
> 
> Here an example for libvirt: 
> http://www.redhat.com/archives/libvir-list/2014-July/msg00715.html 
> "qemu: pass numa node binding preferences to qemu" 
> 
> +-object 
> memory-backend-ram,size=20M,id=ram-node0,host-nodes=3,policy=preferred \ 
> +-numa node,nodeid=0,cpus=0,memdev=ram-node0 \ 
> +-object 
> memory-backend-ram,size=645M,id=ram-node1,host-nodes=0-7,policy=bind \ 
> +-numa node,nodeid=1,cpus=1-27,cpus=29,memdev=ram-node1 \ 
> +-object memory-backend-ram,size=23440M,id=ram-node2,\ 
> +host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind \ 
> +-numa node,nodeid=2,cpus=28,cpus=30-31,memdev=ram-node2 \ 
> 
> ----- Mail original ----- 
> 
> De: "Dietmar Maurer" <dietmar at proxmox.com> 
> À: "Alexandre DERUMIER" <aderumier at odiso.com> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 19:42:45 
> Objet: RE: [pve-devel] [PATCH] add numa options 
> 
>> "When do memory hotplug, if there is numa node, we should add the memory 
>> size to the corresponding node memory size. 
>> 
>> For now, it mainly affects the result of hmp command "info numa"." 
>> 
>> 
>> So, it's seem to be done automaticaly. 
>> Not sure on which node is assigne the pc-dimm, but maybe the free slots 
>> are 
>> shared at start between the numa nodes. 
> 
> shared? That looks strange to me. 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 


More information about the pve-devel mailing list