[pve-devel] [PATCH] add numa options

Alexandre DERUMIER aderumier at odiso.com
Tue Jan 6 09:02:28 CET 2015


Hi,

>>As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive for 
>>MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can manage 
>>his own numa-processes better than QEMU, and as i guess that also will exist 
>>many applications that will manage his own numa-processes better than QEMU, 
>>is that i would like to order that PVE GUI has a option of enable or disable 
>>the automatic administration of the numa-processes, also with the 
>>possibility of do live migration.

I'm not sure to understand what do you mean by 
"says that MS-SQL Server can manage his own numa-processes better than QEMU,"


Numa are not process, it's an architecture to regroup cpus with memory bank,for fast memory access.


They are 2 parts:

1)currently, qemu expose the virtual numa nodes to the guest.
(each numa node = X cores  with X memory)

This can be simply enabled with numa:1  with last patches,
(I'll create 1 numa node by virtual socket, and split the ram amount between each node


or if you want to custom memory access, cores by nodes,or setup specific virtual numa nodes to specific host numa nodes
you can do it with 
numa0: ...., 
numa1: "cpus=<id[-id],memory=<mb>[[,hostnodes=<id[-id]>][,policy=<preferred|bind|interleave>]]"


But this is always the application inside the guest which manage the memory access.


2) Now with kernel 3.10, we have also auto numabalancing at the host side.
I'll try to map if possible the virtual numa nodes to host numa node.

you can disable this feature with "echo 0 > /proc/sys/kernel/numa_balancing"


So for my point of view, numa:1 + auto numa balancing should give you already good results, 
and it's allow live migration between different hosts numa architecture


Maybe with only 1vm, you can try to manually map virtual nodes to specific nodes.

I'm interested to see results between both method (Maybe do you want last qemu-server deb from git ?)



I plan to add gui for part1.




----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>, "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mardi 6 Janvier 2015 06:35:15
Objet: Re: [pve-devel] [PATCH] add numa options

Hi Alexandre and developers team. 

I would like to order a feature for the next release of pve-manager: 

As i have running a VM with MS-SQL Server (and with 246 GB RAM exclusive for 
MS-SQL Server), the DBA of MS-SQL Server says that MS-SQL Server can manage 
his own numa-processes better than QEMU, and as i guess that also will exist 
many applications that will manage his own numa-processes better than QEMU, 
is that i would like to order that PVE GUI has a option of enable or disable 
the automatic administration of the numa-processes, also with the 
possibility of do live migration. 

Moreover, if you can to add such feature, i will can to run a test with 
MS-SQL Server for know which of the two options give me better results and 
publish it (with the times of wait for each case) 

@Alexandre: 
Moreover, with your temporal patches for manage the numa-processes, in 
MS-SQL Server i saw a difference of time between two to three times more 
quick for get the results (that it is fantastic, a great difference), but as 
i yet don't finish of do the tests (talking about of do some changes in the 
Bios Hardware, HugePages managed for the Windows Server, etc), is that yet i 
don't publish a resume very detailed of the tests. I guess that soon i will 
do it (I depend on third parties, and the PVE host not must lose the cluster 
communication). 

And talking about of lose the cluster communication, from that i have "I/OAT 
DMA engine" enabled in the Hardware Bios, the node never more lost the 
cluster communication, but i must do some extensive testing to confirm it. 

Best regards 
Cesar 

----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Dietmar Maurer" <dietmar at proxmox.com> 
Cc: <pve-devel at pve.proxmox.com> 
Sent: Tuesday, December 02, 2014 8:17 PM 
Subject: Re: [pve-devel] [PATCH] add numa options 


> Ok, 
> 
> Finally I found the last pieces of the puzzle: 
> 
> to have autonuma balancing, we just need: 
> 
> 2sockes-2cores-2gb ram 
> 
> -object memory-backend-ram,size=1024M,id=ram-node0 
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 
> -object memory-backend-ram,size=1024M,id=ram-node1 
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 
> 
> Like this, the host kernel will try to balance the numa node. 
> This command line works if the host don't support numa. 
> 
> 
> 
> now if we want to bind guest numa node to specific host numa node, 
> 
> -object 
> memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=preferred 
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 
> -object 
> memory-backend-ram,size=1024M,id=ram-node1,host-nodes=1,policy=bind \ 
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 
> 
> This require that host-nodes=X exist on the physical host 
> and need also the qemu-kvm --enable-numa flag 
> 
> 
> 
> So, 
> I think we could add: 
> 
> numa:0|1. 
> 
> which generate the first config, create 1numa node by socket, and share 
> the ram across the the nodes 
> 
> 
> 
> and also,for advanced users which need manual pinning: 
> 
> 
> numa0:cpus=<X-X>,memory=<mb>,hostnode=<X-X>,policy="bind|preferred|....) 
> numa1:... 
> 
> 
> 
> what do you think about it ? 
> 
> 
> 
> 
> BTW, about pc-dimm hotplug, it's possible to add nume nodeid in 
> "device_add pc-dimm,node=X" 
> 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier at odiso.com> 
> À: "Dietmar Maurer" <dietmar at proxmox.com> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 20:25:51 
> Objet: Re: [pve-devel] [PATCH] add numa options 
> 
>>>shared? That looks strange to me. 
> I mean split across the both nodes. 
> 
> 
> I have check a little libvirt, 
> and I'm not sure, but I think that memory-backend-ram is optionnal, to 
> have autonuma. 
> 
> It's more about cpu pinning/memory pinning on selected host node 
> 
> Here an example for libvirt: 
> http://www.redhat.com/archives/libvir-list/2014-July/msg00715.html 
> "qemu: pass numa node binding preferences to qemu" 
> 
> +-object 
> memory-backend-ram,size=20M,id=ram-node0,host-nodes=3,policy=preferred \ 
> +-numa node,nodeid=0,cpus=0,memdev=ram-node0 \ 
> +-object 
> memory-backend-ram,size=645M,id=ram-node1,host-nodes=0-7,policy=bind \ 
> +-numa node,nodeid=1,cpus=1-27,cpus=29,memdev=ram-node1 \ 
> +-object memory-backend-ram,size=23440M,id=ram-node2,\ 
> +host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind \ 
> +-numa node,nodeid=2,cpus=28,cpus=30-31,memdev=ram-node2 \ 
> 
> ----- Mail original ----- 
> 
> De: "Dietmar Maurer" <dietmar at proxmox.com> 
> À: "Alexandre DERUMIER" <aderumier at odiso.com> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 19:42:45 
> Objet: RE: [pve-devel] [PATCH] add numa options 
> 
>> "When do memory hotplug, if there is numa node, we should add the memory 
>> size to the corresponding node memory size. 
>> 
>> For now, it mainly affects the result of hmp command "info numa"." 
>> 
>> 
>> So, it's seem to be done automaticaly. 
>> Not sure on which node is assigne the pc-dimm, but maybe the free slots 
>> are 
>> shared at start between the numa nodes. 
> 
> shared? That looks strange to me. 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 



More information about the pve-devel mailing list