[pve-devel] Error in PVE with win2008r2 and 256GB RAM

Alexandre DERUMIER aderumier at odiso.com
Thu Dec 18 12:09:46 CET 2014


Hi, here the new version with hugepages.

http://odisoweb1.odiso.net/qemu-server_3.3-6_amd64.deb

vmid.conf
---------
hugepages: 2 for 2M hugepages
hugepages: 1024 for 1G hugepages

for 1G hugepage, you need to add hugepagesz=1G in your grub options, and this work only with recent processor (intel > wesmere)




About numa config, my patches have been applied some week ago,
and it's a little bit different than testing patch I have send you before.


To enable numa:

numa: 1

this will create numa nodes from sockets number of your vm, and split the memory across nodes.
autonuma kernel balancing will try to map vm numa nodes to best host numa nodes.

I think it should be already enough to give your good performance.




Now, if you want to pining host numa nodes to qemu numa nodes manually,
you need to add:

numa0:cpus=0-1,memory=1024,hostnodes=0,policy=bind
numa1:cpus=2-3,memory=1024,hostnodes=1,policy=bind

the syntax is:
"cpus=<id[-id],memory=<mb>[[,hostnodes=<id[-id]>][,policy=<preferred|bind|interleave>]]",




>>Thank you very much for the offer, but until now i don't understand as how 
>>my VM can gain speed, and as i understand: 
>>a) My VM (Win 2008R2) can not use hugetlbfs 
>>b) Hugepages of 1 GB. is recommended for nodes with some terabytes of RAM, 
>>and my VM only has assigned 251 GB. of RAM. 
>>
>>Can you explain in theory, why will be better? 


Well,AFAIK, the guest os don't known about hugepages, so it should work with windows too.
with 1GB hugepage, you need 251pages vs 128512pages with 2M

and without hugepages (4K), you need 65798144pages.


The more pages you have, the more cpu is use.

Here an explain found on the net

"
Each time a process need to read or write data to memory a translation happen from virtual to physical memory. To speed up this process the modern cpu usually has a so called TLB (Translation lookaside buffer) where the most recent referred memory pages translations are saved, if the needed address is in the TLB you have a TLB hit, the physical address is returned and the process go on, if the page is not in the tlb ( which by the way isn’t that big) you have a miss, and a so called pagewalk occurs.

Pagewalk is a really expensive, in terms of cpu time, activity so it can impact the performance.

In the example before where the guest has 16 GB of ram the cpu has to check 4 Milion-TLB size pages.

how we can reduce TLB misses ? Clearly using bigger memory pages size.

Linux as an enterprise operating system has the so called Hugepages, which when activated instead of having 4kb memory pages provide 2Mb ones.

So if we use 2Mb pages we have 8000 instead of 4Milion pages for the 16 Gb guest which greatly reduce TLB misses and can improve performance for memory intensive guests.
"




----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 18 Décembre 2014 08:59:48
Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM

Thank you very much for the offer, but until now i don't understand as how 
my VM can gain speed, and as i understand: 
a) My VM (Win 2008R2) can not use hugetlbfs 
b) Hugepages of 1 GB. is recommended for nodes with some terabytes of RAM, 
and my VM only has assigned 251 GB. of RAM. 

Can you explain in theory, why will be better? 


----- Original Message ----- 
From: "Alexandre DERUMIER" <aderumier at odiso.com> 
To: "Cesar Peschiera" <brain at click.com.py> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Thursday, December 18, 2014 4:23 AM 
Subject: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 


Also, 
hugetlbfs can use 1GB pages vs 2M pages with transparent, 

I'll send a patch today, It's really easy to implement it. 


----- Mail original ----- 
De: "aderumier" <aderumier at odiso.com> 
À: "Cesar Peschiera" <brain at click.com.py> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 18 Décembre 2014 08:11:47 
Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 

>>Many thanks for your answer, but i am not sure if it is a good idea due to 
>>that i don't understand the advantage since that i can disable the huge 
>>pages of this mode: 

They are 2 modes for hugepages, 

the transparent hugepage mode, managed by the kernel 

But also, old way, manual hugepages (aka hugetlbfs, mounted in 
/dev/hugepage..) 

from: 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html 

" However, transparent hugepage mode is not recommended for database 
workloads." 


So, hugepages are always usefull for big memory vms, because it's reduce cpu 
usage on memory access. 
But transparent hugepage sometimes don't work good with some workloads like 
database. 

I think than manually defined hugepages can give good an extra boost vs 
disable TBL. 



>>Moreover, i have a serious problem with the PVE cluster communication in 
>>two 
>>of eight PVE nodes, and with a VM, if you can help me, i will be extremely 
>>grateful, please see this link: 
>>http://forum.proxmox.com/threads/20523-Quorum-problems-with-PVE-2-3-and-3-3?p=104995#post104995 

Do you mix 2.6.32 and 3.10 kernel in your cluster ? 
(I have had some strange problems with mixed kernel, never find the 
problem). 

Also, this could be a multicast snooping problem. 

What are your hardware switches ? 
Myself, I disable snooping on linux vmbr, enable snooping on physical 
swiches + igmp querier on physical switches. 



----- Mail original ----- 
De: "Cesar Peschiera" <brain at click.com.py> 
À: "pve-devel" <pve-devel at pve.proxmox.com>, "aderumier" 
<aderumier at odiso.com> 
Envoyé: Jeudi 18 Décembre 2014 07:09:11 
Objet: Fw: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 

Hi Alexandre 

Many thanks for your answer, but i am not sure if it is a good idea due to 
that i don't understand the advantage since that i can disable the huge 
pages of this mode: 

shell> vim /etc/default/grub: 
GRUB_CMDLINE_LINUX_DEFAULT="...transparent_hugepage=never" 

shell> update-grub 

Moreover, i have a serious problem with the PVE cluster communication in two 
of eight PVE nodes, and with a VM, if you can help me, i will be extremely 
grateful, please see this link: 
http://forum.proxmox.com/threads/20523-Quorum-problems-with-PVE-2-3-and-3-3?p=104995#post104995 

Best regards 
Cesar 

> ----- Original Message ----- 
> From: "Alexandre DERUMIER" <aderumier at odiso.com> 
> To: "Cesar Peschiera" <brain at click.com.py> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Sent: Thursday, December 18, 2014 2:24 AM 
> Subject: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
> 
> Note that currently, 
> 
> hugepages are managed with transparent hugepage mecanism. 
> 
> But it's seem that we can defined manually hugepages by numa nodes 
> 
> -object 
> memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=1024M,id=ram-node0 
> -numa node,nodeid=0,cpus=0,memdev=ram-node0 
> -object 
> memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=1024M,id=ram-node1 
> -numa node,nodeid=1,cpus=1,memdev=ram-node1 
> 
> 
> I'll try to make a patch if you want to test. 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier at odiso.com> 
> À: "Cesar Peschiera" <brain at click.com.py> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Jeudi 18 Décembre 2014 06:15:44 
> Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
>>>Moreover, i guess that i have problems of Hugapages. 
> 
> I have found interesting blog: 
> 
> http://developerblog.redhat.com/2014/03/10/examining-huge-pages-or-transparent-huge-pages-performance/ 
> 
> It's explain how to see if hugepage impact performance or not. 
> 
> 
> 
> ----- Mail original ----- 
> De: "Cesar Peschiera" <brain at click.com.py> 
> À: "aderumier" <aderumier at odiso.com>, "pve-devel" 
> <pve-devel at pve.proxmox.com> 
> Envoyé: Jeudi 18 Décembre 2014 03:44:33 
> Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
> Hi Alexandre 
> 
> I have installed your patches and with some test of MS-SQL-Server, i see a 
> better behavior in terms of speed (soon i will give the comparisons). 
> 
> Moreover, i guess that i have problems of Hugapages. 
> Please see this link, and answer me if you can: 
> http://forum.proxmox.com/threads/20449-Win2008R2-exaggeratedly-slow-with-256GB-RAM-and-strange-behaviours-in-PVE?p=104996#post104996 
> 
> 
> ----- Original Message ----- 
> From: "Alexandre DERUMIER" <aderumier at odiso.com> 
> To: "Cesar Peschiera" <brain at click.com.py> 
> Cc: <pve-devel at pve.proxmox.com> 
> Sent: Tuesday, December 02, 2014 9:50 AM 
> Subject: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
> 
> Hi, 
> can you test this: 
> 
> http://odisoweb1.odiso.net/pve-qemu-kvm_2.2-2_amd64.deb 
> http://odisoweb1.odiso.net/qemu-server_3.3-5_amd64.deb 
> 
> 
> then edit your vm config file: 
> 
> 
> sockets: 2 
> cores: 4 
> memory: 262144 
> numa0: memory=131072,policy=bind 
> numa1: memory=131072,policy=bind 
> 
> 
> (you need 1 numa by socket, total numa memory must be equal to vm memory). 
> 
> you can change cores number if you want. 
> 
> 
> and start the vm ? 
> 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier at odiso.com> 
> À: "Cesar Peschiera" <brain at click.com.py> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 12:40:29 
> Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
> Hi, 
> 
> some news. 
> 
> It's seem that current proxmox qemu build don't have numa support enable. 
> 
> So, previous command line don't work. 
> 
> 
> I'll send a patch for pve-qemu-kvm and also to add numa options to vm 
> config 
> file. 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier at odiso.com> 
> À: "Cesar Peschiera" <brain at click.com.py> 
> Cc: pve-devel at pve.proxmox.com 
> Envoyé: Mardi 2 Décembre 2014 07:05:47 
> Objet: Re: [pve-devel] Error in PVE with win2008r2 and 256GB RAM 
> 
>>>at i would like to ask you if you can give me your suggestions in 
>>>practical terms, besides the brief theoretical explanation, this is due 
>>>to 
>>>that i am not a developer and i don't understand as apply it in my PVE. 
> 
> About the command line, each vm is a kvm process. 
> 
> So start your vm with current config, do a "ps -aux" , copy the big 
> "kvm -id 
> ... " command line for your vm, 
> 
> stop the vm. 
> 
> then, 
> 
> add my specials lines about numa, 
> 
> and paste the command line to start the vm ! 
> 
> 
> (kvm is so simple ;) 
> 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list