[pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?

Alexandre DERUMIER aderumier at odiso.com
Wed Sep 21 13:52:41 CEST 2016


>>OK, multicast traffic may still be hindered when on the same network with 
>>heavy users (e.g. VM storage), even if the network itself is not saturated. 

>>A second totem ring through the redundant ring protocol (rrp) in passive 
>>mode could boost the performance as it almost doubles the speed of the totem 
>>protocol, plus it adds redundancy for quorum. 

yes , but I would like to avoid to wired dedicated cables (I'm very far from saturate the link, maybe 1 or 2gb/s / 10gbits)


> 
> and I'm seeing a lot of retransmit, time to time (around 5-10s of retransmit), 1 or twice by hour :/ 

>>>Hmm sounds a bit weird. Seemingly random? 

yep, totally random

> 
> so I'm really scared to increase the cluster size. 
> 
> Note that I have around 1000vms, so I don't known impact of number of messages/s. 
> 
> Question : do you think streaming all vm statistics could impact number of message/s ? 
> 


>>Do you use something which could trigger frequent writes/modifies on 
>>/etc/pve ? 
nothing special

>>You could look if 
>># inotifywait -e attrib,modify,create,delete,move -r -m /etc/pve/ 
>>generates a lot of output, this is just the info how the local node modifies 
>>the pmxcfs, not all. 

>>FYI, the HA manager uses it frequently but in a 5 seconds cycle, so not 
>>really 
>>heavy usage. 

I only see:
/etc/pve/nodes/kvm1/ CREATE lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MODIFY lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MOVED_FROM lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MOVED_TO lrm_status



>>Just running VMs normally does not modifies anything, there are mostly just 
>>reads which should not cause any problems as they won't go over the wire and 
>>are also fast from the DB as its in RAM, only modifications have to be send 
>>to other nodes. 

yes, but running vms stream rrd values on all nodes. (I have around 70vms by node)



>>Can you also send me the output from 
>># corosync-cmapctl 

>>This is quite some data and contains IP addresses so you maybe want to sent 
>>it to me directly. 

I'll send you the detail directly by mail



> 
> 
> 
> 
> ----- Mail original ----- 
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
> À: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Mercredi 21 Septembre 2016 09:40:01 
> Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ? 
> 
> On 09/21/2016 08:50 AM, Alexandre DERUMIER wrote: 
>>>> Forgot to mention that consul supports multiple clusters and/or multi 
>>>> center clusters out of the box. 
>> yes, I read the doc yesterday. seem very interesting. 
>> 
>> The most work could be to replace pmxcs by consul kv store. I have seen some consul fuse fs implementation, 
>> but it don't have all pmxcs features (like symlinks for example). 
>> 
>> Zookeeper seem to be lower level. 
>> 
>> reading sheedog plugin:(1500loc) 
>> 
>> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/zookeeper.c 
>> vs 
>> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/corosync.c 
> Discussion and evaluating options is good but throwing instantly all away, 
> and switching to another - not necessarily better - cluster stack is 
> maybe a bit overreacted. :) I also think that our current cluster stack, 
> with corosync + pve-cluser (pmxcfs) is quite stable and a lot of things 
> depend on it. 
> 
> Also corosync is very well tested software and works really good, at least 
> with small to mid size clusters (< 60 nodes - which I find is quite an 
> achievement for a cluster!). You have also to consider 
> that quite some overhead, and thus node limitation, may come from the 
> database used by pmxcfs, the transaction needs to be synced with disk to 
> make everything reliable and while this is quite optimized it makes things 
> slower (placing the DB on really fast storage could help here). 
> 
> I, personally, would prefer to keep corosync and introduce a protocol which 
> allows connecting multiple clusters (easier said, but still less change and 
> work then adapting to another cluster stack, which is most surely not 
> better, or has other drawbacks.) 
> 
> Also taking a look at the corosync satellite approach sounds interesting. 
> 
> Connecting multiple clusters is also another approach then a small cluster 
> with a lot of satellite nodes per cluster node, I see the former better as 
> its more decentralized and seems to fit netter in our current design. :) 
> 
>> Note that for scaling, zookeeper,consul,... have some kind of master nodes for the quorum, and client nodes. (same than corosync satelitte). 
>> I don't think it's technically possible to scale with full mesh masters nodes with lot of nodes. 
> No, with full mesh you wont really overcome the limits and problems corosync 
> has here, corosync utilizes the possibilities quite well with multicast 
> here. 
> 
> @Alexandre, you say that with 16 nodes the cluster is quite at is maximum, 
> can I get some more infos from you as I currently do not have the 
> hardware to 
> test this :) 
> 
> Do you use IGMP snooping/queriers? 
> On which network communicates corosync, on an independent? And how fast 
> is it? 
> Redundant rings also? 
> 
> 
>> ----- Mail original ----- 
>> De: "datanom.net" <mir at datanom.net> 
>> À: "pve-devel" <pve-devel at pve.proxmox.com> 
>> Envoyé: Mercredi 21 Septembre 2016 07:49:06 
>> Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ? 
>> 
>> On Wed, 21 Sep 2016 01:45:18 +0200 
>> Michael Rasmussen <mir at datanom.net> wrote: 
>> 
>>> https://github.com/hashicorp/consul 
>>> 
>> Forgot to mention that consul supports multiple clusters and/or multi 
>> center clusters out of the box. 
>> 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list