[pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?
Alexandre DERUMIER
aderumier at odiso.com
Wed Sep 21 13:52:41 CEST 2016
>>OK, multicast traffic may still be hindered when on the same network with
>>heavy users (e.g. VM storage), even if the network itself is not saturated.
>>A second totem ring through the redundant ring protocol (rrp) in passive
>>mode could boost the performance as it almost doubles the speed of the totem
>>protocol, plus it adds redundancy for quorum.
yes , but I would like to avoid to wired dedicated cables (I'm very far from saturate the link, maybe 1 or 2gb/s / 10gbits)
>
> and I'm seeing a lot of retransmit, time to time (around 5-10s of retransmit), 1 or twice by hour :/
>>>Hmm sounds a bit weird. Seemingly random?
yep, totally random
>
> so I'm really scared to increase the cluster size.
>
> Note that I have around 1000vms, so I don't known impact of number of messages/s.
>
> Question : do you think streaming all vm statistics could impact number of message/s ?
>
>>Do you use something which could trigger frequent writes/modifies on
>>/etc/pve ?
nothing special
>>You could look if
>># inotifywait -e attrib,modify,create,delete,move -r -m /etc/pve/
>>generates a lot of output, this is just the info how the local node modifies
>>the pmxcfs, not all.
>>FYI, the HA manager uses it frequently but in a 5 seconds cycle, so not
>>really
>>heavy usage.
I only see:
/etc/pve/nodes/kvm1/ CREATE lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MODIFY lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MOVED_FROM lrm_status.tmp.3291
/etc/pve/nodes/kvm1/ MOVED_TO lrm_status
>>Just running VMs normally does not modifies anything, there are mostly just
>>reads which should not cause any problems as they won't go over the wire and
>>are also fast from the DB as its in RAM, only modifications have to be send
>>to other nodes.
yes, but running vms stream rrd values on all nodes. (I have around 70vms by node)
>>Can you also send me the output from
>># corosync-cmapctl
>>This is quite some data and contains IP addresses so you maybe want to sent
>>it to me directly.
I'll send you the detail directly by mail
>
>
>
>
> ----- Mail original -----
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mercredi 21 Septembre 2016 09:40:01
> Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?
>
> On 09/21/2016 08:50 AM, Alexandre DERUMIER wrote:
>>>> Forgot to mention that consul supports multiple clusters and/or multi
>>>> center clusters out of the box.
>> yes, I read the doc yesterday. seem very interesting.
>>
>> The most work could be to replace pmxcs by consul kv store. I have seen some consul fuse fs implementation,
>> but it don't have all pmxcs features (like symlinks for example).
>>
>> Zookeeper seem to be lower level.
>>
>> reading sheedog plugin:(1500loc)
>>
>> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/zookeeper.c
>> vs
>> https://github.com/sheepdog/sheepdog/blob/8772904509ce6b10c5edca4f497022686aecc18f/sheep/cluster/corosync.c
> Discussion and evaluating options is good but throwing instantly all away,
> and switching to another - not necessarily better - cluster stack is
> maybe a bit overreacted. :) I also think that our current cluster stack,
> with corosync + pve-cluser (pmxcfs) is quite stable and a lot of things
> depend on it.
>
> Also corosync is very well tested software and works really good, at least
> with small to mid size clusters (< 60 nodes - which I find is quite an
> achievement for a cluster!). You have also to consider
> that quite some overhead, and thus node limitation, may come from the
> database used by pmxcfs, the transaction needs to be synced with disk to
> make everything reliable and while this is quite optimized it makes things
> slower (placing the DB on really fast storage could help here).
>
> I, personally, would prefer to keep corosync and introduce a protocol which
> allows connecting multiple clusters (easier said, but still less change and
> work then adapting to another cluster stack, which is most surely not
> better, or has other drawbacks.)
>
> Also taking a look at the corosync satellite approach sounds interesting.
>
> Connecting multiple clusters is also another approach then a small cluster
> with a lot of satellite nodes per cluster node, I see the former better as
> its more decentralized and seems to fit netter in our current design. :)
>
>> Note that for scaling, zookeeper,consul,... have some kind of master nodes for the quorum, and client nodes. (same than corosync satelitte).
>> I don't think it's technically possible to scale with full mesh masters nodes with lot of nodes.
> No, with full mesh you wont really overcome the limits and problems corosync
> has here, corosync utilizes the possibilities quite well with multicast
> here.
>
> @Alexandre, you say that with 16 nodes the cluster is quite at is maximum,
> can I get some more infos from you as I currently do not have the
> hardware to
> test this :)
>
> Do you use IGMP snooping/queriers?
> On which network communicates corosync, on an independent? And how fast
> is it?
> Redundant rings also?
>
>
>> ----- Mail original -----
>> De: "datanom.net" <mir at datanom.net>
>> À: "pve-devel" <pve-devel at pve.proxmox.com>
>> Envoyé: Mercredi 21 Septembre 2016 07:49:06
>> Objet: Re: [pve-devel] question/idea : managing big proxmox cluster (100nodes), get rid of corosync ?
>>
>> On Wed, 21 Sep 2016 01:45:18 +0200
>> Michael Rasmussen <mir at datanom.net> wrote:
>>
>>> https://github.com/hashicorp/consul
>>>
>> Forgot to mention that consul supports multiple clusters and/or multi
>> center clusters out of the box.
>>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list