[pve-devel] corosync bug: cluster break after 1 node clean shutdown

Thomas Lamprecht t.lamprecht at proxmox.com
Mon Sep 14 10:51:03 CEST 2020


On 9/14/20 10:27 AM, Alexandre DERUMIER wrote:
>> I wonder if something like pacemaker sbd could be implemented in proxmox as extra layer of protection ? 
> 
>>> AFAIK Thomas already has patches to implement active fencing. 
> 
>>> But IMHO this will not solve the corosync problems.. 
> 
> Yes, sure. I'm really to have to 2 differents sources of verification, with different path/software, to avoid this kind of bug.
> (shit happens, murphy law ;)

would then need at least three, and if one has a bug flooding the network in
a lot of setups (not having beefy switches like you ;) the other two will be
taken down also, either as memory or the system stack gets overloaded.

> 
> as we say in French "ceinture & bretelles" -> "belt and braces"
> 
> 
> BTW,
> a user have reported new corosync problem here:
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871
> (Sound like the bug that I have 6month ago, with corosync bug flooding a lof of udp packets, but not the same bug I have here)

Did you get in contact with knet/corosync devs about this?

Because, it may well be something their stack is better at handling it, maybe
there's also really still a bug, or bad behaviour on some edge cases...





More information about the pve-devel mailing list