[pve-devel] corosync bug: cluster break after 1 node clean shutdown
Thomas Lamprecht
t.lamprecht at proxmox.com
Mon Sep 14 10:51:03 CEST 2020
On 9/14/20 10:27 AM, Alexandre DERUMIER wrote:
>> I wonder if something like pacemaker sbd could be implemented in proxmox as extra layer of protection ?
>
>>> AFAIK Thomas already has patches to implement active fencing.
>
>>> But IMHO this will not solve the corosync problems..
>
> Yes, sure. I'm really to have to 2 differents sources of verification, with different path/software, to avoid this kind of bug.
> (shit happens, murphy law ;)
would then need at least three, and if one has a bug flooding the network in
a lot of setups (not having beefy switches like you ;) the other two will be
taken down also, either as memory or the system stack gets overloaded.
>
> as we say in French "ceinture & bretelles" -> "belt and braces"
>
>
> BTW,
> a user have reported new corosync problem here:
> https://forum.proxmox.com/threads/proxmox-6-2-corosync-3-rare-and-spontaneous-disruptive-udp-5405-storm-flood.75871
> (Sound like the bug that I have 6month ago, with corosync bug flooding a lof of udp packets, but not the same bug I have here)
Did you get in contact with knet/corosync devs about this?
Because, it may well be something their stack is better at handling it, maybe
there's also really still a bug, or bad behaviour on some edge cases...
More information about the pve-devel
mailing list