[pve-devel] corosync bug: cluster break after 1 node clean shutdown

Alexandre DERUMIER aderumier at odiso.com
Tue Sep 15 12:15:47 CEST 2020


here the previous restart log

node1 -> corosync restart at  10:46:15
-----
https://gist.github.com/aderumier/0992051d20f51270ceceb5b3431d18d7


node2
-----
https://gist.github.com/aderumier/eea0c50fefc1d8561868576f417191ba



node5
------
https://gist.github.com/aderumier/f2ce1bc5a93827045a5691583bbc7a37

----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "aderumier" <aderumier at odiso.com>, "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>
Cc: "dietmar" <dietmar at proxmox.com>
Envoyé: Mardi 15 Septembre 2020 11:46:51
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 9/15/20 11:35 AM, Alexandre DERUMIER wrote: 
> Hi, 
> 
> I have finally reproduce it ! 
> 
> But this is with a corosync restart in cron each 1 minute, on node1 
> 
> Then: lrm was stuck for too long for around 60s and softdog have been triggered on multiple other nodes. 
> 
> here the logs with full corosync debug at the time of last corosync restart. 
> 
> node1 (where corosync is restarted each minute) 
> https://gist.github.com/aderumier/c4f192fbce8e96759f91a61906db514e 
> 
> node2 
> https://gist.github.com/aderumier/2d35ea05c1fbff163652e564fc430e67 
> 
> node5 
> https://gist.github.com/aderumier/df1d91cddbb6e15bb0d0193ed8df9273 
> 
> I'll prepare logs from the previous corosync restart, as the lrm seem to be already stuck before. 

Yeah that would be good, as yes the lrm seems to get stuck at around 10:46:21 

> Sep 15 10:47:26 m6kvm2 pve-ha-lrm[3736]: loop take too long (65 seconds) 





More information about the pve-devel mailing list