[pve-devel] corosync bug: cluster break after 1 node clean shutdown
Alexandre DERUMIER
aderumier at odiso.com
Tue Sep 15 12:15:47 CEST 2020
here the previous restart log
node1 -> corosync restart at 10:46:15
-----
https://gist.github.com/aderumier/0992051d20f51270ceceb5b3431d18d7
node2
-----
https://gist.github.com/aderumier/eea0c50fefc1d8561868576f417191ba
node5
------
https://gist.github.com/aderumier/f2ce1bc5a93827045a5691583bbc7a37
----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "aderumier" <aderumier at odiso.com>, "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>
Cc: "dietmar" <dietmar at proxmox.com>
Envoyé: Mardi 15 Septembre 2020 11:46:51
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On 9/15/20 11:35 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I have finally reproduce it !
>
> But this is with a corosync restart in cron each 1 minute, on node1
>
> Then: lrm was stuck for too long for around 60s and softdog have been triggered on multiple other nodes.
>
> here the logs with full corosync debug at the time of last corosync restart.
>
> node1 (where corosync is restarted each minute)
> https://gist.github.com/aderumier/c4f192fbce8e96759f91a61906db514e
>
> node2
> https://gist.github.com/aderumier/2d35ea05c1fbff163652e564fc430e67
>
> node5
> https://gist.github.com/aderumier/df1d91cddbb6e15bb0d0193ed8df9273
>
> I'll prepare logs from the previous corosync restart, as the lrm seem to be already stuck before.
Yeah that would be good, as yes the lrm seems to get stuck at around 10:46:21
> Sep 15 10:47:26 m6kvm2 pve-ha-lrm[3736]: loop take too long (65 seconds)
More information about the pve-devel
mailing list