[pve-devel] corosync bug: cluster break after 1 node clean shutdown
Alexandre DERUMIER
aderumier at odiso.com
Mon Sep 14 06:54:40 CEST 2020
I wonder if something like pacemaker sbd could be implemented in proxmox as extra layer of protection ?
http://manpages.ubuntu.com/manpages/bionic/man8/sbd.8.html
(shared disk heartbeat).
Something like a independent daemon (not using corosync/pmxcfs/...), also connected to watchdog muxer.
----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>, "aderumier" <aderumier at odiso.com>
Envoyé: Jeudi 10 Septembre 2020 20:21:14
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On 10.09.20 13:34, Alexandre DERUMIER wrote:
>>> as said, if the other nodes where not using HA, the watchdog-mux had no
>>> client which could expire.
>
> sorry, maybe I have wrong explained it,
> but all my nodes had HA enabled.
>
> I have double check lrm_status json files from my morning backup 2h before the problem,
> they were all in "active" state. ("state":"active","mode":"active" )
>
OK, so all had a connection to the watchdog-mux open. This shifts the suspicion
again over to pmxcfs and/or corosync.
> I don't why node7 don't have rebooted, the only difference is that is was the crm master.
> (I think crm also reset the watchdog counter ? maybe behaviour is different than lrm ?)
The watchdog-mux stops updating the real watchdog as soon any client disconnects or times
out. It does not know which client (daemon) that was.
>>> above lines also indicate very high load.
>>> Do you have some monitoring which shows the CPU/IO load before/during this event?
>
> load (1,5,15 ) was: 6 (for 48cores), cpu usage: 23%
> no iowait on disk (vms are on a remote ceph, only proxmox services are running on local ssd disk)
>
> so nothing strange here :/
Hmm, the long loop times could then be the effect of a pmxcfs read or write
operation being (temporarily) stuck.
More information about the pve-devel
mailing list