[pve-devel] corosync bug: cluster break after 1 node clean shutdown

Alexandre DERUMIER aderumier at odiso.com
Mon Sep 14 06:54:40 CEST 2020


I wonder if something like pacemaker sbd could be implemented in proxmox as extra layer of protection ?

http://manpages.ubuntu.com/manpages/bionic/man8/sbd.8.html

(shared disk heartbeat).

Something like a independent daemon (not using corosync/pmxcfs/...), also connected to watchdog muxer.

----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>, "aderumier" <aderumier at odiso.com>
Envoyé: Jeudi 10 Septembre 2020 20:21:14
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 10.09.20 13:34, Alexandre DERUMIER wrote: 
>>> as said, if the other nodes where not using HA, the watchdog-mux had no 
>>> client which could expire. 
> 
> sorry, maybe I have wrong explained it, 
> but all my nodes had HA enabled. 
> 
> I have double check lrm_status json files from my morning backup 2h before the problem, 
> they were all in "active" state. ("state":"active","mode":"active" ) 
> 

OK, so all had a connection to the watchdog-mux open. This shifts the suspicion 
again over to pmxcfs and/or corosync. 

> I don't why node7 don't have rebooted, the only difference is that is was the crm master. 
> (I think crm also reset the watchdog counter ? maybe behaviour is different than lrm ?) 

The watchdog-mux stops updating the real watchdog as soon any client disconnects or times 
out. It does not know which client (daemon) that was. 

>>> above lines also indicate very high load. 
>>> Do you have some monitoring which shows the CPU/IO load before/during this event? 
> 
> load (1,5,15 ) was: 6 (for 48cores), cpu usage: 23% 
> no iowait on disk (vms are on a remote ceph, only proxmox services are running on local ssd disk) 
> 
> so nothing strange here :/ 

Hmm, the long loop times could then be the effect of a pmxcfs read or write 
operation being (temporarily) stuck. 





More information about the pve-devel mailing list