[PVE-User] Cluster disaster

Fri Nov 11 15:56:32 CET 2016

I really hope to find an explanation to all this mess.
Because i'm not very confident right now..

So far if i understand all this correctly.. I'm not very found of how watchdog behaves with crm/lrm.
To  make a comparison with PVE 3 (RedHat cluster), fencing happened on the corosync/cluster communication stack, but not on the resource manager stack.

On PVE 3, several times I found rgmanager was stuck.
I just had to find the culprit process (usually pve status), kill it, et voila.
But it never caused an outage.

> > 2 - There seems to be a bug in lrm.
> >
> > Tonight i have seen timeouts in qmstarts in /var/log/pve/tasks/active.
> > Just after the timeouts, lrm was kind of stuck doing nothing.
> 
> If it's doing nothing it would be interesting to see in which state it is.
> Because if it's already online and  active the watchdog must trigger if
> it is stuck for ~60 seconds or more.

I'll try to grab some info if it happens again.

> Hmm, this means the watchdog was already running out.

Do you have a hint why there is no messages in the logs when watchdog actually seems to trigger fencing ?
Because when a node suddently reboots, i can't be sure if it's the watchdog, a hardware bug, kernel bug or whatever..

> Yeah I looked a bit through logs of two of your nodes, it looks like the
> system hit quite some bottle necks..
> CRM/LRM run often in 'loop took to long' errors the filesystem also is
> sometimes not writable.
> You have in some logs some huge retransmit list from corosync.

Yes, there were much retransmits on "9 Nov 14:56".
This matches when we tried to switch network path, because at this time the nodes did not seem to talk to each other correctly (lrm waiting for quorum.)

Anyway I need to triple check (again) IGMP snooping on all network switchs.
+ Check HP blades Virtual Connect and firmwares..

> Where does your cluster communication happens, not on the storage
> network?

Storage is on fibre channel.
Cluster communication happens on a dedicated network vlan (shared with vmware.)
I also use another vlan for live migrations.