[PVE-User] PVE 6.2 Strange cluster node fence

Eneko Lacunza elacunza at binovo.es
Wed Apr 14 12:12:09 CEST 2021


Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribió:
> On Wed, 14 Apr 2021 11:04:10 +0200
> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com>  wrote:
>
>> Hi all,
>>
>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>
>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>> operating normally for a year. Last update was on January 21st 2021.
>> Storage is Ceph and nodes are connected to the same network switch
>> with active-pasive bonds.
>>
>> proxmox1 was fenced and automatically rebooted, then everything
>> recovered. HA restarted VMs in other nodes too.
>>
>> proxmox1 syslog: (no network link issues reported at device level)
> I have seen this occasionally and every time the cause was high network
> load/network congestion which caused token timeout. The default token
> timeout in corosync IMHO is very optimistically configured to 1000 ms
> so I have changed this setting to 5000 ms and after I have done this I
> have never seen fencing happening caused by network load/network
> congestion again. You could try this and see if that helps you.
>
> PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't 
include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on 
each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private 
networks.

I wouldn't expect high network load/congestion because it's on an 
internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



More information about the pve-user mailing list