[PVE-User] PowerEdge R440 & watchdog timer

Eneko Lacunza elacunza at binovo.es
Tue Apr 19 16:54:38 CEST 2022


Hi,

El 15/4/22 a las 18:04, Michael Rasmussen via pve-user escribió:
>> For the last 10 years I have been using Proxmox I have not have a lost
>> connection to a server for over 1 sec without it being intentionally
>> but if your circumstances is another usecase I would go for stackable
>> switches I have a port for either switch connected to my servers and
>> UPS control for all my servers.
>>
>> Loosing connection to a server for more than 1 sec can only mean
>> hardware failure or loss of power.
>>
> Forgot to mention that all my infrastructure and hardware is UPS
> controlled so only planned downtime has been when replacing UPS/battery
> in UPS (3 times) and one time when there was a longer period without
> power from the power grid (1 time and not planned ;-).
>
Unfortunately, starting with PVE 7.x we're seeing cluster issues (nodes 
going out of quorum only to rejoin instantly) "too often".

This is why we create multiple links for corosync after upgrading 
clusters to v7, so that one of these point-in-time issues with network 
doesn't reboot a node.

So far it has worked well. Unfortunately, we haven't been able to find a 
common pattern/cause in several clusters we see the issue.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



More information about the pve-user mailing list