[PVE-User] PowerEdge R440 & watchdog timer
Eneko Lacunza
elacunza at binovo.es
Tue Apr 19 16:54:38 CEST 2022
Hi,
El 15/4/22 a las 18:04, Michael Rasmussen via pve-user escribió:
>> For the last 10 years I have been using Proxmox I have not have a lost
>> connection to a server for over 1 sec without it being intentionally
>> but if your circumstances is another usecase I would go for stackable
>> switches I have a port for either switch connected to my servers and
>> UPS control for all my servers.
>>
>> Loosing connection to a server for more than 1 sec can only mean
>> hardware failure or loss of power.
>>
> Forgot to mention that all my infrastructure and hardware is UPS
> controlled so only planned downtime has been when replacing UPS/battery
> in UPS (3 times) and one time when there was a longer period without
> power from the power grid (1 time and not planned ;-).
>
Unfortunately, starting with PVE 7.x we're seeing cluster issues (nodes
going out of quorum only to rejoin instantly) "too often".
This is why we create multiple links for corosync after upgrading
clusters to v7, so that one of these point-in-time issues with network
doesn't reboot a node.
So far it has worked well. Unfortunately, we haven't been able to find a
common pattern/cause in several clusters we see the issue.
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-user
mailing list