[PVE-User] I lost the cluster communication in a 10 nodes cluster

Alwin Antreich aa at ipnerd.net
Fri Oct 19 09:36:11 CEST 2018


Hi,

On Thu, Oct 18, 2018, 17:24 Denis Morejon <denis.morejon at etecsa.cu> wrote:

> I lost the cluster communication again.
>
> I have been using Proxmox since version 1, and this is the first time It
> bothers me so much!
>
> - All the 10 nodes have the same version
>
> (pve-manager/5.2-9/4b30e8f9 (running kernel: 4.13.13-2-pve))
>
Is there a reason why you use an old kernel? 4.15.x is now the main kernel.


> - All they have the same date / time (It is one of the causes It could
> lose the communication)
>
> - The environment is ident (No new switch, no new server)
>
>
> And why all these nodes lost the communication at the same time ? If
> they are 10 at least 5 have to be with problems to lost the quorum and
> then the connection. Is it true?
>
It is actually, (10/2)-1 that can have trouble without loosing the quorum,
one partition needs to be bigger.


> I think it is something related to this proxmox version.
>
> What to do ?
>
As Thomas stated, check you multicast traffic. Corosync uses multicast for
it's cluster communication and the cluster filesystem sits on top of
corosync. So, if corosync is not working, neither is the pmxcfs.

--
Cheers,
Alwin


More information about the pve-user mailing list