[PVE-User] Strange cluster/graphics problem in 3-node cluster

Gianni Milo gianni.milo22 at gmail.com
Thu May 16 19:38:39 CEST 2019


Are you using LACP or linux bonding on node2,3 for the VM + cluster traffic
?

Are you using VLANs to separate VM/cluster traffic ?

Have you checked multicast notes in the pve wiki ? Have you tried UDPU
instead of multicast as last option ?

No idea about missing rrd graphs...


On Thu, 16 May 2019 at 16:41, Eneko Lacunza <elacunza at binovo.es> wrote:

> Hi all,
>
> In a 3-node cluster, we're experiencing a strange clustering problem.
>
> Sometimes, the first node drops out of quorum, usually for some hours,
> only to return back to quorum later.
>
> During the last 2 weeks, this has happened 7 times.
>
> Additionally, one time the second and third node dropped out of quorum,
> and soon after first and third node reached quorum. Second node rejoined
> after a manual restart of pve-cluster.
>
> The strange thing (at least for me) is that 2nd and 3rd node have lost
> rrd data around the times 1st node was out (no graphics at GUI for those
> hours). 1st node has all rrd data, graphics are complete.
>
> I understand that we could have a network problem (we're trying to catch
> the problem live again for additional tests...), but why is rrd data
> missing on cluster-joined nodes? Any idea?
>
>
> Servers:
> node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for
> storage
> node2 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G
> for storage
> node3 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G
> for storage
>
> VM storage is EMC VNXe3200
> Switch is HP 5406zl with 5 switch-modules.
> - Node1 is connected to module E (8x10G),
> - node2 and node3 are connected to module A (24x1G).
> Storage switches(2) are Cisco Catalyst 2960G
>
> Nodes have plenty of free RAM (usage below 50%), use less than 10-20%
> max network, CPU mean use is below 20%)
>
> (for all three nodes)
> # pveversion -v
> proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
> pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
> pve-kernel-4.15: 5.2-12
> pve-kernel-4.15.18-9-pve: 4.15.18-30
> corosync: 2.4.4-pve1
> criu: 2.11.1-1~bpo90
> glusterfs-client: 3.8.8-1
> ksm-control-daemon: 1.2-2
> libjs-extjs: 6.0.1-2
> libpve-access-control: 5.1-3
> libpve-apiclient-perl: 2.0-5
> libpve-common-perl: 5.0-43
> libpve-guest-common-perl: 2.0-18
> libpve-http-server-perl: 2.0-11
> libpve-storage-perl: 5.0-33
> libqb0: 1.0.3-1~bpo9
> lvm2: 2.02.168-pve6
> lxc-pve: 3.0.2+pve1-5
> lxcfs: 3.0.2-2
> novnc-pve: 1.0.0-2
> proxmox-widget-toolkit: 1.0-22
> pve-cluster: 5.0-31
> pve-container: 2.0-31
> pve-docs: 5.3-1
> pve-edk2-firmware: 1.20181023-1
> pve-firewall: 3.0-16
> pve-firmware: 2.0-6
> pve-ha-manager: 2.0-5
> pve-i18n: 1.0-9
> pve-libspice-server1: 0.14.1-1
> pve-qemu-kvm: 2.12.1-1
> pve-xtermjs: 1.0-5
> qemu-server: 5.0-43
> smartmontools: 6.5+svn4324-1
> spiceterm: 3.0-5
> vncterm: 1.5-3
> zfsutils-linux: 0.7.12-pve1~bpo1
>
>
> Thanks a lot
> Eneko
>
> --
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarraga bidea 2
> <https://maps.google.com/?q=Astigarraga+bidea+2&entry=gmail&source=g>, 2º
> izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>



More information about the pve-user mailing list