[PVE-User] Strange cluster/graphics problem in 3-node cluster

Eneko Lacunza elacunza at binovo.es
Thu May 16 17:40:09 CEST 2019


Hi all,

In a 3-node cluster, we're experiencing a strange clustering problem.

Sometimes, the first node drops out of quorum, usually for some hours, 
only to return back to quorum later.

During the last 2 weeks, this has happened 7 times.

Additionally, one time the second and third node dropped out of quorum, 
and soon after first and third node reached quorum. Second node rejoined 
after a manual restart of pve-cluster.

The strange thing (at least for me) is that 2nd and 3rd node have lost 
rrd data around the times 1st node was out (no graphics at GUI for those 
hours). 1st node has all rrd data, graphics are complete.

I understand that we could have a network problem (we're trying to catch 
the problem live again for additional tests...), but why is rrd data 
missing on cluster-joined nodes? Any idea?


Servers:
node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for storage
node2 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G 
for storage
node3 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G 
for storage

VM storage is EMC VNXe3200
Switch is HP 5406zl with 5 switch-modules.
- Node1 is connected to module E (8x10G),
- node2 and node3 are connected to module A (24x1G).
Storage switches(2) are Cisco Catalyst 2960G

Nodes have plenty of free RAM (usage below 50%), use less than 10-20% 
max network, CPU mean use is below 20%)

(for all three nodes)
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1


Thanks a lot
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es



More information about the pve-user mailing list