[PVE-User] Strange cluster/graphics problem in 3-node cluster
Eneko Lacunza
elacunza at binovo.es
Thu May 16 17:40:09 CEST 2019
Hi all,
In a 3-node cluster, we're experiencing a strange clustering problem.
Sometimes, the first node drops out of quorum, usually for some hours,
only to return back to quorum later.
During the last 2 weeks, this has happened 7 times.
Additionally, one time the second and third node dropped out of quorum,
and soon after first and third node reached quorum. Second node rejoined
after a manual restart of pve-cluster.
The strange thing (at least for me) is that 2nd and 3rd node have lost
rrd data around the times 1st node was out (no graphics at GUI for those
hours). 1st node has all rrd data, graphics are complete.
I understand that we could have a network problem (we're trying to catch
the problem live again for additional tests...), but why is rrd data
missing on cluster-joined nodes? Any idea?
Servers:
node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for storage
node2 - 2xE5507 4c - 96GB RAM - 2x1G for VM + cluster, 2x1G
for storage
node3 - 2xE5507 4c - 96GB RAM - 2x1G for VM + cluster, 2x1G
for storage
VM storage is EMC VNXe3200
Switch is HP 5406zl with 5 switch-modules.
- Node1 is connected to module E (8x10G),
- node2 and node3 are connected to module A (24x1G).
Storage switches(2) are Cisco Catalyst 2960G
Nodes have plenty of free RAM (usage below 50%), use less than 10-20%
max network, CPU mean use is below 20%)
(for all three nodes)
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
Thanks a lot
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list