[PVE-User] Multicast problems with Intel X540 - 10Gtek network card?
Eneko Lacunza
elacunza at binovo.es
Tue Dec 4 15:57:10 CET 2018
Hi all,
We have just updated a 3-node Proxmox cluster from 3.4 to 5.2, Ceph
hammer to Luminous and the network from 1 Gbit to 10Gbit... one of the
three Proxmox nodes is new too :)
Generally all was good and VMs are working well. :-)
BUT, we have some problems with the cluster; promxox1 node joins and
then after about 4 minutes drops from the cluster.
All multicast tests
https://pve.proxmox.com/wiki/Multicast_notes#Using_omping_to_test_multicast
run fine except the last one:
*** proxmox1:
root at proxmox1:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox3 : waiting for response msg
proxmox4 : waiting for response msg
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox3 : given amount of query messages was sent
proxmox4 : given amount of query messages was sent
proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.073/0.184/0.390/0.061
proxmox3 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.092/0.207/0.421/0.068
proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.167/0.369/0.059
proxmox4 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.063/0.185/0.386/0.064
*** proxmox3:
root at proxmox3:/etc# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox1 : waiting for response msg
proxmox4 : waiting for response msg
proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : waiting for response msg
proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox4 : given amount of query messages was sent
proxmox1 : given amount of query messages was sent
proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.083/0.193/1.030/0.055
proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.102/0.209/1.050/0.054
proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.041/0.108/0.172/0.026
proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.123/0.190/0.030
*** root at proxmox4:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox1 : waiting for response msg
proxmox3 : waiting for response msg
proxmox1 : waiting for response msg
proxmox3 : waiting for response msg
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : given amount of query messages was sent
proxmox3 : given amount of query messages was sent
proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.085/0.188/0.356/0.040
proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.114/0.208/0.377/0.041
proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.117/0.289/0.023
proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.134/0.290/0.026
Ok, so it seems we have a network problem on proxmox1 node. Network
cards are as follows:
- proxmox1: Intel X540 (10Gtek)
- proxmox3: Intel X710 (Intel)
- proxmox4: Intel X710 (Intel)
Switch is Dell N1224T-ON.
Does anyone have experience with Intel X540 chip network cards or Linux
ixgbe network driver or 10Gtek manufacturer?
If we change corosync communication to 1 Gbit network cards (broadcom)
connected to an old HPE 1800-24G switch, cluster is stable...
We also have a running cluster with Dell n1224T-ON switch and X710
network cards without issues.
Thanks a lot
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list