[PVE-User] Multicast problems with Intel X540 - 10Gtek network card?
Stefan M. Radman
smr at kmi.com
Tue Dec 4 23:50:41 CET 2018
Don't put your corosync traffic on bridges.
Dedicate an untagged interface on each node for corosync.
All you need for your cluster network is this:
auto eth3
iface eth3 inet static
address 192.168.10.201
netmask 255.255.255.0
#corosync ring0
Put that interface into an isolated VLAN with IGMP snooping enabled.
Prune that VLAN from all trunks to limit its extent and your troubles.
Stefan
On Dec 4, 2018, at 8:03 PM, Ronny Aasen <ronny+pve-user at aasen.cx<mailto:ronny+pve-user at aasen.cx>> wrote:
vmbr10 is a bridge (or as switch by another name)
if you want the switch to work reliably with multicast you probably need to enable multicast querier.
|echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier|
or you can disable snooping, so that it treats multicast as broadcast. |
echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping|
this problem with multicast traffic also may lead to unreliable ipv6 nd and nd-ra usage.
https://pve.proxmox.com/wiki/Multicast_notes have some more notes and exampes around mulicast_querier
kind regards
Ronny Aasen
On 04.12.2018 17:54, Eneko Lacunza wrote:
Hi all,
Seems I found the solution.
eth3 on proxmox1 is a broadcom 1gbit card connected to HPE switch; it is VLAN 10 untagged on the switch end.
I changed the vmbr10 bridge to use eth4.10 on the X540 card, and after ifdown/ifup and corosync and pve-cluster restart, now everything seems good; cluster is stable and omping is happy too after 10 minutes :)
It is strange because multicast is on VLAN 1 network...
Cheers and thanks a lot
Eneko
El 4/12/18 a las 16:18, Eneko Lacunza escribió:
hi Marcus,
El 4/12/18 a las 16:09, Marcus Haarmann escribió:
Hi,
you did not provide details about your configuration.
How is the network card set up ? Bonding ?
Send your /etc/network/interfaces details.
If bonding is active, check if the mode is correct in /proc/net/bonding.
We encountered differences between /etc/network/interfaces setup and resulting mode.
Also, check your switch configuration, VLAN setup, MTU etc.
Yes, sorry about that. I have double checked the switch and all 3 node SFP+ port have the same configuration.
/etc/network/interfaces in proxmox1 node:
auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual
auto vmbr10
iface vmbr10 inet static
address 192.168.10.201
netmask 255.255.255.0
bridge_ports eth3
bridge_stp off
bridge_fd 0
auto vmbr0
iface vmbr0 inet static
address 192.168.0.201
netmask 255.255.255.0
gateway 192.168.0.100
bridge_ports eth4
bridge_stp off
bridge_fd 0
auto eth4.100
iface eth4.100 inet static
address 10.0.2.1
netmask 255.255.255.0
up ip addr add 10.0.3.1/24 dev eth4.100
Cluster is running on vmbr0 network (192.168.0.0/24)
Cheers
Marcus Haarmann
Von: "Eneko Lacunza" <elacunza at binovo.es<mailto:elacunza at binovo.es>>
An: "pve-user" <pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>>
Gesendet: Dienstag, 4. Dezember 2018 15:57:10
Betreff: [PVE-User] Multicast problems with Intel X540 - 10Gtek network card?
Hi all,
We have just updated a 3-node Proxmox cluster from 3.4 to 5.2, Ceph
hammer to Luminous and the network from 1 Gbit to 10Gbit... one of the
three Proxmox nodes is new too :)
Generally all was good and VMs are working well. :-)
BUT, we have some problems with the cluster; promxox1 node joins and
then after about 4 minutes drops from the cluster.
All multicast tests
https://pve.proxmox.com/wiki/Multicast_notes#Using_omping_to_test_multicast
run fine except the last one:
*** proxmox1:
root at proxmox1:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox3 : waiting for response msg
proxmox4 : waiting for response msg
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox3 : given amount of query messages was sent
proxmox4 : given amount of query messages was sent
proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.073/0.184/0.390/0.061
proxmox3 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.092/0.207/0.421/0.068
proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.167/0.369/0.059
proxmox4 : multicast, xmt/rcv/%loss = 600/262/56%, min/avg/max/std-dev = 0.063/0.185/0.386/0.064
*** proxmox3:
root at proxmox3:/etc# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox1 : waiting for response msg
proxmox4 : waiting for response msg
proxmox4 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : waiting for response msg
proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox4 : given amount of query messages was sent
proxmox1 : given amount of query messages was sent
proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.083/0.193/1.030/0.055
proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.102/0.209/1.050/0.054
proxmox4 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.041/0.108/0.172/0.026
proxmox4 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.123/0.190/0.030
*** root at proxmox4:~# omping -c 600 -i 1 -F -q proxmox1 proxmox3 proxmox4
proxmox1 : waiting for response msg
proxmox3 : waiting for response msg
proxmox1 : waiting for response msg
proxmox3 : waiting for response msg
proxmox3 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : joined (S,G) = (*, 232.43.211.234), pinging
proxmox1 : given amount of query messages was sent
proxmox3 : given amount of query messages was sent
proxmox1 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.085/0.188/0.356/0.040
proxmox1 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.114/0.208/0.377/0.041
proxmox3 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.048/0.117/0.289/0.023
proxmox3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.134/0.290/0.026
Ok, so it seems we have a network problem on proxmox1 node. Network
cards are as follows:
- proxmox1: Intel X540 (10Gtek)
- proxmox3: Intel X710 (Intel)
- proxmox4: Intel X710 (Intel)
Switch is Dell N1224T-ON.
Does anyone have experience with Intel X540 chip network cards or Linux
ixgbe network driver or 10Gtek manufacturer?
If we change corosync communication to 1 Gbit network cards (broadcom)
connected to an old HPE 1800-24G switch, cluster is stable...
We also have a running cluster with Dell n1224T-ON switch and X710
network cards without issues.
Thanks a lot
Eneko
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list