[pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled
Alexandre DERUMIER
aderumier at odiso.com
Sun Mar 10 10:41:53 CET 2013
Hi again, (damn I shouldn't work sunday ;)
I done some tcpdump:
on this setup:
---sw-core----
| |
switch1 switch2
| |
| |
host linux
sw-core: 10.0.0.1
switch1 : 10.0.0.2
switch2 :10.0.0.3
each switch have igmp querier enabled
linux host use bonding (active-backup), querier and multicast_snooping is enabled
now tcpdump on vmbr0 of linux host
I see only query coming from the sw-core with the lowest ip
09:44:38.692597 IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2
09:44:38.692597 IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2
...
Now I disable igmp quierer on sw-core, after around 1 min, I see igmp query comming for switch1
...
09:45:38.698272 IP switch1.odiso.net > all-systems.mcast.net: igmp query v2
09:46:38.703388 IP switch1.odiso.net > all-systems.mcast.net: igmp query v2
If I disable quierer on switch, switch2 will be the quierer.
So election works fine. (the igmp quierer with the lowest ip address is the master).
So, I restore initial setup, with 3 querier, sw-core become again the master.
I launch a tcpdump ,Now after some time (30min/1h) , I see some igmp queries comming from linux bridge and cisco switch at the same time !
10:43:26.703388 IP 0.0.0.0 > all-systems.mcast.net: igmp query v2
10:43:58.69233E IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2
Note that linux bridges igmp query use 0.0.0.0 as source address.
I found the original mail cover letter of the patch about disabling by default the igmp querier on linux bridge
http://en.usenet.digipedia.org/thread/18960/28749/
"
[0/3] bridge: Do not send multicast queries by default 2012-04-13 14:36
This series of patches is aimed to change the default multicast snooping behaviour to one that is safer to deploy in the wild.
There have been numerous reports of switches misbehaving with our current behaviour of sending general queries,
presumably because we're using a zero source IP address which is unavoidable as using anything else would interfere with multicast querier elections
"
So, I don't known for HP switchs, but for Cisco switches it seem to break the election of igmp.
Maybe my problem was that my proxmox host was the igmp quierer, and when I have shutted it down, no other igmp quierer have worked, and snooping have blocked all mutlticast address.
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier at odiso.com>
À: "Michael Rasmussen" <mir at datanom.net>
Cc: pve-devel at pve.proxmox.com
Envoyé: Dimanche 10 Mars 2013 08:29:59
Objet: Re: [pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled
>>@alexandre: What precise Cisco switch do you see the problems with?
>>What IOS version? Are there any firmware upgrade available?
cisco 2960g && cisco 6500. (can't remember ios version but version of 2012 for both)
My biggest problem was 2 week ago, I shutdown 1 of my nodes, and after 2min, alls nodes on the same vlan
(including differents cluster with differents multicast address) can't see each others.
disabling igmp on linux bridge has resolved the problem.
So it should be related to snooping & igmp queries, but I don't known if the problem is on physical switch or linux bridge.
I'll try to reproduce the problem this week and will do some tcpdump to find the problem
Now, I see a lot of bug reports on the net about snooping on linux bridge. (don't known if it's about snooping or igmp queries).
And I trust more my good old cisco switchs than a 2 year old implementation on linux bridge.
here another bug with igmp report from bridge and bonding, if failover occur in bonding, igmp report are not send anymore :/
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1c3ac4289a0e4d60cbd4787b4a91de4a0c785df1
----- Mail original -----
De: "Michael Rasmussen" <mir at datanom.net>
À: pve-devel at pve.proxmox.com
Envoyé: Samedi 9 Mars 2013 21:22:27
Objet: Re: [pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled
On Sat, 9 Mar 2013 18:33:58 +0000
Dietmar Maurer <dietmar at proxmox.com> wrote:
>
> So I think we talk about switch bugs here, not normal behavior.
>
I am leaning towards the same conclusion since I have never seen
those queries cause any problems here.
@alexandre: What precise Cisco switch do you see the problems with?
What IOS version? Are there any firmware upgrade available?
According to Cisco the queries should not cause any problems but maybe
this is what causes your problems:
"Multicast routers send host-query messages periodically to refresh
their knowledge of memberships present on their networks. If, after
some number of queries, the Cisco IOS software discovers that no local
hosts are members of a multicast group, the software stops forwarding
onto the local network multicast packets from remote origins for that
group and sends a prune message upstream toward the source."
http://www.cisco.com/en/US/docs/ios/12_2/ip/configuration/guide/1cfmulti.html#wp1067822
--
Hilsen/Regards
Michael Rasmussen
Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
The moving cursor writes, and having written, blinks on.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list