[pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled

Alexandre DERUMIER aderumier at odiso.com
Sun Mar 10 10:41:53 CET 2013


Hi again, (damn I shouldn't work sunday ;)

I done some tcpdump:


on this setup:

    ---sw-core----
    |             |
  switch1         switch2
        |         |
        |         |
        host linux

sw-core: 10.0.0.1
switch1 : 10.0.0.2
switch2 :10.0.0.3

each switch have igmp querier enabled

linux host use bonding (active-backup), querier and multicast_snooping is enabled



now tcpdump on vmbr0 of linux host

I see only query coming from the sw-core with the lowest ip

09:44:38.692597 IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2
09:44:38.692597 IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2
...

Now I disable igmp quierer on sw-core, after around 1 min, I see igmp query comming for switch1
...
09:45:38.698272 IP switch1.odiso.net > all-systems.mcast.net: igmp query v2
09:46:38.703388 IP switch1.odiso.net > all-systems.mcast.net: igmp query v2

If I disable quierer on switch, switch2 will be the quierer.

So election works fine. (the igmp quierer with the lowest ip address is the master).

So, I restore initial setup, with 3 querier, sw-core become again the master.


I launch a tcpdump ,Now after some time (30min/1h) , I see some igmp queries comming from linux bridge and cisco switch at the same time !

10:43:26.703388 IP 0.0.0.0 > all-systems.mcast.net: igmp query v2
10:43:58.69233E IP sw-core.odiso.net > all-systems.mcast.net: igmp query v2

Note that linux bridges igmp query use 0.0.0.0 as source address.



I found the original mail cover letter of the patch about disabling by default the igmp querier on linux bridge 


http://en.usenet.digipedia.org/thread/18960/28749/
"
[0/3] bridge: Do not send multicast queries by default 2012-04-13 14:36

This series of patches is aimed to change the default multicast snooping behaviour to one that is safer to deploy in the wild. 


 There have been numerous reports of switches misbehaving with our current behaviour of sending general queries, 
presumably because we're using a zero source IP address which is unavoidable as using anything else would interfere with multicast querier elections 


"


So, I don't known for HP switchs, but for Cisco switches it seem to break the election of igmp.


Maybe my problem was that my proxmox host was the igmp quierer, and when I have shutted it down, no other igmp quierer have worked, and snooping have blocked all mutlticast address.


----- Mail original ----- 

De: "Alexandre DERUMIER" <aderumier at odiso.com> 
À: "Michael Rasmussen" <mir at datanom.net> 
Cc: pve-devel at pve.proxmox.com 
Envoyé: Dimanche 10 Mars 2013 08:29:59 
Objet: Re: [pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled 

>>@alexandre: What precise Cisco switch do you see the problems with? 
>>What IOS version? Are there any firmware upgrade available? 

cisco 2960g && cisco 6500. (can't remember ios version but version of 2012 for both) 

My biggest problem was 2 week ago, I shutdown 1 of my nodes, and after 2min, alls nodes on the same vlan 
(including differents cluster with differents multicast address) can't see each others. 
disabling igmp on linux bridge has resolved the problem. 
So it should be related to snooping & igmp queries, but I don't known if the problem is on physical switch or linux bridge. 

I'll try to reproduce the problem this week and will do some tcpdump to find the problem 


Now, I see a lot of bug reports on the net about snooping on linux bridge. (don't known if it's about snooping or igmp queries). 

And I trust more my good old cisco switchs than a 2 year old implementation on linux bridge. 

here another bug with igmp report from bridge and bonding, if failover occur in bonding, igmp report are not send anymore :/ 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1c3ac4289a0e4d60cbd4787b4a91de4a0c785df1 




----- Mail original ----- 

De: "Michael Rasmussen" <mir at datanom.net> 
À: pve-devel at pve.proxmox.com 
Envoyé: Samedi 9 Mars 2013 21:22:27 
Objet: Re: [pve-devel] corosync, multicast problem because of vmbr multicast_snooping enabled 

On Sat, 9 Mar 2013 18:33:58 +0000 
Dietmar Maurer <dietmar at proxmox.com> wrote: 

> 
> So I think we talk about switch bugs here, not normal behavior. 
> 
I am leaning towards the same conclusion since I have never seen 
those queries cause any problems here. 

@alexandre: What precise Cisco switch do you see the problems with? 
What IOS version? Are there any firmware upgrade available? 

According to Cisco the queries should not cause any problems but maybe 
this is what causes your problems: 

"Multicast routers send host-query messages periodically to refresh 
their knowledge of memberships present on their networks. If, after 
some number of queries, the Cisco IOS software discovers that no local 
hosts are members of a multicast group, the software stops forwarding 
onto the local network multicast packets from remote origins for that 
group and sends a prune message upstream toward the source." 

http://www.cisco.com/en/US/docs/ios/12_2/ip/configuration/guide/1cfmulti.html#wp1067822 

-- 
Hilsen/Regards 
Michael Rasmussen 

Get my public GnuPG keys: 
michael <at> rasmussen <dot> cc 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E 
mir <at> datanom <dot> net 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C 
mir <at> miras <dot> org 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 
-------------------------------------------------------------- 
The moving cursor writes, and having written, blinks on. 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list