<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Is anyone else having issues with multicast/quorum after upgrading
to 3.4?<br>
<br>
We have not been able to get our cluster back into a healthy state
since the upgrade last weekend.<br>
<br>
I found this post here:<br>
<br>
<a class="moz-txt-link-freetext" href="http://pve.proxmox.com/pipermail/pve-devel/2015-February/014356.html">http://pve.proxmox.com/pipermail/pve-devel/2015-February/014356.html</a><br>
<br>
which suggests there might be an issue with the 2.6.32-37 kernel
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
.<br>
<br>
We have downgraded the kernel on 9 of our 19 servers to 2.6.32-34,
however those 9 servers still cannot see each other according to
'pvecm nodes':<br>
<br>
Using the 2.6.32-37 kernels it appeared as though the nodes could
see each other...however /etc/pve continued to be in a read-only
state...even after a 'quorum' was formed (according to pvecm
status).<br>
<br>
Does anyone have any suggestions?<br>
<br>
Shain<br>
<br>
<br>
<br>
<br>
<br>
At this point we are unsure how to proceed and we cannot really
continue to just reboot hosts over and over again.<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 03/09/2015 03:04 PM, Shain Miley
wrote:<br>
</div>
<blockquote cite="mid:54FDEEA6.8070305@npr.org" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div class="moz-cite-prefix">Ok...after some testing it seems like
the new 3.4 servers are dropping (or at least not getting)
multicast packets:<br>
<br>
Here is a test between two 3.4 proxmox servers:<br>
<br>
root@proxmox3:~# asmping 224.0.2.1 proxmox1.npr.org<br>
asmping joined (S,G) = (*,224.0.2.234)<br>
pinging 172.31.2.141 from 172.31.2.33<br>
unicast from 172.31.2.141, seq=1 dist=0 time=1.592 ms<br>
unicast from 172.31.2.141, seq=2 dist=0 time=0.163 ms<br>
unicast from 172.31.2.141, seq=3 dist=0 time=0.136 ms<br>
unicast from 172.31.2.141, seq=4 dist=0 time=0.117 ms<br>
........<br>
<br>
--- 172.31.2.141 statistics ---<br>
11 packets transmitted, time 10702 ms<br>
unicast:<br>
11 packets received, 0% packet loss<br>
rtt min/avg/max/std-dev = 0.107/0.261/1.592/0.421 ms<br>
multicast:<br>
0 packets received, 100% packet loss<br>
<br>
<br>
<br>
and here are two other servers (ubuntu and debian) connected to
the same set of switches as the servers above:<br>
<br>
root@test2:~# asmping 224.0.2.1 testserver1.npr.org<br>
asmping joined (S,G) = (*,224.0.2.234)<br>
pinging 172.31.2.125 from 172.31.2.131<br>
multicast from 172.31.2.125, seq=1 dist=0 time=0.203 ms<br>
unicast from 172.31.2.125, seq=1 dist=0 time=0.322 ms<br>
unicast from 172.31.2.125, seq=2 dist=0 time=0.143 ms<br>
multicast from 172.31.2.125, seq=2 dist=0 time=0.150 ms<br>
unicast from 172.31.2.125, seq=3 dist=0 time=0.138 ms<br>
multicast from 172.31.2.125, seq=3 dist=0 time=0.146 ms<br>
unicast from 172.31.2.125, seq=4 dist=0 time=0.122 ms<br>
.........<br>
<br>
--- 172.31.2.125 statistics ---<br>
9 packets transmitted, time 8115 ms<br>
unicast:<br>
9 packets received, 0% packet loss<br>
rtt min/avg/max/std-dev = 0.114/0.150/0.322/0.061 ms<br>
multicast:<br>
9 packets received, 0% packet loss since first mc packet (seq
1) recvd<br>
rtt min/avg/max/std-dev = 0.118/0.142/0.203/0.026 ms<br>
<br>
As you can see multicast works fine there.<br>
<br>
<br>
All servers are running 2.6.32 kernels but not all the same
version (2.6.32-23-pve - 2.6.32-37-pve)<br>
<br>
Anyone have any suggestions as to why the Proxmox servers are
not seeing the multicast traffic?<br>
<br>
Thanks,<br>
<br>
Shain<br>
<br>
On 3/9/15 12:33 PM, Shain Miley wrote:<br>
</div>
<blockquote cite="mid:54FDCB77.4040104@npr.org" type="cite">
<meta http-equiv="Context-Type" content="text/html;
charset=windows-1252">
<div class="moz-cite-prefix">I am looking into the possibility
that there is a multicast issue here as I am unable to ping
any of the multicast ip address on any of the nodes.<br>
<br>
I have reached out to cisco support for some additional help.<br>
<br>
I will let you know what I find out.<br>
<br>
Thanks again,<br>
<br>
Shain<br>
<br>
<br>
On 3/9/15 11:54 AM, Eneko Lacunza wrote:<br>
</div>
<blockquote cite="mid:54FDC226.9080000@binovo.es" type="cite">
<div class="moz-cite-prefix">It seems yesterday something
happened at 20:40:53:<br>
<br>
Mar 08 20:40:53 corosync [TOTEM ] FAILED TO RECEIVE<br>
Mar 08 20:41:05 corosync [CLM ] CLM CONFIGURATION CHANGE<br>
Mar 08 20:41:05 corosync [CLM ] New Configuration:<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.48) <br>
Mar 08 20:41:05 corosync [CLM ] Members Left:<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.16) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.33) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.49) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.50) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.69) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.75) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.77) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.87) <br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.141)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.142)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.161)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.163)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.165)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.215)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.216)
<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.219)
<br>
Mar 08 20:41:05 corosync [CLM ] Members Joined:<br>
Mar 08 20:41:05 corosync [QUORUM] Members[16]: 1 2 4 5 6 7 8
10 11 12 13 14 15 16 17 19<br>
Mar 08 20:41:05 corosync [QUORUM] Members[15]: 1 2 4 5 6 7 8
11 12 13 14 15 16 17 19<br>
Mar 08 20:41:05 corosync [QUORUM] Members[14]: 1 2 4 5 6 7 8
11 12 14 15 16 17 19<br>
Mar 08 20:41:05 corosync [QUORUM] Members[13]: 1 2 4 5 6 7 8
11 12 15 16 17 19<br>
Mar 08 20:41:05 corosync [QUORUM] Members[12]: 1 2 4 5 6 7 8
11 12 15 17 19<br>
Mar 08 20:41:05 corosync [QUORUM] Members[11]: 1 2 4 5 6 7 8
11 12 15 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[10]: 1 2 4 5 6 7 8
11 12 17<br>
Mar 08 20:41:05 corosync [CMAN ] quorum lost, blocking
activity<br>
Mar 08 20:41:05 corosync [QUORUM] This node is within the
non-primary component and will NOT provide any services.<br>
Mar 08 20:41:05 corosync [QUORUM] Members[9]: 1 2 5 6 7 8 11
12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[8]: 1 2 5 6 7 11
12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[7]: 1 2 5 6 7 12
17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[6]: 1 2 6 7 12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[5]: 1 2 7 12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[4]: 1 2 12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[3]: 1 12 17<br>
Mar 08 20:41:05 corosync [QUORUM] Members[2]: 1 12<br>
Mar 08 20:41:05 corosync [QUORUM] Members[1]: 12<br>
Mar 08 20:41:05 corosync [CLM ] CLM CONFIGURATION CHANGE<br>
Mar 08 20:41:05 corosync [CLM ] New Configuration:<br>
Mar 08 20:41:05 corosync [CLM ] r(0) ip(172.31.2.48) <br>
Mar 08 20:41:05 corosync [CLM ] Members Left:<br>
Mar 08 20:41:05 corosync [CLM ] Members Joined:<br>
Mar 08 20:41:05 corosync [TOTEM ] A processor joined or left
the membership and a new membership was formed.<br>
Mar 08 20:41:05 corosync [CPG ] chosen downlist: sender
r(0) ip(172.31.2.48) ; members(old:17 left:16)<br>
Mar 08 20:41:05 corosync [MAIN ] Completed service
synchronization, ready to provide service<br>
<br>
Is the "pvecm nodes" similar in all nodes?<br>
<br>
I don't have experience troubleshooting corosync but it
seems you have to re-estrablish the corosync cluster and
quorum.<br>
<br>
Check "corosync-quorumtool -l -i" . Also check cman_tool
command for diagnosing the cluster.<br>
<br>
Is corosync service loaded and running? Does restarting it
change something (service cman restart) ?<br>
<br>
<br>
<br>
On 09/03/15 16:13, Shain Miley wrote:<br>
</div>
<blockquote cite="mid:54FDB89B.1080107@npr.org" type="cite">
<div class="moz-cite-prefix">Oddly enough...there is nothing
in the latest corosync logfile...however the one from last
night (when we started seeing the problem) has a lot of
info in it.<br>
<br>
Here is the link to entire file:<br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://717b5bb5f6a032ce28eb-fa7f03050c118691fd4b41bf00a93863.r71.cf1.rackcdn.com/corosync.log.1">http://717b5bb5f6a032ce28eb-fa7f03050c118691fd4b41bf00a93863.r71.cf1.rackcdn.com/corosync.log.1</a><br>
<br>
Thanks again for your help so far.<br>
<br>
Shain <br>
<br>
On 3/9/15 10:53 AM, Eneko Lacunza wrote:<br>
</div>
<blockquote cite="mid:54FDB404.8020903@binovo.es"
type="cite">
<div class="moz-cite-prefix">What about
/var/log/cluster/corosync.log ?<br>
<br>
On 09/03/15 15:34, Shain Miley wrote:<br>
</div>
<blockquote cite="mid:54FDAF5B.8040700@npr.org"
type="cite">
<div class="moz-cite-prefix">Yes,<br>
<br>
All the nodes are pingable and resolvable via their
hostname. <br>
<br>
Here is the ouput of 'pvecm nodes'<br>
<br>
<br>
root@proxmox13:~# pvecm nodes<br>
Node Sts Inc Joined Name<br>
1 X 964 proxmox22<br>
2 X 964 proxmox23<br>
3 X 756 proxmox24<br>
4 X 808 proxmox18<br>
5 X 964 proxmox19<br>
6 X 964 proxmox20<br>
7 X 964 proxmox21<br>
8 X 964 proxmox1<br>
9 X 0 proxmox2<br>
10 X 756 proxmox3<br>
11 X 964 proxmox4<br>
12 M 696 2014-10-20 01:10:09 proxmox13<br>
13 X 904 proxmox14<br>
14 X 848 proxmox15<br>
15 X 856 proxmox16<br>
16 X 836 proxmox17<br>
17 X 964 proxmox25<br>
18 X 960 proxmox26<br>
19 X 868 proxmox28<br>
<br>
Thanks,<br>
<br>
Shain<br>
<br>
On 3/9/15 10:23 AM, Eneko Lacunza wrote:<br>
</div>
<blockquote cite="mid:54FDACD3.5090800@binovo.es"
type="cite">pvecm nodes</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<title></title>
<span class="Apple-style-span"><span
class="Apple-style-span"><sub><span><img alt="NPR"
moz-do-not-send="true" height="13"
width="40"> </span></sub><span>| Shain
Miley| Manager of Systems and Infrastructure,
Digital Media | <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:smiley@npr.org">smiley@npr.org</a>
| p: 202-513-3649</span></span></span> </div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.binovo.es">www.binovo.es</a></pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<title></title>
<span class="Apple-style-span"><span
class="Apple-style-span"><sub><span><img alt="NPR"
moz-do-not-send="true" height="13" width="40"> </span></sub><span>|
Shain Miley| Manager of Systems and Infrastructure,
Digital Media | <a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:smiley@npr.org">smiley@npr.org</a> |
p: 202-513-3649</span></span></span> </div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.binovo.es">www.binovo.es</a></pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<title></title>
<span class="Apple-style-span"><span class="Apple-style-span"><sub><span><img
alt="NPR" moz-do-not-send="true" height="13"
width="40"> </span></sub><span>| Shain Miley|
Manager of Systems and Infrastructure, Digital Media | <a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:smiley@npr.org">smiley@npr.org</a> | p:
202-513-3649</span></span></span> </div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
pve-user mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:pve-user@pve.proxmox.com">pve-user@pve.proxmox.com</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user">http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user</a>
</pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<title></title>
<span class="Apple-style-span" style="border-collapse: separate;
color: rgb(0, 0, 0); font-family: Times; font-style: normal;
font-variant: normal; font-weight: normal; letter-spacing:
normal; line-height: normal; orphans: 2; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; font-size: medium;"><span
class="Apple-style-span" style="font-family: 'Times New
Roman'; font-size: 16px;"><sub><font face="Arial" size="3"><span
style="font-size: 12pt; font-family: Arial;"><img
alt="NPR" moz-do-not-send="true"
src="file:///Users/smiley/Pictures/npr_log_small.jpg"
height="13" width="40" align="top"> </span></font></sub><font
face="Arial" size="1" color="gray"><span style="font-size:
9pt; font-family: Arial; color: gray;">| Shain Miley|
Manager of Systems and Infrastructure, Digital Media | <a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:smiley@npr.org">smiley@npr.org</a> | p:
202-513-3649</span></font></span></span> </div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
pve-user mailing list
<a class="moz-txt-link-abbreviated" href="mailto:pve-user@pve.proxmox.com">pve-user@pve.proxmox.com</a>
<a class="moz-txt-link-freetext" href="http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user">http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user</a>
</pre>
</blockquote>
<br>
</body>
</html>