[pve-devel] need help, lost quorum on all nodes

Alexandre DERUMIER aderumier at odiso.com
Tue Jan 15 07:08:59 CET 2013


>>Yes, AFAIK looks like a known issue with non proxmox ve kernels. But we have already a new corosync package. 
>>Dietmar will send the details and can explain the reason and fix. 

Thanks for the report Martin!


I don't like to use custom kernel, But I need a pnfs feature from which will be available in rhel 6.4 kernel. (so in not yet released openvz kernel).


----- Mail original ----- 

De: "Martin Maurer" <martin at proxmox.com> 
À: pve-devel at pve.proxmox.com 
Envoyé: Lundi 14 Janvier 2013 21:20:59 
Objet: Re: [pve-devel] need help, lost quorum on all nodes 

Yes, AFAIK looks like a known issue with non proxmox ve kernels. But we have already a new corosync package. 
Dietmar will send the details and can explain the reason and fix. 

Martin 

> -----Original Message----- 
> From: pve-devel-bounces at pve.proxmox.com [mailto:pve-devel- 
> bounces at pve.proxmox.com] On Behalf Of Alexandre DERUMIER 
> Sent: Montag, 14. Jänner 2013 20:47 
> To: pve-devel at pve.proxmox.com 
> Subject: Re: [pve-devel] need help, lost quorum on all nodes 
> 
> Ok, I found the problem. 
> 
> I had installed a custom 3.7 kernel on the upgrade node, and it seem to cause 
> problem to corosync cluster (I don't know why,I'll to investigate tomorrow) 
> 
> maybe it's related to dlm ? 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier at odiso.com> 
> À: pve-devel at pve.proxmox.com 
> Envoyé: Lundi 14 Janvier 2013 18:10:35 
> Objet: [pve-devel] need help, lost quorum on all nodes 
> 
> Hi, 
> 
> I have lost quorum on my 8 nodes cluster, when trying to upgrade one node 
> to last stable 
> 
> when the problem occur: 
> 
> Jan 14 17:25:34 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 17:25:34 corosync [CLM ] New Configuration: 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.38) Jan 14 17:25:34 corosync 
> [CLM ] r(0) ip(10.3.94.40) Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.49) 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.50) Jan 14 17:25:34 corosync 
> [CLM ] r(0) ip(10.3.94.51) Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.52) 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.53) Jan 14 17:25:34 corosync 
> [CLM ] Members Left: 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.39) Jan 14 17:25:34 corosync 
> [CLM ] Members Joined: 
> Jan 14 17:25:34 corosync [QUORUM] Members[7]: 1 2 3 4 5 6 8 Jan 14 17:25:34 
> corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 17:25:34 corosync 
> [CLM ] New Configuration: 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.38) Jan 14 17:25:34 corosync 
> [CLM ] r(0) ip(10.3.94.40) Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.49) 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.50) Jan 14 17:25:34 corosync 
> [CLM ] r(0) ip(10.3.94.51) Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.52) 
> Jan 14 17:25:34 corosync [CLM ] r(0) ip(10.3.94.53) Jan 14 17:25:34 corosync 
> [CLM ] Members Left: 
> Jan 14 17:25:34 corosync [CLM ] Members Joined: 
> Jan 14 17:25:34 corosync [TOTEM ] A processor joined or left the membership 
> and a new membership was formed. 
> Jan 14 17:25:35 corosync [CPG ] chosen downlist: sender r(0) ip(10.3.94.53) ; 
> members(old:8 left:1) Jan 14 17:25:35 corosync [MAIN ] Completed service 
> synchronization, ready to provide service. 
> Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 
> 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 
> 7c9 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 
> 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 
> 7ca Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 
> 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 
> 7ca Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 
> 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 7ca Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 
> 7d8 Jan 14 17:27:32 corosync [TOTEM ] Retransmit List: 7c9 7ca 7cb 7cc 7cd 7ce 
> 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 Jan 14 17:27:32 corosync [TOTEM ] 
> Retransmit List: 7cb 7cc 7cd 7ce 7cf 7d0 7d1 7d2 7d3 7d4 7d5 7d6 7d7 7d8 7c9 
> 7ca .... 
> Jan 14 17:29:36 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 17:29:36 corosync [CLM ] New Configuration: 
> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.40) Jan 14 17:29:36 corosync 
> [CLM ] r(0) ip(10.3.94.50) Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.51) 
> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.53) Jan 14 17:29:36 corosync 
> [CLM ] Members Left: 
> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.38) Jan 14 17:29:36 corosync 
> [CLM ] r(0) ip(10.3.94.49) Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.52) 
> Jan 14 17:29:36 corosync [CLM ] Members Joined: 
> Jan 14 17:29:36 corosync [QUORUM] Members[6]: 1 2 4 5 6 8 Jan 14 17:29:36 
> corosync [QUORUM] Members[5]: 1 2 4 5 8 Jan 14 17:29:36 corosync [CMAN ] 
> quorum lost, blocking activity Jan 14 17:29:36 corosync [QUORUM] This node 
> is within the non-primary component and will NOT provide any services. 
> Jan 14 17:29:36 corosync [QUORUM] Members[4]: 1 2 4 8 Jan 14 17:29:36 
> corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 17:29:36 corosync 
> [CLM ] New Configuration: 
> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.40) Jan 14 17:29:36 corosync 
> [CLM ] r(0) ip(10.3.94.50) Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.51) 
> Jan 14 17:29:36 corosync [CLM ] r(0) ip(10.3.94.53) Jan 14 17:29:36 corosync 
> [CLM ] Members Left: 
> Jan 14 17:29:36 corosync [CLM ] Members Joined: 
> Jan 14 17:29:36 corosync [TOTEM ] A processor joined or left the membership 
> and a new membership was formed. 
> Jan 14 17:29:36 corosync [CPG ] chosen downlist: sender r(0) ip(10.3.94.53) ; 
> members(old:7 left:3) Jan 14 17:29:36 corosync [MAIN ] Completed service 
> synchronization, ready to provide service. 
> 
> 
> But I can't get it up anymore 
> 
> I'm trying 
> 
> /etc/init.d/cman restart on each node 
> Starting cluster: 
> Checking if cluster has been disabled at boot... [ OK ] Checking Network 
> Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] 
> Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... 
> Timed-out waiting for cluster 
> 
> 
> 
> corosync log of node1 when restart cman 
> 
> Jan 14 18:04:10 corosync [SERV ] Unloading all Corosync service engines. 
> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: corosync 
> extended virtual synchrony service Jan 14 18:04:10 corosync [SERV ] Service 
> engine unloaded: corosync configuration service Jan 14 18:04:10 corosync 
> [SERV ] Service engine unloaded: corosync cluster closed process group 
> service v1.01 Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: 
> corosync cluster config database access v1.01 Jan 14 18:04:10 corosync [SERV 
> ] Service engine unloaded: corosync profile loading service Jan 14 18:04:10 
> corosync [SERV ] Service engine unloaded: openais cluster membership 
> service B.01.01 Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: 
> openais checkpoint service B.01.01 Jan 14 18:04:10 corosync [SERV ] Service 
> engine unloaded: openais event service B.01.01 Jan 14 18:04:10 corosync 
> [SERV ] Service engine unloaded: openais distributed locking service B.03.01 
> Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: openais message 
> service B.03.01 Jan 14 18:04:10 corosync [SERV ] Service engine unloaded: 
> corosync CMAN membership service 2.90 Jan 14 18:04:10 corosync [SERV ] 
> Service engine unloaded: corosync cluster quorum service v0.1 Jan 14 
> 18:04:10 corosync [SERV ] Service engine unloaded: openais timer service 
> A.01.01 Jan 14 18:04:10 corosync [MAIN ] Corosync Cluster Engine exiting with 
> status 0 at main.c:1856. 
> Jan 14 18:04:11 corosync [MAIN ] Corosync Cluster Engine ('1.4.4'): started 
> and ready to provide service. 
> Jan 14 18:04:11 corosync [MAIN ] Corosync built-in features: nss Jan 14 
> 18:04:11 corosync [MAIN ] Successfully read config from 
> /etc/cluster/cluster.conf Jan 14 18:04:11 corosync [MAIN ] Successfully 
> parsed cman config Jan 14 18:04:11 corosync [MAIN ] Successfully configured 
> openais services to load Jan 14 18:04:11 corosync [TOTEM ] Initializing 
> transport (UDP/IP Multicast). 
> Jan 14 18:04:11 corosync [TOTEM ] Initializing transmit/receive security: 
> libtomcrypt SOBER128/SHA1HMAC (mode 0). 
> Jan 14 18:04:11 corosync [TOTEM ] The network interface [10.3.94.49] is now 
> up. 
> Jan 14 18:04:11 corosync [QUORUM] Using quorum provider quorum_cman 
> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync cluster 
> quorum service v0.1 Jan 14 18:04:11 corosync [CMAN ] CMAN 1352871249 
> (built Nov 14 2012 06:34:12) started Jan 14 18:04:11 corosync [SERV ] Service 
> engine loaded: corosync CMAN membership service 2.90 Jan 14 18:04:11 
> corosync [SERV ] Service engine loaded: openais cluster membership service 
> B.01.01 Jan 14 18:04:11 corosync [SERV ] Service engine loaded: openais 
> event service B.01.01 Jan 14 18:04:11 corosync [SERV ] Service engine loaded: 
> openais checkpoint service B.01.01 Jan 14 18:04:11 corosync [SERV ] Service 
> engine loaded: openais message service B.03.01 Jan 14 18:04:11 corosync 
> [SERV ] Service engine loaded: openais distributed locking service B.03.01 Jan 
> 14 18:04:11 corosync [SERV ] Service engine loaded: openais timer service 
> A.01.01 Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync 
> extended virtual synchrony service Jan 14 18:04:11 corosync [SERV ] Service 
> engine loaded: corosync configuration service Jan 14 18:04:11 corosync [SERV 
> ] Service engine loaded: corosync cluster closed process group service v1.01 
> Jan 14 18:04:11 corosync [SERV ] Service engine loaded: corosync cluster 
> config database access v1.01 Jan 14 18:04:11 corosync [SERV ] Service engine 
> loaded: corosync profile loading service Jan 14 18:04:11 corosync [QUORUM] 
> Using quorum provider quorum_cman Jan 14 18:04:11 corosync [SERV ] 
> Service engine loaded: corosync cluster quorum service v0.1 Jan 14 18:04:11 
> corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of 
> the synchronization engine. 
> Jan 14 18:04:11 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 18:04:11 corosync [CLM ] New Configuration: 
> Jan 14 18:04:11 corosync [CLM ] Members Left: 
> Jan 14 18:04:11 corosync [CLM ] Members Joined: 
> Jan 14 18:04:11 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 18:04:11 corosync [CLM ] New Configuration: 
> Jan 14 18:04:11 corosync [CLM ] r(0) ip(10.3.94.49) Jan 14 18:04:11 corosync 
> [CLM ] Members Left: 
> Jan 14 18:04:11 corosync [CLM ] Members Joined: 
> Jan 14 18:04:11 corosync [CLM ] r(0) ip(10.3.94.49) Jan 14 18:04:11 corosync 
> [TOTEM ] A processor joined or left the membership and a new membership 
> was formed. 
> Jan 14 18:04:11 corosync [QUORUM] Members[1]: 6 Jan 14 18:04:11 corosync 
> [QUORUM] Members[1]: 6 Jan 14 18:04:11 corosync [CPG ] chosen downlist: 
> sender r(0) ip(10.3.94.49) ; members(old:0 left:0) Jan 14 18:04:11 corosync 
> [MAIN ] Completed service synchronization, ready to provide service. 
> 
> 
> corosync log of node2 when restart cman 
> 
> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: corosync 
> extended virtual synchrony service Jan 14 18:05:30 corosync [SERV ] Service 
> engine unloaded: corosync configuration service Jan 14 18:05:30 corosync 
> [SERV ] Service engine unloaded: corosync cluster closed process group 
> service v1.01 Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: 
> corosync cluster config database access v1.01 Jan 14 18:05:30 corosync [SERV 
> ] Service engine unloaded: corosync profile loading service Jan 14 18:05:30 
> corosync [SERV ] Service engine unloaded: openais cluster membership 
> service B.01.01 Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: 
> openais checkpoint service B.01.01 Jan 14 18:05:30 corosync [SERV ] Service 
> engine unloaded: openais event service B.01.01 Jan 14 18:05:30 corosync 
> [SERV ] Service engine unloaded: openais distributed locking service B.03.01 
> Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: openais message 
> service B.03.01 Jan 14 18:05:30 corosync [SERV ] Service engine unloaded: 
> corosync CMAN membership service 2.90 Jan 14 18:05:30 corosync [SERV ] 
> Service engine unloaded: corosync cluster quorum service v0.1 Jan 14 
> 18:05:30 corosync [SERV ] Service engine unloaded: openais timer service 
> A.01.01 Jan 14 18:05:30 corosync [MAIN ] Corosync Cluster Engine exiting with 
> status 0 at main.c:1856. 
> Jan 14 18:05:31 corosync [MAIN ] Corosync Cluster Engine ('1.4.4'): started 
> and ready to provide service. 
> Jan 14 18:05:31 corosync [MAIN ] Corosync built-in features: nss Jan 14 
> 18:05:31 corosync [MAIN ] Successfully read config from 
> /etc/cluster/cluster.conf Jan 14 18:05:31 corosync [MAIN ] Successfully 
> parsed cman config Jan 14 18:05:31 corosync [MAIN ] Successfully configured 
> openais services to load Jan 14 18:05:31 corosync [TOTEM ] Initializing 
> transport (UDP/IP Multicast). 
> Jan 14 18:05:31 corosync [TOTEM ] Initializing transmit/receive security: 
> libtomcrypt SOBER128/SHA1HMAC (mode 0). 
> Jan 14 18:05:31 corosync [TOTEM ] The network interface [10.3.94.50] is now 
> up. 
> Jan 14 18:05:31 corosync [QUORUM] Using quorum provider quorum_cman 
> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync cluster 
> quorum service v0.1 Jan 14 18:05:31 corosync [CMAN ] CMAN 1352871249 
> (built Nov 14 2012 06:34:12) started Jan 14 18:05:31 corosync [SERV ] Service 
> engine loaded: corosync CMAN membership service 2.90 Jan 14 18:05:31 
> corosync [SERV ] Service engine loaded: openais cluster membership service 
> B.01.01 Jan 14 18:05:31 corosync [SERV ] Service engine loaded: openais 
> event service B.01.01 Jan 14 18:05:31 corosync [SERV ] Service engine loaded: 
> openais checkpoint service B.01.01 Jan 14 18:05:31 corosync [SERV ] Service 
> engine loaded: openais message service B.03.01 Jan 14 18:05:31 corosync 
> [SERV ] Service engine loaded: openais distributed locking service B.03.01 Jan 
> 14 18:05:31 corosync [SERV ] Service engine loaded: openais timer service 
> A.01.01 Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync 
> extended virtual synchrony service Jan 14 18:05:31 corosync [SERV ] Service 
> engine loaded: corosync configuration service Jan 14 18:05:31 corosync [SERV 
> ] Service engine loaded: corosync cluster closed process group service v1.01 
> Jan 14 18:05:31 corosync [SERV ] Service engine loaded: corosync cluster 
> config database access v1.01 Jan 14 18:05:31 corosync [SERV ] Service engine 
> loaded: corosync profile loading service Jan 14 18:05:31 corosync [QUORUM] 
> Using quorum provider quorum_cman Jan 14 18:05:31 corosync [SERV ] 
> Service engine loaded: corosync cluster quorum service v0.1 Jan 14 18:05:31 
> corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of 
> the synchronization engine. 
> Jan 14 18:05:31 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 18:05:31 corosync [CLM ] New Configuration: 
> Jan 14 18:05:31 corosync [CLM ] Members Left: 
> Jan 14 18:05:31 corosync [CLM ] Members Joined: 
> Jan 14 18:05:31 corosync [CLM ] CLM CONFIGURATION CHANGE Jan 14 
> 18:05:31 corosync [CLM ] New Configuration: 
> Jan 14 18:05:31 corosync [CLM ] r(0) ip(10.3.94.50) Jan 14 18:05:31 corosync 
> [CLM ] Members Left: 
> Jan 14 18:05:31 corosync [CLM ] Members Joined: 
> Jan 14 18:05:31 corosync [CLM ] r(0) ip(10.3.94.50) Jan 14 18:05:31 corosync 
> [TOTEM ] A processor joined or left the membership and a new membership 
> was formed. 
> Jan 14 18:05:31 corosync [QUORUM] Members[1]: 4 Jan 14 18:05:31 corosync 
> [QUORUM] Members[1]: 4 Jan 14 18:05:31 corosync [CPG ] chosen downlist: 
> sender r(0) ip(10.3.94.50) ; members(old:0 left:0) Jan 14 18:05:31 corosync 
> [MAIN ] Completed service synchronization, ready to provide service. 
> 
> 
> Any idea ? 
> 
> 
> 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list