[pve-devel] corosync problems - need help

Alexandre DERUMIER aderumier at odiso.com
Sun Sep 14 08:18:09 CEST 2014


Hi, 

I have a corosync problem on my production cluster,
and I don't known how to debug.



Cluster is a 12 nodes cluster,
multicast is working fine


on this cluster, 2 nodes show "corosync [TOTEM ] Retransmit List" 


All nodes are show
------------------
# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  76636   2014-09-08 12:23:04  kvm6
   2   M  76636   2014-09-08 12:23:04  kvm4
   3   M  76636   2014-09-08 12:23:04  kvm3
   4   M  76636   2014-09-08 12:23:04  kvm2
   5   M  76636   2014-09-08 12:23:04  kvm5
   6   M  76672   2014-09-12 16:52:08  kvm1
   7   M  76636   2014-09-08 12:23:04  kvm8
   8   M  76636   2014-09-08 12:23:04  kvm7
   8   M  76636   2014-09-08 12:23:04  kvm9
  10   M  76636   2014-09-08 12:23:04  kvm10
  11   M  76944   2014-09-14 08:08:18  kvm11
  12   M      4   2014-09-03 06:57:27  kvm12


I have quorum
--------------
#cman_tool status
Version: 6.2.0
Config Version: 12
Cluster Name: odiso
Cluster Id: 3337
Cluster Member: Yes
Cluster Generation: 76944
Membership state: Cluster-Member
Nodes: 12
Expected votes: 12
Total votes: 12
Node votes: 1
Quorum: 7  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: kvm12
Node ID: 12
Multicast addresses: 239.192.13.22 
Node addresses: 10.3.94.59 




But I can't write anything in pmxcfs on any node (read is ok)

with a lot erros like this
kvm1 pmxcfs[65403]: [dcdb] notice: cpg_join retry 32310
kvm1 pmxcfs[65403]: [dcdb] notice: cpg_join retry 32320
kvm1 pmxcfs[65403]: [dcdb] notice: cpg_join retry 32330



Any idea ?

I would like to try to change corosync window_size, but how can I do it online ?

(and /etc/init.d/cman stop is hanging)



More information about the pve-devel mailing list