[pve-devel] corosync problems - need help
Alexandre DERUMIER
aderumier at odiso.com
Wed Sep 17 08:11:06 CEST 2014
one last thing I don't have tested,
is to update libqb, which is really old on wheezy (0.11)
Last version is 0.17
and I have seen bugs related to corosync hanging because of libqb
https://bugs.launchpad.net/ubuntu/+source/libqb/+bug/1341496
I'll try to backport package from debian sid.
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier at odiso.com>
À: "Dietmar Maurer" <dietmar at proxmox.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Mardi 16 Septembre 2014 23:56:09
Objet: Re: [pve-devel] corosync problems - need help
Some news,
I finally stop/start the node (shutdown the vm too :( ),
and finally it join correctly the cluster.
So, I really don't known what could be hang... Damned...
BTW, do you had already have a look at corosync2 + pacemaker ? (Seem that this the supported model in rhel7)
I known that pacemker replace rgmanager, don't known if corosync2 need to do a lot of change in pmxfs.
----- Mail original -----
De: "Alexandre DERUMIER" <aderumier at odiso.com>
À: "Dietmar Maurer" <dietmar at proxmox.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Mardi 16 Septembre 2014 08:33:56
Objet: Re: [pve-devel] corosync problems - need help
>>First, int is 32bit. Second, interger overflow does not raise an exception in C.
>>So that cannot be the reason.
Ok, sorry. ( I thinked about this because in log I was seeing increment up to around 65000, then no more log ).
What I have done yesterday :
- update all nodes to 3.10 kernel
- upgrade openvswitch to 2.3.0 (I had see an high cpu bug, and 2.3 fix it).
But don't help.
I have been able to bring back this node in the cluster for around 5min, then It begin to hang again.
Today, I'll try to shutdown corosync on all servers,
then start corosync on this node and join other nodes.
(I want be sure that it's not because I have 2 more nodes in my cluster)
I'll keep you in touch
----- Mail original -----
De: "Dietmar Maurer" <dietmar at proxmox.com>
À: "Alexandre DERUMIER" <aderumier at odiso.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Mardi 16 Septembre 2014 07:51:07
Objet: RE: [pve-devel] corosync problems - need help
> with retry around 65000 (16bits)
>
>
>
> and
> int retries = 0;
> result = cpg_join(dfsm->cpg_handle, &dfsm->cpg_group_name);
> if (result == CPG_ERR_TRY_AGAIN) {
> nanosleep(&tvreq, NULL);
> ++retries;
> if ((retries % 10) == 0)
> cfs_dom_message(dfsm->log_domain, "cpg_join retry %d",
> retries);
> goto loop;
> }
>
>
> could it be related to retries integer type?
First, int is 32bit. Second, interger overflow does not raise an exception in C.
So that cannot be the reason.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list