[pve-devel] corosync problems - need help

Alexandre DERUMIER aderumier at odiso.com
Tue Sep 16 08:33:56 CEST 2014


>>First, int is 32bit. Second, interger overflow does not raise an exception in C.
>>So that cannot be the reason.

Ok, sorry. ( I thinked about this because in log I was seeing increment up to around 65000, then no more log ).


What I have done yesterday :

- update all nodes to 3.10 kernel
- upgrade openvswitch to 2.3.0   (I had see an high cpu bug, and 2.3 fix it).


But don't help.

I have been able to bring back this node in the cluster for around 5min, then It begin to hang again.


Today, I'll try to shutdown corosync on all servers,

then start corosync on this node and join other nodes.

(I want be sure that it's not because I have 2 more nodes in my cluster)


I'll keep you in touch

----- Mail original -----

De: "Dietmar Maurer" <dietmar at proxmox.com>
À: "Alexandre DERUMIER" <aderumier at odiso.com>
Cc: pve-devel at pve.proxmox.com
Envoyé: Mardi 16 Septembre 2014 07:51:07
Objet: RE: [pve-devel] corosync problems - need help

> with retry around 65000 (16bits)
>
>
>
> and
> int retries = 0;
> result = cpg_join(dfsm->cpg_handle, &dfsm->cpg_group_name);
> if (result == CPG_ERR_TRY_AGAIN) {
> nanosleep(&tvreq, NULL);
> ++retries;
> if ((retries % 10) == 0)
> cfs_dom_message(dfsm->log_domain, "cpg_join retry %d",
> retries);
> goto loop;
> }
>
>
> could it be related to retries integer type?

First, int is 32bit. Second, interger overflow does not raise an exception in C.
So that cannot be the reason.



More information about the pve-devel mailing list