[PVE-User] Recurring crashes after cluster upgrade from 5 to 6

Alexandre DERUMIER aderumier at odiso.com
Fri Nov 22 07:35:31 CET 2019


>>Anyway, just for my education, is there someone here who can explain
>>shortly what was the problem (unicast management ?) or who have a good
>>link regarding this "behavior" ? Thanks !

It was multiple bugs in corosync3
(difficult to explain, it was very difficult to debug)

here more details:

https://github.com/kronosnet/kronosnet/commit/f1a5de2141a73716c09566f294e3873add5c3ff3

https://github.com/kronosnet/kronosnet/commit/1338058fa634b08eee7099c0614e8076267501ff


----- Mail original -----
De: "Hervé Ballans" <herve.ballans at ias.u-psud.fr>
À: "proxmoxve" <pve-user at pve.proxmox.com>
Envoyé: Mercredi 20 Novembre 2019 16:12:53
Objet: Re: [PVE-User] Recurring crashes after cluster upgrade from 5 to 6

Dear all, 

Since we upgraded to these versions (15 days ago), we don't encounter 
anymore the problem :) 

We are still waiting a few more days to be sure of stability but looks 
good ! 

Anyway, just for my education, is there someone here who can explain 
shortly what was the problem (unicast management ?) or who have a good 
link regarding this "behavior" ? Thanks ! 

Cheers, 
rv 

Le 08/11/2019 à 11:18, Alexandre DERUMIER a écrit : 
> Hi, 
> 
> do you have upgrade all your nodes to 
> 
> corosync 3.0.2-pve4 
> libknet1:amd64 1.13-pve1 
> 
> 
> ? 
> 
> (available in pve-no-subscription et pve-enteprise repos) 
> 
> ----- Mail original ----- 
> De: "Eneko Lacunza" <elacunza at binovo.es> 
> À: "proxmoxve" <pve-user at pve.proxmox.com> 
> Envoyé: Jeudi 7 Novembre 2019 15:35:38 
> Objet: Re: [PVE-User] Recurring crashes after cluster upgrade from 5 to 6 
> 
> Hi all, 
> 
> We updated our office cluster to get the patch, but got a node reboot on 
> 31th october. Node was fenced and rebooted, everything continued working OK. 
> 
> Is anyone experencing yet this problem? 
> 
> Cheers 
> Eneko 
> 
> El 2/10/19 a las 18:09, Hervé Ballans escribió: 
>> Hi Alexandre, 
>> 
>> We encouter exactly the same problem as Laurent Caron (after upgrade 
>> from 5 to 6). 
>> 
>> So I tried your patch 3 days ago, but unfortunately, the problem still 
>> occurs... 
>> 
>> This is a really annoying problem, since sometimes, all the PVE nodes 
>> of our cluster reboot quasi-simultaneously ! 
>> And in the same time, we don't encounter this problem with our other 
>> PVE cluster in version 5. 
>> (And obviously we are waiting for a solution and a stable situation 
>> before upgrade it !) 
>> 
>> It seems to be a unicast or corosync3 problem, but logs are not really 
>> verbose at the time of reboot... 
>> 
>> Is there anything else to test ? 
>> 
>> Regards, 
>> Hervé 
>> 
>> Le 20/09/2019 à 17:00, Alexandre DERUMIER a écrit : 
>>> Hi, 
>>> 
>>> a patch is available in pvetest 
>>> 
>>> http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libknet1_1.11-pve2_amd64.deb 
>>> 
>>> 
>>> can you test it ? 
>>> 
>>> (you need to restart corosync after install of the deb) 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Laurent CARON" <lcaron at unix-scripts.info> 
>>> À: "proxmoxve" <pve-user at pve.proxmox.com> 
>>> Envoyé: Lundi 16 Septembre 2019 09:55:34 
>>> Objet: [PVE-User] Recurring crashes after cluster upgrade from 5 to 6 
>>> 
>>> Hi, 
>>> 
>>> 
>>> After upgrading our 4 node cluster from PVE 5 to 6, we experience 
>>> constant crashed (once every 2 days). 
>>> 
>>> Those crashes seem related to corosync. 
>>> 
>>> Since numerous users are reporting sych issues (broken cluster after 
>>> upgrade, unstabilities, ...) I wonder if it is possible to downgrade 
>>> corosync to version 2.4.4 without impacting functionnality ? 
>>> 
>>> Basic steps would be: 
>>> 
>>> On all nodes 
>>> 
>>> # systemctl stop pve-ha-lrm 
>>> 
>>> Once done, on all nodes: 
>>> 
>>> # systemctl stop pve-ha-crm 
>>> 
>>> Once done, on all nodes: 
>>> 
>>> # apt-get install corosync=2.4.4-pve1 libcorosync-common4=2.4.4-pve1 
>>> libcmap4=2.4.4-pve1 libcpg4=2.4.4-pve1 libqb0=1.0.3-1~bpo9 
>>> libquorum5=2.4.4-pve1 libvotequorum8=2.4.4-pve1 
>>> 
>>> Then, once corosync has been downgraded, on all nodes 
>>> 
>>> # systemctl start pve-ha-lrm 
>>> # systemctl start pve-ha-crm 
>>> 
>>> Would that work ? 
>>> 
>>> Thanks 
>>> 
>>> _______________________________________________ 
>>> pve-user mailing list 
>>> pve-user at pve.proxmox.com 
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 
>>> 
>>> _______________________________________________ 
>>> pve-user mailing list 
>>> pve-user at pve.proxmox.com 
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 
>> 
>> _______________________________________________ 
>> pve-user mailing list 
>> pve-user at pve.proxmox.com 
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 
> 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 




More information about the pve-user mailing list