[PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Uwe Sauter uwe.sauter.de at gmail.com
Tue Jan 5 20:29:59 CET 2021


Frank,

Am 05.01.21 um 20:24 schrieb Frank Thommen:
> Hi Uwe,
> 
>> did you look into the log of MON and OSD?
> 
> I can't see any specific MON and OSD logs. However the log available in the UI (Ceph -> Log) has lots of messages 
> regarding scrubbing but no messages regarding issues with starting the monitor
> 

On each host the logs should be in /var/log/ceph. These should be rotated (see /etc/logrotate.d/ceph-common for details).

Regards,

	Uwe



> 
>> Can you provide the list of installed packages of the affected host and the rest of the cluster?
> 
> let me compile the lists and post them somewhere.  They are quite long.
> 
>>
>> Is the output of "ceph status" the same for all hosts?
> 
> yes
> 
> Frank
> 
>>
>>
>> Regards,
>>
>>      Uwe
>>
>> Am 05.01.21 um 20:01 schrieb Frank Thommen:
>>>
>>> On 04.01.21 12:44, Frank Thommen wrote:
>>>>
>>>> Dear all,
>>>>
>>>> one of our three PVE hypervisors in the cluster crashed (it was fenced successfully) and rebooted automatically.  I 
>>>> took the chance to do a complete dist-upgrade and rebooted again.
>>>>
>>>> The PVE Ceph dashboard now reports, that
>>>>
>>>>    * the monitor on the host is down (out of quorum), and
>>>>    * "A newer version was installed but old version still running, please restart"
>>>>
>>>> The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 is installed. The hypervisor has been rebooted 
>>>> twice since the upgrade, so it should be basically impossible that the old version is still running.
>>>>
>>>> `systemctl restart ceph.target` and restarting the monitor through the PVE Ceph UI didn't help. The hypervisor is 
>>>> running PVE 6.3-3 (the other two are running 6.3-2 with monitor 14.2.15)
>>>>
>>>> What to do in this situation?
>>>>
>>>> I am happy with either UI or commandline instructions, but I have no Ceph experience besides setting up it up 
>>>> following the PVE instructions.
>>>>
>>>> Any help or hint is appreciated.
>>>> Cheers, Frank
>>>
>>> In an attempt to fix the issue I destroyed the monitor through the UI and recreated it.  Unfortunately it can still 
>>> not be started.  A popup tells me that the monitor has been started, but the overview still shows "stopped" and there 
>>> is no version number any more.
>>>
>>> Then I stopped and started Ceph on the node (`pveceph stop; pveceph start`) which resulted in a degraded cluster (1 
>>> host down, 7 of 21 OSDs down). OSDs cannot be started through the UI either.
>>>
>>> I feel extremely uncomfortable with this situation and would appreciate any hint as to how I should proceed with the 
>>> problem.
>>>
>>> Cheers, Frank
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list