[PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Frank Thommen f.thommen at dkfz-heidelberg.de
Tue Jan 5 20:24:56 CET 2021

Hi Uwe,

> did you look into the log of MON and OSD?

I can't see any specific MON and OSD logs. However the log available in 
the UI (Ceph -> Log) has lots of messages regarding scrubbing but no 
messages regarding issues with starting the monitor

> Can you provide the list of 
> installed packages of the affected host and the rest of the cluster?

let me compile the lists and post them somewhere.  They are quite long.

> Is the output of "ceph status" the same for all hosts?



> Regards,
>      Uwe
> Am 05.01.21 um 20:01 schrieb Frank Thommen:
>> On 04.01.21 12:44, Frank Thommen wrote:
>>> Dear all,
>>> one of our three PVE hypervisors in the cluster crashed (it was 
>>> fenced successfully) and rebooted automatically.  I took the chance 
>>> to do a complete dist-upgrade and rebooted again.
>>> The PVE Ceph dashboard now reports, that
>>>    * the monitor on the host is down (out of quorum), and
>>>    * "A newer version was installed but old version still running, 
>>> please restart"
>>> The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 is 
>>> installed. The hypervisor has been rebooted twice since the upgrade, 
>>> so it should be basically impossible that the old version is still 
>>> running.
>>> `systemctl restart ceph.target` and restarting the monitor through 
>>> the PVE Ceph UI didn't help. The hypervisor is running PVE 6.3-3 (the 
>>> other two are running 6.3-2 with monitor 14.2.15)
>>> What to do in this situation?
>>> I am happy with either UI or commandline instructions, but I have no 
>>> Ceph experience besides setting up it up following the PVE instructions.
>>> Any help or hint is appreciated.
>>> Cheers, Frank
>> In an attempt to fix the issue I destroyed the monitor through the UI 
>> and recreated it.  Unfortunately it can still not be started.  A popup 
>> tells me that the monitor has been started, but the overview still 
>> shows "stopped" and there is no version number any more.
>> Then I stopped and started Ceph on the node (`pveceph stop; pveceph 
>> start`) which resulted in a degraded cluster (1 host down, 7 of 21 
>> OSDs down). OSDs cannot be started through the UI either.
>> I feel extremely uncomfortable with this situation and would 
>> appreciate any hint as to how I should proceed with the problem.
>> Cheers, Frank
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

More information about the pve-user mailing list