[PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Frank Thommen f.thommen at dkfz-heidelberg.de
Tue Jan 5 20:01:31 CET 2021


On 04.01.21 12:44, Frank Thommen wrote:
> 
> Dear all,
> 
> one of our three PVE hypervisors in the cluster crashed (it was fenced 
> successfully) and rebooted automatically.  I took the chance to do a 
> complete dist-upgrade and rebooted again.
> 
> The PVE Ceph dashboard now reports, that
> 
>    * the monitor on the host is down (out of quorum), and
>    * "A newer version was installed but old version still running, 
> please restart"
> 
> The Ceph UI reports monitor version 14.2.11 while in fact 14.2.16 is 
> installed. The hypervisor has been rebooted twice since the upgrade, so 
> it should be basically impossible that the old version is still running.
> 
> `systemctl restart ceph.target` and restarting the monitor through the 
> PVE Ceph UI didn't help. The hypervisor is running PVE 6.3-3 (the other 
> two are running 6.3-2 with monitor 14.2.15)
> 
> What to do in this situation?
> 
> I am happy with either UI or commandline instructions, but I have no 
> Ceph experience besides setting up it up following the PVE instructions.
> 
> Any help or hint is appreciated.
> Cheers, Frank

In an attempt to fix the issue I destroyed the monitor through the UI 
and recreated it.  Unfortunately it can still not be started.  A popup 
tells me that the monitor has been started, but the overview still shows 
"stopped" and there is no version number any more.

Then I stopped and started Ceph on the node (`pveceph stop; pveceph 
start`) which resulted in a degraded cluster (1 host down, 7 of 21 OSDs 
down). OSDs cannot be started through the UI either.

I feel extremely uncomfortable with this situation and would appreciate 
any hint as to how I should proceed with the problem.

Cheers, Frank



More information about the pve-user mailing list