[PVE-User] Ceph: Monitors not running but cannot be destroyed or recreated

Frank Thommen f.thommen at dkfz-heidelberg.de
Sun Jan 26 14:14:44 CET 2020


Dear all,

I am trying to destroy "old" Ceph monitors but they can't be deleted and 
also cannot be recreated:

I am currently configuring Ceph on our PVE cluster (3 nodes running PVE 
6.1-3).  There have been some "remainders" of a previous Ceph 
configuration which I had tried to configure while the nodes were not in 
a cluster configuration yet (and I had used the wrong network).  However 
I had purged these configurations with `pveceph purge`.  I have redone 
the basic Ceph configuration through the GUI on the first node and I 
have deleted the still existing managers through the GUI (to have a 
fresh start).

A new monitor has been created on the first node automatically, but I am 
unable to delete the monitors on nodes 2 and 3.  They show up as 
Status=stopped and Address=Unknown in the GUI and they cannot be started 
(no error message).  In the syslog window I see (after rebooting node 
odcf-pve02):

------------
Jan 26 13:51:53 odcf-pve02 systemd[1]: Started Ceph cluster monitor daemon.
Jan 26 13:51:55 odcf-pve02 ceph-mon[1372]: 2020-01-26 13:51:55.450 
7faa98ab9280 -1 mon.odcf-pve02 at 0(electing) e1 failed to get devid for : 
fallback method has serial ''but no model
------------

On the other hand I see the same message on the first node, and there 
the monitor seems to work fine.

Trying to destroy them results in the message, that there is no such 
monitor, and trying to create a new monitor on these nodes results in 
the message, that the monitor already exists.... I am stuck in this 
existence loop.  Destroying or creating them also doesn't work on the 
commandline.

Any idea on how to fix this?  I'd rather not completely reinstall the 
nodes :-)

Cheers
frank


More information about the pve-user mailing list