[PVE-User] Ceph: Monitors not running but cannot be destroyed or recreated
Frank Thommen
f.thommen at dkfz-heidelberg.de
Sun Jan 26 16:46:08 CET 2020
On 26/01/2020 14:14, Frank Thommen wrote:
> Dear all,
>
> I am trying to destroy "old" Ceph monitors but they can't be deleted and
> also cannot be recreated:
>
> I am currently configuring Ceph on our PVE cluster (3 nodes running PVE
> 6.1-3). There have been some "remainders" of a previous Ceph
> configuration which I had tried to configure while the nodes were not in
> a cluster configuration yet (and I had used the wrong network). However
> I had purged these configurations with `pveceph purge`. I have redone
> the basic Ceph configuration through the GUI on the first node and I
> have deleted the still existing managers through the GUI (to have a
> fresh start).
>
> A new monitor has been created on the first node automatically, but I am
> unable to delete the monitors on nodes 2 and 3. They show up as
> Status=stopped and Address=Unknown in the GUI and they cannot be started
> (no error message). In the syslog window I see (after rebooting node
> odcf-pve02):
>
> ------------
> Jan 26 13:51:53 odcf-pve02 systemd[1]: Started Ceph cluster monitor daemon.
> Jan 26 13:51:55 odcf-pve02 ceph-mon[1372]: 2020-01-26 13:51:55.450
> 7faa98ab9280 -1 mon.odcf-pve02 at 0(electing) e1 failed to get devid for :
> fallback method has serial ''but no model
> ------------
>
> On the other hand I see the same message on the first node, and there
> the monitor seems to work fine.
>
> Trying to destroy them results in the message, that there is no such
> monitor, and trying to create a new monitor on these nodes results in
> the message, that the monitor already exists.... I am stuck in this
> existence loop. Destroying or creating them also doesn't work on the
> commandline.
>
> Any idea on how to fix this? I'd rather not completely reinstall the
> nodes :-)
>
> Cheers
> frank
In an attempt to clean up the Ceph setup again, I ran
pveceph stop ceph.target
pveceph purge
on the first node. Now I get an
rados_connect failed - No such file or directory (500)
when I select Ceph in the GUI of any of the three nodes. A reboot of
all nodes didn't help.
frank
More information about the pve-user
mailing list