[PVE-User] Ceph: Monitors not running but cannot be destroyed or recreated
Frank Thommen
f.thommen at dkfz-heidelberg.de
Sun Jan 26 23:51:54 CET 2020
On 26/01/2020 16:46, Frank Thommen wrote:
> On 26/01/2020 14:14, Frank Thommen wrote:
>> Dear all,
>>
>> I am trying to destroy "old" Ceph monitors but they can't be deleted
>> and also cannot be recreated:
>>
>> I am currently configuring Ceph on our PVE cluster (3 nodes running
>> PVE 6.1-3). There have been some "remainders" of a previous Ceph
>> configuration which I had tried to configure while the nodes were not
>> in a cluster configuration yet (and I had used the wrong network).
>> However I had purged these configurations with `pveceph purge`. I
>> have redone the basic Ceph configuration through the GUI on the first
>> node and I have deleted the still existing managers through the GUI
>> (to have a fresh start).
>>
>> A new monitor has been created on the first node automatically, but I
>> am unable to delete the monitors on nodes 2 and 3. They show up as
>> Status=stopped and Address=Unknown in the GUI and they cannot be
>> started (no error message). In the syslog window I see (after
>> rebooting node odcf-pve02):
>>
>> ------------
>> Jan 26 13:51:53 odcf-pve02 systemd[1]: Started Ceph cluster monitor
>> daemon.
>> Jan 26 13:51:55 odcf-pve02 ceph-mon[1372]: 2020-01-26 13:51:55.450
>> 7faa98ab9280 -1 mon.odcf-pve02 at 0(electing) e1 failed to get devid for
>> : fallback method has serial ''but no model
>> ------------
>>
>> On the other hand I see the same message on the first node, and there
>> the monitor seems to work fine.
>>
>> Trying to destroy them results in the message, that there is no such
>> monitor, and trying to create a new monitor on these nodes results in
>> the message, that the monitor already exists.... I am stuck in this
>> existence loop. Destroying or creating them also doesn't work on the
>> commandline.
>>
>> Any idea on how to fix this? I'd rather not completely reinstall the
>> nodes :-)
>>
>> Cheers
>> frank
>
>
> In an attempt to clean up the Ceph setup again, I ran
>
> pveceph stop ceph.target
> pveceph purge
>
> on the first node. Now I get an
>
> rados_connect failed - No such file or directory (500)
>
> when I select Ceph in the GUI of any of the three nodes. A reboot of
> all nodes didn't help.
>
> frank
I was finally able to completely purge the old settings and reconfigure
Ceph with the various instructions from this
(https://forum.proxmox.com/threads/not-able-to-use-pveceph-purge-to-completely-remove-ceph.59606/)
post.
Maybe this information could be added to the official documentation
(unless there is a nicer way of completely resetting Ceph in a PROXMOX
cluster)?
frank
More information about the pve-user
mailing list