mon_cmd_maybe_osd_create fail

Stefan Radman stefan.radman at me.com
Fri Sep 15 17:09:29 CEST 2023


We recently upgraded a 3-node HCI cluster to Proxmox VE 8.0 and Ceph Quincy.
Everything went as expected (thanks to pve7to8 and the great instructions on the wiki).

During a thorough check of the logs after the upgrade we found the message

Sep 08 12:38:31 pve3 ceph-osd[3462]: 2023-09-08T12:38:31.579+0200 7fd2815b73c0 -1 osd.0 8469 mon_cmd_maybe_osd_create fail: 'osd.0 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy

We have two device classes with 3 osd in the nvme class (one on each node) and no ssd class.
The nvme crush rule is exactly the same as the original replicated_rule.

root at pve3:~# smartctl -i /dev/nvme1n1 | grep -E '^(Model|Firmware|NVM)'
Model Number:                       SAMSUNG MZPLJ1T6HBJR-00007
Firmware Version:                   EPK9AB5Q
NVMe Version:                       1.3
root at pve3:~# ceph-volume lvm list /dev/nvme1n1 | grep -E '==|devices|crush'
====== osd.0 =======
      crush device class        nvme
      devices                   /dev/nvme1n1
root at pve3:~# ceph osd crush class ls
[
    "nvme",
    "hdd"
]
root at pve3:~# ceph osd crush class ls-osd nvme
0
1
2
root at pve3:~# ceph osd crush rule ls
replicated_rule
nvme
hdd

The mon_cmd_maybe_osd_create failure has been reported for the local nvme osd after every single node reboot since we installed the cluster back in 2021 (PVE7.0, Ceph Pacific).
Up to now we did not notice and haven’t experienced any negative impact.

Can someone tell us why we are seeing this message (despite not having an ssd class)?

Can/should we do something about it as suggested in the forum post below?

Ceph trying to reset class
https://forum.proxmox.com/threads/ceph-trying-to-reset-class.101841/

Thanks 

Stefan



More information about the pve-user mailing list