[PVE-User] PVE 7 Ceph 16.2.6 osd crash

Thu Oct 28 11:34:15 CEST 2021

Hi Nada,

El 27/10/21 a las 20:41, nada escribió:
> are all your ceph nodes time synced ?
> PLS check chrony and drift time
> # ceph time-sync-status
Yes seems they are:
# ceph time-sync-status
{
     "time_skew_status": {
         "amaiur": {
             "skew": 0,
             "latency": 0,
             "health": "HEALTH_OK"
         },
         "2": {
             "skew": -0.0052040435139160159,
             "latency": 0.00021316702537780806,
             "health": "HEALTH_OK"
         },
         "3": {
             "skew": -0.0077342363594970704,
             "latency": 0.00020703031996249557,
             "health": "HEALTH_OK"
         }
     },
     "timechecks": {
         "epoch": 4648,
         "round": 1730,
         "round_status": "finished"
     }
}

If this was the issue, I'd expect all OSDs in that node to crash (not 
just one as happened)?

>
> in case you have new spare disk and free slot, try to add new OSD and 
> stabilize ceph cluster
> in case you do not have free slot (and you are sure that it failed) 
> you have to replace it
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd 
>
>
> before any changes identify disk and OSD status
> list disk bay at server # pvs
> # ceph osd tree
> # ceph device ls
> # ceph-volume lvm list > osd-lvm-list-202110XX
>
> BTW we started with ceph cluster here in June, so sorry i am beginner 
> with ceph
Thanks for your comments. Ceph (systemd) automatically restarted crashed 
OSD, it is working and cluster is healthy so no pressing worries here, 
just trying to understand what happened and see whether there's 
something we can do so that crash doesn't happen again :)

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/