[PVE-User] PVE 7 Ceph 16.2.6 osd crash
Eneko Lacunza
elacunza at binovo.es
Thu Oct 28 11:34:15 CEST 2021
Hi Nada,
El 27/10/21 a las 20:41, nada escribió:
> are all your ceph nodes time synced ?
> PLS check chrony and drift time
> # ceph time-sync-status
Yes seems they are:
# ceph time-sync-status
{
"time_skew_status": {
"amaiur": {
"skew": 0,
"latency": 0,
"health": "HEALTH_OK"
},
"2": {
"skew": -0.0052040435139160159,
"latency": 0.00021316702537780806,
"health": "HEALTH_OK"
},
"3": {
"skew": -0.0077342363594970704,
"latency": 0.00020703031996249557,
"health": "HEALTH_OK"
}
},
"timechecks": {
"epoch": 4648,
"round": 1730,
"round_status": "finished"
}
}
If this was the issue, I'd expect all OSDs in that node to crash (not
just one as happened)?
>
> in case you have new spare disk and free slot, try to add new OSD and
> stabilize ceph cluster
> in case you do not have free slot (and you are sure that it failed)
> you have to replace it
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
>
>
> before any changes identify disk and OSD status
> list disk bay at server # pvs
> # ceph osd tree
> # ceph device ls
> # ceph-volume lvm list > osd-lvm-list-202110XX
>
> BTW we started with ceph cluster here in June, so sorry i am beginner
> with ceph
Thanks for your comments. Ceph (systemd) automatically restarted crashed
OSD, it is working and cluster is healthy so no pressing worries here,
just trying to understand what happened and see whether there's
something we can do so that crash doesn't happen again :)
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-user
mailing list