[PVE-User] ceph-osd not starting after network related issues

Wed Jul 3 08:35:01 CEST 2019

Hi All,

Some feedback on my end. I managed to recover the "lost data" from one of
the other OSDs. Seems like my initial summary was a bit off, in that the
PG's was replicated, CEPH just wanted to confirm that the objects were
still relevant.

For future reference, I basically marked the OSD as lost

> ceph osd lost <id>

Then the PGs went into an incomplete state

After that I temporarily set an option on the OSDs to ignore the history
(osd_find_best_info_ignore_history_les). Got the info from
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-March/017270.html

After that CEPH was happy and started to rebalance the cluster, pheew,
crisis averted.

This failure did however convince me to increase our cluster size from 2:1
to 3:2. Sacrificing usable space for reliability.

Now I need to give feedback on what happened, this is what I am still not
sure about as SMART does not show any sector errors. I might as well start
a badblocks and see if I detect anything in there.

As always, I am open to other suggestion as to where to look for other
clues on what went wrong.

Kind regards

On Mon, 1 Jul 2019 at 09:10, Ian Coetzee <proxmox at iancoetzee.za.net> wrote:

> Hi All,
>
> This morning I have a bit of a big boo-boo on our production system.
>
> After a very sudden network outage somewhere during the night, one of my
> ceph-osd's is no longer starting up.
>
> If I try and start it manually, I get a very spectacular failure, see link.
>
> https://www.jacklin.co.za/zerobin/?04e2dcd13ab8dfc8#zKCISUvAm4o/6mnLmyu+8fSS1VumC65XaETt/dD7rn0=
>
> As near as I can tell, it seems to be asserting whether a file exsists, I
> have yet to determine which file that would be. Any pointers are welcome,
> as well as any other ideas to get the osd back. For some reason there is
> data on the osd that was not replicated to my other osd's, as such I can
> not just re-init this osd as some of the posts I could find suggests
>
> I am also going to head to the ceph ML in a bit (after I have registered)
>
> Kind regards
>
>