[PVE-User] Again, Ceph: default timeout for osd?

Fri Dec 16 12:47:31 CET 2016

Mandi! Alexandre DERUMIER
  In chel di` si favelave...

> >>mon osd down out interval
> This is the time between when a monitor marks an OSD "down" (not
> currently serving data) and "out" (not considered *responsible* for
> data by the cluster). IO will resume once the OSD is down (assuming
> the PG has its minimum number of live replicas); it's just that data
> will be re-replicated to other nodes once an OSD is marked "out".

Seems clear to me. I try to make an example to be sure.

If i set:
	mon osd report timeout = 15
	mon osd down out interval = 300

happen:

a) after 15 seconds, irresponsive OSD get 'down', so IO resume

b) after 5 minutes, the OSD get marked 'out', and so rebalancing
 start.

I've still a doubt. If i set 'ceph osd set nodown', simply i put the
first timeout to 'never'? Explained as above, could be... and so it is
my fault that i've set the 'nodown'...

Wait...
	http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002438.html

ok, seems that 'noout' flag is the right thing to do. 'nodown' have to
be used only in 'bouncing' situation.

If simply i need to stop rebalancing, it suffices to set 'noout'.

> osd should go down in around 30s max. (in this time, the cluster will be stale)..
> but not 5min.

My experience say no. And if the parameter is 'mon osd report timeout',
also the docs say '300'.
Seems to me a total unreasonable value...

> (in ceph kraken, they have done optimisation for this detection
>  https://github.com/ceph/ceph/pull/8558)

Interesting. This is not my case, anyway, because i've rebooted all the
server.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)