[PVE-User] Ceph and cold bootstrap...

Fri Sep 16 11:00:29 CEST 2016

On Fri, Sep 16, 2016 at 10:53:21AM +0200, Marco Gaiarin wrote:
> 
> I'm testing some error condition on my test ceph storage cluster.
> 
> Today i've booted it (cold boot, was off by yesterday).
> 
> The log say:
> 
>  2016-09-16 09:38:38.015517 mon.0 10.27.251.7:6789/0 59 : cluster [INF] mon.0 calling new monitor election
>  2016-09-16 09:38:38.078034 mon.1 10.27.251.11:6789/0 59 : cluster [INF] mon.1 calling new monitor election
>  2016-09-16 09:38:38.551916 mon.1 10.27.251.11:6789/0 62 : cluster [WRN] message from mon.0 was stamped 0.125730s in the future, clocks not synchronized
>  2016-09-16 09:38:38.555582 mon.0 10.27.251.7:6789/0 62 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1
>  2016-09-16 09:38:38.583642 mon.0 10.27.251.7:6789/0 63 : cluster [INF] HEALTH_OK
>  2016-09-16 09:38:38.641007 mon.0 10.27.251.7:6789/0 64 : cluster [WRN] mon.1 10.27.251.11:6789/0 clock skew 0.14422s > max 0.05s
>  2016-09-16 09:38:38.677682 mon.0 10.27.251.7:6789/0 65 : cluster [INF] monmap e2: 2 mons at {0=10.27.251.7:6789/0,1=10.27.251.11:6789/0}
>  2016-09-16 09:38:38.679335 mon.0 10.27.251.7:6789/0 66 : cluster [INF] pgmap v4383: 192 pgs: 192 active+clean; 7582 MB data, 14938 MB used, 1837 GB / 1852 GB avail
>  2016-09-16 09:38:38.679402 mon.0 10.27.251.7:6789/0 67 : cluster [INF] mdsmap e1: 0/0/0 up
>  2016-09-16 09:38:38.679475 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osdmap e20: 2 osds: 1 up, 2 in
>  2016-09-16 09:38:38.797378 mon.0 10.27.251.7:6789/0 69 : cluster [INF] pgmap v4384: 192 pgs: 102 stale+active+clean, 90 active+clean; 7582 MB data, 14938 MB used, 1837 GB / 1852 GB avail
>  2016-09-16 09:38:44.976103 mon.1 10.27.251.11:6789/0 65 : cluster [WRN] message from mon.0 was stamped 0.074651s in the future, clocks not synchronized
>  2016-09-16 09:39:38.583757 mon.0 10.27.251.7:6789/0 72 : cluster [INF] HEALTH_WARN; 102 pgs stale; 1/2 in osds are down
>  2016-09-16 09:43:42.025626 mon.0 10.27.251.7:6789/0 73 : cluster [INF] osd.1 out (down for 303.348204)
>  2016-09-16 09:43:42.064945 mon.0 10.27.251.7:6789/0 74 : cluster [INF] osdmap e21: 2 osds: 1 up, 1 in
>  2016-09-16 09:43:42.164823 mon.0 10.27.251.7:6789/0 75 : cluster [INF] pgmap v4385: 192 pgs: 102 stale+active+clean, 90 active+clean; 7582 MB data, 7484 MB used, 918 GB / 926 GB avail
>  2016-09-16 09:44:38.584772 mon.0 10.27.251.7:6789/0 83 : cluster [INF] HEALTH_WARN; 102 pgs stale; 102 pgs stuck stale; too many PGs per OSD (384 > max 300)
>  2016-09-16 09:45:32.454684 mon.0 10.27.251.7:6789/0 105 : cluster [WRN] message from mon.1 was stamped 0.050064s in the future, clocks not synchronized
> 
> my test VM, set for autostart, seems in 'limbo' (proxmox say it is
> started, but was not).
> One of the two OSD was in down/out state.
> 
> 
> I've waited some time to clock to syncronize, then i've started the
> OSD in down state, after that i was able to start the VM.
> 
> 
> Considering that a little clock skew at bootup could be normal, it is a
> consequence of having only two OSD (and/or monitor), or could be a
> common error condition?
> 
> There's some way to prevent, or automatically (script?) fix them at
> bootup?
> 
> 
> Thanks.
> 

two ceph nodes, two mons and two osds are all way too few for a
(production) ceph setup. at least three nodes/mons (for quorum reasons),
and multiple osds per storage node (for performance and failure reasons)
are required.