[PVE-User] Ceph: PANIC or DON'T PANIC? ;-)

Mon Nov 28 13:45:24 CET 2016

Hi Marco,

On 11/28/2016 01:05 PM, Marco Gaiarin wrote:
> 
> A very strange saturday evening. Hardware tooling, hacking, caffeine,
> ...
> 
> I'm still completing my CEPH storage cluster (now 2 node storage,
> waiting to add the third), but is it mostly ''on production''.
> So, after playing with server for some month, saturday i've shut down
> all the cluster, setup all the cables, switches, UPS, ... in a more
> decent and stable way.
> 
> To simulate a hard power outgage, i've not set the noout and nodown
> flags.
> 
> 
> After that, i've powered up all the cluster (first the 2 ceph storage
> node, after the 2 pve host nodes) and i've hit the first trouble:
> 
> 	2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
> 
> The trouble came from the fact that... my NTP server was on a VM, and
> despite the fact that the status was only 'HEALTH_WARN', i cannot
> access anymore the storage.

What did the full ceph status show?
Did you add all the monitors to your storage config in proxmox?
A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be
available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg.
1 mons down).

Did you configure timesyncd properly?
On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the
proper time, so every host knows which map is the current one.

> 
> I've solved adding more NTP server from other sites, and after some
> time the cluster go OK:
> 
> 	2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
> 
> and here the panic start.
> 
> 
> PVE interface report the Ceph cluster OK, report correctly all the stuffs
> (mon, osd, pools, pool usage, ...) but data cluster was not accessible:
> 
>  a) if i try to move a disk, reply with something like 'no available'.
> 
>  b) if i try to start VMs, they stalls...
> 
> The only strange things on log was that there's NO pgmap update, like
> before:
> 
> 	2016-11-26 16:59:31.588695 mon.0 10.27.251.7:6789/0 2317560 : cluster [INF] pgmap v2410540: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13569 kB/s rd, 2731 kB/s wr, 565 op/s
> 
> but really, on panic, i've not noted that.
> 
> 
> After some tests, i've finally do the right thing.
> 
>  1) i've set the noout and nodown flags.
> 
>  2) i've rebooted the ceph nodes, one by one.
> 
> After that, all the cluster start. VMs that was on stalls, immediately
> start.
> 
> 
> After that, i've understood that NTP is a crucial service for ceph, so
> it is needed to have a pool of servers. Still, i'm not sure this was
> the culprit.
> 
> 
> The second thing i've understood is that Ceph react badly to a total
> shutdown. In a datacenter this is probably acceptable.
> 
> I don't know if it is my fault, or at least there's THE RIGTH WAY to
> start a Ceph cluster from cold metal...
> 
> 
> Thanks.
> 

-- 
Cheers,
Alwin