[PVE-User] Ceph: PANIC or DON'T PANIC? ;-)

Alwin Antreich sysadmin-pve at cognitec.com
Tue Nov 29 14:40:44 CET 2016


Hi Marco,

On 11/29/2016 12:17 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
>> May you please show us the logs?
> 
> Ok, i'm here. With the log.
> 
> A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes
> (mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only
> mon). In order, mon.0, mon.1, mon.2 and mon.3.
> 
> These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs
> are all similar, so i hope these suffices.
> 
> 
> I've started my activity at 15.00, but before take down all the stuff
> i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked:
> 
> [...]
> 2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts
> 2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok
> 2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s
> 2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s
> [...]
> 2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s
> 2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
> 
> 
> Finished the P2V, i've started to power off the cluster, starting from
> the cpu nodes. After powering down a node, i've realized that i need it
> to do another thing, so i've re-powered on. ;-)
> 
> 2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
> 2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s
> 2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s
> [...]
> 2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s
> 2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s
> 2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts
> 2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok
> 2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts
> 2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok
> 2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s
> 2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s
> 2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized
> 2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s
> [...]
> 2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s
> 2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts
> 2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok
> 2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s
> [...]
> 2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s
> 2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s
> 2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s
> [...]
> 2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s
> 2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts
> 2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok
> 2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts
> 2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok
> 2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s
> [...]
> 2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s
> 2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s
> [...]
> 2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s
> 2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s
> 2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s
> [...]
> 2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s
> 2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized
> 2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s
> [...]
> 2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s
> 2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s
> [...]
> 2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s
> 2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s
> 2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s
> [...]
> 2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s
> 
> OK, here server was shut down and so logs stop.
> 
> 
> At power up, i got as sayed clock skew troubles, so i got status
> HEALTH_WARN:
> 
> 2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized
> 2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized
> 2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s
> 2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s
> 2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized
> 2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized
> 2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s
> 2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s
> 2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized
> 2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized
> 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
> 2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s
> 2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s
> 2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized
> 2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK
> 2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s
> 2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s
> 2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s
> 2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized
> 2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s
> 2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s
> 2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s
> 2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized
> 2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized
> 2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized
> 2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s
> 2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s
> 2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s
> 2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized
> 2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s
> 2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s
> 2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s
> 2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s
> 2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s
> 2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s
> 2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s
> 2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s
> 2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s
> 2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s
> 2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s
> 2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s
> 2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized
> 2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized
> 2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized
> 2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s
> 2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s
> 2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s
> 2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s
> 2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s
> 2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s
> 2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s
> 2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s
> 2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s
> 2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s
> 2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s
> 2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s
> 2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s
> 2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s
> 2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s
> 2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized
> 2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK
> 2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s
> 2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s
> 2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s
> 2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized
> 2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s
> 2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s
> 2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s
> 2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized
> 2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s
> 2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s
> 2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s
> 2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized
> 2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s
> 2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s
> 2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s
> 2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s
> 2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s
> 2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s
> 2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s
> 2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s
> 2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s
> 2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s
> 2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s
> 2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s
> 2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized
> 2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s
> 2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s
> 2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s
> 2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s
> 2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s
> 2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s
> 2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s
> 2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s
> 2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s
> 
> But after adding the new NTP sever and waiting some time, finally clock
> get in sync and status go to OK.
> But (this is the PANIC time) despite of the fact that 'ceph status' and
> pve interface say 'all OK', cluster does not work.
> 
> So i've started to reboot the CPU nodes (mon.2 and .3):
> 
> 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
> 2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK
> 2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK
> 2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized
> 2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK
> 2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s
> 2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized
> 2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s
> 2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized
> 2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected 
> 2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK
> 
> 
> With no luck. So finally i've set 'nodown' and 'noout' flags and
> rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as
> normal:
> 
> 2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set
> 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3
> 2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
> 2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s
> 2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s
> 2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s
> 2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized
> 2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized
> 2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s
> 2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s
> 2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s
> 2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in
> 2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot
> 2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot
> 2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in
> 2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized
> 2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s
> 2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s
> 2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected 
> 2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs
> 2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs
> 2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs
> 2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs
> 2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set
> 2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs
> 2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs
> 2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
> 2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3
> 2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3
> 2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs
> 2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs
> 2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
> 2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s
> 2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized
> 2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot
> 2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot
> 2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s
> 2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s
> 2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s
> 2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts
> 2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok
> 2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot
> 2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s
> 2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot
> 2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts
> 2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok
> 2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts
> 2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok
> 2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts
> 2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok
> 2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts
> 2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s
> 2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s
> 2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s
> 2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s
> 2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s
> 2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s
> 2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts
> 2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok
> 2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s
> [...]
> 2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s
> 2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok
> 2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s
> [...]
> 2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s
> 2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected 
> 2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s
> [...]
> 2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s
> 2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
> 2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s
> [...]
> 2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s
> 2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in
> 2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s
> [...]
> 2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s
> 2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in
> 2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s
> [...]
> 2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s
> 2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK
> 
> 
> Other things to note. In syslog (not ceph log) of mon.0 i've found for
> the first (falied) boot:
> 
> Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 ===
> Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica...
> Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service.
> Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica...
> Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 ===
> Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700  0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault
> Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
> Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault
> Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault
> Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault
> Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default'
> Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1
> Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 ===
> Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault
> Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault
> Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault
> Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault
> Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault
> Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default'
> Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1
> Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate
> 
> And for the second (working):
> 
> Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 ===
> Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica...
> Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service.
> Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica...
> Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
> Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 ===
> Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map
> Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica...
> Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service.
> Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
> Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 ===
> Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map
> Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica...
> Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service.
> Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
> 
> 
> So seems to me that at the first start (some) OSD fail to start. But,
> again, PVE and 'ceph status' report all OSDs as up&in.

What does the following command give you?

ceph osd pool get <POOLNAME> min_size

> 
> 
> Thanks.
> 

As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to
happen. And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again
and ceph needs its time, till all services are running.

-- 
Cheers,
Alwin



More information about the pve-user mailing list