[PVE-User] Ceph and cold bootstrap...

Adam Thompson athompso at athompso.net
Fri Sep 16 14:02:56 CEST 2016

We've observed that if any of the nodes boot much faster or slower than the other nodes, this causes big problems with both CEPH and PVE, particularly with quorum issues.
I've just finished switching a 9-node cluster to NFS because CEPH was too unreliable after repeated power failure crashes.
Turns out powering off the last few nodes is hard because they've lost quorum by that point and hang during shutdown for longer than the UPSes last.
Unless you have redundant power (I.e. generator) I'm not sure I would ever recommend a large PVE+CEPH cluster again.

On September 16, 2016 4:36:41 AM CDT, Marco Gaiarin <gaio at sv.lnf.it> wrote:
>Mandi! Fabian Grünbichler
>  In chel di` si favelave...
>> two ceph nodes, two mons and two osds are all way too few for a
>> (production) ceph setup.
>I know, this is my 'test' ceph cluster as stated... ;-)
>> at least three nodes/mons (for quorum reasons),
>> and multiple osds per storage node (for performance and failure
>> are required.
>Production, as planned, will have 3 nodes/mon, and 2 OSD per node.
>I'm simply curious if starting a ceph cluster from cold iron could be a
>common failure condition, or is a consequence of my little setup...
>dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
>Associazione ``La Nostra Famiglia''         
>Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento
>marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f
>    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
>	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
>pve-user mailing list
>pve-user at pve.proxmox.com

Sent from my Android device with K-9 Mail. Please excuse my brevity.

More information about the pve-user mailing list