[PVE-User] Three node Hyperconverged PVE+Ceph and failure domains...

Eneko Lacunza elacunza at binovo.es
Wed Mar 10 13:35:16 CET 2021

Hi Storm,

El 10/3/21 a las 13:28, storm escribió:
> when operating a 3-node cluster, you have to ensure that at least 2 
> nodes are up and operational.
> If you want the possibility for 2 nodes failing, you need to move to 
> the next odd number: 5 - you need at least a 5 node cluster if you 
> want to survive the loss of two nodes without problems.
> We have a 7 node cluster, so 3 nodes can fail, but we also have to 
> raise the Ceph - size to 4, because if three nodes fail you have a 
> high possibility, that placement groups will be unavailable because 
> they were replicated only to the three nodes which are down.
> btw - I think you should look at this hyperconverged solution as if it 
> were two different clusters, the proxmox cluster and the ceph cluster 
> although it is "all in one node"you are operating two clusters, with 
> different preconditions.
Woah I see you really paranoid. What is really the chance for 3 nodes to 
fail before ceph has automatically recovered the replicas?


> best regards
> Am 10/03/2021 um 11:47 schrieb Marco Gaiarin:
>> One of the most interesting configuration of PVE is the three node,
>> switchless (full mesh) configuration, depicted in some PVE docs, most
>> notably:
>>     https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
>>     https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09 
>> But lurking 'ceph-user' mailing list, some weeks ago, lead to an
>> interesting discussion about 'failure domains', and many user depicted
>> the three node cluster as 'insecure'.
>> The reasoning are about:
>> a) 'min_size = 2' is a must if you need to keep your data safe; you can
>>   set 'min_size = 1', but clearly there's no scrub/checksumming, so no
>> real guarantee against data corruption.
>> b) but in a three node setup, with 'min_size = 2', if a node goes down,
>>   the cluster switch in 'readonly' at the very first subsequent failure,
>> eg the cluster does not handle more then a failure.
>> c) you can change the failure domain, eg:
>>     mon osd down out subtree limit = osd
>>   but in this way you have to guarantee (at worst case) room for the
>> double of the space on a single node (eg, three node cluster with 2TB of
>> space each, to guarantee the 'min_size = 2' you cannot use more then 1TB
>> space on overral cluster; so, a 6TB total disk space for a 1TB usable
>> space).
>> I'm wrong? If not, the 3-node hyperconverged cluster is suitable only
>> for testing?
>> Thanks.
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



Director Técnico | Zuzendari teknikoa

Binovo IT Human Project

	943 569 206 <tel:943 569 206>

	elacunza at binovo.es <mailto:elacunza at binovo.es>

	binovo.es <//binovo.es>

	Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun

youtube <https://www.youtube.com/user/CANALBINOVO/> 	
	linkedin <https://www.linkedin.com/company/37269706/> 	

More information about the pve-user mailing list