[PVE-User] Ceph OSD failure questions

Stefan Radman stefan.radman at me.com
Mon Jun 5 10:42:19 CEST 2023


Hi Eneko

Thank you for the answers.

> use replicated pools with size=3 min=2 as recommended

Yes, I am planning to use the safe defaults indeed :)

For now I am not too concerned about the HDD pool performance (yet) but will keep your wal/db suggestion in mind for the last remaining slot.

So it looks like I am going for a working design that is going to keep all VMs running even if the only SSD in the system fails and that’s the most important.

Thanks 

Stefan

> On Jun 5, 2023, at 09:25, Eneko Lacunza via pve-user <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>> wrote:
> 
> 
> From: Eneko Lacunza <elacunza at binovo.es <mailto:elacunza at binovo.es>>
> Subject: Re: [PVE-User] Ceph OSD failure questions
> Date: June 5, 2023 at 09:25:36 GMT+2
> To: pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
> 
> 
> Hi Stefan,
> 
> El 3/6/23 a las 13:47, Stefan Radman via pve-user escribió:
>> I want to create a Proxmox VE HCI cluster on 3 old but indentical DL380 Gen9 hosts (128GB, Dual CPU, 4x1GbE, 2x10GbE, 6x1.2T SFF 10K 12Gb SAS HDD on P440ar controller).
>> 
>> Corosync will run over 2 x 1GbE, connected to separate VLANs on different switches.
>> Ceph storage network will be a 10GbE routed mesh.
>> 
>> The P440ar controller will be switched to HBA mode.
>> 
>> I am planning to use 2 HDDs as redundant boot disks with ZFS (a waste, I know).
>> 
>> The other 4 HDDs will be used as Ceph OSDs in a single HDD pool.
>> Considering a single OSD failure the HDD pool should provide ~3TB usable capacity.
>> 
>> With 2 SFF slots still available I am considering adding one or two SSDs to each host for a Ceph SSD pool to improve performance for some virtual disks.
>> 
>> I am thinking to install a single SSD in each host as the failure of a second SSD would limit the usable capacity to 50% of the SSD pool because Ceph would immediately try re-create the 3rd replica on the still working SSD on the same node (from what I have read up to now).
>> A second SSD would thus not buy me any further usable capacity (I cannot create a pool of 4 SSDs because there are no more slots available).
>> Is that correct?
> 
> Yes, you got it right if you're planning to use replicated pools with size=3 min=2 as recommended. You can consider using a second SSD in each node to provide fast wal/db space for HDD OSDs.
> 
>> 
>> With a single SSD in each host if that SSD fails, how would VMs on that same host behave?
> 
> VMs don't know about what happens to local OSDs in their host; for them they're as any other (remote) Ceph OSD.
> 
>> Are they going to continue to run happily or is I/O to their virtual disks going to stop until the SSD OSD is replaced?
> 
> This depends on the min value in the replica pool. Recommended value 2 will keep all your VMs working. If that was 3, then all VMs in the cluster will stop their I/O.
> 
>> If I/O to the SSD pool stops for all VMs running on the affected host, would HA fail them over to another host? (considering that 2 copies of the data exist on the other 2 hosts)
> 
> That won't happen (and wouldn't help at all).
> 
> Cheers
> 
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
> 
> Tel. +34 943 569 206 | https://www.binovo.es <https://www.binovo.es/>
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
> 
> https://www.youtube.com/user/CANALBINOVO <https://www.youtube.com/user/CANALBINOVO>
> https://www.linkedin.com/company/37269706/
> 
> 
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list