[PVE-User] Ceph OSD failure questions

Mon Jun 5 09:25:36 CEST 2023

Hi Stefan,

El 3/6/23 a las 13:47, Stefan Radman via pve-user escribió:
> I want to create a Proxmox VE HCI cluster on 3 old but indentical DL380 Gen9 hosts (128GB, Dual CPU, 4x1GbE, 2x10GbE, 6x1.2T SFF 10K 12Gb SAS HDD on P440ar controller).
>
> Corosync will run over 2 x 1GbE, connected to separate VLANs on different switches.
> Ceph storage network will be a 10GbE routed mesh.
>
> The P440ar controller will be switched to HBA mode.
>
> I am planning to use 2 HDDs as redundant boot disks with ZFS (a waste, I know).
>
> The other 4 HDDs will be used as Ceph OSDs in a single HDD pool.
> Considering a single OSD failure the HDD pool should provide ~3TB usable capacity.
>
> With 2 SFF slots still available I am considering adding one or two SSDs to each host for a Ceph SSD pool to improve performance for some virtual disks.
>
> I am thinking to install a single SSD in each host as the failure of a second SSD would limit the usable capacity to 50% of the SSD pool because Ceph would immediately try re-create the 3rd replica on the still working SSD on the same node (from what I have read up to now).
> A second SSD would thus not buy me any further usable capacity (I cannot create a pool of 4 SSDs because there are no more slots available).
> Is that correct?

Yes, you got it right if you're planning to use replicated pools with 
size=3 min=2 as recommended. You can consider using a second SSD in each 
node to provide fast wal/db space for HDD OSDs.

>
> With a single SSD in each host if that SSD fails, how would VMs on that same host behave?

VMs don't know about what happens to local OSDs in their host; for them 
they're as any other (remote) Ceph OSD.

> Are they going to continue to run happily or is I/O to their virtual disks going to stop until the SSD OSD is replaced?

This depends on the min value in the replica pool. Recommended value 2 
will keep all your VMs working. If that was 3, then all VMs in the 
cluster will stop their I/O.

> If I/O to the SSD pool stops for all VMs running on the affected host, would HA fail them over to another host? (considering that 2 copies of the data exist on the other 2 hosts)

That won't happen (and wouldn't help at all).

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/