Advice request for new cluster

Eneko Lacunza elacunza at binovo.es
Thu Mar 27 10:22:27 CET 2025


Hi all,

We're in the process of designing and procuring a new PVE HCI cluster.

We will service about 6,000 tenants, each running two docker containers, 
one with PostgreSQL and the other a python web application.

Containers will be hosted in VMs, 250 tenants per VM. VMs will have 
512GB RAM and 16-32 vcores. 4 VMs per server.

Based on current data, we expect to be memory constrained; CPU, 
networking and disk (Ceph) not being our main concern.

We're looking to optimize our cost per GB of RAM, so our current plan is 
to deploy 6 servers with 2,3TB RAM and 2 EPYC CPU sockets, each with 48 
cores, 4x25 Gbps network and 3x7,68T disks for Ceph. The more expensive 
alternative would be to deploy 12x 1,15TB RAM 1 EPYC servers (+30% 
adquisition and about +40% running costs)

I have been reading reports about NUMA performance issues on Proxmox 
mailing lists and elsewhere. Memory bandwith issues for example.

Based on what I understood, and seeing that we'll have more than 2,000 
not very demanding containers in each server, I think those NUMA issues 
shouldn't be a problem in our use case.

I'll be very grateful on any suggestion or comment about this NUMA issue 
and our cluster design.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/




More information about the pve-user mailing list