Advice request for new cluster
Eneko Lacunza
elacunza at binovo.es
Thu Mar 27 10:22:27 CET 2025
Hi all,
We're in the process of designing and procuring a new PVE HCI cluster.
We will service about 6,000 tenants, each running two docker containers,
one with PostgreSQL and the other a python web application.
Containers will be hosted in VMs, 250 tenants per VM. VMs will have
512GB RAM and 16-32 vcores. 4 VMs per server.
Based on current data, we expect to be memory constrained; CPU,
networking and disk (Ceph) not being our main concern.
We're looking to optimize our cost per GB of RAM, so our current plan is
to deploy 6 servers with 2,3TB RAM and 2 EPYC CPU sockets, each with 48
cores, 4x25 Gbps network and 3x7,68T disks for Ceph. The more expensive
alternative would be to deploy 12x 1,15TB RAM 1 EPYC servers (+30%
adquisition and about +40% running costs)
I have been reading reports about NUMA performance issues on Proxmox
mailing lists and elsewhere. Memory bandwith issues for example.
Based on what I understood, and seeing that we'll have more than 2,000
not very demanding containers in each server, I think those NUMA issues
shouldn't be a problem in our use case.
I'll be very grateful on any suggestion or comment about this NUMA issue
and our cluster design.
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-user
mailing list