Advice request for new cluster
Eneko Lacunza
elacunza at
Thu Mar 27 10:22:27 CET 2025
Hi all,
We're in the process of designing and procuring a new PVE HCI cluster.
We will service about 6,000 tenants, each running two docker containers,
one with PostgreSQL and the other a python web application.
Containers will be hosted in VMs, 250 tenants per VM. VMs will have
512GB RAM and 16-32 vcores. 4 VMs per server.
Based on current data, we expect to be memory constrained; CPU,
networking and disk (Ceph) not being our main concern.
We're looking to optimize our cost per GB of RAM, so our current plan is
to deploy 6 servers with 2,3TB RAM and 2 EPYC CPU sockets, each with 48
cores, 4x25 Gbps network and 3x7,68T disks for Ceph. The more expensive
alternative would be to deploy 12x 1,15TB RAM 1 EPYC servers (+30%
adquisition and about +40% running costs)
I have been reading reports about NUMA performance issues on Proxmox
mailing lists and elsewhere. Memory bandwith issues for example.
Based on what I understood, and seeing that we'll have more than 2,000
not very demanding containers in each server, I think those NUMA issues
shouldn't be a problem in our use case.
I'll be very grateful on any suggestion or comment about this NUMA issue
and our cluster design.
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 |
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
More information about the pve-user
mailing list