[PVE-User] (Very) basic question regarding PVE Ceph integration
Frank Thommen
frank.thommen at uni-heidelberg.de
Mon Dec 17 13:20:50 CET 2018
On 12/17/18 9:23 AM, Eneko Lacunza wrote:
> Hi,
>
> El 16/12/18 a las 17:16, Frank Thommen escribió:
>>>> I understand that with the new PVE release PVE hosts (hypervisors)
>>>> can be
>>>> used as Ceph servers. But it's not clear to me if (or when) that makes
>>>> sense. Do I really want to have Ceph MDS/OSD on the same hardware
>>>> as my
>>>> hypervisors? Doesn't that a) accumulate multiple POFs on the same
>>>> hardware
>>>> and b) occupy computing resources (CPU, RAM), that I'd rather use
>>>> for my VMs
>>>> and containers? Wouldn't I rather want to have a separate Ceph
>>>> cluster?
>>> The integration of Ceph services in PVE started with Proxmox VE 3.0.
>>> With PVE 5.3 (current) we added CephFS services to the PVE. So you can
>>> run a hyper-converged Ceph with RBD/CephFS on the same servers as your
>>> VM/CT.
>>>
>>> a) can you please be more specific in what you see as multiple point of
>>> failures?
>>
>> not only I run the hypervisor which controls containers and virtual
>> machines on the server, but also the fileservice which is used to
>> store the VM and container images.
> I think you have less points of failure :-) because you'll have 3 points
> (nodes) of failure in an hyperconverged scenario and 6 in a separate
> virtualization/storage cluster scenario... it depends how you look at it.
Right, but I look at it from the service side: one hardware failure ->
one service affected vs. one hardware failure -> two service affected.
>>> b) depends on the workload of your nodes. Modern server hardware has
>>> enough power to be able to run multiple services. It all comes down to
>>> have enough resources for each domain (eg. Ceph, KVM, CT, host).
>>>
>>> I recommend to use a simple calculation for the start, just to get a
>>> direction.
>>>
>>> In principle:
>>>
>>> ==CPU==
>>> core='CPU with HT on'
>>>
>>> * reserve a core for each Ceph daemon
>>> (preferable on the same NUMA as the network; higher frequency is
>>> better)
>>> * one core for the network card (higher frequency = lower latency)
>>> * rest of the cores for OS (incl. monitoring, backup, ...), KVM/CT usage
>>> * don't overcommit
>>>
>>> ==Memory==
>>> * 1 GB per TB of used disk space on an OSD (more on recovery)
> Note this is not true anymore with Bluestore, because you have to add
> cache space into account (1GB for HDD and 3GB for SSD OSDs if I recall
> correctly.), and also currently OSD processes aren't that good with RAM
> use accounting... :)
>>> * enough memory for KVM/CT
>>> * free memory for OS, backup, monitoring, live migration
>>> * don't overcommit
>>>
>>> ==Disk==
>>> * one OSD daemon per disk, even disk sizes throughout the cluster
>>> * more disks, more hosts, better distribution
>>>
>>> ==Network==
>>> * at least 10 GbE for storage traffic (more the better),
>>> see our benchmark paper
>>> https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
>>>
> 10Gbit helps a lot with latency; small clusters can work perfectly with
> 2x1Gbit if they aren't latency-sensitive (we have been running a
> handfull of those for some years now).
I will keep the two points in mind. Thank you.
frank
More information about the pve-user
mailing list