[PVE-User] (Very) basic question regarding PVE Ceph integration

Mon Dec 17 11:52:26 CET 2018

Hi Alwin,

On 12/16/18 7:47 PM, Alwin Antreich wrote:
> On Sun, Dec 16, 2018 at 05:16:50PM +0100, Frank Thommen wrote:
>> Hi Alwin,
>>
>> On 16/12/18 15:39, Alwin Antreich wrote:
>>> Hello Frank,
>>>
>>> On Sun, Dec 16, 2018 at 02:28:19PM +0100, Frank Thommen wrote:
>>>> Hi,
>>>>
>>>> I understand that with the new PVE release PVE hosts (hypervisors) can be
>>>> used as Ceph servers.  But it's not clear to me if (or when) that makes
>>>> sense.  Do I really want to have Ceph MDS/OSD on the same hardware as my
>>>> hypervisors?  Doesn't that a) accumulate multiple POFs on the same hardware
>>>> and b) occupy computing resources (CPU, RAM), that I'd rather use for my VMs
>>>> and containers?  Wouldn't I rather want to have a separate Ceph cluster?
>>> The integration of Ceph services in PVE started with Proxmox VE 3.0.
>>> With PVE 5.3 (current) we added CephFS services to the PVE. So you can
>>> run a hyper-converged Ceph with RBD/CephFS on the same servers as your
>>> VM/CT.
>>>
>>> a) can you please be more specific in what you see as multiple point of
>>> failures?
>>
>> not only I run the hypervisor which controls containers and virtual machines
>> on the server, but also the fileservice which is used to store the VM and
>> container images.
> Sorry, I am still not quite sure, what your question/concern is.
> Failure tolerance needs to be planned into the system design, irrespective
> of service distribution.
> 
> Proxmox VE has a HA stack that restarts all services from a failed node
> (if configured) on a other node.
> https://pve.proxmox.com/pve-docs/chapter-ha-manager.html
> 
> Ceph does selfhealing (if enough nodes
> are available) or still works in a degraded state.
> http://docs.ceph.com/docs/luminous/start/intro/

Yes, I am aware of PVE and Ceph failover/healing capabilities.  But I 
always liked to separate basic and central services on the hardware 
level.  This way if one server "explodes", only one service is affected. 
  With PVE+Ceph on one node, such an outage would affect two basic 
services at once.  I don't say they wouldn't continue to run 
productively, but they would run in degraded and non-failure-safe mode - 
assumed we had three such nodes in the cluster - until the broken node 
can be restored.

But that's probably just my old-fashioned conservative approach.  That's 
why I wanted to ask the list members for their assessment ;-)

> [...]

Cheers
frank