[PVE-User] DRS proxmox

Wed Apr 6 09:18:43 CEST 2016


On 04/06/2016 08:49 AM, Mohamed Sadok Ben Jazia wrote:
> Thank you thomas,
> As i'm deploying a large number of CT, this is the best algorithm, and as i
> mentioned above, i can't wait next version of pve for live migration, also
Understandable.
> doing a backup/restore is not friendly once it takes time.
I would also strongly be _against_ backup/restore in this case.

cheers,
Thomas
> I'm going to commit this small method in github and upgrade it later.
> Best regards
> Le 6 avr. 2016 7:28 AM, "Thomas Lamprecht" <t.lamprecht at proxmox.com> a
> écrit :
>
>> Hi,
>>
>> On 04/05/2016 12:03 PM, Mohamed Sadok Ben Jazia wrote:
>>> Thank you Thomas
>>> I'm going to describe my thoughts about the DRS based on the project i'm
>>> working on, and i was stuck in this step.
>>> Starting from many clusters in different sub nets and locations, i want
>>> create a large number of LXC containers for my clients.
>>> So for one cluster with many nodes and shared storage, it's a greedy
>>> algorithm with best matches, and by considering the LXC live migration is
>>> not yet available, this is what i'm doing:
>>>
>>> For each new container, or re-sizing an old one, i loop all available
>> nodes
>>> in the cluster and see the one that uses more ressources without reaching
>>> the max possible hardware resources, in order to make nodes full.
>>> Optimization of this method is doing a silent migration when a container
>> is
>>> rebooted or restarted based on the same logic.
>> Ah okay, now I understand. This would be and "CT deployment tool" and
>> should be definitively more stable as the problems I mentioned in my email.
>> It does not really change dynamically the cluster but rather on the
>> checkpoints (create, stop CT), sounds quite cool.
>>
>>> What do you think of my logic (if it's clear until now).
>> The summarize from above seems good to me, if you really plan to
>> create/start/stop a lot of containers in the cluster try to keep the
>> evaluation algorithm rather simple so that it runs in O(n) time, else
>> you could run into performance problems.
>>
>>> Also, this point is not clear for me (* wait for the cluster to become
>>> stable (e.g. a few minutes no cluster action), can you explain the
>> reason.
>>
>> I thought here of (live) migrations, they give the network, and the
>> nodes some load, whille besides your algorithm there may also run other
>> - user triggered - actions which also need resources.
>> Further I want to wait a bit of time to let the cluster stabilize, else
>> it could trigger unnecessary migrations or an out of control feedback
>> loop, but this affects you less as you use static resource values (CPU
>> cores, max ram), as far as I've understood.
>>
>> best regards,
>> Thomas
>>
>>> On 5 April 2016 at 10:42, Thomas Lamprecht <t.lamprecht at proxmox.com>
>> wrote:
>>>> Hi,
>>>>
>>>> this idea was proposed quite some time ago and we planned to implement
>>>> it in the pve-ha-manager stack,
>>>> as it provides a lot of functionality needed for that.
>>>>
>>>> The general idea from our side is:
>>>> * wait for the cluster to become stable (e.g. a few minutes no cluster
>>>> action),
>>>> * evaluate the load
>>>> * see if there is a configuration which makes the load more equal, here
>>>> migrate "lighter" VMs first else we may get to big system time delays
>>>> which are bad for such systems and can cause instability.
>>>> * if there is any such configuration try to achieve it (migrating one VM
>>>> at a time).
>>>> * start at the beginning.
>>>>
>>>> There are a few question open, e.g. how to determine load _correctly_ as
>>>> there are various setups and indicators from memory, cpu, network and
>>>> IO, which may have different effects on different setups.
>>>> What happens in edge cases (fencing, ...)
>>>>
>>>> Also a static value which can be assigned to VMs would be nice, as just
>>>> because a VM is lightweight
>>>>
>>>> Thus we want to start simple, i.e. use static load balance, then a
>>>> simple dynamic on (e.g. CPU only) and at best with a simulation which
>>>> can evaluate how often migration happened and so on (wishlist).
>>>> And AFAIK, we want to "limit" it to HA Groups, meaning this group should
>>>> be balanced over the group assigned nodes.
>>>>
>>>> The point of this message is to summarize our (or better my) thoughts to
>>>> that topic and to notify you that there is already something planned and
>>>> also that there is a Project by us which someone who wants to implement
>>>> that could make use of, namely the Proxmox VE HA Manager.
>>>>
>>>> I appreciate the fact that you want to make something for PVE and wish
>>>> you the best,
>>>>  it could be a though worth for you to use some of the HA manager stack
>>>> using perl would help here, this way it could also land upstream.
>>>>
>>>> best regards,
>>>> Thomas
>>>>
>>>>
>>>> On 04/05/2016 11:04 AM, Mohamed Sadok Ben Jazia wrote:
>>>>> Hi list,
>>>>> For my proxmox infrastructure, i set a number of nodes of a cluster.
>>>>> I'm looking for a load-balancer, to make those tasks:
>>>>> -Choose the best node for a just created or resized CT/VM.
>>>>> -Live migration to gain ressources on nodes, or for optimisation.
>>>>> My idea is to create a dynamic resources scheduler that is integrated
>>>>> to my server side script to perform this function.
>>>>>
>>>>> Here is the ling to the project
>>>>>
>>>>> https://github.com/BenJaziaSadok/proxmox-DRS
>>>>>
>>>>> Any help with the algorithm or in the development is welcome
>>>>>
>>>>> Thank you