[pve-devel] [PATCH pve-ha-manager 1/3] add ressource awareness manager

Thomas Lamprecht t.lamprecht at proxmox.com
Mon Dec 13 12:29:12 CET 2021

Hi Alexandre,

On 13.12.21 11:58, DERUMIER, Alexandre wrote:
> Thanks for the fast review !
> I'll rework the patches with your comments.
> Do you prefer some another commits patches for each of your remark for easier review ?  or can I send a v2 directly the changes already merged ?

A v2 with the fixes squashed in works good for me, if you also think the commit-split
I suggested is Ok and do that one great, else it's OK too.

>> semi-related, Dietmar and I talked about adapting the RRD format and higher resolution
>> from PBS also for PVE, dropping the dependency on rrdcached, but PVE with the cluster
>> and broadcasting is quite a bit more complex compared to PBS, so the effort stalled.
>> If we make more use of those metrics here it would add further coupling to the existing
>> interface so maybe we'd rethink that through - but nothing that needs to stall this
>> completely, changing RRD would be better done at a major release anyway where we
>> can handle this somewhat cleanly.
> I think we had talked last year about single rrd file by metric, but I'm not sure it's the best way
> (with the number of files it could be generate).
> I would like to have another metrics in rrd, like pressure of cpu/mem, or true vm cgroup mem/cpu.
> I think we already stream them, but rrd part is not yet done.

I'd like to benchmark and evaluate the in-memory key-value store of pmxcfs
to see if it's ok to further expand usage of that one, as for PSI info it could be
OK to only have the last one, as they are already averaged by the kernel,
so we could start out by just broadcasting the last one outside of RRD - but
as said, I do not know if that's really (better) scalable.

> +    $stats->{cpu} = percentile($percentile, \@cpu) || 0;
> +    $stats->{mem} = percentile($percentile, \@mem) || 0;
> +    $stats->{maxmem} = percentile($percentile, \@maxmem) || 0;
> +    $stats->{maxcpu} = percentile($percentile, \@maxcpu) || 0;
> +    $stats->{totalcpu} = $stats->{cpu} * $stats->{maxcpu} * 100;
>> what is cpu/maxcpu exactly, the multiplication of both strikes me as a bit weird?
> maxcpu is the number of core,so totalcpu is the number of cores * current percent usage * 100.
> (This is to be able to compare vm/host with differents number of cores)
> But indeed, maybe I'm wrong with my maths ;) I need to check a little bit the ordering and compare code.

Ok, I was more confused because maxcpu and cpu had different units (count vs. load-percentage).

> +        #### FILTERING NODES WITH HARD CONSTRAINTS (vm can't be started)
> +       next if !check_hard_constraints($haenv, $vmid, $cd, $nodename, $node_stats, $vm_stats, $storecfg, $group_members_prio);
>> I mean, most of those should be covered by the grouping already?
> do you mean: manually grouping by user ? if yes, the point is here to get it done auto (for storage avalability, number de host core,maybe vmbr availability,....), because this is a pain to setup.

Hmm, OK, groups can still make sense if the admin wants to parition the cluster,
e.g., if its not homogenous (e.g., hyper-converged with storage and compute nodes)
or the like.

> For example, currently, if you forgot it, a vm with a local storage could be migrated to another node by HA (with error). Then user don't known how to migrate it back to original node.

I mean, it's definitively a pain point that users new to our HA stack run into,
so IMO good to improve.

Maybe the suggested `check_service_constraints` should then rather be a
`get_runnable_nodes($sid)` method in Env, not to sure yet though, they're both
roughly the same.

> But it's also checking dynamic values like available cpu/ram on host.

hmm, sounds like a third patch we could split out

1. commit: add basic constrained checking with storage and such automatically
2. commit: add resource/node usage tracking code-infrastructure
3. commit integrate that tracking into the constraints checker for dynamic
   target decisions
4. tests and such

As said, the split is not a must for me, but I think it could be a bit easier
to digest that way on review.


More information about the pve-devel mailing list