[pve-devel] [PATCH qemu-server 02/10] add memory parser

Mon Jan 9 15:35:06 CET 2023

Am 05.01.23 um 15:21 schrieb DERUMIER, Alexandre:
> Le jeudi 05 janvier 2023 à 13:48 +0100, Fiona Ebner a écrit :
>> Am 02.01.23 um 12:23 schrieb DERUMIER, Alexandre:
>>> Le vendredi 16 décembre 2022 à 14:38 +0100, Fiona Ebner a écrit :
>>>> From a modularity standpoint, it would be nice to move the format
>>>> description, parsing+printing to PVE/QemuServer/Memory.pm,
>>>> similar to
>>>> how it is done in PVE/QemuServer/PCI.pm
>>>>
>>>> Since $confdesc->{memory}->{default} is now gone, load_defaults()
>>>> will
>>>> not return a default for 'memory' anymore. And $conf->{memory}
>>>> needs
>>>> to
>>>> be parsed everywhere. These two things will break getting static
>>>> usage
>>>> in HA manager, which uses load_defaults() and relies on $conf-
>>>>> {memory}
>>>> to be a single value right now. We can switch there to use
>>>> get_current_memory() too of course, but it'd require a versioned
>>>> Breaks+Depends.
>>>>
>>>> Alternatively, we could add a function in qemu-server for
>>>> calculating
>>>> the static stats and call that from HA. Requires a versioned
>>>> Breaks+Depends too, but then we'd be safe in the future and also
>>>> catch
>>>> such changes more easily. OTOH, it'd make the coupling go in the
>>>> other
>>>> direction: if HA manager wants to change what it uses for static
>>>> consideration, then qemu-server would need to adapt. So not sure.
>>>
>>> I was think about this,
>>> When dynamic scheduler will be implemented, you'll need to use
>>> values
>>> streamed from pvestatd.
>>> So why can't we do the same for static values ?  (maxmem && maxcpus
>>> are
>>> already send by pvestatd).
>>>
>>> This should avoid the need to parse vm config,
>>> and maybe avoid to use load_config ?
>>> (from your initial commit,
>>> https://antiphishing.cetsi.fr/proxy/v3?i=empzeXJKYXZmc05YYWxacy8a0i
>>> pdXQjseJrjy1XKYRo&r=VmtndDVTbzdiM2ZTWE5zNOCuIdjZagQdntGyKix9ckH0vqu
>>> 2mBQ8ZB8NuZkj4rZot7vf1bLmZyMMHUI_kNRzCQ&f=SXFHV0doZ0hlNkF0enZmVuReM
>>> 9JWfuGCjA-
>>> Pu0y17Kx4xMFPn1ZetTNBKCwDYMki&u=https%3A//git.proxmox.com/%3Fp%3Dpv
>>> e-ha-
>>> manager.git%3Ba%3Dcommit%3Bh%3D561e7f4bfb235fcdca5b0bbb8422ce742a5d
>>> a75f%2C&k=NcQA
>>> it seem to be slow)
>>>
>>>
>>
>> The information is already readily available on the cluster file
>> system,
>> so sending it around via pvestatd additionally isn't ideal IMHO.
>> maxmem
>> and maxcpus are only one per node and were not available before.
>>
>> The load_config() call is not really problematic, because the result
>> from cfs_read_file() is cached. The real issue is that
>> recompute_online_node_usage() and thus getting the static info is
>> called
>> very often currently. There was an RFC [0] to get the information
>> using
>> PVE::Cluster::get_guest_config_properties(), but it's only twice as
>> fast. Optimizing how often we call recompute_online_node_usage() can
>> give us much more.
>>
> Ok thanks for the details. I'll try to work on this after virtiomem,
> as I have big cluster (1000vms minimum), and I really need it.

I can also do it if you want, requires remove_service_usage(_to_node)
functions in the Rust backend and replacing
recompute_online_node_usage() calls with more fine-grained add/remove
usage calls.

> Maybe only recompute by increment add/del from/to nodes where the vms
> are migrated. (At least when we iterate through the main loop with
> services).

Yes, the idea is to introduce a remove_service_usage_to_node() as an
inverse to add_service_usage_to_node(). Adding or removing usage in
PVE/HA/Manager.pm's change_service_state() depending on old and new
state seems to be a natural way to do it. Currently, we do a full
recompute every time in change_service_state(). During migration we
count usage on both source and target, since the VM impacts both, that
should be fine already.

> 
> A do a full recompute on new crm run each 10s.

Yes. We could also do a full recompute after iterating every N services
just to be sure.

> 
>> In any case, HA manager needs to be adapted before the memory setting
>> can be turned into a property string.
>>
> 
> I was thinked about pvestatd statd, because it's already parsed on
> pvestatd side, and we only have to use raw metrics.
> But no problem, I can use QemuServer::Memory::get_current_memory() in
> pve-ha-manager. Still not sure it's the best way. or a special
> QemuServer::get_ha_stats() we could reuse later to add other stats ?

In both cases, a versioned Breaks+Depends is needed. The second approach
would likely avoid the need for versioned Breaks for similar changes in
the future, but it tightens the coupling with the qemu-server repo:
qemu-server then needs to know what information HA manager wants and
needs to adapt when that changes. IMHO both approaches can be fine.

Yes, we could then add the dynamic stats to such a method, but again,
it's a bit more coupling. Not sure if it's not just better to handle
those directly in HA manager.