[pve-devel] [PATCH ha-manager 8/9] manager: make online node usage computation granular

Fri Oct 17 18:07:29 CEST 2025

On Fri Oct 17, 2025 at 2:32 PM CEST, Fiona Ebner wrote:
> Am 30.09.25 um 4:20 PM schrieb Daniel Kral:
>> The HA Manager builds $online_node_usage in every FSM iteration in
>> manage(...) and at every HA resource state change in
>> change_service_state(...). This becomes quite costly with a high HA
>> resource count and a lot of state changes happening at once, e.g.
>> starting up multiple nodes with rebalance_on_request_start set or a
>> failover of a node with many configured HA resources.
>> 
>> To improve this situation, make the changes to the $online_node_usage
>> more granular by building $online_node_usage only once per call to
>> manage(...) and changing the nodes a HA resource uses individually on
>> every HA resource state transition.
>> 
>> The change in service usage "freshness" should be negligible here as the
>> static service usage data is cached anyway (except if the cache fails
>> for some reason).
>
> But the cache is refreshed on every recompute_online_node_usage(), which
> happened much more frequently before, so the fact that it's cached
> doesn't seem like a strong argument here?
>
> I /do/ think there is a real tradeoff being made, namely "the ability to
> manage much larger fleets of guests" versus "immediately incorporating
> every guest config change in decisions". Config changes that would lead
> to wildly different decisions would need to be timed very badly to cause
> actual issues and should be rare to begin with. Also, with PSI-based
> information, things are also less "instant", I don't see an issue with
> moving in the same direction.

Right, I'll change that to better reflect the tradeoff here!