[pve-devel] [PATCH ha-manager 09/11] manager: use static resource scheduler when configured

Fri Nov 11 10:28:24 CET 2022

Am 10.11.22 um 15:37 schrieb Fiona Ebner:
> @@ -206,11 +207,30 @@ my $valid_service_states = {
>  sub recompute_online_node_usage {
So I was a bit worried that recompute_online_node_usage() would become
too inefficient with the new add_service_usage_to_node() overhead from
needing to read the guest configs. I now tested it with ~300 HA services
(minimal containers) running on my virtual test cluster.

Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds

While about a 10-fold increase, it's not too dramatic at least. I guess
that's what the caching of cfs files is for :)

Still, the function is currently not only called in the main loop in
manage(), but also in next_state_recovery() and change_service_state().

With, say, 400 HA services each on 5 nodes, if a node fails there's
400 calls from changing to freeze
400 calls from changing to recovery
400 calls in next_state_recovery
400 calls from changing to started
If we take a generous estimate that each call takes 0.1 seconds (there's
2000 services in total), that's 40+80+40 seconds in 3 bursts during the
fencing and recovery period.

Is that acceptable? Should I try to optimize how often the function is
called?