[pve-devel] [PATCH ha-manager 1/9] implement static service stats cache

Fiona Ebner f.ebner at proxmox.com
Thu Oct 16 13:12:56 CEST 2025


Am 30.09.25 um 4:21 PM schrieb Daniel Kral:
> As the HA Manager builds the static load scheduler, it queries the
> services' static usage by reading and parsing the static guest configs
> individually, which can take significant time with respect to how many
> times recompute_online_node_usage(...) is called.
> 
> PVE::Cluster exposes an efficient interface to gather a set of
> properties from one or all guest configs [0]. This is used here to build
> a rather short-lived cache on every (re)initialization of the static
> load scheduler.
> 
> The downside to this approach is if there are way more non-HA managed
> guests in the cluster than HA-managed guests, which causes this to load
> much more information than necessary. It also doesn't cache the default
> values for the environment-specific static service usage, which causes
> quite a bottleneck as well.
> 
> [0] pve-cluster cf1b19d (add get_guest_config_property IPCC method)
> 
> Suggested-by: Fiona Ebner <f.ebner at proxmox.com>
> Signed-off-by: Daniel Kral <d.kral at proxmox.com>
> ---
> If the above mentioned downside is too large for some setups, we could
> also just add a switch to enable/disable the static stats cache or
> automatically figure out if it brings any benefits.. But I think this
> will be good enough, especially with the latter patches making way fewer
> calls to get_service_usage(...).

Parsing the configs is much more costly than
PVE::Cluster::get_guest_config_properties(), so improving the situation
for the majority of use cases of the static scheduler still makes sense,
even if it hurts performance for some more exotic setups. And we would
like to consider usage of non-HA guests in the future too to make the
static scheduler more accurate even in such exotic setups. But with
PSI-based information that is already included in the node usage, no
extra preparation necessary there.

> 
>  src/PVE/HA/Env.pm             | 12 ++++++++++++
>  src/PVE/HA/Env/PVE2.pm        | 21 +++++++++++++++++++++
>  src/PVE/HA/Manager.pm         |  1 +

Needs a rebase because of changes in Manager.pm

> @@ -497,6 +499,25 @@ sub get_datacenter_settings {
>      };
>  }
>  
> +sub get_static_service_stats {
> +    my ($self, $id) = @_;
> +
> +    # undef if update_static_service_stats(...) failed before
> +    return undef if !defined($self->{static_service_stats});
> +
> +    return $self->{static_service_stats}->{$id} // {};

Can't returning '{}' when nothing is there lead to issues down the line?
If we return undef instead, it's consistent with not having anything
cached and the caller will fall back to loading the config.

> @@ -477,6 +467,8 @@ sub new {
>  
>      $self->{service_config} = $self->read_service_config();
>  
> +    $self->{static_service_stats} = undef;
> +
>      return $self;
>  }
>  
> @@ -943,6 +935,25 @@ sub watchdog_update {
>      return &$modify_watchog($self, $code);
>  }
>  
> +sub get_static_service_stats {
> +    my ($self, $id) = @_;
> +
> +    # undef if update_static_service_stats(...) failed before
> +    return undef if !defined($self->{static_service_stats});
> +
> +    return $self->{static_service_stats}->{$id} // {};

Same here.

> +}
> +
> +sub update_static_service_stats {
> +    my ($self) = @_;
> +
> +    my $filename = "$self->{statusdir}/static_service_stats";
> +    my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) };
> +    $self->log('warning', "unable to update static service stats cache - $@") if $@;
> +
> +    $self->{static_service_stats} = $stats;
> +}
> +
>  sub get_static_node_stats {
>      my ($self) = @_;
>  
> diff --git a/src/PVE/HA/Sim/Resources.pm b/src/PVE/HA/Sim/Resources.pm
> index 72623ee1..7641b1a9 100644
> --- a/src/PVE/HA/Sim/Resources.pm
> +++ b/src/PVE/HA/Sim/Resources.pm
> @@ -143,8 +143,8 @@ sub get_static_stats {
>      my $sid = $class->type() . ":$id";
>      my $hardware = $haenv->hardware();
>  
> -    my $stats = $hardware->read_static_service_stats();
> -    if (my $service_stats = $stats->{$sid}) {
> +    my $service_stats = $hardware->get_static_service_stats($sid);
> +    if (%$service_stats) {

Style nit: with the above change, this could be just $service_stats or a
definedness check for it. If really checking if there are any keys, I
prefer being explicit with scalar(keys ...)

>          return $service_stats;
>      } elsif ($id =~ /^(\d)(\d\d)/) {
>          # auto assign usage calculated from ID for convenience





More information about the pve-devel mailing list