[pve-devel] [PATCH ha-manager v2 7/8] manager: make online node usage computation granular

Daniel Kral d.kral at proxmox.com
Mon Oct 20 18:45:37 CEST 2025


The HA Manager builds $online_node_usage in every FSM iteration in
manage(...) and at every HA resource state change in
change_service_state(...). This becomes quite costly with a high HA
resource count and a lot of state changes happening at once, e.g.
starting up multiple nodes with rebalance_on_request_start set or a
failover of a node with many configured HA resources.

To improve this situation, make the changes to the $online_node_usage
more granular by building $online_node_usage only once per call to
manage(...) and changing the nodes a HA resource uses individually on
every HA resource state transition. This allows the HA Manager to handle
many more HA resources with the static load scheduler.

Signed-off-by: Daniel Kral <d.kral at proxmox.com>
---
changes since v1:
 - remove FIXME
 - remove argument about cache from patch message
 - use add_service_usage(...) helper from $online_node_usage now
 - did not add R-b from Fiona as add_service_usage(...) was moved

 src/PVE/HA/Manager.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index bf6895ad..3bd6e1a6 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -238,8 +238,6 @@ my $valid_service_states = {
     error => 1,
 };
 
-# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact
-# that this function is called for each state change and upon recovery doesn't help.
 sub recompute_online_node_usage {
     my ($self) = @_;
 
@@ -317,7 +315,9 @@ my $change_service_state = sub {
         $sd->{$k} = $v;
     }
 
-    $self->recompute_online_node_usage();
+    $self->{online_node_usage}->remove_service_usage($sid);
+    $self->{online_node_usage}
+        ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
 
     $sd->{uid} = compute_new_uuid($new_state);
 
@@ -709,6 +709,8 @@ sub manage {
         delete $ss->{$sid};
     }
 
+    $self->recompute_online_node_usage();
+
     my $new_rules = $haenv->read_rules_config();
 
     # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
@@ -738,8 +740,6 @@ sub manage {
     for (;;) {
         my $repeat = 0;
 
-        $self->recompute_online_node_usage();
-
         foreach my $sid (sort keys %$ss) {
             my $sd = $ss->{$sid};
             my $cd = $sc->{$sid} || { state => 'disabled' };
-- 
2.47.3





More information about the pve-devel mailing list