[pve-devel] [PATCH ha-manager v2 08/12] rules: make positive affinity resources migrate on single resource fail
Daniel Kral
d.kral at proxmox.com
Fri Aug 1 18:22:23 CEST 2025
In the context of the HA Manager, resources' downtime is expected to be
minimized as much as possible. Therefore, it is more reasonable to try
other possible node placements if one or more of the HA resources of a
positive affinity rule fail, instead of putting the failed HA resources
in recovery.
This can be improved later to allow temporarily separated positive
affinity "groups", where the failed HA resources tries to find a node,
where it can start, and afterwards the other HA resources are migrated
there too, but this simpler heuristic is enough for the current feature.
Reported-by: Hannes Dürr <h.duerr at proxmox.com>
Reported-by: Michael Köppl <m.koeppl at proxmox.com>
Signed-off-by: Daniel Kral <d.kral at proxmox.com>
---
src/PVE/HA/Rules/ResourceAffinity.pm | 4 +++
.../README | 9 ++++---
.../log.expect | 26 ++++++++++++++++---
3 files changed, 31 insertions(+), 8 deletions(-)
diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 7327ee08..9bc039ba 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -596,6 +596,10 @@ resource has not failed running there yet.
sub apply_positive_resource_affinity : prototype($$) {
my ($together, $allowed_nodes) = @_;
+ for my $node (keys %$together) {
+ delete $together->{$node} if !$allowed_nodes->{$node};
+ }
+
my @possible_nodes = sort keys $together->%*
or return; # nothing to do if there is no positive resource affinity
diff --git a/src/test/test-resource-affinity-strict-positive3/README b/src/test/test-resource-affinity-strict-positive3/README
index a270277b..804d1312 100644
--- a/src/test/test-resource-affinity-strict-positive3/README
+++ b/src/test/test-resource-affinity-strict-positive3/README
@@ -1,7 +1,8 @@
Test whether a strict positive resource affinity rule makes three resources
migrate to the same recovery node in case of a failover of their previously
assigned node. If one of those fail to start on the recovery node (e.g.
-insufficient resources), the failing resource will be kept on the recovery node.
+insufficient resources), all resources in the positive resource affinity rule
+will be migrated to another available recovery node.
The test scenario is:
- vm:101, vm:102, and fa:120002 must be kept together
@@ -12,6 +13,6 @@ The test scenario is:
The expected outcome is:
- As node3 fails, all resources are migrated to node2
-- Two of those resources will start successfully, but fa:120002 will stay in
- recovery, since it cannot be started on this node, but cannot be relocated to
- another one either due to the strict resource affinity rule
+- Two of those resources will start successfully, but fa:120002 will fail; as
+ there are other available nodes left where it can run, all resources in the
+ positive resource affinity rule are migrated to the next-best fitting node
diff --git a/src/test/test-resource-affinity-strict-positive3/log.expect b/src/test/test-resource-affinity-strict-positive3/log.expect
index 4a54cb3b..b5d7018f 100644
--- a/src/test/test-resource-affinity-strict-positive3/log.expect
+++ b/src/test/test-resource-affinity-strict-positive3/log.expect
@@ -82,8 +82,26 @@ info 263 node2/lrm: starting service fa:120002
warn 263 node2/lrm: unable to start service fa:120002
err 263 node2/lrm: unable to start service fa:120002 on local node after 1 retries
warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
-warn 280 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120002', retry start on current node. Tried nodes: node2
-info 283 node2/lrm: starting service fa:120002
-info 283 node2/lrm: service status fa:120002 started
-info 300 node1/crm: relocation policy successful for 'fa:120002' on node 'node2', failed nodes: node2
+info 280 node1/crm: relocate service 'fa:120002' to node 'node1'
+info 280 node1/crm: service 'fa:120002': state changed from 'started' to 'relocate' (node = node2, target = node1)
+info 283 node2/lrm: service fa:120002 - start relocate to node 'node1'
+info 283 node2/lrm: service fa:120002 - end relocate to node 'node1'
+info 300 node1/crm: service 'fa:120002': state changed from 'relocate' to 'started' (node = node1)
+info 300 node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info 300 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 300 node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info 300 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 301 node1/lrm: starting service fa:120002
+info 301 node1/lrm: service status fa:120002 started
+info 303 node2/lrm: service vm:101 - start migrate to node 'node1'
+info 303 node2/lrm: service vm:101 - end migrate to node 'node1'
+info 303 node2/lrm: service vm:102 - start migrate to node 'node1'
+info 303 node2/lrm: service vm:102 - end migrate to node 'node1'
+info 320 node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
+info 320 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 320 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
+info 321 node1/lrm: starting service vm:101
+info 321 node1/lrm: service status vm:101 started
+info 321 node1/lrm: starting service vm:102
+info 321 node1/lrm: service status vm:102 started
info 720 hardware: exit simulation - done
--
2.47.2
More information about the pve-devel
mailing list