[pve-devel] [PATCH ha-manager 2/2] fix #6801: only consider target node during positive resource affinity migration
Daniel Kral
d.kral at proxmox.com
Mon Nov 3 16:17:12 CET 2025
When a HA resource with positive affinity to other HA resources is moved
to another node, the other HA resources in positive affinity are
automatically moved to the same target node as well.
If the HA resources have significant differences in migration time
(more than the average HA Manager round of ~10 seconds) the already
migrated HA resources in 'started' state will check for better node
placements while the other(s) are still migrating.
This search includes whether the positive resource affinity rules are
held and will query where the other HA resources are. When HA resources
are still migrating this will report that these are both on the source
and target node, which is correct from a accounting standpoint, but will
add equal weights on both nodes and might result in the already started
HA resource to be migrated to the source node.
Therefore, only consider the target node for positive affinity during
migration or relocation to prevent this from happening.
As a side-effect, two test cases for positive resource affinity rules
will result in a slightly quicker convergence to a steady state as these
now will get the information about the common target node sooner.
Signed-off-by: Daniel Kral <d.kral at proxmox.com>
---
src/PVE/HA/Rules/ResourceAffinity.pm | 6 ++--
.../log.expect | 25 +++--------------
.../log.expect | 28 +++++++++----------
.../README | 3 --
.../log.expect | 28 +++----------------
5 files changed, 26 insertions(+), 64 deletions(-)
diff --git a/src/PVE/HA/Rules/ResourceAffinity.pm b/src/PVE/HA/Rules/ResourceAffinity.pm
index 4f5ffca5..9303bafd 100644
--- a/src/PVE/HA/Rules/ResourceAffinity.pm
+++ b/src/PVE/HA/Rules/ResourceAffinity.pm
@@ -517,8 +517,10 @@ sub get_resource_affinity {
for my $csid (keys $positive->%*) {
my ($current_node, $target_node) = $get_used_service_nodes->($csid);
- $together->{$current_node}++ if defined($current_node);
- $together->{$target_node}++ if defined($target_node);
+ # consider only the target node for positive affinity to prevent already
+ # moved HA resources to move back to the source node (see #6801)
+ my $node = $target_node // $current_node;
+ $together->{$node}++ if defined($node);
}
for my $csid (keys $negative->%*) {
diff --git a/src/test/test-resource-affinity-strict-mixed3/log.expect b/src/test/test-resource-affinity-strict-mixed3/log.expect
index b3de104f..ee6412a1 100644
--- a/src/test/test-resource-affinity-strict-mixed3/log.expect
+++ b/src/test/test-resource-affinity-strict-mixed3/log.expect
@@ -58,17 +58,11 @@ info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'sta
info 40 node1/crm: service 'vm:103': state changed from 'migrate' to 'started' (node = node3)
info 40 node1/crm: migrate service 'vm:201' to node 'node2' (running)
info 40 node1/crm: service 'vm:201': state changed from 'started' to 'migrate' (node = node1, target = node2)
-info 40 node1/crm: migrate service 'vm:202' to node 'node1' (running)
-info 40 node1/crm: service 'vm:202': state changed from 'started' to 'migrate' (node = node2, target = node1)
info 40 node1/crm: service 'vm:203': state changed from 'migrate' to 'started' (node = node2)
-info 40 node1/crm: migrate service 'vm:203' to node 'node1' (running)
-info 40 node1/crm: service 'vm:203': state changed from 'started' to 'migrate' (node = node2, target = node1)
info 41 node1/lrm: service vm:201 - start migrate to node 'node2'
info 41 node1/lrm: service vm:201 - end migrate to node 'node2'
-info 43 node2/lrm: service vm:202 - start migrate to node 'node1'
-info 43 node2/lrm: service vm:202 - end migrate to node 'node1'
-info 43 node2/lrm: service vm:203 - start migrate to node 'node1'
-info 43 node2/lrm: service vm:203 - end migrate to node 'node1'
+info 43 node2/lrm: starting service vm:203
+info 43 node2/lrm: service status vm:203 started
info 45 node3/lrm: starting service vm:101
info 45 node3/lrm: service status vm:101 started
info 45 node3/lrm: starting service vm:102
@@ -76,17 +70,6 @@ info 45 node3/lrm: service status vm:102 started
info 45 node3/lrm: starting service vm:103
info 45 node3/lrm: service status vm:103 started
info 60 node1/crm: service 'vm:201': state changed from 'migrate' to 'started' (node = node2)
-info 60 node1/crm: service 'vm:202': state changed from 'migrate' to 'started' (node = node1)
-info 60 node1/crm: service 'vm:203': state changed from 'migrate' to 'started' (node = node1)
-info 60 node1/crm: migrate service 'vm:201' to node 'node1' (running)
-info 60 node1/crm: service 'vm:201': state changed from 'started' to 'migrate' (node = node2, target = node1)
-info 61 node1/lrm: starting service vm:202
-info 61 node1/lrm: service status vm:202 started
-info 61 node1/lrm: starting service vm:203
-info 61 node1/lrm: service status vm:203 started
-info 63 node2/lrm: service vm:201 - start migrate to node 'node1'
-info 63 node2/lrm: service vm:201 - end migrate to node 'node1'
-info 80 node1/crm: service 'vm:201': state changed from 'migrate' to 'started' (node = node1)
-info 81 node1/lrm: starting service vm:201
-info 81 node1/lrm: service status vm:201 started
+info 63 node2/lrm: starting service vm:201
+info 63 node2/lrm: service status vm:201 started
info 620 hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-strict-positive3/log.expect b/src/test/test-resource-affinity-strict-positive3/log.expect
index b5d7018f..5f4e6531 100644
--- a/src/test/test-resource-affinity-strict-positive3/log.expect
+++ b/src/test/test-resource-affinity-strict-positive3/log.expect
@@ -84,24 +84,24 @@ err 263 node2/lrm: unable to start service fa:120002 on local node after
warn 280 node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
info 280 node1/crm: relocate service 'fa:120002' to node 'node1'
info 280 node1/crm: service 'fa:120002': state changed from 'started' to 'relocate' (node = node2, target = node1)
+info 280 node1/crm: migrate service 'vm:101' to node 'node1' (running)
+info 280 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 280 node1/crm: migrate service 'vm:102' to node 'node1' (running)
+info 280 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1)
info 283 node2/lrm: service fa:120002 - start relocate to node 'node1'
info 283 node2/lrm: service fa:120002 - end relocate to node 'node1'
+info 283 node2/lrm: service vm:101 - start migrate to node 'node1'
+info 283 node2/lrm: service vm:101 - end migrate to node 'node1'
+info 283 node2/lrm: service vm:102 - start migrate to node 'node1'
+info 283 node2/lrm: service vm:102 - end migrate to node 'node1'
info 300 node1/crm: service 'fa:120002': state changed from 'relocate' to 'started' (node = node1)
-info 300 node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info 300 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node2, target = node1)
-info 300 node1/crm: migrate service 'vm:102' to node 'node1' (running)
-info 300 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node2, target = node1)
+info 300 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 300 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
info 301 node1/lrm: starting service fa:120002
info 301 node1/lrm: service status fa:120002 started
-info 303 node2/lrm: service vm:101 - start migrate to node 'node1'
-info 303 node2/lrm: service vm:101 - end migrate to node 'node1'
-info 303 node2/lrm: service vm:102 - start migrate to node 'node1'
-info 303 node2/lrm: service vm:102 - end migrate to node 'node1'
+info 301 node1/lrm: starting service vm:101
+info 301 node1/lrm: service status vm:101 started
+info 301 node1/lrm: starting service vm:102
+info 301 node1/lrm: service status vm:102 started
info 320 node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
-info 320 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
-info 320 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
-info 321 node1/lrm: starting service vm:101
-info 321 node1/lrm: service status vm:101 started
-info 321 node1/lrm: starting service vm:102
-info 321 node1/lrm: service status vm:102 started
info 720 hardware: exit simulation - done
diff --git a/src/test/test-resource-affinity-strict-positive6/README b/src/test/test-resource-affinity-strict-positive6/README
index a6affda3..e174e458 100644
--- a/src/test/test-resource-affinity-strict-positive6/README
+++ b/src/test/test-resource-affinity-strict-positive6/README
@@ -1,5 +1,2 @@
Test whether two HA resources in positive resource affinity will migrate to the
same target node when one of them finishes earlier than the other.
-
-The current behavior is not correct, because the already migrated HA resource
-will be migrated back to the source node.
diff --git a/src/test/test-resource-affinity-strict-positive6/log.expect b/src/test/test-resource-affinity-strict-positive6/log.expect
index 69f8d867..cbc63a1e 100644
--- a/src/test/test-resource-affinity-strict-positive6/log.expect
+++ b/src/test/test-resource-affinity-strict-positive6/log.expect
@@ -10,8 +10,6 @@ info 20 node3/crm: status change startup => wait_for_quorum
info 20 node3/lrm: status change startup => wait_for_agent_lock
info 20 node1/crm: got lock 'ha_manager_lock'
info 20 node1/crm: status change wait_for_quorum => master
-info 20 node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info 20 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1)
info 21 node1/lrm: got lock 'ha_agent_node1_lock'
info 21 node1/lrm: status change wait_for_agent_lock => active
info 21 node1/lrm: service vm:102 - start migrate to node 'node3'
@@ -20,27 +18,9 @@ info 22 node2/crm: status change wait_for_quorum => slave
info 24 node3/crm: status change wait_for_quorum => slave
info 25 node3/lrm: got lock 'ha_agent_node3_lock'
info 25 node3/lrm: status change wait_for_agent_lock => active
-info 25 node3/lrm: service vm:101 - start migrate to node 'node1'
-info 25 node3/lrm: service vm:101 - end migrate to node 'node1'
-info 40 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
+info 25 node3/lrm: starting service vm:101
+info 25 node3/lrm: service status vm:101 started
info 40 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node3)
-info 40 node1/crm: migrate service 'vm:101' to node 'node3' (running)
-info 40 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node1, target = node3)
-info 40 node1/crm: migrate service 'vm:102' to node 'node1' (running)
-info 40 node1/crm: service 'vm:102': state changed from 'started' to 'migrate' (node = node3, target = node1)
-info 41 node1/lrm: service vm:101 - start migrate to node 'node3'
-info 41 node1/lrm: service vm:101 - end migrate to node 'node3'
-info 45 node3/lrm: service vm:102 - start migrate to node 'node1'
-info 45 node3/lrm: service vm:102 - end migrate to node 'node1'
-info 60 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node3)
-info 60 node1/crm: service 'vm:102': state changed from 'migrate' to 'started' (node = node1)
-info 60 node1/crm: migrate service 'vm:101' to node 'node1' (running)
-info 60 node1/crm: service 'vm:101': state changed from 'started' to 'migrate' (node = node3, target = node1)
-info 61 node1/lrm: starting service vm:102
-info 61 node1/lrm: service status vm:102 started
-info 65 node3/lrm: service vm:101 - start migrate to node 'node1'
-info 65 node3/lrm: service vm:101 - end migrate to node 'node1'
-info 80 node1/crm: service 'vm:101': state changed from 'migrate' to 'started' (node = node1)
-info 81 node1/lrm: starting service vm:101
-info 81 node1/lrm: service status vm:101 started
+info 45 node3/lrm: starting service vm:102
+info 45 node3/lrm: service status vm:102 started
info 620 hardware: exit simulation - done
--
2.47.3
More information about the pve-devel
mailing list