[pve-devel] [PATCH ha-manager 4/4] lrm: do not migrate if service already running upon rebalance on start

Fiona Ebner f.ebner at proxmox.com
Fri Apr 14 14:38:30 CEST 2023


As reported in the community forum[0], currently, a newly added
service that's already running is shut down, offline migrated and
started again if rebalance selects a new node for it. This is
unexpected.

An improvement would be online migrating the service, but rebalance
is only supposed to happen for a stopped->start transition[1], so the
service should not being migrated at all.

The cleanest solution would be for the CRM to use the state 'started'
instead of 'request_start' for newly added services that are already
running, i.e. restore the behavior from before commit c2f2b9c
("manager: set new request_start state for services freshly added to
HA") for such services. But currently, there is no mechanism for the
CRM to check if the service is already running, because it could be on
a different node. For now, avoiding the migration has to be handled in
the LRM instead. If the CRM ever has access to the necessary
information in the future, to solution mentioned above can be
re-considered.

Note that the CRM log message relies on the fact that the LRM only
returns the IGNORED status in this case, but it's more user-friendly
than using a generic message like "migration ignored (check LRM
log)".

[0]: https://forum.proxmox.com/threads/125597/
[1]: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_crs_scheduling_points

Suggested-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
 src/PVE/HA/LRM.pm                             |  5 ++
 src/PVE/HA/Manager.pm                         |  6 ++
 src/test/test-crs-static-rebalance2/README    |  3 +
 src/test/test-crs-static-rebalance2/cmdlist   |  9 +++
 .../test-crs-static-rebalance2/datacenter.cfg |  7 +++
 .../hardware_status                           |  5 ++
 .../test-crs-static-rebalance2/log.expect     | 63 +++++++++++++++++++
 .../test-crs-static-rebalance2/manager_status |  1 +
 .../test-crs-static-rebalance2/service_config |  1 +
 .../static_service_stats                      |  1 +
 10 files changed, 101 insertions(+)
 create mode 100644 src/test/test-crs-static-rebalance2/README
 create mode 100644 src/test/test-crs-static-rebalance2/cmdlist
 create mode 100644 src/test/test-crs-static-rebalance2/datacenter.cfg
 create mode 100644 src/test/test-crs-static-rebalance2/hardware_status
 create mode 100644 src/test/test-crs-static-rebalance2/log.expect
 create mode 100644 src/test/test-crs-static-rebalance2/manager_status
 create mode 100644 src/test/test-crs-static-rebalance2/service_config
 create mode 100644 src/test/test-crs-static-rebalance2/static_service_stats

diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index e3f44f7..c7642e5 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -947,6 +947,11 @@ sub exec_resource_agent {
 	    return SUCCESS;
 	}
 
+	if ($cmd eq 'request_start_balance' && $running) {
+	    $haenv->log("info", "ignoring rebalance-on-start for service $sid - already running");
+	    return IGNORED;
+	}
+
 	my $online = ($cmd eq 'migrate') ? 1 : 0;
 
 	my $res = $plugin->migrate($haenv, $id, $target, $online);
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 0d0cad2..761ffa1 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -644,6 +644,12 @@ sub next_state_migrate_relocate {
 	    $haenv->log('err', "service '$sid' - migration failed: service" .
 			" registered on wrong node!");
 	    &$change_service_state($self, $sid, 'error');
+	} elsif ($exit_code == IGNORED) {
+	    $haenv->log(
+		"info",
+		"service '$sid' - rebalance-on-start request ignored - service already running",
+	    );
+	    $change_service_state->($self, $sid, $req_state, node => $sd->{node});
 	} else {
 	    $haenv->log('err', "service '$sid' - migration failed (exit code $exit_code)");
 	    &$change_service_state($self, $sid, $req_state, node => $sd->{node});
diff --git a/src/test/test-crs-static-rebalance2/README b/src/test/test-crs-static-rebalance2/README
new file mode 100644
index 0000000..bf32c26
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/README
@@ -0,0 +1,3 @@
+Test how adding new services behaves with ha-rebalance-on-start.
+
+Expect that already running services are not affected, but others are.
diff --git a/src/test/test-crs-static-rebalance2/cmdlist b/src/test/test-crs-static-rebalance2/cmdlist
new file mode 100644
index 0000000..72bec8c
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/cmdlist
@@ -0,0 +1,9 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "service vm:100 add node2 started 1" ],
+    [ "service vm:101 add node2 started 0" ],
+    [ "service vm:102 add node2 started 1" ],
+    [ "service vm:103 add node2 started 0" ],
+    [ "service vm:104 add node2 stopped 0" ],
+    [ "service vm:105 add node2 stopped 0" ]
+]
diff --git a/src/test/test-crs-static-rebalance2/datacenter.cfg b/src/test/test-crs-static-rebalance2/datacenter.cfg
new file mode 100644
index 0000000..9f5137b
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/datacenter.cfg
@@ -0,0 +1,7 @@
+{
+    "crs": {
+        "ha": "static",
+        "ha-rebalance-on-start": 1
+    }
+}
+
diff --git a/src/test/test-crs-static-rebalance2/hardware_status b/src/test/test-crs-static-rebalance2/hardware_status
new file mode 100644
index 0000000..9be70a4
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off", "cpus": 40, "memory": 384000000000 },
+  "node2": { "power": "off", "network": "off", "cpus": 32, "memory": 256000000000 },
+  "node3": { "power": "off", "network": "off", "cpus": 32, "memory": 256000000000 }
+}
diff --git a/src/test/test-crs-static-rebalance2/log.expect b/src/test/test-crs-static-rebalance2/log.expect
new file mode 100644
index 0000000..286514d
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/log.expect
@@ -0,0 +1,63 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info    120      cmdlist: execute service vm:100 add node2 started 1
+info    120    node1/crm: got lock 'ha_manager_lock'
+info    120    node1/crm: status change wait_for_quorum => master
+info    120    node1/crm: using scheduler mode 'static'
+info    120    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info    120    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info    120    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info    120    node1/crm: adding new service 'vm:100' on node 'node2'
+info    120    node1/crm: service vm:100: re-balance selected new node node1 for startup
+info    120    node1/crm: service 'vm:100': state changed from 'request_start' to 'request_start_balance'  (node = node2, target = node1)
+info    122    node2/crm: status change wait_for_quorum => slave
+info    123    node2/lrm: got lock 'ha_agent_node2_lock'
+info    123    node2/lrm: status change wait_for_agent_lock => active
+info    123    node2/lrm: ignoring rebalance-on-start for service vm:100 - already running
+info    124    node3/crm: status change wait_for_quorum => slave
+info    140    node1/crm: service 'vm:100' - rebalance-on-start request ignored - service already running
+info    140    node1/crm: service 'vm:100': state changed from 'request_start_balance' to 'started'  (node = node2)
+info    220      cmdlist: execute service vm:101 add node2 started 0
+info    220    node1/crm: adding new service 'vm:101' on node 'node2'
+info    220    node1/crm: service vm:101: re-balance selected new node node1 for startup
+info    220    node1/crm: service 'vm:101': state changed from 'request_start' to 'request_start_balance'  (node = node2, target = node1)
+info    223    node2/lrm: service vm:101 - start relocate to node 'node1'
+info    223    node2/lrm: service vm:101 - end relocate to node 'node1'
+info    240    node1/crm: service 'vm:101': state changed from 'request_start_balance' to 'started'  (node = node1)
+info    241    node1/lrm: got lock 'ha_agent_node1_lock'
+info    241    node1/lrm: status change wait_for_agent_lock => active
+info    241    node1/lrm: starting service vm:101
+info    241    node1/lrm: service status vm:101 started
+info    320      cmdlist: execute service vm:102 add node2 started 1
+info    320    node1/crm: adding new service 'vm:102' on node 'node2'
+info    320    node1/crm: service vm:102: re-balance selected new node node3 for startup
+info    320    node1/crm: service 'vm:102': state changed from 'request_start' to 'request_start_balance'  (node = node2, target = node3)
+info    323    node2/lrm: ignoring rebalance-on-start for service vm:102 - already running
+info    340    node1/crm: service 'vm:102' - rebalance-on-start request ignored - service already running
+info    340    node1/crm: service 'vm:102': state changed from 'request_start_balance' to 'started'  (node = node2)
+info    420      cmdlist: execute service vm:103 add node2 started 0
+info    420    node1/crm: adding new service 'vm:103' on node 'node2'
+info    420    node1/crm: service vm:103: re-balance selected new node node3 for startup
+info    420    node1/crm: service 'vm:103': state changed from 'request_start' to 'request_start_balance'  (node = node2, target = node3)
+info    423    node2/lrm: service vm:103 - start relocate to node 'node3'
+info    423    node2/lrm: service vm:103 - end relocate to node 'node3'
+info    440    node1/crm: service 'vm:103': state changed from 'request_start_balance' to 'started'  (node = node3)
+info    445    node3/lrm: got lock 'ha_agent_node3_lock'
+info    445    node3/lrm: status change wait_for_agent_lock => active
+info    445    node3/lrm: starting service vm:103
+info    445    node3/lrm: service status vm:103 started
+info    520      cmdlist: execute service vm:104 add node2 stopped 0
+info    520    node1/crm: adding new service 'vm:104' on node 'node2'
+info    540    node1/crm: service 'vm:104': state changed from 'request_stop' to 'stopped'
+info    620      cmdlist: execute service vm:105 add node2 stopped 0
+info    620    node1/crm: adding new service 'vm:105' on node 'node2'
+info    640    node1/crm: service 'vm:105': state changed from 'request_stop' to 'stopped'
+info   1220     hardware: exit simulation - done
diff --git a/src/test/test-crs-static-rebalance2/manager_status b/src/test/test-crs-static-rebalance2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-crs-static-rebalance2/service_config b/src/test/test-crs-static-rebalance2/service_config
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/service_config
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-crs-static-rebalance2/static_service_stats b/src/test/test-crs-static-rebalance2/static_service_stats
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-crs-static-rebalance2/static_service_stats
@@ -0,0 +1 @@
+{}
-- 
2.30.2






More information about the pve-devel mailing list