[pve-devel] [PATCH ha-manager v3 3/3] relocate policy: try to avoid already failed nodes

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Jun 17 17:11:16 CEST 2016


If the failure policy triggers more often than 2 times we used an
already tried node again, even if there where other untried nodes,
we then cycled between those two nodes, if the active service count
did not change.

This does not make real sense as when it failed to start on a node
a short time ago it probably will also fail now (e.g. storage is
offline), whereas an untried node may have the chance to be fully
able to start the service, which is our goal.

Fix that by excluding those already tried nodes from the top
priority node list in 'select_service_node' if there are other
possible nodes to try. If there isn't any left we delete the
last tried one, so that the service tries another one even if it was
already tried (we want to fulfill the relocation policy after all).
This is not ideal but our default relocation try value is set to 1,
so that would only happen if a user set it explicitly to an high(er)
value.

select_service_node gets called in two places:
* next_state_started: there we want to use this behaviour
* recover_fenced service: there the try_next is always false
  as we just want to select a node to recover, if a relocation
  policy is then needed it is the duty for next_state_started
  to do so.
So we are safe to do this witouth changing behaviour execpt the one
described above.

If all tries fail we place the service in the error state, the
tried nodes entry gets cleanup after an user triggers an error
recovery by disabeling the service, so the information of the tried
nodes stays in the manager status until then.

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---

changes since v2:
* add and adapt some comments and commit message

 src/PVE/HA/Manager.pm                              | 26 ++++++++-
 src/test/test-relocate-policy-default-group/README |  7 +++
 .../test-relocate-policy-default-group/cmdlist     |  4 ++
 .../hardware_status                                |  5 ++
 .../test-relocate-policy-default-group/log.expect  | 53 +++++++++++++++++
 .../manager_status                                 |  1 +
 .../service_config                                 |  3 +
 src/test/test-relocate-policy1/README              |  4 ++
 src/test/test-relocate-policy1/cmdlist             |  4 ++
 src/test/test-relocate-policy1/hardware_status     |  5 ++
 src/test/test-relocate-policy1/log.expect          | 68 ++++++++++++++++++++++
 src/test/test-relocate-policy1/manager_status      | 42 +++++++++++++
 src/test/test-relocate-policy1/service_config      |  9 +++
 src/test/test-resource-failure6/log.expect         | 59 +++++++++++++++++++
 14 files changed, 287 insertions(+), 3 deletions(-)
 create mode 100644 src/test/test-relocate-policy-default-group/README
 create mode 100644 src/test/test-relocate-policy-default-group/cmdlist
 create mode 100644 src/test/test-relocate-policy-default-group/hardware_status
 create mode 100644 src/test/test-relocate-policy-default-group/log.expect
 create mode 100644 src/test/test-relocate-policy-default-group/manager_status
 create mode 100644 src/test/test-relocate-policy-default-group/service_config
 create mode 100644 src/test/test-relocate-policy1/README
 create mode 100644 src/test/test-relocate-policy1/cmdlist
 create mode 100644 src/test/test-relocate-policy1/hardware_status
 create mode 100644 src/test/test-relocate-policy1/log.expect
 create mode 100644 src/test/test-relocate-policy1/manager_status
 create mode 100644 src/test/test-relocate-policy1/service_config
 create mode 100644 src/test/test-resource-failure6/log.expect

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index c9e53a0..7107680 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -71,7 +71,7 @@ sub flush_master_status {
 } 
 
 sub select_service_node {
-    my ($groups, $online_node_usage, $service_conf, $current_node, $try_next) = @_;
+    my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes) = @_;
 
     my $group = {};
     # add all online nodes to default group to allow try_next when no group set
@@ -119,6 +119,26 @@ sub select_service_node {
 
     my $top_pri = $pri_list[0];
 
+    # try to avoid nodes where the service failed already if we want to relocate
+    if ($try_next) {
+	# first check if we have any untried node left
+	my $i = 0;
+	foreach my $node (@$tried_nodes) {
+	    $i++ if $pri_groups->{$top_pri}->{$node};
+	}
+
+	if ($i < scalar(keys %{$pri_groups->{$top_pri}})){
+	    # we have another one left so delete the tried ones
+	    foreach my $node (@$tried_nodes) {
+		delete $pri_groups->{$top_pri}->{$node};
+	    }
+	} else {
+	    # no untried (and active) node left, so delete the current node,
+	    # we want to try another one after all
+	    delete $pri_groups->{$top_pri}->{$current_node};
+	}
+    }
+
     my @nodes = sort { 
 	$online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b
     } keys %{$pri_groups->{$top_pri}};
@@ -652,8 +672,8 @@ sub next_state_started {
 		}
 	    }
 
-	    my $node = select_service_node($self->{groups}, $self->{online_node_usage}, 
-					   $cd, $sd->{node}, $try_next);
+	    my $node = select_service_node($self->{groups}, $self->{online_node_usage},
+					   $cd, $sd->{node}, $try_next, $tried_nodes);
 
 	    if ($node && ($sd->{node} ne $node)) {
 		if ($cd->{type} eq 'vm') {
diff --git a/src/test/test-relocate-policy-default-group/README b/src/test/test-relocate-policy-default-group/README
new file mode 100644
index 0000000..18ee13a
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/README
@@ -0,0 +1,7 @@
+Test relocate policy on services with no group.
+Service 'fa:130' fails three times to restart and has a 'max_restart' policy
+of 0, thus will be relocated after each start try.
+As it has no group configured all available nodes should get chosen for
+when relocating.
+As we allow to relocate twice but the service fails three times we place
+it in the error state after all tries where used and all nodes where visited
diff --git a/src/test/test-relocate-policy-default-group/cmdlist b/src/test/test-relocate-policy-default-group/cmdlist
new file mode 100644
index 0000000..8f06508
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "service fa:130 enabled" ]
+]
diff --git a/src/test/test-relocate-policy-default-group/hardware_status b/src/test/test-relocate-policy-default-group/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-relocate-policy-default-group/log.expect b/src/test/test-relocate-policy-default-group/log.expect
new file mode 100644
index 0000000..a1e6795
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/log.expect
@@ -0,0 +1,53 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'fa:130' on node 'node2'
+info     20    node1/crm: service 'fa:130': state changed from 'started' to 'request_stop'
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute service fa:130 enabled
+info    120    node1/crm: service 'fa:130': state changed from 'stopped' to 'started'  (node = node2)
+info    123    node2/lrm: starting service fa:130
+warn    123    node2/lrm: unable to start service fa:130
+err     123    node2/lrm: unable to start service fa:130 on local node after 0 retries
+warn    140    node1/crm: starting service fa:130 on node 'node2' failed, relocating service.
+info    140    node1/crm: relocate service 'fa:130' to node 'node1'
+info    140    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node2, target = node1)
+info    143    node2/lrm: service fa:130 - start relocate to node 'node1'
+info    143    node2/lrm: service fa:130 - end relocate to node 'node1'
+info    160    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node1)
+info    161    node1/lrm: got lock 'ha_agent_node1_lock'
+info    161    node1/lrm: status change wait_for_agent_lock => active
+info    161    node1/lrm: starting service fa:130
+warn    161    node1/lrm: unable to start service fa:130
+err     161    node1/lrm: unable to start service fa:130 on local node after 0 retries
+warn    180    node1/crm: starting service fa:130 on node 'node1' failed, relocating service.
+info    180    node1/crm: relocate service 'fa:130' to node 'node3'
+info    180    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    181    node1/lrm: service fa:130 - start relocate to node 'node3'
+info    181    node1/lrm: service fa:130 - end relocate to node 'node3'
+info    200    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node3)
+info    205    node3/lrm: got lock 'ha_agent_node3_lock'
+info    205    node3/lrm: status change wait_for_agent_lock => active
+info    205    node3/lrm: starting service fa:130
+warn    205    node3/lrm: unable to start service fa:130
+err     205    node3/lrm: unable to start service fa:130 on local node after 0 retries
+err     220    node1/crm: recovery policy for service fa:130 failed, entering error state. Tried nodes: node2, node1, node3
+info    220    node1/crm: service 'fa:130': state changed from 'started' to 'error'
+err     225    node3/lrm: service fa:130 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-relocate-policy-default-group/manager_status b/src/test/test-relocate-policy-default-group/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-relocate-policy-default-group/service_config b/src/test/test-relocate-policy-default-group/service_config
new file mode 100644
index 0000000..c3cc873
--- /dev/null
+++ b/src/test/test-relocate-policy-default-group/service_config
@@ -0,0 +1,3 @@
+{
+    "fa:130": { "node": "node2", "max_restart": "0", "max_relocate": "2"  }
+}
diff --git a/src/test/test-relocate-policy1/README b/src/test/test-relocate-policy1/README
new file mode 100644
index 0000000..f0f12fd
--- /dev/null
+++ b/src/test/test-relocate-policy1/README
@@ -0,0 +1,4 @@
+Test if relocate policy selects the lowest populated node in addition to
+only those which weren't tried yet.
+As node 1 has the most services it should get selected as last even if its
+name sorts before the other ones.
diff --git a/src/test/test-relocate-policy1/cmdlist b/src/test/test-relocate-policy1/cmdlist
new file mode 100644
index 0000000..d253427
--- /dev/null
+++ b/src/test/test-relocate-policy1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "service fa:130 enabled" ]
+]
diff --git a/src/test/test-relocate-policy1/hardware_status b/src/test/test-relocate-policy1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-relocate-policy1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-relocate-policy1/log.expect b/src/test/test-relocate-policy1/log.expect
new file mode 100644
index 0000000..d53b4f4
--- /dev/null
+++ b/src/test/test-relocate-policy1/log.expect
@@ -0,0 +1,68 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: adding new service 'fa:130' on node 'node3'
+info     20    node1/crm: service 'fa:130': state changed from 'started' to 'request_stop'
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:100
+info     21    node1/lrm: service status vm:100 started
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     21    node1/lrm: starting service vm:102
+info     21    node1/lrm: service status vm:102 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:103
+info     23    node2/lrm: service status vm:103 started
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:105
+info     25    node3/lrm: service status vm:105 started
+info     40    node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute service fa:130 enabled
+info    120    node1/crm: service 'fa:130': state changed from 'stopped' to 'started'  (node = node3)
+info    125    node3/lrm: starting service fa:130
+warn    125    node3/lrm: unable to start service fa:130
+err     125    node3/lrm: unable to start service fa:130 on local node after 0 retries
+warn    140    node1/crm: starting service fa:130 on node 'node3' failed, relocating service.
+info    140    node1/crm: relocate service 'fa:130' to node 'node2'
+info    140    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node3, target = node2)
+info    145    node3/lrm: service fa:130 - start relocate to node 'node2'
+info    145    node3/lrm: service fa:130 - end relocate to node 'node2'
+info    160    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node2)
+info    163    node2/lrm: starting service fa:130
+warn    163    node2/lrm: unable to start service fa:130
+err     163    node2/lrm: unable to start service fa:130 on local node after 0 retries
+warn    180    node1/crm: starting service fa:130 on node 'node2' failed, relocating service.
+info    180    node1/crm: relocate service 'fa:130' to node 'node1'
+info    180    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node2, target = node1)
+info    183    node2/lrm: service fa:130 - start relocate to node 'node1'
+info    183    node2/lrm: service fa:130 - end relocate to node 'node1'
+info    200    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node1)
+info    201    node1/lrm: starting service fa:130
+warn    201    node1/lrm: unable to start service fa:130
+err     201    node1/lrm: unable to start service fa:130 on local node after 0 retries
+warn    220    node1/crm: starting service fa:130 on node 'node1' failed, relocating service.
+info    220    node1/crm: relocate service 'fa:130' to node 'node3'
+info    220    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    221    node1/lrm: service fa:130 - start relocate to node 'node3'
+info    221    node1/lrm: service fa:130 - end relocate to node 'node3'
+info    240    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node3)
+info    245    node3/lrm: starting service fa:130
+info    245    node3/lrm: service status fa:130 started
+info    260    node1/crm: relocation policy successful for 'fa:130', tried nodes: node3, node2, node1, node3
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-relocate-policy1/manager_status b/src/test/test-relocate-policy1/manager_status
new file mode 100644
index 0000000..8cce913
--- /dev/null
+++ b/src/test/test-relocate-policy1/manager_status
@@ -0,0 +1,42 @@
+{
+    "master_node": "node1",
+    "node_status": {
+        "node1": "online",
+        "node2": "online",
+        "node3": "online"
+    },
+    "relocate_tried_nodes": {},
+    "service_status": {
+        "vm:100": {
+            "node": "node1",
+            "state": "started",
+            "uid": "hSIUPNL/lBjgyU4svobXlg"
+        },
+        "vm:101": {
+            "node": "node1",
+            "state": "started",
+            "uid": "vLuiMIZ5KBKzDZv2bkYLvA"
+        },
+        "vm:102": {
+            "node": "node1",
+            "state": "started",
+            "uid": "COPzO9cc+8Z3lUbWn8zCHA"
+        },
+        "vm:103": {
+            "node": "node2",
+            "state": "started",
+            "uid": "iktXhI6tCi8X6h8wQS9Uyw"
+        },
+        "vm:104": {
+            "node": "node2",
+            "state": "started",
+            "uid": "ySWup2on+tY88hdfzS1ymg"
+        },
+        "vm:105": {
+            "node": "node3",
+            "state": "started",
+            "uid": "RGRR9EOAzALG5cVMeWiKWA"
+        }
+    },
+    "timestamp": 10
+}
diff --git a/src/test/test-relocate-policy1/service_config b/src/test/test-relocate-policy1/service_config
new file mode 100644
index 0000000..d9f1823
--- /dev/null
+++ b/src/test/test-relocate-policy1/service_config
@@ -0,0 +1,9 @@
+{
+    "vm:100": { "node": "node1", "state": "enabled" },
+    "vm:101": { "node": "node1", "state": "enabled" },
+    "vm:102": { "node": "node1", "state": "enabled" },
+    "vm:103": { "node": "node2", "state": "enabled" },
+    "vm:104": { "node": "node2", "state": "enabled" },
+    "vm:105": { "node": "node3", "state": "enabled" },
+    "fa:130": { "node": "node3", "max_restart": "0", "max_relocate": "3"  }
+}
diff --git a/src/test/test-resource-failure6/log.expect b/src/test/test-resource-failure6/log.expect
new file mode 100644
index 0000000..8e06ec7
--- /dev/null
+++ b/src/test/test-resource-failure6/log.expect
@@ -0,0 +1,59 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'fa:130' on node 'node2'
+info     20    node1/crm: service 'fa:130': state changed from 'started' to 'request_stop'
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     24    node3/crm: status change wait_for_quorum => slave
+info     40    node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped'
+info    120      cmdlist: execute service fa:130 enabled
+info    120    node1/crm: service 'fa:130': state changed from 'stopped' to 'started'  (node = node2)
+info    123    node2/lrm: starting service fa:130
+warn    123    node2/lrm: unable to start service fa:130
+err     123    node2/lrm: unable to start service fa:130 on local node after 0 retries
+warn    140    node1/crm: starting service fa:130 on node 'node2' failed, relocating service.
+info    140    node1/crm: relocate service 'fa:130' to node 'node1'
+info    140    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node2, target = node1)
+info    143    node2/lrm: service fa:130 - start relocate to node 'node1'
+info    143    node2/lrm: service fa:130 - end relocate to node 'node1'
+info    160    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node1)
+info    161    node1/lrm: got lock 'ha_agent_node1_lock'
+info    161    node1/lrm: status change wait_for_agent_lock => active
+info    161    node1/lrm: starting service fa:130
+warn    161    node1/lrm: unable to start service fa:130
+err     161    node1/lrm: unable to start service fa:130 on local node after 0 retries
+warn    180    node1/crm: starting service fa:130 on node 'node1' failed, relocating service.
+info    180    node1/crm: relocate service 'fa:130' to node 'node3'
+info    180    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node1, target = node3)
+info    181    node1/lrm: service fa:130 - start relocate to node 'node3'
+info    181    node1/lrm: service fa:130 - end relocate to node 'node3'
+info    200    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node3)
+info    205    node3/lrm: got lock 'ha_agent_node3_lock'
+info    205    node3/lrm: status change wait_for_agent_lock => active
+info    205    node3/lrm: starting service fa:130
+warn    205    node3/lrm: unable to start service fa:130
+err     205    node3/lrm: unable to start service fa:130 on local node after 0 retries
+warn    220    node1/crm: starting service fa:130 on node 'node3' failed, relocating service.
+info    220    node1/crm: relocate service 'fa:130' to node 'node1'
+info    220    node1/crm: service 'fa:130': state changed from 'started' to 'relocate'  (node = node3, target = node1)
+info    225    node3/lrm: service fa:130 - start relocate to node 'node1'
+info    225    node3/lrm: service fa:130 - end relocate to node 'node1'
+info    240    node1/crm: service 'fa:130': state changed from 'relocate' to 'started'  (node = node1)
+info    241    node1/lrm: starting service fa:130
+info    241    node1/lrm: service status fa:130 started
+info    260    node1/crm: relocation policy successful for 'fa:130', tried nodes: node2, node1, node3, node1
+info    720     hardware: exit simulation - done
-- 
2.1.4





More information about the pve-devel mailing list