[pve-devel] [PATCH ha-manager 11/15] test: ha tester: add test cases for strict negative colocation rules

Daniel Kral d.kral at proxmox.com
Tue Mar 25 16:12:50 CET 2025


Add test cases for strict negative colocation rules, i.e. where services
must be kept on separate nodes. These verify the behavior of the
services in strict negative colocation rules in case of a failover of
the node of one or more of these services in the following scenarios:

- 2 neg. colocated services in a 3 node cluster; 1 node failing
- 3 neg. colocated services in a 5 node cluster; 1 node failing
- 3 neg. colocated services in a 5 node cluster; 2 nodes failing
- 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
  recovery node cannot start the service
- Pair of 2 neg. colocated services (with one common service in both) in
  a 3 node cluster; 1 node failing

Signed-off-by: Daniel Kral <d.kral at proxmox.com>
---
 .../test-colocation-strict-separate1/README   |  13 +++
 .../test-colocation-strict-separate1/cmdlist  |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  60 ++++++++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |   6 +
 .../test-colocation-strict-separate2/README   |  15 +++
 .../test-colocation-strict-separate2/cmdlist  |   4 +
 .../hardware_status                           |   7 ++
 .../log.expect                                |  90 ++++++++++++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |  10 ++
 .../test-colocation-strict-separate3/README   |  16 +++
 .../test-colocation-strict-separate3/cmdlist  |   4 +
 .../hardware_status                           |   7 ++
 .../log.expect                                | 110 ++++++++++++++++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |  10 ++
 .../test-colocation-strict-separate4/README   |  17 +++
 .../test-colocation-strict-separate4/cmdlist  |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  69 +++++++++++
 .../manager_status                            |   1 +
 .../rules_config                              |   4 +
 .../service_config                            |   6 +
 .../test-colocation-strict-separate5/README   |  11 ++
 .../test-colocation-strict-separate5/cmdlist  |   4 +
 .../hardware_status                           |   5 +
 .../log.expect                                |  56 +++++++++
 .../manager_status                            |   1 +
 .../rules_config                              |   9 ++
 .../service_config                            |   5 +
 35 files changed, 573 insertions(+)
 create mode 100644 src/test/test-colocation-strict-separate1/README
 create mode 100644 src/test/test-colocation-strict-separate1/cmdlist
 create mode 100644 src/test/test-colocation-strict-separate1/hardware_status
 create mode 100644 src/test/test-colocation-strict-separate1/log.expect
 create mode 100644 src/test/test-colocation-strict-separate1/manager_status
 create mode 100644 src/test/test-colocation-strict-separate1/rules_config
 create mode 100644 src/test/test-colocation-strict-separate1/service_config
 create mode 100644 src/test/test-colocation-strict-separate2/README
 create mode 100644 src/test/test-colocation-strict-separate2/cmdlist
 create mode 100644 src/test/test-colocation-strict-separate2/hardware_status
 create mode 100644 src/test/test-colocation-strict-separate2/log.expect
 create mode 100644 src/test/test-colocation-strict-separate2/manager_status
 create mode 100644 src/test/test-colocation-strict-separate2/rules_config
 create mode 100644 src/test/test-colocation-strict-separate2/service_config
 create mode 100644 src/test/test-colocation-strict-separate3/README
 create mode 100644 src/test/test-colocation-strict-separate3/cmdlist
 create mode 100644 src/test/test-colocation-strict-separate3/hardware_status
 create mode 100644 src/test/test-colocation-strict-separate3/log.expect
 create mode 100644 src/test/test-colocation-strict-separate3/manager_status
 create mode 100644 src/test/test-colocation-strict-separate3/rules_config
 create mode 100644 src/test/test-colocation-strict-separate3/service_config
 create mode 100644 src/test/test-colocation-strict-separate4/README
 create mode 100644 src/test/test-colocation-strict-separate4/cmdlist
 create mode 100644 src/test/test-colocation-strict-separate4/hardware_status
 create mode 100644 src/test/test-colocation-strict-separate4/log.expect
 create mode 100644 src/test/test-colocation-strict-separate4/manager_status
 create mode 100644 src/test/test-colocation-strict-separate4/rules_config
 create mode 100644 src/test/test-colocation-strict-separate4/service_config
 create mode 100644 src/test/test-colocation-strict-separate5/README
 create mode 100644 src/test/test-colocation-strict-separate5/cmdlist
 create mode 100644 src/test/test-colocation-strict-separate5/hardware_status
 create mode 100644 src/test/test-colocation-strict-separate5/log.expect
 create mode 100644 src/test/test-colocation-strict-separate5/manager_status
 create mode 100644 src/test/test-colocation-strict-separate5/rules_config
 create mode 100644 src/test/test-colocation-strict-separate5/service_config

diff --git a/src/test/test-colocation-strict-separate1/README b/src/test/test-colocation-strict-separate1/README
new file mode 100644
index 0000000..5a03d99
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/README
@@ -0,0 +1,13 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 must be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test the colocation rule is
+  applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+  node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-strict-separate1/cmdlist b/src/test/test-colocation-strict-separate1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate1/hardware_status b/src/test/test-colocation-strict-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate1/log.expect b/src/test/test-colocation-strict-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/log.expect
@@ -0,0 +1,60 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:102
+info    241    node1/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate1/manager_status b/src/test/test-colocation-strict-separate1/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-strict-separate1/rules_config b/src/test/test-colocation-strict-separate1/rules_config
new file mode 100644
index 0000000..21c5608
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+    services vm:101,vm:102
+    affinity separate
+    strict 1
diff --git a/src/test/test-colocation-strict-separate1/service_config b/src/test/test-colocation-strict-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-strict-separate1/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate2/README b/src/test/test-colocation-strict-separate2/README
new file mode 100644
index 0000000..f494d2b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/README
@@ -0,0 +1,15 @@
+Test whether a strict negative colocation rule among three services makes one
+of the services migrate to a different node than the other services in case of
+a failover of the service's previously assigned node.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are on node3, node4, and node5 respectively
+- node1 and node2 have each both higher service counts than node3, node4 and
+  node5 to test the rule is applied even though the scheduler would prefer the
+  less utilizied nodes node3, node4, or node5
+
+Therefore, the expected outcome is:
+- As node5 fails, vm:103 is migrated to node2; even though the utilization of
+  node2 is high already, the services must be kept separate; node2 is chosen
+  since node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate2/cmdlist b/src/test/test-colocation-strict-separate2/cmdlist
new file mode 100644
index 0000000..89d09c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+    [ "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate2/hardware_status b/src/test/test-colocation-strict-separate2/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/hardware_status
@@ -0,0 +1,7 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" },
+  "node4": { "power": "off", "network": "off" },
+  "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate2/log.expect b/src/test/test-colocation-strict-separate2/log.expect
new file mode 100644
index 0000000..858d3c9
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/log.expect
@@ -0,0 +1,90 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node4 on
+info     20    node4/crm: status change startup => wait_for_quorum
+info     20    node4/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node5 on
+info     20    node5/crm: status change startup => wait_for_quorum
+info     20    node5/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node4'
+info     20    node1/crm: adding new service 'vm:103' on node 'node5'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: adding new service 'vm:105' on node 'node1'
+info     20    node1/crm: adding new service 'vm:106' on node 'node1'
+info     20    node1/crm: adding new service 'vm:107' on node 'node2'
+info     20    node1/crm: adding new service 'vm:108' on node 'node2'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node4)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node5)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:108': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     21    node1/lrm: starting service vm:105
+info     21    node1/lrm: service status vm:105 started
+info     21    node1/lrm: starting service vm:106
+info     21    node1/lrm: service status vm:106 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:107
+info     23    node2/lrm: service status vm:107 started
+info     23    node2/lrm: starting service vm:108
+info     23    node2/lrm: service status vm:108 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info     26    node4/crm: status change wait_for_quorum => slave
+info     27    node4/lrm: got lock 'ha_agent_node4_lock'
+info     27    node4/lrm: status change wait_for_agent_lock => active
+info     27    node4/lrm: starting service vm:102
+info     27    node4/lrm: service status vm:102 started
+info     28    node5/crm: status change wait_for_quorum => slave
+info     29    node5/lrm: got lock 'ha_agent_node5_lock'
+info     29    node5/lrm: status change wait_for_agent_lock => active
+info     29    node5/lrm: starting service vm:103
+info     29    node5/lrm: service status vm:103 started
+info    120      cmdlist: execute network node5 off
+info    120    node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info    128    node5/crm: status change slave => wait_for_quorum
+info    129    node5/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node5'
+info    170     watchdog: execute power node5 off
+info    169    node5/crm: killed by poweroff
+info    170    node5/lrm: killed by poweroff
+info    170     hardware: server 'node5' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node5_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info    240    node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node2'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node2)
+info    243    node2/lrm: starting service vm:103
+info    243    node2/lrm: service status vm:103 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate2/manager_status b/src/test/test-colocation-strict-separate2/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate2/rules_config b/src/test/test-colocation-strict-separate2/rules_config
new file mode 100644
index 0000000..4167bab
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+    services vm:101,vm:102,vm:103
+    affinity separate
+    strict 1
diff --git a/src/test/test-colocation-strict-separate2/service_config b/src/test/test-colocation-strict-separate2/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate2/service_config
@@ -0,0 +1,10 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node4", "state": "started" },
+    "vm:103": { "node": "node5", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" },
+    "vm:105": { "node": "node1", "state": "started" },
+    "vm:106": { "node": "node1", "state": "started" },
+    "vm:107": { "node": "node2", "state": "started" },
+    "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate3/README b/src/test/test-colocation-strict-separate3/README
new file mode 100644
index 0000000..44d88ef
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/README
@@ -0,0 +1,16 @@
+Test whether a strict negative colocation rule among three services makes two
+of the services migrate to two different recovery nodes than the node of the
+third service in case of a failover of their two previously assigned nodes.
+
+The test scenario is:
+- vm:101, vm:102, and vm:103 must be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node3, node4, and node5
+- node1 and node2 have both higher service counts than node3, node4 and node5
+  to test the colocation rule is enforced even though the utilization would
+  prefer the other node3, node4, and node5
+
+Therefore, the expected outcome is:
+- As node4 and node5 fails, vm:102 and vm:103 are migrated to node2 and node1
+  respectively; even though the utilization of node1 and node2 are high
+  already, the services must be kept separate; node2 is chosen first since
+  node1 has one more service running on it
diff --git a/src/test/test-colocation-strict-separate3/cmdlist b/src/test/test-colocation-strict-separate3/cmdlist
new file mode 100644
index 0000000..1934596
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on", "power node4 on", "power node5 on" ],
+    [ "network node4 off", "network node5 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate3/hardware_status b/src/test/test-colocation-strict-separate3/hardware_status
new file mode 100644
index 0000000..7b8e961
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/hardware_status
@@ -0,0 +1,7 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" },
+  "node4": { "power": "off", "network": "off" },
+  "node5": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate3/log.expect b/src/test/test-colocation-strict-separate3/log.expect
new file mode 100644
index 0000000..4acdcec
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/log.expect
@@ -0,0 +1,110 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node4 on
+info     20    node4/crm: status change startup => wait_for_quorum
+info     20    node4/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node5 on
+info     20    node5/crm: status change startup => wait_for_quorum
+info     20    node5/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node4': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node5': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node4'
+info     20    node1/crm: adding new service 'vm:103' on node 'node5'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: adding new service 'vm:105' on node 'node1'
+info     20    node1/crm: adding new service 'vm:106' on node 'node1'
+info     20    node1/crm: adding new service 'vm:107' on node 'node2'
+info     20    node1/crm: adding new service 'vm:108' on node 'node2'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node4)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node5)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:107': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:108': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     21    node1/lrm: starting service vm:105
+info     21    node1/lrm: service status vm:105 started
+info     21    node1/lrm: starting service vm:106
+info     21    node1/lrm: service status vm:106 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:107
+info     23    node2/lrm: service status vm:107 started
+info     23    node2/lrm: starting service vm:108
+info     23    node2/lrm: service status vm:108 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info     26    node4/crm: status change wait_for_quorum => slave
+info     27    node4/lrm: got lock 'ha_agent_node4_lock'
+info     27    node4/lrm: status change wait_for_agent_lock => active
+info     27    node4/lrm: starting service vm:102
+info     27    node4/lrm: service status vm:102 started
+info     28    node5/crm: status change wait_for_quorum => slave
+info     29    node5/lrm: got lock 'ha_agent_node5_lock'
+info     29    node5/lrm: status change wait_for_agent_lock => active
+info     29    node5/lrm: starting service vm:103
+info     29    node5/lrm: service status vm:103 started
+info    120      cmdlist: execute network node4 off
+info    120      cmdlist: execute network node5 off
+info    120    node1/crm: node 'node4': state changed from 'online' => 'unknown'
+info    120    node1/crm: node 'node5': state changed from 'online' => 'unknown'
+info    126    node4/crm: status change slave => wait_for_quorum
+info    127    node4/lrm: status change active => lost_agent_lock
+info    128    node5/crm: status change slave => wait_for_quorum
+info    129    node5/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node4': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node4'
+info    160    node1/crm: node 'node5': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node5'
+info    168     watchdog: execute power node4 off
+info    167    node4/crm: killed by poweroff
+info    168    node4/lrm: killed by poweroff
+info    168     hardware: server 'node4' stopped by poweroff (watchdog)
+info    170     watchdog: execute power node5 off
+info    169    node5/crm: killed by poweroff
+info    170    node5/lrm: killed by poweroff
+info    170     hardware: server 'node5' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node4_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node4'
+info    240    node1/crm: node 'node4': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node4'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: got lock 'ha_agent_node5_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node5'
+info    240    node1/crm: node 'node5': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node5'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node4' to node 'node2'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node2)
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node5' to node 'node1'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:103
+info    241    node1/lrm: service status vm:103 started
+info    243    node2/lrm: starting service vm:102
+info    243    node2/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate3/manager_status b/src/test/test-colocation-strict-separate3/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate3/rules_config b/src/test/test-colocation-strict-separate3/rules_config
new file mode 100644
index 0000000..4167bab
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+    services vm:101,vm:102,vm:103
+    affinity separate
+    strict 1
diff --git a/src/test/test-colocation-strict-separate3/service_config b/src/test/test-colocation-strict-separate3/service_config
new file mode 100644
index 0000000..2c27816
--- /dev/null
+++ b/src/test/test-colocation-strict-separate3/service_config
@@ -0,0 +1,10 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node4", "state": "started" },
+    "vm:103": { "node": "node5", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" },
+    "vm:105": { "node": "node1", "state": "started" },
+    "vm:106": { "node": "node1", "state": "started" },
+    "vm:107": { "node": "node2", "state": "started" },
+    "vm:108": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate4/README b/src/test/test-colocation-strict-separate4/README
new file mode 100644
index 0000000..31f127d
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/README
@@ -0,0 +1,17 @@
+Test whether a strict negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 must be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+  applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will stay in recovery, since it cannot be started on node1, but
+  cannot be relocated to another one either due to the strict colocation rule
diff --git a/src/test/test-colocation-strict-separate4/cmdlist b/src/test/test-colocation-strict-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate4/hardware_status b/src/test/test-colocation-strict-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate4/log.expect b/src/test/test-colocation-strict-separate4/log.expect
new file mode 100644
index 0000000..f772ea8
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/log.expect
@@ -0,0 +1,69 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'fa:120001' on node 'node3'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'fa:120001': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service fa:120001
+info     25    node3/lrm: service status fa:120001 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'fa:120001': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service fa:120001
+warn    241    node1/lrm: unable to start service fa:120001
+warn    241    node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info    261    node1/lrm: starting service fa:120001
+warn    261    node1/lrm: unable to start service fa:120001
+err     261    node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn    280    node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+warn    280    node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:120001', retry start on current node. Tried nodes: node1
+info    281    node1/lrm: starting service fa:120001
+info    281    node1/lrm: service status fa:120001 started
+info    300    node1/crm: relocation policy successful for 'fa:120001' on node 'node1', failed nodes: node1
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate4/manager_status b/src/test/test-colocation-strict-separate4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate4/rules_config b/src/test/test-colocation-strict-separate4/rules_config
new file mode 100644
index 0000000..3db0056
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-must-vms-be
+    services vm:101,fa:120001
+    affinity separate
+    strict 1
diff --git a/src/test/test-colocation-strict-separate4/service_config b/src/test/test-colocation-strict-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-strict-separate4/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "fa:120001": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-strict-separate5/README b/src/test/test-colocation-strict-separate5/README
new file mode 100644
index 0000000..4cdcbf5
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/README
@@ -0,0 +1,11 @@
+Test whether two pair-wise strict negative colocation rules, i.e. where one
+service is in two separate non-colocation relationship with two other services,
+makes one of the outer services migrate to the same node as the other outer
+service in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102, and vm:101 and vm:103 must each be kept separate
+- vm:101, vm:102, and vm:103 are respectively on node1, node2, and node3
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:103 is migrated to node2 - the same as vm:102
diff --git a/src/test/test-colocation-strict-separate5/cmdlist b/src/test/test-colocation-strict-separate5/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-strict-separate5/hardware_status b/src/test/test-colocation-strict-separate5/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-strict-separate5/log.expect b/src/test/test-colocation-strict-separate5/log.expect
new file mode 100644
index 0000000..16156ad
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/log.expect
@@ -0,0 +1,56 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node1'
+info     20    node1/crm: adding new service 'vm:102' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node3'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node3)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:101
+info     21    node1/lrm: service status vm:101 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:102
+info     23    node2/lrm: service status vm:102 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:103
+info     25    node3/lrm: service status vm:103 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:103': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:103': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2'
+info    240    node1/crm: service 'vm:103': state changed from 'recovery' to 'started'  (node = node2)
+info    243    node2/lrm: starting service vm:103
+info    243    node2/lrm: service status vm:103 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-strict-separate5/manager_status b/src/test/test-colocation-strict-separate5/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-strict-separate5/rules_config b/src/test/test-colocation-strict-separate5/rules_config
new file mode 100644
index 0000000..f72fc66
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/rules_config
@@ -0,0 +1,9 @@
+colocation: lonely-must-some-vms-be1
+    services vm:101,vm:102
+    affinity separate
+    strict 1
+
+colocation: lonely-must-some-vms-be2
+    services vm:101,vm:103
+    affinity separate
+    strict 1
diff --git a/src/test/test-colocation-strict-separate5/service_config b/src/test/test-colocation-strict-separate5/service_config
new file mode 100644
index 0000000..4b26f6b
--- /dev/null
+++ b/src/test/test-colocation-strict-separate5/service_config
@@ -0,0 +1,5 @@
+{
+    "vm:101": { "node": "node1", "state": "started" },
+    "vm:102": { "node": "node2", "state": "started" },
+    "vm:103": { "node": "node3", "state": "started" }
+}
-- 
2.39.5





More information about the pve-devel mailing list