[pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose colocation rules

Daniel Kral d.kral at proxmox.com
Tue Mar 25 16:12:52 CET 2025


Add test cases for loose positive and negative colocation rules, i.e.
where services should be kept on the same node together or kept separate
nodes. These are copies of their strict counterpart tests, but verify
the behavior if the colocation rule cannot be met, i.e. not adhering to
the colocation rule. The test scenarios are:

- 2 neg. colocated services in a 3 node cluster; 1 node failing
- 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
  recovery node cannot start the service
- 2 pos. colocated services in a 3 node cluster; 1 node failing
- 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
  recovery node cannot start one of the services

Signed-off-by: Daniel Kral <d.kral at proxmox.com>
---
 .../test-colocation-loose-separate1/README    | 13 +++
 .../test-colocation-loose-separate1/cmdlist   |  4 +
 .../hardware_status                           |  5 +
 .../log.expect                                | 60 ++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 +
 .../service_config                            |  6 ++
 .../test-colocation-loose-separate4/README    | 17 ++++
 .../test-colocation-loose-separate4/cmdlist   |  4 +
 .../hardware_status                           |  5 +
 .../log.expect                                | 73 +++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 +
 .../service_config                            |  6 ++
 .../test-colocation-loose-together1/README    | 11 +++
 .../test-colocation-loose-together1/cmdlist   |  4 +
 .../hardware_status                           |  5 +
 .../log.expect                                | 66 +++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 +
 .../service_config                            |  6 ++
 .../test-colocation-loose-together3/README    | 16 ++++
 .../test-colocation-loose-together3/cmdlist   |  4 +
 .../hardware_status                           |  5 +
 .../log.expect                                | 93 +++++++++++++++++++
 .../manager_status                            |  1 +
 .../rules_config                              |  4 +
 .../service_config                            |  8 ++
 28 files changed, 431 insertions(+)
 create mode 100644 src/test/test-colocation-loose-separate1/README
 create mode 100644 src/test/test-colocation-loose-separate1/cmdlist
 create mode 100644 src/test/test-colocation-loose-separate1/hardware_status
 create mode 100644 src/test/test-colocation-loose-separate1/log.expect
 create mode 100644 src/test/test-colocation-loose-separate1/manager_status
 create mode 100644 src/test/test-colocation-loose-separate1/rules_config
 create mode 100644 src/test/test-colocation-loose-separate1/service_config
 create mode 100644 src/test/test-colocation-loose-separate4/README
 create mode 100644 src/test/test-colocation-loose-separate4/cmdlist
 create mode 100644 src/test/test-colocation-loose-separate4/hardware_status
 create mode 100644 src/test/test-colocation-loose-separate4/log.expect
 create mode 100644 src/test/test-colocation-loose-separate4/manager_status
 create mode 100644 src/test/test-colocation-loose-separate4/rules_config
 create mode 100644 src/test/test-colocation-loose-separate4/service_config
 create mode 100644 src/test/test-colocation-loose-together1/README
 create mode 100644 src/test/test-colocation-loose-together1/cmdlist
 create mode 100644 src/test/test-colocation-loose-together1/hardware_status
 create mode 100644 src/test/test-colocation-loose-together1/log.expect
 create mode 100644 src/test/test-colocation-loose-together1/manager_status
 create mode 100644 src/test/test-colocation-loose-together1/rules_config
 create mode 100644 src/test/test-colocation-loose-together1/service_config
 create mode 100644 src/test/test-colocation-loose-together3/README
 create mode 100644 src/test/test-colocation-loose-together3/cmdlist
 create mode 100644 src/test/test-colocation-loose-together3/hardware_status
 create mode 100644 src/test/test-colocation-loose-together3/log.expect
 create mode 100644 src/test/test-colocation-loose-together3/manager_status
 create mode 100644 src/test/test-colocation-loose-together3/rules_config
 create mode 100644 src/test/test-colocation-loose-together3/service_config

diff --git a/src/test/test-colocation-loose-separate1/README b/src/test/test-colocation-loose-separate1/README
new file mode 100644
index 0000000..ac7c395
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/README
@@ -0,0 +1,13 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other in case of a
+failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept separate
+- vm:101 and vm:102 are currently running on node2 and node3 respectively
+- node1 has a higher service count than node2 to test that the rule is applied
+  even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, vm:102 is migrated to node1; even though the utilization of
+  node1 is high already, the services must be kept separate
diff --git a/src/test/test-colocation-loose-separate1/cmdlist b/src/test/test-colocation-loose-separate1/cmdlist
new file mode 100644
index 0000000..eee0e40
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on"],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate1/hardware_status b/src/test/test-colocation-loose-separate1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate1/log.expect b/src/test/test-colocation-loose-separate1/log.expect
new file mode 100644
index 0000000..475db39
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/log.expect
@@ -0,0 +1,60 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:102
+info    241    node1/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate1/manager_status b/src/test/test-colocation-loose-separate1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate1/rules_config b/src/test/test-colocation-loose-separate1/rules_config
new file mode 100644
index 0000000..5227309
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+    services vm:101,vm:102
+    affinity separate
+    strict 0
diff --git a/src/test/test-colocation-loose-separate1/service_config b/src/test/test-colocation-loose-separate1/service_config
new file mode 100644
index 0000000..6582e8c
--- /dev/null
+++ b/src/test/test-colocation-loose-separate1/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-separate4/README b/src/test/test-colocation-loose-separate4/README
new file mode 100644
index 0000000..5b68cde
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/README
@@ -0,0 +1,17 @@
+Test whether a loose negative colocation rule among two services makes one of
+the services migrate to a different recovery node than the other service in
+case of a failover of service's previously assigned node. As the service fails
+to start on the recovery node (e.g. insufficient resources), the failing
+service is kept on the recovery node.
+
+The test scenario is:
+- vm:101 and fa:120001 should be kept separate
+- vm:101 and fa:120001 are on node2 and node3 respectively
+- fa:120001 will fail to start on node1
+- node1 has a higher service count than node2 to test the colocation rule is
+  applied even though the scheduler would prefer the less utilized node
+
+Therefore, the expected outcome is:
+- As node3 fails, fa:120001 is migrated to node1
+- fa:120001 will be relocated to another node, since it couldn't start on its
+  initial recovery node
diff --git a/src/test/test-colocation-loose-separate4/cmdlist b/src/test/test-colocation-loose-separate4/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-separate4/hardware_status b/src/test/test-colocation-loose-separate4/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-separate4/log.expect b/src/test/test-colocation-loose-separate4/log.expect
new file mode 100644
index 0000000..bf70aca
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/log.expect
@@ -0,0 +1,73 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'fa:120001' on node 'node3'
+info     20    node1/crm: adding new service 'vm:101' on node 'node2'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: service 'fa:120001': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node2)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:101
+info     23    node2/lrm: service status vm:101 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service fa:120001
+info     25    node3/lrm: service status fa:120001 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'fa:120001': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'fa:120001': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'fa:120001' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'fa:120001': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service fa:120001
+warn    241    node1/lrm: unable to start service fa:120001
+warn    241    node1/lrm: restart policy: retry number 1 for service 'fa:120001'
+info    261    node1/lrm: starting service fa:120001
+warn    261    node1/lrm: unable to start service fa:120001
+err     261    node1/lrm: unable to start service fa:120001 on local node after 1 retries
+warn    280    node1/crm: starting service fa:120001 on node 'node1' failed, relocating service.
+info    280    node1/crm: relocate service 'fa:120001' to node 'node2'
+info    280    node1/crm: service 'fa:120001': state changed from 'started' to 'relocate'  (node = node1, target = node2)
+info    281    node1/lrm: service fa:120001 - start relocate to node 'node2'
+info    281    node1/lrm: service fa:120001 - end relocate to node 'node2'
+info    300    node1/crm: service 'fa:120001': state changed from 'relocate' to 'started'  (node = node2)
+info    303    node2/lrm: starting service fa:120001
+info    303    node2/lrm: service status fa:120001 started
+info    320    node1/crm: relocation policy successful for 'fa:120001' on node 'node2', failed nodes: node1
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-separate4/manager_status b/src/test/test-colocation-loose-separate4/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-separate4/rules_config b/src/test/test-colocation-loose-separate4/rules_config
new file mode 100644
index 0000000..8a4b869
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/rules_config
@@ -0,0 +1,4 @@
+colocation: lonely-should-vms-be
+    services vm:101,fa:120001
+    affinity separate
+    strict 0
diff --git a/src/test/test-colocation-loose-separate4/service_config b/src/test/test-colocation-loose-separate4/service_config
new file mode 100644
index 0000000..f53c2bc
--- /dev/null
+++ b/src/test/test-colocation-loose-separate4/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node2", "state": "started" },
+    "fa:120001": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together1/README b/src/test/test-colocation-loose-together1/README
new file mode 100644
index 0000000..2f5aeec
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/README
@@ -0,0 +1,11 @@
+Test whether a loose positive colocation rule makes two services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+
+The test scenario is:
+- vm:101 and vm:102 should be kept together
+- vm:101 and vm:102 are both currently running on node3
+- node1 and node2 have the same service count to test that the rule is applied
+  even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, both services are migrated to node2
diff --git a/src/test/test-colocation-loose-together1/cmdlist b/src/test/test-colocation-loose-together1/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together1/hardware_status b/src/test/test-colocation-loose-together1/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together1/log.expect b/src/test/test-colocation-loose-together1/log.expect
new file mode 100644
index 0000000..7d43314
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/log.expect
@@ -0,0 +1,66 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:103' on node 'node1'
+info     20    node1/crm: adding new service 'vm:104' on node 'node2'
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:103
+info     21    node1/lrm: service status vm:103 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:104
+info     23    node2/lrm: service status vm:104 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:101': state changed from 'recovery' to 'started'  (node = node1)
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node1'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
+info    241    node1/lrm: starting service vm:101
+info    241    node1/lrm: service status vm:101 started
+info    241    node1/lrm: starting service vm:102
+info    241    node1/lrm: service status vm:102 started
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together1/manager_status b/src/test/test-colocation-loose-together1/manager_status
new file mode 100644
index 0000000..9e26dfe
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/manager_status
@@ -0,0 +1 @@
+{}
\ No newline at end of file
diff --git a/src/test/test-colocation-loose-together1/rules_config b/src/test/test-colocation-loose-together1/rules_config
new file mode 100644
index 0000000..37f6aab
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+    services vm:101,vm:102
+    affinity together
+    strict 0
diff --git a/src/test/test-colocation-loose-together1/service_config b/src/test/test-colocation-loose-together1/service_config
new file mode 100644
index 0000000..9fb091d
--- /dev/null
+++ b/src/test/test-colocation-loose-together1/service_config
@@ -0,0 +1,6 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "vm:103": { "node": "node1", "state": "started" },
+    "vm:104": { "node": "node2", "state": "started" }
+}
diff --git a/src/test/test-colocation-loose-together3/README b/src/test/test-colocation-loose-together3/README
new file mode 100644
index 0000000..c2aebcf
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/README
@@ -0,0 +1,16 @@
+Test whether a loose positive colocation rule makes three services migrate to
+the same recovery node in case of a failover of their previously assigned node.
+If one of those fail to start on the recovery node (e.g. insufficient
+resources), the failed service will be relocated to another node.
+
+The test scenario is:
+- vm:101, vm:102, and fa:120002 should be kept together
+- vm:101, vm:102, and fa:120002 are all currently running on node3
+- fa:120002 will fail to start on node2
+- node1 has a higher service count than node2 to test that the rule is applied
+  even though it would be usually balanced between both remaining nodes
+
+Therefore, the expected outcome is:
+- As node3 fails, all services are migrated to node2
+- Two of those services will start successfully, but fa:120002 will be
+  relocated to another node, since it couldn't start on the same recovery node
diff --git a/src/test/test-colocation-loose-together3/cmdlist b/src/test/test-colocation-loose-together3/cmdlist
new file mode 100644
index 0000000..c0a4daa
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/cmdlist
@@ -0,0 +1,4 @@
+[
+    [ "power node1 on", "power node2 on", "power node3 on" ],
+    [ "network node3 off" ]
+]
diff --git a/src/test/test-colocation-loose-together3/hardware_status b/src/test/test-colocation-loose-together3/hardware_status
new file mode 100644
index 0000000..451beb1
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/hardware_status
@@ -0,0 +1,5 @@
+{
+  "node1": { "power": "off", "network": "off" },
+  "node2": { "power": "off", "network": "off" },
+  "node3": { "power": "off", "network": "off" }
+}
diff --git a/src/test/test-colocation-loose-together3/log.expect b/src/test/test-colocation-loose-together3/log.expect
new file mode 100644
index 0000000..6ca8053
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/log.expect
@@ -0,0 +1,93 @@
+info      0     hardware: starting simulation
+info     20      cmdlist: execute power node1 on
+info     20    node1/crm: status change startup => wait_for_quorum
+info     20    node1/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node2 on
+info     20    node2/crm: status change startup => wait_for_quorum
+info     20    node2/lrm: status change startup => wait_for_agent_lock
+info     20      cmdlist: execute power node3 on
+info     20    node3/crm: status change startup => wait_for_quorum
+info     20    node3/lrm: status change startup => wait_for_agent_lock
+info     20    node1/crm: got lock 'ha_manager_lock'
+info     20    node1/crm: status change wait_for_quorum => master
+info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
+info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
+info     20    node1/crm: adding new service 'fa:120002' on node 'node3'
+info     20    node1/crm: adding new service 'vm:101' on node 'node3'
+info     20    node1/crm: adding new service 'vm:102' on node 'node3'
+info     20    node1/crm: adding new service 'vm:104' on node 'node1'
+info     20    node1/crm: adding new service 'vm:105' on node 'node1'
+info     20    node1/crm: adding new service 'vm:106' on node 'node2'
+info     20    node1/crm: service 'fa:120002': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node3)
+info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node1)
+info     20    node1/crm: service 'vm:106': state changed from 'request_start' to 'started'  (node = node2)
+info     21    node1/lrm: got lock 'ha_agent_node1_lock'
+info     21    node1/lrm: status change wait_for_agent_lock => active
+info     21    node1/lrm: starting service vm:104
+info     21    node1/lrm: service status vm:104 started
+info     21    node1/lrm: starting service vm:105
+info     21    node1/lrm: service status vm:105 started
+info     22    node2/crm: status change wait_for_quorum => slave
+info     23    node2/lrm: got lock 'ha_agent_node2_lock'
+info     23    node2/lrm: status change wait_for_agent_lock => active
+info     23    node2/lrm: starting service vm:106
+info     23    node2/lrm: service status vm:106 started
+info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: starting service fa:120002
+info     25    node3/lrm: service status fa:120002 started
+info     25    node3/lrm: starting service vm:101
+info     25    node3/lrm: service status vm:101 started
+info     25    node3/lrm: starting service vm:102
+info     25    node3/lrm: service status vm:102 started
+info    120      cmdlist: execute network node3 off
+info    120    node1/crm: node 'node3': state changed from 'online' => 'unknown'
+info    124    node3/crm: status change slave => wait_for_quorum
+info    125    node3/lrm: status change active => lost_agent_lock
+info    160    node1/crm: service 'fa:120002': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:101': state changed from 'started' to 'fence'
+info    160    node1/crm: service 'vm:102': state changed from 'started' to 'fence'
+info    160    node1/crm: node 'node3': state changed from 'unknown' => 'fence'
+emai    160    node1/crm: FENCE: Try to fence node 'node3'
+info    166     watchdog: execute power node3 off
+info    165    node3/crm: killed by poweroff
+info    166    node3/lrm: killed by poweroff
+info    166     hardware: server 'node3' stopped by poweroff (watchdog)
+info    240    node1/crm: got lock 'ha_agent_node3_lock'
+info    240    node1/crm: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: node 'node3': state changed from 'fence' => 'unknown'
+emai    240    node1/crm: SUCCEED: fencing: acknowledged - got agent lock for node 'node3'
+info    240    node1/crm: service 'fa:120002': state changed from 'fence' to 'recovery'
+info    240    node1/crm: service 'vm:101': state changed from 'fence' to 'recovery'
+info    240    node1/crm: service 'vm:102': state changed from 'fence' to 'recovery'
+info    240    node1/crm: recover service 'fa:120002' from fenced node 'node3' to node 'node2'
+info    240    node1/crm: service 'fa:120002': state changed from 'recovery' to 'started'  (node = node2)
+info    240    node1/crm: recover service 'vm:101' from fenced node 'node3' to node 'node2'
+info    240    node1/crm: service 'vm:101': state changed from 'recovery' to 'started'  (node = node2)
+info    240    node1/crm: recover service 'vm:102' from fenced node 'node3' to node 'node2'
+info    240    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node2)
+info    243    node2/lrm: starting service fa:120002
+warn    243    node2/lrm: unable to start service fa:120002
+warn    243    node2/lrm: restart policy: retry number 1 for service 'fa:120002'
+info    243    node2/lrm: starting service vm:101
+info    243    node2/lrm: service status vm:101 started
+info    243    node2/lrm: starting service vm:102
+info    243    node2/lrm: service status vm:102 started
+info    263    node2/lrm: starting service fa:120002
+warn    263    node2/lrm: unable to start service fa:120002
+err     263    node2/lrm: unable to start service fa:120002 on local node after 1 retries
+warn    280    node1/crm: starting service fa:120002 on node 'node2' failed, relocating service.
+info    280    node1/crm: relocate service 'fa:120002' to node 'node1'
+info    280    node1/crm: service 'fa:120002': state changed from 'started' to 'relocate'  (node = node2, target = node1)
+info    283    node2/lrm: service fa:120002 - start relocate to node 'node1'
+info    283    node2/lrm: service fa:120002 - end relocate to node 'node1'
+info    300    node1/crm: service 'fa:120002': state changed from 'relocate' to 'started'  (node = node1)
+info    301    node1/lrm: starting service fa:120002
+info    301    node1/lrm: service status fa:120002 started
+info    320    node1/crm: relocation policy successful for 'fa:120002' on node 'node1', failed nodes: node2
+info    720     hardware: exit simulation - done
diff --git a/src/test/test-colocation-loose-together3/manager_status b/src/test/test-colocation-loose-together3/manager_status
new file mode 100644
index 0000000..0967ef4
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/manager_status
@@ -0,0 +1 @@
+{}
diff --git a/src/test/test-colocation-loose-together3/rules_config b/src/test/test-colocation-loose-together3/rules_config
new file mode 100644
index 0000000..b43c087
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/rules_config
@@ -0,0 +1,4 @@
+colocation: vms-might-stick-together
+	services vm:101,vm:102,fa:120002
+	affinity together
+	strict 0
diff --git a/src/test/test-colocation-loose-together3/service_config b/src/test/test-colocation-loose-together3/service_config
new file mode 100644
index 0000000..3ce5f27
--- /dev/null
+++ b/src/test/test-colocation-loose-together3/service_config
@@ -0,0 +1,8 @@
+{
+    "vm:101": { "node": "node3", "state": "started" },
+    "vm:102": { "node": "node3", "state": "started" },
+    "fa:120002": { "node": "node3", "state": "started" },
+    "vm:104": { "node": "node1", "state": "started" },
+    "vm:105": { "node": "node1", "state": "started" },
+    "vm:106": { "node": "node2", "state": "started" }
+}
-- 
2.39.5





More information about the pve-devel mailing list