[pve-devel] [PATCH ha-manager 13/15] test: ha tester: add test cases for loose colocation rules

Fri May 9 13:20:50 CEST 2025

On 4/28/25 16:44, Fiona Ebner wrote:
> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>> Add test cases for loose positive and negative colocation rules, i.e.
>> where services should be kept on the same node together or kept separate
>> nodes. These are copies of their strict counterpart tests, but verify
>> the behavior if the colocation rule cannot be met, i.e. not adhering to
>> the colocation rule. The test scenarios are:
>>
>> - 2 neg. colocated services in a 3 node cluster; 1 node failing
>> - 2 neg. colocated services in a 3 node cluster; 1 node failing, but the
>>    recovery node cannot start the service
>> - 2 pos. colocated services in a 3 node cluster; 1 node failing
>> - 3 pos. colocated services in a 3 node cluster; 1 node failing, but the
>>    recovery node cannot start one of the services
>>
>> Signed-off-by: Daniel Kral <d.kral at proxmox.com>
> 
> With the errors in the descriptions fixed:
> 
> Reviewed-by: Fiona Ebner <f.ebner at proxmox.com>

ACK

> 
>> diff --git a/src/test/test-colocation-loose-separate4/README b/src/test/test-colocation-loose-separate4/README
> 
> Not sure it should be named the same number as the strict test just
> because it's adapted from that.

Me neither... I'll just make them consecutive in the next revision. If 
we wanted to be exhaustive we could run all/most of test cases for the 
strict colocation rules against the loose colocation rules but I'd 
figure that it would be a waste of resources when running the test suite 
/ building the package as it's a lot of duplicate code.

In general, it'd be sure great to have a better overview on what the 
current test cases already cover as the directory names can only get so 
long and give only so much description on what's tested and going 
through every README is also a hassle. But that's a whole other topic.

> 
>> new file mode 100644
>> index 0000000..5b68cde
>> --- /dev/null
>> +++ b/src/test/test-colocation-loose-separate4/README
>> @@ -0,0 +1,17 @@
>> +Test whether a loose negative colocation rule among two services makes one of
>> +the services migrate to a different recovery node than the other service in
>> +case of a failover of service's previously assigned node. As the service fails
>> +to start on the recovery node (e.g. insufficient resources), the failing
>> +service is kept on the recovery node.
> 
> The description here is wrong. It will be started on a different node
> after the start failure.

ACK

> 
>> +
>> +The test scenario is:
>> +- vm:101 and fa:120001 should be kept separate
>> +- vm:101 and fa:120001 are on node2 and node3 respectively
>> +- fa:120001 will fail to start on node1
>> +- node1 has a higher service count than node2 to test the colocation rule is
>> +  applied even though the scheduler would prefer the less utilized node
>> +
>> +Therefore, the expected outcome is:
>> +- As node3 fails, fa:120001 is migrated to node1
>> +- fa:120001 will be relocated to another node, since it couldn't start on its
>> +  initial recovery node

Also mentioned the node where it is migrated to here so that it is clear 
that loose colocation rules are free to ignore the rule if it would mean 
that the service is kept in recovery state else

> 
> ---snip 8<---
> 
>> diff --git a/src/test/test-colocation-loose-together1/README b/src/test/test-colocation-loose-together1/README
>> new file mode 100644
>> index 0000000..2f5aeec
>> --- /dev/null
>> +++ b/src/test/test-colocation-loose-together1/README
>> @@ -0,0 +1,11 @@
>> +Test whether a loose positive colocation rule makes two services migrate to
>> +the same recovery node in case of a failover of their previously assigned node.
>> +
>> +The test scenario is:
>> +- vm:101 and vm:102 should be kept together
>> +- vm:101 and vm:102 are both currently running on node3
>> +- node1 and node2 have the same service count to test that the rule is applied
>> +  even though it would be usually balanced between both remaining nodes
>> +
>> +Therefore, the expected outcome is:
>> +- As node3 fails, both services are migrated to node2
> 
> It's actually node1

ACK