[pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services

Daniel Kral d.kral at proxmox.com
Tue Jul 1 14:23:29 CEST 2025


On 7/1/25 14:11, Michael Köppl wrote:
> On 6/20/25 16:31, Daniel Kral wrote:
>> select_service_node(...) in 'none' mode will usually only return no
>> node, if negative colocations specify more services than nodes
>> available. In these cases, these cannot be separated as there are no
>> more nodes left, so these are put in error state for now.
>>
>> Signed-off-by: Daniel Kral <d.kral at proxmox.com>
>> ---
>> This is not ideal and I'd rather make this be dropped in the
>> check_feasibility(...) part, but then we'd need to introduce more state
>> to the check helpers or make a direct call to
> 
> This also affects cases where it's not entirely clear why a service is
> put into error state. One such case is having a "together" colocation
> rule vor VMs 100 and 101 and also defining a location rule that says
> that VM 100 has to be on a specific node A. VM 100 will then go into an
> error state. From the user's perspective, it is not really transparent
> why this happens. Could be that I just made a wrong assumption about
> this, but I would've expected VM 100 to be migrated and, due to the
> colocation rule, 101 also being migrated to the specified node A, which
> is what would happen if migrated manually.
> 
> As discussed off-list, one approach to solve this could be to ask users
> to create a location rule for each service involved in the "together"
> colocation rule upon its creation. As an example:
> 
> - 100 has a location rule defined for node A
> - User tries to create a colocation rule for 100 and 101
> - Dialog asks user to first create a location rule for 101 and node A
> 
> With a large number of services this could become tedious, but it would
> make combining location and colocation rules for the scenario described
> above more explicit and reduce complexity in resolving and applying the
> rules.

Right, as already anticipated and discussed off-list, moving the service 
in error state creates more trouble than necessary and is also confusing 
to and unwanted by end users. I'll remove that in v3 as well.

I'd also rather restrict these combinations more in advance (i.e., in a 
rule checker), that users need to specify the node affinity for _all_ 
services that are in a positive service affinity rule, as else it is 
rather ambiguous what is to be done. More on that in my self-reply for 
ha-manager patch #15.

We can still remove that restriction later and do some inference, but as 
already discussed off-list, I think that is rather confusing with an 
increasing amount of rules. But removing ambiguity from the start for 
the user and the HA Manager is a benefit IMO.




More information about the pve-devel mailing list