[pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services

Michael Köppl m.koeppl at proxmox.com
Tue Jul 1 14:11:54 CEST 2025


On 6/20/25 16:31, Daniel Kral wrote:
> select_service_node(...) in 'none' mode will usually only return no
> node, if negative colocations specify more services than nodes
> available. In these cases, these cannot be separated as there are no
> more nodes left, so these are put in error state for now.
> 
> Signed-off-by: Daniel Kral <d.kral at proxmox.com>
> ---
> This is not ideal and I'd rather make this be dropped in the
> check_feasibility(...) part, but then we'd need to introduce more state
> to the check helpers or make a direct call to

This also affects cases where it's not entirely clear why a service is
put into error state. One such case is having a "together" colocation
rule vor VMs 100 and 101 and also defining a location rule that says
that VM 100 has to be on a specific node A. VM 100 will then go into an
error state. From the user's perspective, it is not really transparent
why this happens. Could be that I just made a wrong assumption about
this, but I would've expected VM 100 to be migrated and, due to the
colocation rule, 101 also being migrated to the specified node A, which
is what would happen if migrated manually.

As discussed off-list, one approach to solve this could be to ask users
to create a location rule for each service involved in the "together"
colocation rule upon its creation. As an example:

- 100 has a location rule defined for node A
- User tries to create a colocation rule for 100 and 101
- Dialog asks user to first create a location rule for 101 and node A

With a large number of services this could become tedious, but it would
make combining location and colocation rules for the scenario described
above more explicit and reduce complexity in resolving and applying the
rules.

> PVE::Cluster::get_nodelist(...).
> 
> changes since v1:
>     - NEW!
> 
>  src/PVE/HA/Manager.pm                         | 13 +++++
>  .../test-colocation-strict-separate9/README   | 14 +++++
>  .../test-colocation-strict-separate9/cmdlist  |  3 +
>  .../hardware_status                           |  5 ++
>  .../log.expect                                | 57 +++++++++++++++++++
>  .../manager_status                            |  1 +
>  .../rules_config                              |  3 +
>  .../service_config                            |  7 +++
>  8 files changed, 103 insertions(+)
>  create mode 100644 src/test/test-colocation-strict-separate9/README
>  create mode 100644 src/test/test-colocation-strict-separate9/cmdlist
>  create mode 100644 src/test/test-colocation-strict-separate9/hardware_status
>  create mode 100644 src/test/test-colocation-strict-separate9/log.expect
>  create mode 100644 src/test/test-colocation-strict-separate9/manager_status
>  create mode 100644 src/test/test-colocation-strict-separate9/rules_config
>  create mode 100644 src/test/test-colocation-strict-separate9/service_config
> 
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 66e5710..59b2998 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -1092,6 +1092,19 @@ sub next_state_started {
>                          );
>                          delete $sd->{maintenance_node};
>                      }
> +                } elsif ($select_mode eq 'none' && !defined($node)) {
> +                    # Having no node here means that the service is started but cannot find any
> +                    # node it is allowed to run on, e.g. added negative colocation rule, while the
> +                    # nodes aren't separated yet.
> +                    # TODO Could be made impossible by a dynamic check to drop negative colocation
> +                    #      rules which have defined more services than available nodes
> +                    $haenv->log(
> +                        'err',
> +                        "service '$sid' cannot run on '$sd->{node}', but no recovery node found",
> +                    );
> +
> +                    # TODO Should this really move the service to the error state?
> +                    $change_service_state->($self, $sid, 'error');
>                  }
>  
>                  # ensure service get started again if it went unexpected down
> diff --git a/src/test/test-colocation-strict-separate9/README b/src/test/test-colocation-strict-separate9/README
> new file mode 100644
> index 0000000..85494dd
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/README
> @@ -0,0 +1,14 @@
> +Test whether a strict negative colocation rule among five services on a three
> +node cluster, makes the services which are on the same node be put in error
> +state as there are not enough nodes to separate all of them and it's also not
> +clear which of the three is more important to run.
> +
> +The test scenario is:
> +- vm:101 through vm:105 must be kept separate
> +- vm:101 through vm:105 are all running on node1
> +
> +The expected outcome is:
> +- As the cluster comes up, vm:102 and vm:103 are migrated to node2 and node3
> +- vm:101, vm:104, and vm:105 will be put in error state as there are not enough
> +  nodes left to separate them but it is also not clear which service is more
> +  important to be run on the only node left.
> diff --git a/src/test/test-colocation-strict-separate9/cmdlist b/src/test/test-colocation-strict-separate9/cmdlist
> new file mode 100644
> index 0000000..3bfad44
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/cmdlist
> @@ -0,0 +1,3 @@
> +[
> +    [ "power node1 on", "power node2 on", "power node3 on"]
> +]
> diff --git a/src/test/test-colocation-strict-separate9/hardware_status b/src/test/test-colocation-strict-separate9/hardware_status
> new file mode 100644
> index 0000000..451beb1
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/hardware_status
> @@ -0,0 +1,5 @@
> +{
> +  "node1": { "power": "off", "network": "off" },
> +  "node2": { "power": "off", "network": "off" },
> +  "node3": { "power": "off", "network": "off" }
> +}
> diff --git a/src/test/test-colocation-strict-separate9/log.expect b/src/test/test-colocation-strict-separate9/log.expect
> new file mode 100644
> index 0000000..efe85a2
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/log.expect
> @@ -0,0 +1,57 @@
> +info      0     hardware: starting simulation
> +info     20      cmdlist: execute power node1 on
> +info     20    node1/crm: status change startup => wait_for_quorum
> +info     20    node1/lrm: status change startup => wait_for_agent_lock
> +info     20      cmdlist: execute power node2 on
> +info     20    node2/crm: status change startup => wait_for_quorum
> +info     20    node2/lrm: status change startup => wait_for_agent_lock
> +info     20      cmdlist: execute power node3 on
> +info     20    node3/crm: status change startup => wait_for_quorum
> +info     20    node3/lrm: status change startup => wait_for_agent_lock
> +info     20    node1/crm: got lock 'ha_manager_lock'
> +info     20    node1/crm: status change wait_for_quorum => master
> +info     20    node1/crm: node 'node1': state changed from 'unknown' => 'online'
> +info     20    node1/crm: node 'node2': state changed from 'unknown' => 'online'
> +info     20    node1/crm: node 'node3': state changed from 'unknown' => 'online'
> +info     20    node1/crm: adding new service 'vm:101' on node 'node1'
> +info     20    node1/crm: adding new service 'vm:102' on node 'node1'
> +info     20    node1/crm: adding new service 'vm:103' on node 'node1'
> +info     20    node1/crm: adding new service 'vm:104' on node 'node1'
> +info     20    node1/crm: adding new service 'vm:105' on node 'node1'
> +info     20    node1/crm: service 'vm:101': state changed from 'request_start' to 'started'  (node = node1)
> +info     20    node1/crm: service 'vm:102': state changed from 'request_start' to 'started'  (node = node1)
> +info     20    node1/crm: service 'vm:103': state changed from 'request_start' to 'started'  (node = node1)
> +info     20    node1/crm: service 'vm:104': state changed from 'request_start' to 'started'  (node = node1)
> +info     20    node1/crm: service 'vm:105': state changed from 'request_start' to 'started'  (node = node1)
> +info     20    node1/crm: migrate service 'vm:101' to node 'node2' (running)
> +info     20    node1/crm: service 'vm:101': state changed from 'started' to 'migrate'  (node = node1, target = node2)
> +info     20    node1/crm: migrate service 'vm:102' to node 'node3' (running)
> +info     20    node1/crm: service 'vm:102': state changed from 'started' to 'migrate'  (node = node1, target = node3)
> +err      20    node1/crm: service 'vm:103' cannot run on 'node1', but no recovery node found
> +info     20    node1/crm: service 'vm:103': state changed from 'started' to 'error'
> +err      20    node1/crm: service 'vm:104' cannot run on 'node1', but no recovery node found
> +info     20    node1/crm: service 'vm:104': state changed from 'started' to 'error'
> +err      20    node1/crm: service 'vm:105' cannot run on 'node1', but no recovery node found
> +info     20    node1/crm: service 'vm:105': state changed from 'started' to 'error'
> +info     21    node1/lrm: got lock 'ha_agent_node1_lock'
> +info     21    node1/lrm: status change wait_for_agent_lock => active
> +info     21    node1/lrm: service vm:101 - start migrate to node 'node2'
> +info     21    node1/lrm: service vm:101 - end migrate to node 'node2'
> +info     21    node1/lrm: service vm:102 - start migrate to node 'node3'
> +info     21    node1/lrm: service vm:102 - end migrate to node 'node3'
> +err      21    node1/lrm: service vm:103 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +err      21    node1/lrm: service vm:104 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +err      21    node1/lrm: service vm:105 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
> +info     22    node2/crm: status change wait_for_quorum => slave
> +info     24    node3/crm: status change wait_for_quorum => slave
> +info     40    node1/crm: service 'vm:101': state changed from 'migrate' to 'started'  (node = node2)
> +info     40    node1/crm: service 'vm:102': state changed from 'migrate' to 'started'  (node = node3)
> +info     43    node2/lrm: got lock 'ha_agent_node2_lock'
> +info     43    node2/lrm: status change wait_for_agent_lock => active
> +info     43    node2/lrm: starting service vm:101
> +info     43    node2/lrm: service status vm:101 started
> +info     45    node3/lrm: got lock 'ha_agent_node3_lock'
> +info     45    node3/lrm: status change wait_for_agent_lock => active
> +info     45    node3/lrm: starting service vm:102
> +info     45    node3/lrm: service status vm:102 started
> +info    620     hardware: exit simulation - done
> diff --git a/src/test/test-colocation-strict-separate9/manager_status b/src/test/test-colocation-strict-separate9/manager_status
> new file mode 100644
> index 0000000..9e26dfe
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/manager_status
> @@ -0,0 +1 @@
> +{}
> \ No newline at end of file
> diff --git a/src/test/test-colocation-strict-separate9/rules_config b/src/test/test-colocation-strict-separate9/rules_config
> new file mode 100644
> index 0000000..478d70b
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/rules_config
> @@ -0,0 +1,3 @@
> +colocation: lonely-must-too-many-vms-be
> +	services vm:101,vm:102,vm:103,vm:104,vm:105
> +	affinity separate
> diff --git a/src/test/test-colocation-strict-separate9/service_config b/src/test/test-colocation-strict-separate9/service_config
> new file mode 100644
> index 0000000..a1d61f5
> --- /dev/null
> +++ b/src/test/test-colocation-strict-separate9/service_config
> @@ -0,0 +1,7 @@
> +{
> +    "vm:101": { "node": "node1", "state": "started" },
> +    "vm:102": { "node": "node1", "state": "started" },
> +    "vm:103": { "node": "node1", "state": "started" },
> +    "vm:104": { "node": "node1", "state": "started" },
> +    "vm:105": { "node": "node1", "state": "started" }
> +}





More information about the pve-devel mailing list