[pve-devel] [RFC cluster/ha-manager 00/16] HA colocation rules
Daniel Kral
d.kral at proxmox.com
Tue Mar 25 16:12:38 CET 2025
This RFC patch series is a draft for the implementation to allow users
to specify colocation rules (or affinity/anti-affinity) for the HA
Manager, so that two or more services are either kept together or apart
with respect to each other in case of service recovery or if
auto-rebalancing on service start is enabled.
I chose the name "colocation" in favor of affinity/anti-affinity, since
it is a bit more concise that it is about co-locating services between
each other in contrast to locating services on nodes, but no hard
feelings to change it (same for any other names in this series).
Many thanks to @Thomas, @Fiona, @Friedrich, and @Hannes Duerr for the
discussions about this feature off-list!
Recap: HA groups
----------------
The HA Manager currently allows a service to be assigned to one HA
groups, which essentially implements an affinity to a set of nodes. This
affinity can either be unrestricted or restricted, where the first
allows recovery to nodes outside of the HA group's nodes, if those are
currently unavailable.
This allows users to constrain the set of nodes, that can be selected
from as the starting and/or recovery node. Furthermore, each node in a
HA group can have an individual priority. This further constraints the
set of possible recovery nodes to the subset of online nodes in the
highest priority group.
Introduction
------------
Colocation is the concept of an inter-service affinity relationship,
which can either be positive (keep services together) or negative (keep
services apart). This is in contrast with the service-nodes affinity
relationship implemented by HA groups.
In addition to the positive-negative dimension, there's also the
mandatory-optional axis. Currently, this is a binary setting, whether
failing to meet the colocation relationship results in a service
- (1) being kept in recovery for a mandatory colocation rule, or
- (2) is migrated in ignorance to the optional colocation rule.
Motivation
----------
There are many different use cases to support colocation, but two simple
examples that come to mind are:
- Two or more services need to communicate with each other very
frequently. To reduce the communication path length and therefore
hopefully the latency, keep them together on one node.
- Two or more services need a lot of computational resources and will
therefore consume much of the assigned node's resource capacity. To
reduce starving and memory stalls, keep them separate on multiple
nodes, so that they have enough resources for themselves.
And some more concrete use cases from current HA Manager users:
- "For example: let's say we have three DB VMs (DB nodes in a cluster)
which we want to run on ANY PVE host, but we don't want them to be on
the same host." [0]
- "An example is: When Server from the DMZ zone start on the same host
like the firewall or another example the application servers start on
the same host like the sql server. Services they depend on each other
have short latency to each other." [1]
HA Rules
--------
To implement colocation, this patch series introduces HA rules, which
allows users to specify the colocation requirements on services. These
are implemented with the widely used section config, where each type of
rule is a individual plugin (for now only 'colocation').
This introduces some small initial complexity for testing satisfiability
of the rules, but allows the constraint interface to be extensible, and
hopefully allow easier reasoning about the node selection process with
the added constraint rules in the future.
Colocation Rules
----------------
The two properties of colocation rules, as described in the
introduction, are rather straightforward. A typical colocation rule
inside of the config would look like the following:
colocation: some-lonely-services
services vm:101,vm:103,ct:909
affinity separate
strict 1
This means that the three services vm:101, vm:103 and ct:909 must be
kept separate on different nodes. I'm very keen on naming suggestions
since I think there could be a better word than 'affinity' here. I
played around with 'keep-services', since then it would always read
something like 'keep-services separate', which is very declarative, but
this might suggest that this is a binary option to too much users (I
mean it is, but not with the values 0 and 1).
Satisfiability and Inference
----------------------------
Since rules allow more complexity, it is necessary to check whether
rules can be (1) satisfied, (2) simplified, and (3) infer other
constraints. There's a static part (i.e. the configuration file) and a
dynamic part (i.e. when deciding the next node) for this.
| Satisfiability
----------
Statically, colocation rules currently must satisfy:
- Two or more services must not be in both a positive and negative
colocation rule.
- Two or more services in a positive colocation rule must not be in
restricted HA groups with disjoint node sets.
- Two or more services in a negative colocation rule, which are in
restricted HA groups, must have at least as many statically available
nodes as node-restricted services.
The first is obvious. The second one asserts whether there is at least a
common node that can be recovered to. The third one asserts whether
there are enough nodes that can be selected from for recovery of the
services, which are restricted to a set of node.
Of course, it doesn't make sense to have three services in a negative
colocation relation in case of a failover, if there are only three
cluster nodes, but the static part is only a best effort to reduce
obvious misconfigurations.
| Canonicalization
----------
Additionally, colocation rules are currently simplified as follows:
- If there are multiple positive colocation rules with common services
and the same strictness, these are merged to a single positive
colocation rule.
| Inference rules
----------
There are currently no inference rules implemented for the RFC, but
there could be potential to further simplify some code paths in the
future, e.g. a positive colocation rule where one service is part of a
restricted HA group makes the other services in the positive colocation
rule a part of this HA group as well.
I leave this open for discussion here.
Special negative colocation scenarios
-------------------------------------
Just to be aware of these, there's a distinction between the following
two sets of negative colocation rules:
colocation: separate-vms
services vm:101,vm:102,vm:103
affinity separate
strict 1
and
colocation: separate-vms1
services vm:101,vm:102
affinity separate
strict 1
colocation: separate-vms2
services vm:102,vm:103
affinity separate
strict 1
The first keeps all three services separate from each other, while the
second only keeps pair-wise services separate from each other, but
vm:101 and vm:103 might be migrated to the same node.
Test cases
----------
The test cases are quite straight forward and I designed them so they
would fail without the colocation rules applied. This can be verified,
if the `apply_colocation_rules(...)` is removed from the
`select_service_node()` body.
They are not completely exhaustive and I didn't implement test cases
with HA groups yet (both for the ha-tester and rules config tests), but
would be implemented in a post-RFC.
Also the loose tests are complete copies of their strict counterparts,
where only the expected log and the rules are changed from 'strict 1' to
'strict 0'.
TODO
----
- WebGUI Integration
- User Documentation
- Add test cases with HA groups and more complex scenarios
- CLI / API endpoints for CRUD and maybe verification
- Cleanup the `select_service_node` signature into two structs as
suggested by @Thomas in [3]
Additional and/or future ideas
------------------------------
- Transforming HA groups to location rules (see comment below).
- Make recomputing the online node usage more granular.
- Add information of overall free node resources to improve decision
heuristic when recovering services to nodes.
- Improve recovery node selection for optional positive colocation.
Correlated with the idea about free node resources above.
- When deciding the recovery node for positively colocated services,
account for the needed resources of all to-be-migrated services rather
than just the first one. This is a non-trivial problem as we currently
solve this as a online bin covering problem, i.e. selecting for each
service alone instead of selecting for all services together.
- When migrating a service manually, migrate the colocated services too.
But this would also mean that we need to check whether a migration is
legal according to the colocation rules, which we do not do yet for HA
groups.
- Dynamic colocation rule health statistics (e.g. warn on the
satisfiability of a colocation rule), e.g. in the WebGUI and/or API.
- Property for mandatory colocation rules to specify whether all
services should be stopped if the rule cannot be satisfied.
Comment about HA groups -> Location Rules
-----------------------------------------
This part is not really part of the patch series, but still worth for an
on-list discussion.
I'd like to suggest to also transform the existing HA groups to location
rules, if the rule concept turns out to be a good fit for the colocation
feature in the HA Manager, as HA groups seem to integrate quite easily
into this concept.
This would make service-node relationships a little more flexible for
users and we'd be able to have both configurable / visible in the same
WebUI view, API endpoint, and configuration file. Also, some code paths
could be a little more consise, e.g. checking changes to constraints and
canonicalizing the rules config.
The how should be rather straightforward for the obvious use cases:
- Services in unrestricted HA groups -> Location rules with the nodes of
the HA group; We could either split each node priority group into
separate location rules (with each having their score / weight) or
keep the input format of HA groups with a list of
`<node>(:<priority>)` in each rule
- Services in restricted HA groups -> Same as above, but also using
either `+inf` for a mandatory location rule or `strict` property
depending on how we decide on the colocation rule properties
This would allow most of the use cases of HA groups to be easily
migratable to location rules. We could also keep the inference of the
'default group' for unrestricted HA groups (any node that is available
is added as a group member with priority -1).
The only thing that I'm unsure about this, is how we would migrate the
`nofailback` option, since this operates on the group-level. If we keep
the `<node>(:<priority>)` syntax and restrict that each service can only
be part of one location rule, it'd be easy to have the same flag. If we
go with multiple location rules per service and each having a score or
weight (for the priority), then we wouldn't be able to have this flag
anymore. I think we could keep the semantic if we move this flag to the
service config, but I'm thankful for any comments on this.
[0] https://clusterlabs.org/projects/pacemaker/doc/3.0/Pacemaker_Explained/html/constraints.html#colocation-properties
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5260
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=5332
[3] https://lore.proxmox.com/pve-devel/c8fa7b8c-fb37-5389-1302-2002780d4ee2@proxmox.com/
Diffstat
--------
pve-cluster:
Daniel Kral (1):
cfs: add 'ha/rules.cfg' to observed files
src/PVE/Cluster.pm | 1 +
src/pmxcfs/status.c | 1 +
2 files changed, 2 insertions(+)
pve-ha-manager:
Daniel Kral (15):
ignore output of fence config tests in tree
tools: add hash set helper subroutines
usage: add get_service_node and pin_service_node methods
add rules section config base plugin
rules: add colocation rule plugin
config, env, hw: add rules read and parse methods
manager: read and update rules config
manager: factor out prioritized nodes in select_service_node
manager: apply colocation rules when selecting service nodes
sim: resources: add option to limit start and migrate tries to node
test: ha tester: add test cases for strict negative colocation rules
test: ha tester: add test cases for strict positive colocation rules
test: ha tester: add test cases for loose colocation rules
test: ha tester: add test cases in more complex scenarios
test: add test cases for rules config
.gitignore | 3 +
debian/pve-ha-manager.install | 2 +
src/PVE/HA/Config.pm | 12 +
src/PVE/HA/Env.pm | 6 +
src/PVE/HA/Env/PVE2.pm | 13 +
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Manager.pm | 235 ++++++++++-
src/PVE/HA/Rules.pm | 118 ++++++
src/PVE/HA/Rules/Colocation.pm | 391 ++++++++++++++++++
src/PVE/HA/Rules/Makefile | 6 +
src/PVE/HA/Sim/Env.pm | 15 +
src/PVE/HA/Sim/Hardware.pm | 15 +
src/PVE/HA/Sim/Resources/VirtFail.pm | 37 +-
src/PVE/HA/Tools.pm | 53 +++
src/PVE/HA/Usage.pm | 12 +
src/PVE/HA/Usage/Basic.pm | 15 +
src/PVE/HA/Usage/Static.pm | 14 +
src/test/Makefile | 4 +-
.../connected-positive-colocations.cfg | 34 ++
.../connected-positive-colocations.cfg.expect | 54 +++
.../rules_cfgs/illdefined-colocations.cfg | 9 +
.../illdefined-colocations.cfg.expect | 12 +
.../inner-inconsistent-colocations.cfg | 14 +
.../inner-inconsistent-colocations.cfg.expect | 13 +
.../test-colocation-loose-separate1/README | 13 +
.../test-colocation-loose-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-separate4/README | 17 +
.../test-colocation-loose-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 73 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-together1/README | 11 +
.../test-colocation-loose-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-loose-together3/README | 16 +
.../test-colocation-loose-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 93 +++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-colocation-strict-separate1/README | 13 +
.../test-colocation-strict-separate1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 60 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate2/README | 15 +
.../test-colocation-strict-separate2/cmdlist | 4 +
.../hardware_status | 7 +
.../log.expect | 90 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 +
.../test-colocation-strict-separate3/README | 16 +
.../test-colocation-strict-separate3/cmdlist | 4 +
.../hardware_status | 7 +
.../log.expect | 110 +++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 10 +
.../test-colocation-strict-separate4/README | 17 +
.../test-colocation-strict-separate4/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 69 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-separate5/README | 11 +
.../test-colocation-strict-separate5/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 56 +++
.../manager_status | 1 +
.../rules_config | 9 +
.../service_config | 5 +
.../test-colocation-strict-together1/README | 11 +
.../test-colocation-strict-together1/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 66 +++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 6 +
.../test-colocation-strict-together2/README | 11 +
.../test-colocation-strict-together2/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 80 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-colocation-strict-together3/README | 17 +
.../test-colocation-strict-together3/cmdlist | 4 +
.../hardware_status | 5 +
.../log.expect | 89 ++++
.../manager_status | 1 +
.../rules_config | 4 +
.../service_config | 8 +
.../test-crs-static-rebalance-coloc1/README | 26 ++
.../test-crs-static-rebalance-coloc1/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 120 ++++++
.../manager_status | 1 +
.../rules_config | 24 ++
.../service_config | 10 +
.../static_service_stats | 10 +
.../test-crs-static-rebalance-coloc2/README | 16 +
.../test-crs-static-rebalance-coloc2/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 5 +
.../log.expect | 86 ++++
.../manager_status | 1 +
.../rules_config | 14 +
.../service_config | 5 +
.../static_service_stats | 5 +
.../test-crs-static-rebalance-coloc3/README | 14 +
.../test-crs-static-rebalance-coloc3/cmdlist | 4 +
.../datacenter.cfg | 6 +
.../hardware_status | 7 +
.../log.expect | 156 +++++++
.../manager_status | 1 +
.../rules_config | 49 +++
.../service_config | 7 +
.../static_service_stats | 5 +
src/test/test_failover1.pl | 4 +-
src/test/test_rules_config.pl | 100 +++++
137 files changed, 3113 insertions(+), 20 deletions(-)
create mode 100644 src/PVE/HA/Rules.pm
create mode 100644 src/PVE/HA/Rules/Colocation.pm
create mode 100644 src/PVE/HA/Rules/Makefile
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg
create mode 100644 src/test/rules_cfgs/connected-positive-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg
create mode 100644 src/test/rules_cfgs/illdefined-colocations.cfg.expect
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg
create mode 100644 src/test/rules_cfgs/inner-inconsistent-colocations.cfg.expect
create mode 100644 src/test/test-colocation-loose-separate1/README
create mode 100644 src/test/test-colocation-loose-separate1/cmdlist
create mode 100644 src/test/test-colocation-loose-separate1/hardware_status
create mode 100644 src/test/test-colocation-loose-separate1/log.expect
create mode 100644 src/test/test-colocation-loose-separate1/manager_status
create mode 100644 src/test/test-colocation-loose-separate1/rules_config
create mode 100644 src/test/test-colocation-loose-separate1/service_config
create mode 100644 src/test/test-colocation-loose-separate4/README
create mode 100644 src/test/test-colocation-loose-separate4/cmdlist
create mode 100644 src/test/test-colocation-loose-separate4/hardware_status
create mode 100644 src/test/test-colocation-loose-separate4/log.expect
create mode 100644 src/test/test-colocation-loose-separate4/manager_status
create mode 100644 src/test/test-colocation-loose-separate4/rules_config
create mode 100644 src/test/test-colocation-loose-separate4/service_config
create mode 100644 src/test/test-colocation-loose-together1/README
create mode 100644 src/test/test-colocation-loose-together1/cmdlist
create mode 100644 src/test/test-colocation-loose-together1/hardware_status
create mode 100644 src/test/test-colocation-loose-together1/log.expect
create mode 100644 src/test/test-colocation-loose-together1/manager_status
create mode 100644 src/test/test-colocation-loose-together1/rules_config
create mode 100644 src/test/test-colocation-loose-together1/service_config
create mode 100644 src/test/test-colocation-loose-together3/README
create mode 100644 src/test/test-colocation-loose-together3/cmdlist
create mode 100644 src/test/test-colocation-loose-together3/hardware_status
create mode 100644 src/test/test-colocation-loose-together3/log.expect
create mode 100644 src/test/test-colocation-loose-together3/manager_status
create mode 100644 src/test/test-colocation-loose-together3/rules_config
create mode 100644 src/test/test-colocation-loose-together3/service_config
create mode 100644 src/test/test-colocation-strict-separate1/README
create mode 100644 src/test/test-colocation-strict-separate1/cmdlist
create mode 100644 src/test/test-colocation-strict-separate1/hardware_status
create mode 100644 src/test/test-colocation-strict-separate1/log.expect
create mode 100644 src/test/test-colocation-strict-separate1/manager_status
create mode 100644 src/test/test-colocation-strict-separate1/rules_config
create mode 100644 src/test/test-colocation-strict-separate1/service_config
create mode 100644 src/test/test-colocation-strict-separate2/README
create mode 100644 src/test/test-colocation-strict-separate2/cmdlist
create mode 100644 src/test/test-colocation-strict-separate2/hardware_status
create mode 100644 src/test/test-colocation-strict-separate2/log.expect
create mode 100644 src/test/test-colocation-strict-separate2/manager_status
create mode 100644 src/test/test-colocation-strict-separate2/rules_config
create mode 100644 src/test/test-colocation-strict-separate2/service_config
create mode 100644 src/test/test-colocation-strict-separate3/README
create mode 100644 src/test/test-colocation-strict-separate3/cmdlist
create mode 100644 src/test/test-colocation-strict-separate3/hardware_status
create mode 100644 src/test/test-colocation-strict-separate3/log.expect
create mode 100644 src/test/test-colocation-strict-separate3/manager_status
create mode 100644 src/test/test-colocation-strict-separate3/rules_config
create mode 100644 src/test/test-colocation-strict-separate3/service_config
create mode 100644 src/test/test-colocation-strict-separate4/README
create mode 100644 src/test/test-colocation-strict-separate4/cmdlist
create mode 100644 src/test/test-colocation-strict-separate4/hardware_status
create mode 100644 src/test/test-colocation-strict-separate4/log.expect
create mode 100644 src/test/test-colocation-strict-separate4/manager_status
create mode 100644 src/test/test-colocation-strict-separate4/rules_config
create mode 100644 src/test/test-colocation-strict-separate4/service_config
create mode 100644 src/test/test-colocation-strict-separate5/README
create mode 100644 src/test/test-colocation-strict-separate5/cmdlist
create mode 100644 src/test/test-colocation-strict-separate5/hardware_status
create mode 100644 src/test/test-colocation-strict-separate5/log.expect
create mode 100644 src/test/test-colocation-strict-separate5/manager_status
create mode 100644 src/test/test-colocation-strict-separate5/rules_config
create mode 100644 src/test/test-colocation-strict-separate5/service_config
create mode 100644 src/test/test-colocation-strict-together1/README
create mode 100644 src/test/test-colocation-strict-together1/cmdlist
create mode 100644 src/test/test-colocation-strict-together1/hardware_status
create mode 100644 src/test/test-colocation-strict-together1/log.expect
create mode 100644 src/test/test-colocation-strict-together1/manager_status
create mode 100644 src/test/test-colocation-strict-together1/rules_config
create mode 100644 src/test/test-colocation-strict-together1/service_config
create mode 100644 src/test/test-colocation-strict-together2/README
create mode 100644 src/test/test-colocation-strict-together2/cmdlist
create mode 100644 src/test/test-colocation-strict-together2/hardware_status
create mode 100644 src/test/test-colocation-strict-together2/log.expect
create mode 100644 src/test/test-colocation-strict-together2/manager_status
create mode 100644 src/test/test-colocation-strict-together2/rules_config
create mode 100644 src/test/test-colocation-strict-together2/service_config
create mode 100644 src/test/test-colocation-strict-together3/README
create mode 100644 src/test/test-colocation-strict-together3/cmdlist
create mode 100644 src/test/test-colocation-strict-together3/hardware_status
create mode 100644 src/test/test-colocation-strict-together3/log.expect
create mode 100644 src/test/test-colocation-strict-together3/manager_status
create mode 100644 src/test/test-colocation-strict-together3/rules_config
create mode 100644 src/test/test-colocation-strict-together3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/README
create mode 100644 src/test/test-crs-static-rebalance-coloc1/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc1/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc1/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc1/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc1/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc1/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc2/README
create mode 100644 src/test/test-crs-static-rebalance-coloc2/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc2/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc2/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc2/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc2/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc2/static_service_stats
create mode 100644 src/test/test-crs-static-rebalance-coloc3/README
create mode 100644 src/test/test-crs-static-rebalance-coloc3/cmdlist
create mode 100644 src/test/test-crs-static-rebalance-coloc3/datacenter.cfg
create mode 100644 src/test/test-crs-static-rebalance-coloc3/hardware_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/log.expect
create mode 100644 src/test/test-crs-static-rebalance-coloc3/manager_status
create mode 100644 src/test/test-crs-static-rebalance-coloc3/rules_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/service_config
create mode 100644 src/test/test-crs-static-rebalance-coloc3/static_service_stats
create mode 100755 src/test/test_rules_config.pl
Summary over all repositories:
139 files changed, 3115 insertions(+), 20 deletions(-)
--
Generated by git-murpp 0.8.0
More information about the pve-devel
mailing list