[pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules
Jillian Morgan
jillian.morgan at primordial.ca
Fri Jun 20 19:11:11 CEST 2025
Daniel,
Firstly I want to say thank you very, very, very much! This extensive work
obviously took a lot of time and effort. I feel like one of my Top-5 gripes
with Proxmox (after moving from oVirt) will finally be resolved by this new
feature.
Next, however, I would like to add my two cents to the discussion over the
nomenclature being chosen, since it seems to be not-quite set in stone yet.
Here are my thoughts:
1) Having "location" and "colocation" rules is, I think, going to be
unnecessarily confusing for people. While it isn't too complicated to glean
the distinction once having read the descriptions of them (and I had to go
read the descriptions), they don't convey immediately how they
differentiate themselves from each other. I think the concepts are better
described by something like "host-service affinity" (for positive or
negative affinity between service(s) and specific host(s)/Resource Pools),
and "service-service affinity" (for positive or negative affinity between
multiple services (where any relationship to specific hosts are
inconsequential or specifically undesirable).
2) Your own discussion seems to refer to "affinity" quite regularly, so
calling the rules by some other names in the documentation/CLI/UI seems to
be a choice made to try to 'simplify' the concept for an audience that
doesn't need the concept to be simplified, and in fact probably just
confuses things.
3) Despite your feeling otherwise, I believe that naming them "affinity"
rules will be very well understood by anyone who has worked with other
cluster systems, basic system administration (CPU pinning, for example), or
any other sort of computer science or engineering background. I think the
number of people coming into the world of Proxmox with zero prior
experience in the field is probably very low, and they would be well-served
to learn the word "affinity", since that is what's most commonly used in
the industry.
4) Similarly, your own discussion refers to "positive" and "negative"
affinity, yet a choice was made to identify these in the rule configuration
as "together" and "separate", which while relatively clear, feels entirely
contrived (as well as upsetting the language part of my brain by being
adverbs when adjectives are warranted) since affinity (in computing /
resource scheduling contexts at least) is very commonly described as
positive and negative.
Happy to discuss further, or be pointed to prior debates over this that
I've likely missed.
And, of course, I'd happily suffer the cringe-worth nomenclature to have
the feature sooner than later! Just saying: Thumbs Up!
--
Jillian Morgan (she/her)
Systems & Networking Specialist
Primordial Software Group & I.T. Consultancy
https://www.primordial.ca
On Fri, Jun 20, 2025 at 11:43 AM Daniel Kral <d.kral at proxmox.com> wrote:
> On 6/20/25 16:31, Daniel Kral wrote:
> > Changelog
> > ---------
>
> Just noticed that I missed one detail that might be beneficial to know,
> so following the patch changes is easier:
>
> - migrate ha groups internally in the HA Manager to ha location rules,
> so that internally these can already be replaced; the test cases in
> ha-manager patch #09 (stuck in moderator review because it became
> quite large) are there to ensure that the migration produces the same
> result for the migrated location rules
>
> On 6/20/25 16:31, Daniel Kral wrote:
> > TODO
> > ----
> >
> > There are some things left to be done or discussed for a proper patch
> > series:
>
> Also other small things to point out:
>
> - Add missing comment field in rule edit dialog
>
> - Since ha location rules were designed so that these will never be
> dropped by the rule checks (this is because there was no notion of
> dropping ha groups), location rules are the only rules that can
> introduce conflicts, e.g. introducing another priority group in the
> location rule or restricting colocated services too much.
>
> What should we do here? allow dropping location rules when these are
> not automatically migrated from groups? Or show a conformation dialog
> when creating these, so that users are warned? Both of them would
> introduce some more complexity/state in how rules are checked. For
> now, these conflicts are created silently.
>
> - Reload both ha location and ha colocation rules if one of them gets
> changed (e.g. when a location rule is added that creates a conflict in
> ha colocation rules, then it will only show the conflict on the next
> reload).
>
> >
> > - Implement check which does not allow negative colocation rules with
> > more services than nodes, because these cannot be applied. Or should
> > we just fail the remaining services which cannot be separated to any
> > node since these do not have anywhere to go?
> >
> > - How can the migration process from HA groups to HA location rules be
> > improved? Add a 'Migrate' button to the HA Groups page and then
> > auto-toggle the use-location-rules feature flag? Should the
> > use-location-rules feature flag even be user-toggleable?
>
> Another point here for the migration of HA groups to HA location rules
> is how we would name these new location rules? In the auto-migration the
> code currently prefixes the group name with `_group_` so that they
> cannot conflict with config keys as these cannot start with a
> underscore. If we introduce a manual "Migrate" button, then we'd need to
> handle name collisions with either already existing HA location rules
> (especially if we allow switching back and forward) and existing HA
> colocation rules.
>
> >
> > - Add web interface and/or CLI facing messages about the HA service
> > migration blockers and side-effects. The rough idea would be like the
> > following (feedback highly appreciated!):
> >
> > - For the web interface, I'd make these visible through the already
> > existing precondition checks (which need to also be added for
> > containers, as there is no existing API endpoint there).
> > Side-effects would be 'warning' items, which just state that some
> > positively colocated service is migrated with them (the 'Migrate'
> > button then is the confirmation for that). Blockers would be
> > 'error' items, which state that a negatively colocated service is
> > on the requested target node and therefore the migration is
> > blocked because of that.
> >
> > - For bulk migrations in the web interface, these are still visible
> > through the console that is popped up afterwards, which should
> > print the messages from the migrate/relocate crm-command API
> > endpoints.
> >
> > - For the CLI, I'd add another 'force' flag or something similar. If
> > there are side-effects and the force flag is not set, then no
> > migration happens at all, but the user gets a list of the
> > migrations that will be done and should confirm by making another
> > call to 'migrate'/'relocate' with the force flag set to confirm
> > these choices.
> >
> > - Add more user documentation (especially about conflicts, migrations,
> > restrictions and failover scenario handling)
> >
> > - Add mixed test cases with HA location and HA colocation rules
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
More information about the pve-devel
mailing list