[pve-devel] [PATCH docs/ha-manager/manager v4 00/25] HA Rules
Michael Köppl
m.koeppl at proxmox.com
Wed Jul 30 19:29:09 CEST 2025
Gave this version another spin today, focusing on the migration from
groups to rules. I tested this 3-node and 5-node clusters. Went through
the following scenarios:
1) At least one of the nodes in the cluster not at minimum version
required for migration to rules
2) At least one node offline during the attempt to migrate to rules
In both of the above cases, only the in-memory mapping of groups to
rules will happen. Groups continue to work on the PVE 8 nodes and rules
continue work on the PVE 9 nodes. It should be noted that the nofailback
flag is not inverted for the resources while the rules are still
in-memory. This "switch" from nofailback to failback only occurs once
the migration is persisted.
3) Updating the remaining PVE 8 nodes one after another
Persistent migration started soon after all nodes were upgraded to PVE 9
(there is a slight delay since the check if groups need to migrated does
not happen every round). Worked smoothly and I did not notice any
discrepancies in the rules.cfg generated from the groups.cfg.
4) Migration with non-existent groups in resource.cfg
5) Invalid properties in resources.cfg or groups.cfg
6) Partially upgrading the cluster, editing a rule on a PVE 9 node
This will not persist. It is not unexpected, since the rules exist only
in-memory at this point, but users should probably be warned about
making any changes to rules mid-upgrade.
Dano already incorporated feedback from Hannes' and my tests and we also
tested updated versions that fix the problems that we noticed, just
documenting it here for the sake of completeness. The migration from
groups to rules overall worked very well in the cases where migration
was already possible and did not proceed (and provided informative
errors or warnings) if it was not.
On 7/29/25 20:03, Daniel Kral wrote:
> Here's a quick update on the core HA rules series. This cleans up the
> series so that all tests are running again and includes the missing ui
> patch that I didn't see missing last time.
>
> The persistent migration path has been tested for at least four full
> upgrade runs now, always with one node being behind and checking that
> the group config is only removed as soon as all nodes are on the right
> version.
>
> I'll wait for tomorrow if something comes up and will do some testing
> myself, so I'm anticipating to follow up on this tomorrow. I'll also
> want to get a more mature version of the HA resource affinity series
> ready for tomorrow on the mailing list.
>
> For maintainers: ha-manager patch #19 should be updated to the correct
> pve-manager version that is dependent on the pve-ha-manager package
> which can interpret the HA rules config.
>
> Changelog since v3
> ------------------
>
> - rebased on newest available master
>
> - included missing ui patch for web interface
>
> - correction in failback property description (does not influence the ha
> node affinity rules)
>
> - migrated the groups configs in the test cases to node affinity rules
> in rules configs (except two test cases for the persistent migration)
>
> - improved persistent ha group migration process
>
> - try a persistent upgrade only every 10 HA manager rounds
>
> - various other minor touches
>
> TODO
> ----
>
> - More testing on edge cases for the HA Manager migration path
>
> - Some more testing of the ha-manager CLI and adding a deprecation
> warning on the HA Groups API and disallowing requests as soon as the
> groups config is fully migrated
More information about the pve-devel
mailing list