[pve-devel] [PATCH-SERIES v2 ha-manager/docs] add static usage scheduler for HA manager
Fiona Ebner
f.ebner at proxmox.com
Thu Nov 17 15:00:01 CET 2022
Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.
With this version, the effect is limited to choosing nodes during
recovery or by migrations triggered by a shutdown plolicy, but the
plan is to extend this in the future.
As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.
An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
chosen node after each iteration. The current node can be kept if the
score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.
The online node usage calculation is factored out into a 'Usage'
plugin system to ease adding the new static mode without much
cluttering. If not all nodes provide static service information, we
fall back to the 'basic' mode. If only the scoring fails, the service
count is used as a fallback.
Dependency bumps needed:
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.
Changes from v1:
* Drop already applied patches.
* Add tests for HA manager which also required properly adding
relevant methods to the simulation environment.
* Implement fallback for scoring in Usage/Static.pm.
* Improve documentation and mention current limitation with many
services.
ha-manager:
Fiona Ebner (15):
env: add get_static_node_stats() method
resources: add get_static_stats() method
add Usage base plugin and Usage::Basic plugin
manager: select service node: add $sid to parameters
manager: online node usage: switch to Usage::Basic plugin
usage: add Usage::Static plugin
env: rename get_ha_settings to get_datacenter_settings
env: datacenter config: include crs (cluster-resource-scheduling)
setting
manager: set resource scheduler mode upon init
manager: use static resource scheduler when configured
manager: avoid scoring nodes if maintenance fallback node is valid
manager: avoid scoring nodes when not trying next and current node is
valid
usage: static: use service count on nodes as a fallback
test: add tests for static resource scheduling
resources: add missing PVE::Cluster use statements
debian/pve-ha-manager.install | 3 +
src/PVE/HA/Env.pm | 10 +-
src/PVE/HA/Env/PVE2.pm | 27 ++-
src/PVE/HA/LRM.pm | 4 +-
src/PVE/HA/Makefile | 3 +-
src/PVE/HA/Manager.pm | 79 +++++---
src/PVE/HA/Resources.pm | 5 +
src/PVE/HA/Resources/PVECT.pm | 13 ++
src/PVE/HA/Resources/PVEVM.pm | 16 ++
src/PVE/HA/Sim/Env.pm | 13 +-
src/PVE/HA/Sim/Hardware.pm | 28 +++
src/PVE/HA/Sim/Resources.pm | 10 +
src/PVE/HA/Usage.pm | 50 +++++
src/PVE/HA/Usage/Basic.pm | 52 ++++++
src/PVE/HA/Usage/Makefile | 6 +
src/PVE/HA/Usage/Static.pm | 120 ++++++++++++
src/test/test-crs-static1/README | 4 +
src/test/test-crs-static1/cmdlist | 4 +
src/test/test-crs-static1/datacenter.cfg | 6 +
src/test/test-crs-static1/hardware_status | 5 +
src/test/test-crs-static1/log.expect | 50 +++++
src/test/test-crs-static1/manager_status | 1 +
src/test/test-crs-static1/service_config | 3 +
.../test-crs-static1/static_service_stats | 3 +
src/test/test-crs-static2/README | 4 +
src/test/test-crs-static2/cmdlist | 20 ++
src/test/test-crs-static2/datacenter.cfg | 6 +
src/test/test-crs-static2/groups | 2 +
src/test/test-crs-static2/hardware_status | 7 +
src/test/test-crs-static2/log.expect | 171 ++++++++++++++++++
src/test/test-crs-static2/manager_status | 1 +
src/test/test-crs-static2/service_config | 3 +
.../test-crs-static2/static_service_stats | 3 +
src/test/test-crs-static3/README | 5 +
src/test/test-crs-static3/cmdlist | 4 +
src/test/test-crs-static3/datacenter.cfg | 9 +
src/test/test-crs-static3/hardware_status | 5 +
src/test/test-crs-static3/log.expect | 131 ++++++++++++++
src/test/test-crs-static3/manager_status | 1 +
src/test/test-crs-static3/service_config | 12 ++
.../test-crs-static3/static_service_stats | 12 ++
src/test/test-crs-static4/README | 6 +
src/test/test-crs-static4/cmdlist | 4 +
src/test/test-crs-static4/datacenter.cfg | 9 +
src/test/test-crs-static4/hardware_status | 5 +
src/test/test-crs-static4/log.expect | 149 +++++++++++++++
src/test/test-crs-static4/manager_status | 1 +
src/test/test-crs-static4/service_config | 12 ++
.../test-crs-static4/static_service_stats | 12 ++
src/test/test-crs-static5/README | 5 +
src/test/test-crs-static5/cmdlist | 4 +
src/test/test-crs-static5/datacenter.cfg | 9 +
src/test/test-crs-static5/hardware_status | 5 +
src/test/test-crs-static5/log.expect | 117 ++++++++++++
src/test/test-crs-static5/manager_status | 1 +
src/test/test-crs-static5/service_config | 10 +
.../test-crs-static5/static_service_stats | 11 ++
src/test/test_failover1.pl | 21 ++-
58 files changed, 1242 insertions(+), 50 deletions(-)
create mode 100644 src/PVE/HA/Usage.pm
create mode 100644 src/PVE/HA/Usage/Basic.pm
create mode 100644 src/PVE/HA/Usage/Makefile
create mode 100644 src/PVE/HA/Usage/Static.pm
create mode 100644 src/test/test-crs-static1/README
create mode 100644 src/test/test-crs-static1/cmdlist
create mode 100644 src/test/test-crs-static1/datacenter.cfg
create mode 100644 src/test/test-crs-static1/hardware_status
create mode 100644 src/test/test-crs-static1/log.expect
create mode 100644 src/test/test-crs-static1/manager_status
create mode 100644 src/test/test-crs-static1/service_config
create mode 100644 src/test/test-crs-static1/static_service_stats
create mode 100644 src/test/test-crs-static2/README
create mode 100644 src/test/test-crs-static2/cmdlist
create mode 100644 src/test/test-crs-static2/datacenter.cfg
create mode 100644 src/test/test-crs-static2/groups
create mode 100644 src/test/test-crs-static2/hardware_status
create mode 100644 src/test/test-crs-static2/log.expect
create mode 100644 src/test/test-crs-static2/manager_status
create mode 100644 src/test/test-crs-static2/service_config
create mode 100644 src/test/test-crs-static2/static_service_stats
create mode 100644 src/test/test-crs-static3/README
create mode 100644 src/test/test-crs-static3/cmdlist
create mode 100644 src/test/test-crs-static3/datacenter.cfg
create mode 100644 src/test/test-crs-static3/hardware_status
create mode 100644 src/test/test-crs-static3/log.expect
create mode 100644 src/test/test-crs-static3/manager_status
create mode 100644 src/test/test-crs-static3/service_config
create mode 100644 src/test/test-crs-static3/static_service_stats
create mode 100644 src/test/test-crs-static4/README
create mode 100644 src/test/test-crs-static4/cmdlist
create mode 100644 src/test/test-crs-static4/datacenter.cfg
create mode 100644 src/test/test-crs-static4/hardware_status
create mode 100644 src/test/test-crs-static4/log.expect
create mode 100644 src/test/test-crs-static4/manager_status
create mode 100644 src/test/test-crs-static4/service_config
create mode 100644 src/test/test-crs-static4/static_service_stats
create mode 100644 src/test/test-crs-static5/README
create mode 100644 src/test/test-crs-static5/cmdlist
create mode 100644 src/test/test-crs-static5/datacenter.cfg
create mode 100644 src/test/test-crs-static5/hardware_status
create mode 100644 src/test/test-crs-static5/log.expect
create mode 100644 src/test/test-crs-static5/manager_status
create mode 100644 src/test/test-crs-static5/service_config
create mode 100644 src/test/test-crs-static5/static_service_stats
docs:
Fiona Ebner (2):
ha: add section about scheduler modes
ha: add warning against using 'static' mode with many services
ha-manager.adoc | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
--
2.30.2
More information about the pve-devel
mailing list