[pve-devel] [PATCH-SERIES proxmox-resource-scheduling/pve-ha-manager/etc] add static usage scheduler for HA manager

Fiona Ebner f.ebner at proxmox.com
Thu Nov 10 15:37:39 CET 2022


Right now, the online node usage calculation for the HA manager only
considers the number of active services on each node. This patch
series allows switching to a 'static' scheduler mode instead, where
static usage information from the nodes and guest configurations is
used instead.

This also includes the remaining cgroup/cpuunits-related patches,
because the broadcasting of static information was done to include the
cgroup mode of the node.

With this version, the effect is limited to choosing nodes during
recovery, but the plan is to extend this.

As a next step, it would be nice to also have for startup, but AFAICT
the issue is that the node selection only happens after the state is
already set to started and I think select_service_node() doesn't
currently know if a service has been newly started. I haven't looked
into it in too much detail though.

An idea to get a balancer out of it, is to:
1. (optionally) sort all services by badness (needs new backend function)
2. iterate scoring the nodes for each service, adding the usage to the
   chosen node after each iteration. The current node can be kept if the
   score compared to the best node doesn't differ too much.
3. record the chosen nodes and migrate the services accordingly.

Still missing are also unit tests for ha-manager itself.


Almost all of the series is preparatory infrastructure, but the hope
is that much of it can be re-used for balancers and dynamic
scheduling in the future.

The proxmox-resource-scheduling Rust crate implements the TOPSIS
algorithm first suggested by Alexandre. It also models the static node
and service usages in PVE and allows to score nodes where to start
new or recovered service. This is done by simulating starting it on
each node and comparing the alternatives with average and highest CPU
and memory as criteria. Memory being weighted much more as it is a
more limited resource than CPU.

I did not implement the criteria weighing process from AHP (yet) (also
suggested by Alexandre) which computes avaraged weights and a bias
score from a table of pairwise weights between criteria. The downside
is that one needs to guess n(n-1)/2 weights instead of n, and the
upside is that it has to be done only pairwise rather than relative to
all others. But this still can be done in the future if we want.

In proxmox-perl-rs, a class is provided for interfacing from Perl.

In pve-manager, the static node information is broadcast whenever
outdated. There also are the unrelated (but touching the same code)
cgroup/cpuunits patches.

In pve-cluster, a new crs (=cluster-resource-scheduler) option is
added, initially with a mode for HA.

In pve-ha-manager, the online node usage calculation is factored out
into a 'Usage' plugin system to ease adding the new static mode
without much cluttering. If not all nodes provide static service
information, we fall back to the 'basic' mode. If only the scoring
fails (but that /should/ be rather unlikely), there is no real
fallback implemented currently (the '|| $a cmp $b' in
select_service_node() destroys the random hash keys order again ;)).
We could change it to stay random or better, track the service count
in Usage::Static too and use that.


Dependency bumps needed:
proxmox-perl-rs depends on proxmox-resource-scheduling
proxmox-ha-manager (build)depends on proxmox-perl-rs
The new feature is only usable with updated pve-manager and
pve-cluster of course, but no hard dependency.


proxmox-resource-scheduling:

Fiona Ebner (3):
  initial commit
  add pve_static module
  add Debian packaging


proxmox-perl-rs:

Fiona Ebner (2):
  pve-rs: add resource scheduling module
  add basic test for resource scheduling

 Makefile                                 |   1 +
 pve-rs/Cargo.toml                        |   1 +
 pve-rs/src/lib.rs                        |   1 +
 pve-rs/src/resource_scheduling/mod.rs    |   1 +
 pve-rs/src/resource_scheduling/static.rs | 116 +++++++++++++++++++++++
 pve-rs/test/Makefile                     |   4 +
 pve-rs/test/README                       |   2 +
 pve-rs/test/resource_scheduling.pl       |  70 ++++++++++++++
 8 files changed, 196 insertions(+)
 create mode 100644 pve-rs/src/resource_scheduling/mod.rs
 create mode 100644 pve-rs/src/resource_scheduling/static.rs
 create mode 100644 pve-rs/test/Makefile
 create mode 100644 pve-rs/test/README
 create mode 100755 pve-rs/test/resource_scheduling.pl


pve-manager:

Fiona Ebner (3):
  pvestatd: broadcast static node information
  cluster resources: add cgroup-mode to node properties
  ui: lxc/qemu: cpu edit: make cpuunits depend on node's cgroup version

 PVE/API2/Cluster.pm                | 13 +++++++++++++
 PVE/Service/pvestatd.pm            | 25 ++++++++++++++++++++++++
 www/manager6/lxc/CreateWizard.js   |  8 ++++++++
 www/manager6/lxc/ResourceEdit.js   | 31 +++++++++++++++++++++++++-----
 www/manager6/lxc/Resources.js      |  8 +++++++-
 www/manager6/qemu/CreateWizard.js  |  8 ++++++++
 www/manager6/qemu/HardwareView.js  |  8 +++++++-
 www/manager6/qemu/ProcessorEdit.js | 31 +++++++++++++++++++++++-------
 8 files changed, 118 insertions(+), 14 deletions(-)


pve-cluster:

Fiona Ebner (1):
  datacenter config: add cluster resource scheduling (crs) options

 data/PVE/DataCenterConfig.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)


pve-ha-manager:

Fiona Ebner (11):
  env: add get_static_node_stats() method
  resources: add get_static_stats() method
  add Usage base plugin and Usage::Basic plugin
  manager: select service node: add $sid to parameters
  manager: online node usage: switch to Usage::Basic plugin
  usage: add Usage::Static plugin
  env: add get_crs_settings() method
  manager: set resource scheduler mode upon init
  manager: use static resource scheduler when configured
  manager: avoid scoring nodes if maintenance fallback node is valid
  manager: avoid scoring nodes when not trying next and current node is
    valid

 debian/pve-ha-manager.install |   3 +
 src/PVE/HA/Env.pm             |  13 ++++
 src/PVE/HA/Env/PVE2.pm        |  29 +++++++++
 src/PVE/HA/Makefile           |   3 +-
 src/PVE/HA/Manager.pm         |  77 ++++++++++++++---------
 src/PVE/HA/Resources.pm       |   5 ++
 src/PVE/HA/Resources/PVECT.pm |  11 ++++
 src/PVE/HA/Resources/PVEVM.pm |  14 +++++
 src/PVE/HA/Sim/Env.pm         |   9 +++
 src/PVE/HA/Sim/TestEnv.pm     |   6 ++
 src/PVE/HA/Usage.pm           |  50 +++++++++++++++
 src/PVE/HA/Usage/Basic.pm     |  52 ++++++++++++++++
 src/PVE/HA/Usage/Makefile     |   6 ++
 src/PVE/HA/Usage/Static.pm    | 114 ++++++++++++++++++++++++++++++++++
 src/test/test_failover1.pl    |  21 ++++---
 15 files changed, 374 insertions(+), 39 deletions(-)
 create mode 100644 src/PVE/HA/Usage.pm
 create mode 100644 src/PVE/HA/Usage/Basic.pm
 create mode 100644 src/PVE/HA/Usage/Makefile
 create mode 100644 src/PVE/HA/Usage/Static.pm


pve-docs:

Fiona Ebner (1):
  ha: add section about scheduler modes

 ha-manager.adoc | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

-- 
2.30.2






More information about the pve-devel mailing list