[pve-devel] [PATCH many v4 00/31] Expand and migrate RRD data and add/change summary graphs

Aaron Lauterer a.lauterer at proxmox.com
Sat Jul 26 03:05:55 CEST 2025


This patch series does a few things. It expands the RRD format for nodes and
 VMs. For all types (nodes, VMs, storage) we adjust the aggregation to align
 them with the way they are done on the Backup Server. Therefore, we have new
 RRD defitions for all 3 types.

New values are added for nodes and VMs. In particular:

Nodes:
* memfree
* arcsize
* pressures:
  * cpu some
  * io some
  * io full
  * mem some
  * mem full

VMs:
* memhost (memory consumption of all processes in the guests cgroup, host view)
* pressures:
  * cpu some
  * cpu full
  * io some
  * io full
  * mem some
  * mem full

The change in RRD columns and aggregation means, that we need new RRD files. To
not lose old RRD data, we need to migrate the old RRD files to the ones with
the new schema. Some initial performance tests showed that migrating 10k VM
RRD files took ~2m40s single threaded. This is way to long to do it within the
pmxcfs itself. Therefore this will be a dedicated step:
The new `proxmox-rrd-migration-tool` migrates the RRD files to the new location
and aggregation schemas. It is run automatically by the postinst script of the
pve-manager.

This also means, that we need to handle the situation of new and old RRD
files and formats. Therefore we introduce new keys by which the metrics
are broadcast in a cluster. Up until now (pre PVE9), it is in the format of
'pve2-{type}/{resource id}'.
Having the version number this early in the string makes it tough to match
against newer ones, especially in the C code of the pmxcfs. To make it easier
in the future, we change the key format to 'pve-{type}-{version}/{resource id}'.
This way, we can fuzzy match against unknown 'pve-{type}-{version}' in the C
code too and handle those situations better.

The result is, that to avoid breaking changes, we are only allowed to add new
columns, but not modify or remove existing columns!


To avoid missing data and key errors in the journal, we already bumped 
changes to PVE 8 so it can handle the new format sent out by pvestatd in the
latest versions.

On the GUI side, we switch memory graphs to stacked area graphs and for VMs
we also have a dedicated line for the memory consumption as the host sees it.
Because the current memory view of a VM will switch to the internal guest view,
if we get detailed infos via the ballooning device.
To make those slightly more complicated graphs possible, we need to adapt
RRDChart.js in the widget-toolkit to allow for detailed overrides.

While we are at it, we can also fix bug #6068 (Node Search tab incorrect Host
memory usage %) by switching to memhost if available and one wrong if check.


As a side note, now that we got pressure graphs, we could start thinking about
dropping the server load and IO wait graphs. Those are not very specific and
mash many different metrics into a single one.


Release notes:
We should probably mention in the release notes, that due to the changed
aggregation settings, it is expected that the resulting RRD files might have
some data points that the originals didn't have. We observed that in some
situation we get could get a data point in one time step earlier than before.
This is most likely due to how RRD recalculates the aggregated data with the
different resolution.

In the pve8to9 checks, we now have a check that makes sure we do have enough
free space, as the new RRD files with the new columns and more detailed
aggeration steps, are quite a bit larger. We also check after install, if any
RRD files have not yet been migrated, which would warrant another manual run of
the migration tool.

Plans:
* add doc patches for the summary pages that explain the different graphs and
make the help button point to those sections

KNOWN ISSUES:
* on a live system, renaming the source RRD files to FILE.old doesn't seem to
work as expected and besides the renamed ones, new ones without the .old prefix
show up again. I suspect some interaction with rrdached and/or pmxcfs receiving
new data.

How to test:
1. have PVE8 nodes on the latest version (>= 8.4.4)
2. Upgrade the first node to PVE9/trixie and install all the other patches
    to see the automatic upgrade, pve-manager might need to be temporarily
    bumped to 9.0.0~12!
    build all the other repositories, copy the .deb files over and then ideally
    use something like the following to make shure that any dependency will be
    used from the deb files, and not the apt repositories.
    ```
    apt install ./*.deb --reinstall --allow-downgrades -y
    ```
3. you should see, if the pve-manager package calling the
proxmox-rrd-migration-tool


High level changes since:
v3:
* added check for pve8to9 for both situations, pre and post migration
* rebase and only send not yet applied patches
* incorporate suggested changes and improvement
* improve proxmox-rrd-migraton-tool
  * code style and refactoring of repetitive parts
  * rename processed files to FILE.old
  * tests
  * initial packaging
* drop info button tooltip patch. The concept was interesting, but would
introduce a new way to interact in just one place and doesn't work well on touch
devices.

v2:
* several bugfixes that I found, especially regarding pressure and memory
  collection for CTs and VMs
* add missing return property descriptions for pressures
* added all the GUI changes

v1:
* refactored the patches as they were a bit of a mess in v1, sorry for that
  now we have distinct patches for pve8 for both affected repos (cluster & manager)

RFC:
* drop membuffer and memcached in favor of already present memused and memavailable
* switch from pve9-{type} to pve-{type}-9.0 schema in all places
* add patch for PVE8 & 9 that handles different keys in live status to avoid
  question marks in the UI

proxmox-rrd-migration-tool:

Aaron Lauterer (3):
  create proxmox-rrd-migration-tool
  add first tests
  add debian packaging


cluster:

Aaron Lauterer (2):
  status: introduce new pve-{type}- rrd and metric format
  rrd: adapt to new RRD format with different aggregation windows

 src/PVE/RRD.pm      |  52 +++++++--
 src/pmxcfs/status.c | 261 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 278 insertions(+), 35 deletions(-)


widget-toolkit:

Aaron Lauterer (4):
  rrdchart: allow to override the series object
  rrdchart: use reference for undo button
  rrdchard: set cursor pointer for legend
  rrdchart: add dummy listener for legend clicks

 src/panel/RRDChart.js | 61 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 13 deletions(-)


manager:

Aaron Lauterer (14):
  pvestatd: collect and distribute new pve-{type}-9.0 metrics
  api: nodes: rrd and rrddata add decade option and use new pve-node-9.0
    rrd files
  api2tools: extract_vm_status add new vm memhost column
  ui: rrdmodels: add new columns and update existing
  ui: node summary: use stacked memory graph with zfs arc
  ui: GuestStatusView: add memhost for VM guests
  ui: GuestSummary: memory switch to stacked and add hostmem
  ui: GuestSummary: remember visibility of host memory view
  ui: nodesummary: guestsummary: add tooltip info buttons
  ui: summaries: use titles for disk and network series
  fix #6068: ui: utils: calculate and render host memory usage correctly
  d/control: require proxmox-rrd-migration-tool >= 1.0.0
  d/postinst: run promox-rrd-migration-tool
  pve8to9: add checkfs for RRD migration

Folke Gleumes (1):
  ui: add pressure graphs to node and guest summary

 PVE/API2/Cluster.pm                   |   7 +
 PVE/API2/Nodes.pm                     |  16 +-
 PVE/API2Tools.pm                      |   3 +
 PVE/CLI/pve8to9.pm                    |  62 +++++
 PVE/Service/pvestatd.pm               | 342 +++++++++++++++++++-------
 debian/control                        |   1 +
 debian/postinst                       |   5 +
 www/manager6/Utils.js                 |   6 +
 www/manager6/data/ResourceStore.js    |   8 +
 www/manager6/data/model/RRDModels.js  |  44 +++-
 www/manager6/node/Summary.js          |  79 +++++-
 www/manager6/panel/GuestStatusView.js |  18 +-
 www/manager6/panel/GuestSummary.js    | 103 +++++++-
 13 files changed, 587 insertions(+), 107 deletions(-)


storage:

Aaron Lauterer (1):
  status: rrddata: use new pve-storage-9.0 rrd location if file is
    present

 src/PVE/API2/Storage/Status.pm | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)


qemu-server:

Aaron Lauterer (3):
  vmstatus: add memhost for host view of vm mem consumption
  vmstatus: switch mem stat to PSS of VM cgroup
  rrddata: use new pve-vm-9.0 rrd location if file is present

Folke Gleumes (1):
  metrics: add pressure to metrics

 src/PVE/API2/Qemu.pm  | 11 ++++----
 src/PVE/QemuServer.pm | 65 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 69 insertions(+), 7 deletions(-)


container:

Aaron Lauterer (1):
  rrddata: use new pve-vm-9.0 rrd location if file is present

Folke Gleumes (1):
  metrics: add pressures to metrics

 src/PVE/API2/LXC.pm | 11 ++++++-----
 src/PVE/LXC.pm      | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 5 deletions(-)


Summary over all repositories:
  21 files changed, 1026 insertions(+), 172 deletions(-)

-- 
Generated by git-murpp 0.8.1




More information about the pve-devel mailing list