[pdm-devel] [PATCH proxmox-datacenter-manager v3 00/26] metric collection improvements (concurrency, config, API, CLI)

Lukas Wagner l.wagner at proxmox.com
Wed Apr 16 14:56:16 CEST 2025


Key points:
- fetch metrics concurrently
- configuration for metric collection
  - new config /etc/proxmox-datacenter-manager/metric-collection.cfg,
    allowing configuration of the collection interval

- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
  - overall collection time for all remotes
  - per remote response time when fetching metrics
- Persist metric collection state to disk:
  /var/lib/proxmox-datacenter-manager/metric-collection-state.json
  (timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API

- Add new API endpoints
	POST     /metric-collection/trigger with optional 'remote' param
	GET      /metric-collection/status
	GET/PUT  /config/metric-collection/default
	GET      /remotes/<remote>/metric-collection-rrddata
	GET      /metric-collection/rrddata

- Add CLI tooling
	proxmox-datacenter-client metric-collection settings show
	proxmox-datacenter-client metric-collection settings update
	proxmox-datacenter-client metric-collection trigger [--remote <remote>]
	proxmox-datacenter-client metric-collection status


## To reviewers / open questions:
- Please review the defaults I've chosen for the settings, e.g.
  the ones for the default metric collection interval (10 minutes)
  I also kindly ask to double-check the naming of the properties.
  See "pdm-api-types: add CollectionSettings type" for details

- Please review path and params for new API endpoints (anything public
  facing that is hard to change later)

- I've chosen a section-config config now, even though we only have a
  single section for now. This was done for future-proofing reasons,
  maybe we want to add support for different setting 'groups' or
  something, e.g. to have different settings for distinct sets of
  remotes. Does this make sense?
  Or should I just stick to a simple config for now? (At moments like
  these I wish for TOML configs where we could be a bit more flexible...)

	collection-settings: default
	    collection-interval 180
	    # These have been removed in v3, but might be readded
	    # in some other form in the future:
	    # max-concurrent-connections 10
	    # min-interval-offset 0
	    # max-interval-offset 20
	    # min-connection-delay 10
	    # max-connection-delay 100


- Should `GET /remotes/<remote>/metric-collection-rrddata` be 
  just `rrddata`?
  not sure if we are going to add any other PDM-native per-remote
  metrics and whether we want to return that from the same API call
  as this...

## Potential future work
- UI button for triggering metric collection
- UI for metric collection settings
- Show RRD graphs for metric collection stats somewhere
- Have some global concurrency control knob for background
  requests [request scheduling].


Changes since [v2]:
  - For now, drop settings that might change any way with a
    global background request scheduling system [request scheduling]:
       - max-concurrency
       - {min,max}-interval-offset
       - {min,max}-connection-delay

Changes since [v1]:
  - add missing dependency to librust-rand-dev to d/control
  - Fix a couple of minor spelling/punctuation issues (thx maximiliano)
  - Some minor code style improvments, e.g. using unwrap_or_else instead
    of doing a manual match
  - Document return values of 'setup_timer' function
  - Factor out handle_tick/handle_control_message
  - Minor refatoring/code style improvments
  - CLI: Change 'update-settings' to 'settings update'
  - CLI: Change 'show-settings' to 'settings show'
  - change missed tick behavior for tokio::time::Interval to 'skip'
    instead of burst.

The last three commits are new in v2.

[v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t
[v2]: https://lore.proxmox.com/pdm-devel/20250214130653.283012-1-l.wagner@proxmox.com/
[request scheduling]: https://lore.proxmox.com/pdm-devel/7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com/

proxmox-datacenter-manager:

Lukas Wagner (26):
  pdm-api-types: add CollectionSettings type
  pdm-config: add functions for reading/writing metric collection
    settings
  metric collection: split top_entities split into separate module
  metric collection: save metric data to RRD in separate task
  metric collection: rework metric poll task
  metric collection: persist state after metric collection
  metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
  metric collection: collect overdue metrics on startup/timer change
  metric collection: add tests for the fetch_remotes function
  metric collection: add test for fetch_overdue
  metric collection: pass rrd cache instance as function parameter
  metric collection: add test for rrd task
  metric collection: wrap rrd_cache::Cache in a struct
  metric collection: record remote response time in metric database
  metric collection: save time needed for collection run to RRD
  metric collection: periodically clean removed remotes from statefile
  api: add endpoint for updating metric collection settings
  api: add endpoint to trigger metric collection
  api: remotes: trigger immediate metric collection for newly added
    nodes
  api: add api for querying metric collection RRD data
  api: metric-collection: add status endpoint
  pdm-client: add metric collection API methods
  cli: add commands for metric-collection settings, trigger, status
  metric collection: factor out handle_tick and handle_control_message
    fns
  metric collection: skip missed timer ticks
  metric collection: use JoinSet instead of joining from handles in a
    Vec

 cli/client/Cargo.toml                         |   1 +
 cli/client/src/main.rs                        |   2 +
 cli/client/src/metric_collection.rs           | 145 ++++
 lib/pdm-api-types/src/lib.rs                  |   3 +
 lib/pdm-api-types/src/metric_collection.rs    |  83 +++
 lib/pdm-api-types/src/rrddata.rs              |  26 +
 lib/pdm-client/src/lib.rs                     |  87 +++
 lib/pdm-config/src/lib.rs                     |   1 +
 lib/pdm-config/src/metric_collection.rs       |  69 ++
 server/src/api/config/metric_collection.rs    |  97 +++
 server/src/api/config/mod.rs                  |   2 +
 server/src/api/metric_collection.rs           |  99 +++
 server/src/api/mod.rs                         |   2 +
 server/src/api/remotes.rs                     |  59 ++
 server/src/api/resources.rs                   |   3 +-
 server/src/api/rrd_common.rs                  |  11 +-
 server/src/bin/proxmox-datacenter-api/main.rs |   2 +-
 .../src/metric_collection/collection_task.rs  | 702 ++++++++++++++++++
 server/src/metric_collection/mod.rs           | 346 ++-------
 server/src/metric_collection/rrd_cache.rs     | 204 ++---
 server/src/metric_collection/rrd_task.rs      | 289 +++++++
 server/src/metric_collection/state.rs         | 150 ++++
 server/src/metric_collection/top_entities.rs  | 150 ++++
 23 files changed, 2165 insertions(+), 368 deletions(-)
 create mode 100644 cli/client/src/metric_collection.rs
 create mode 100644 lib/pdm-api-types/src/metric_collection.rs
 create mode 100644 lib/pdm-config/src/metric_collection.rs
 create mode 100644 server/src/api/config/metric_collection.rs
 create mode 100644 server/src/api/metric_collection.rs
 create mode 100644 server/src/metric_collection/collection_task.rs
 create mode 100644 server/src/metric_collection/rrd_task.rs
 create mode 100644 server/src/metric_collection/state.rs
 create mode 100644 server/src/metric_collection/top_entities.rs


Summary over all repositories:
  23 files changed, 2165 insertions(+), 368 deletions(-)

-- 
Generated by git-murpp 0.8.1




More information about the pdm-devel mailing list