[pdm-devel] [PATCH proxmox-datacenter-manager v2 00/28] metric collection improvements (concurrency, config, API, CLI)
Lukas Wagner
l.wagner at proxmox.com
Fri Feb 14 14:06:25 CET 2025
Key points:
- fetch metrics concurrently
- configuration for metric collection
- new config /etc/proxmox-datacenter-manager/metric-collection.json
- max-concurrency (number of allowed parallel connections)
- collection-interval
- randomized offset for collection start
(min-interval-offset..max-interval-offset)
- randomized per-connection delay
(max-connection-delay..max-connection-delay)
- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
- overall collection time for all remotes
- per remote response time when fetching metrics
- Persist metric collection state to disk:
/var/lib/proxmox-datacenter-manager/metric-collection-state.json
(timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API
- Add new API endpoints
POST /metric-collection/trigger with optional 'remote' param
GET /metric-collection/status
GET/PUT /config/metric-collection/default
GET /remotes/<remote>/metric-collection-rrddata
GET /metric-collection/rrddata
- Add CLI tooling
proxmox-datacenter-client metric-collection settings show
proxmox-datacenter-client metric-collection settings update
proxmox-datacenter-client metric-collection trigger [--remote <remote>]
proxmox-datacenter-client metric-collection status
## To reviewers / open questions:
- Please review the defaults I've chosen for the settings, especially
the ones for the default metric collection interval (10 minutes) as
well as max-concurrency (10).
I also kindly ask to double-check the naming of the properties.
See "pdm-api-types: add CollectionSettings type" for details
- Please review path and params for new API endpoints (anything public
facing that is hard to change later)
- I've chosen a section-config config now, even though we only have a
single section for now. This was done for future-proofing reasons,
maybe we want to add support for different setting 'groups' or
something, e.g. to have different settings for distinct sets of
remotes. Does this make sense?
Or should I just stick to a simple config for now? (At moments like
these I wish for TOML configs where we could be a bit more flexible...)
collection-settings: default
max-concurrency 10
collection-interval 180
min-interval-offset 0
max-interval-offset 20
min-connection-delay 10
max-connection-delay 100
- Should `GET /remotes/<remote>/metric-collection-rrddata` be
just `rrddata`?
not sure if we are going to add any other PDM-native per-remote
metrics and whether we want to return that from the same API call
as this...
## Potential future work
- UI button for triggering metric collection
- UI for metric collection settings
- Show RRD graphs for metric collection stats somewhere
## Random offset/delay examples
Example with 'max-concurrency' = 3 and 6 remotes.
X ... timer triggered
[ A ] .... fetching remote 'A'
**** .... interval-offset (usually a couple of seconds)
#### .... random worker delay (usually in millisecond range)
/--########[ B ] ### [ C ]--\
/---####[ A ] ###### [ D ]--------\
----X ************* ---/ ---###### [ E ] #########[ F ]--\----
Changes since [v1]:
- add missing dependency to librust-rand-dev to d/control
- Fix a couple of minor spelling/punctuation issues (thx maximiliano)
- Some minor code style improvments, e.g. using unwrap_or_else instead of doing
a manual match
- Document return values of 'setup_timer' function
- Factor out handle_tick/handle_control_message
- Minor refatoring/code style improvments
- CLI: Change 'update-settings' to 'settings update'
- CLI: Change 'show-settings' to 'settings show'
- change missed tick behavior for tokio::time::Interval to 'skip' instead
of burst.
The last three commits are new in v2.
[v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t
proxmox-datacenter-manager:
Lukas Wagner (28):
test support: add NamedTempFile helper
test support: add NamedTempDir helper
pdm-api-types: add CollectionSettings type
pdm-config: add functions for reading/writing metric collection
settings
metric collection: split top_entities split into separate module
metric collection: save metric data to RRD in separate task
metric collection: rework metric poll task
metric collection: persist state after metric collection
metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
metric collection: collect overdue metrics on startup/timer change
metric collection: add tests for the fetch_remotes function
metric collection: add test for fetch_overdue
metric collection: pass rrd cache instance as function parameter
metric collection: add test for rrd task
metric collection: wrap rrd_cache::Cache in a struct
metric collection: record remote response time in metric database
metric collection: save time needed for collection run to RRD
metric collection: periodically clean removed remotes from statefile
api: add endpoint for updating metric collection settings
api: add endpoint to trigger metric collection
api: remotes: trigger immediate metric collection for newly added
nodes
api: add api for querying metric collection RRD data
api: metric-collection: add status endpoint
pdm-client: add metric collection API methods
cli: add commands for metric-collection settings, trigger, status
metric collection: factor out handle_tick and handle_control_message
fns
metric collection: skip missed timer ticks
metric collection: use JoinSet instead of joining from handles in a
Vec
Cargo.toml | 1 +
cli/client/Cargo.toml | 1 +
cli/client/src/main.rs | 2 +
cli/client/src/metric_collection.rs | 170 ++++
debian/control | 1 +
lib/pdm-api-types/src/lib.rs | 3 +
lib/pdm-api-types/src/metric_collection.rs | 203 +++++
lib/pdm-api-types/src/rrddata.rs | 26 +
lib/pdm-client/src/lib.rs | 87 +++
lib/pdm-config/src/lib.rs | 1 +
lib/pdm-config/src/metric_collection.rs | 69 ++
server/Cargo.toml | 1 +
server/src/api/config/metric_collection.rs | 156 ++++
server/src/api/config/mod.rs | 2 +
server/src/api/metric_collection.rs | 99 +++
server/src/api/mod.rs | 2 +
server/src/api/remotes.rs | 59 ++
server/src/api/resources.rs | 3 +-
server/src/api/rrd_common.rs | 11 +-
server/src/bin/proxmox-datacenter-api.rs | 2 +-
server/src/lib.rs | 2 +-
.../src/metric_collection/collection_task.rs | 739 ++++++++++++++++++
server/src/metric_collection/mod.rs | 332 ++------
server/src/metric_collection/rrd_cache.rs | 204 ++---
server/src/metric_collection/rrd_task.rs | 289 +++++++
server/src/metric_collection/state.rs | 150 ++++
server/src/metric_collection/top_entities.rs | 150 ++++
server/src/test_support/mod.rs | 4 +
server/src/test_support/temp.rs | 60 ++
29 files changed, 2467 insertions(+), 362 deletions(-)
create mode 100644 cli/client/src/metric_collection.rs
create mode 100644 lib/pdm-api-types/src/metric_collection.rs
create mode 100644 lib/pdm-config/src/metric_collection.rs
create mode 100644 server/src/api/config/metric_collection.rs
create mode 100644 server/src/api/metric_collection.rs
create mode 100644 server/src/metric_collection/collection_task.rs
create mode 100644 server/src/metric_collection/rrd_task.rs
create mode 100644 server/src/metric_collection/state.rs
create mode 100644 server/src/metric_collection/top_entities.rs
create mode 100644 server/src/test_support/temp.rs
Summary over all repositories:
29 files changed, 2467 insertions(+), 362 deletions(-)
--
Generated by git-murpp 0.8.0
More information about the pdm-devel
mailing list