[pdm-devel] [PATCH proxmox-datacenter-manager v6 0/6] remote task cache fetching task / better cache backend
Lukas Wagner
l.wagner at proxmox.com
Thu Aug 14 09:56:16 CEST 2025
The aim of this patch series is to greatly improve the performance of the
remote task cache for big PDM setups.
The inital cache implementation had the following problems:
1.) cache was populated as part of the `get_tasks` API, leading to hanging
API calls while fetching task data from remotes
2.) all tasks were stored in a single file, which was completely rewritten
for any change to the cache's contents
3.) The caching mechanism was pretty simple, using only a max-age mechanism,
re-requesting all task data if max-age was exceeded
Now, these characteristics are not really problematic for *small* PDM setups
with only a couple of remotes. However, for big setups (e.g. 100 remotes, each
remote being a PVE cluster with 10 nodes), this completely falls apart:
1.) fetching remote tasks takes considerable amount of time, especially
on connections with a high latency. Since the data is requested
from *within* the `get_tasks` function, which is called by the
`remote-tasks/list` API handler, the API call is blocked until
*all* task data is requested.
2.) The single file approach leads to significant writes to the disk
3.) Leads to unnecessary network IO, as we re-request data that we
already have locally.
To rectify the situation, this series performs the following changes:
- `get_tasks` never does any fetching, it only reads the most recent
data from the cache
- There is a new background task which periodically fetches tasks
from all remotes (every 10mins at the moment). Only the latest
missing tasks are requested, not the full task history as before
- The new background task also takes over the 'tracked task' polling
duty, where we fetch the status for any task started by PDM on
a remote (short polling interval, 10s at the moment).
- The task cache storage implementation has been completely overhauled
and is now optimized for the most common accesses to the cache.
It is also more storage efficient, occupying rougly 50% of the disk
space for the same number of tasks (achieved by avoiding duplicate
information in the files)
- The size of the task cache is 'limited' by doing file rotation.
We keep 7 days of task history.
For details on *how* the cache itself works, please refer to the full
commit message of
remote tasks: implement improved cache for remote tasks
# Benchmarks
Finally, some concrete data to back up the claimed performance improvements. The
times were measured *inside* the `get_tasks` function and not at the API level,
so the times do not include JSON serialization and data transfer.
Benchmarking was done using the 'fake-remote' feature. There were 100 remotes,
10 PVE nodes per remote. The task cache contained about 1.5 million tasks.
before v5 v6 (journal, zstd)
list of active tasks (*): ~1.3s ~300µs ~300µs
list of 500 tasks, offset 0 (**): ~1.3s ~1.45ms ~1.5ms
list of 500 tasks, offset 1 million (***): ~1.3s ~175ms ~200ms
list of 500 tasks, offset 0,
2000 tasks in journal (****): ~4.5ms
Size on disk: ~500MB ~200MB ~40MB
(*): Requested by the UI every 3s
(**): Requested by the UI when visiting Remotes > Tasks
(***): E.g. when scrolling towars the bottom of 'Remotes > Tasks'
(****): e.g. when the journal has not been applied for a while. Reading tasks
from the journal is a bit less efficient than from the task archive, since
we have to fully load it into memory so that we can sort the tasks and
also remove potential duplicates
In the old implementation, the archive file was *always* fully deserialized and
loaded into RAM, this is the reason why the time needed is pretty idential for
all scenarios.
The new implementation reads the archive files only line by line, and only 500
tasks were loaded into RAM at the same time. The higher the offset, the more
archive lines/files we have to scan, which increases the time needed to access
the data. The tasks are sorted descending by starttime, as a result the
requests get slower the further you go back in history.
The 'before' times do NOT include the time needed for actually fetching the
task data.
This series was preseded by [1], however almost all of the code has changes,
which is the reason why I send this as a new series.
[1] https://lore.proxmox.com/pdm-devel/20250128122520.167796-1-l.wagner@proxmox.com/
Changes since v5:
- Incorporate review feedback from @Dominik:
- Poll tracked tasks individually instead of doing a full task refresh with the
oldest running task as cutoff. This should be much more efficient
for long-running tasks.
- Change state-file representation
- Improved some doc comments
- Use timestamps instead of cycle counter for the fetching task
- make total connection semaphore allocation more efficient
- Use dedicated types for (read/write)-locked task cache, encoding
the locking requirements in Rust's type system. Neat!
- Keep track of cut-off typestamps per node, not per remote.
- This makes sure that we don't refetch tasks that we already have
if one node in a cluster is offline for a longer period of time
- Instead of writing new task directly into the archive files, append
them to a journal/write-ahead-log file, which is then applied in regular intervals.
This should reduce disk writes, since every single time an archive file is
changed, it has to be completely rewritten (tasks might arrive out-of-order and
the contents of the archive are sorted by the task's starttime). The journal allows
us write more tasks at once.
- Compress older archive files using zstd - this greatly reduces disk usage
of task data
Changes since v4:
- Rebased onto latest master, adapting to Gabriel's section config changes
Changes since v3:
- Include benchmark results in commit message
- Remove unneeded and potentially unsafe `pub` (thx Wolfgang)
Changes since v2:
- Change locking approach as suggested by Wolfgang
- Incorporated feedback from Wolfang
- see patch notes for details
- Added some .context/.with_context for better error messages
Changes since v1:
- Drop already applied patches
- Some code style improvents, see individual patch changelogs
- Move tack fetching task to bin/proxmox-datacenter-api/tasks/remote_task.rs
- Make sure that remote_tasks::get_tasks does not block the async executor
proxmox-datacenter-manager:
Lukas Wagner (6):
remote tasks: implement improved cache for remote tasks
remote tasks: add background task for task polling, use new task cache
pdm-api-types: remote tasks: add new_from_str constructor for
TaskStateType
fake remote: make the fake_remote feature compile again
fake remote: clippy fixes
fixup! fake remote: make the fake_remote feature compile again
Cargo.toml | 2 +-
lib/pdm-api-types/src/lib.rs | 15 +
server/Cargo.toml | 1 +
server/src/api/pve/lxc.rs | 10 +-
server/src/api/pve/mod.rs | 4 +-
server/src/api/pve/qemu.rs | 6 +-
server/src/api/remote_tasks.rs | 11 +-
server/src/bin/proxmox-datacenter-api/main.rs | 1 +
.../bin/proxmox-datacenter-api/tasks/mod.rs | 1 +
.../tasks/remote_tasks.rs | 559 +++++++
server/src/remote_tasks/mod.rs | 632 ++-----
server/src/remote_tasks/task_cache.rs | 1486 +++++++++++++++++
server/src/test_support/fake_remote.rs | 39 +-
13 files changed, 2228 insertions(+), 539 deletions(-)
create mode 100644 server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
create mode 100644 server/src/remote_tasks/task_cache.rs
Summary over all repositories:
13 files changed, 2228 insertions(+), 539 deletions(-)
--
Generated by murpp 0.9.0
More information about the pdm-devel
mailing list