[pdm-devel] [PATCH proxmox-datacenter-manager 0/8] remote task cache fetching task / better cache backend

Lukas Wagner l.wagner at proxmox.com
Fri Mar 14 15:12:17 CET 2025


The aim of this patch series is to greatly improve the performance of the
remote task cache for big PDM setups.

The inital, 'dumb' cache implementation had the following problems:
  1.) cache was populated as part of the `get_tasks` API, leading to hanging
    API calls while fetching task data from remotes
  2.) all tasks were stored in a single file, which was completely rewritten
    for any change to the cache's contents
  3.) The caching mechanism was pretty simple, using only a max-age mechanism,
    re-requesting all task data if max-age was exceeded

Now, these characteristics are not really problematic for *small* PDM setups
with only a couple of remotes. However, for big setups (e.g. 100 remotes,
each remote being a PVE cluster with 10 nodes), this completely falls apart:
  1.) fetching remote tasks takes considerable amount of time, especially
      on connections with a high latency. Since the data is requested
      from *within* the `get_tasks` function, which is called by the
      `remote-tasks/list` API handler, the API call is blocked until
     *all* task data is requested.
  2.) The single file approach leads to significant writes to the disk
  3.) Leads to unnecessary network IO, as we re-request data that we
      already have locally.

To rectify the situation, this series performs the following changes:

  - `get_tasks` never does any fetching, it only reads the most recent
    data from the cache
  - There is a new background task which periodically fetches tasks
    from all remotes (every 10mins at the moment). Only the latest
    missing tasks are requested, not the full task history as before
  - The new background task also takes over the 'tracked task' polling
    duty, where we fetch the status for any task started by PDM on
    a remote (short polling interval, 10s at the moment).
  - The task cache storage implementation has been completely overhauled
    and is now optimized for the most common accesses to the cache.
    It is also more storage efficient, occupying rougly 50% of the disk
    space for the same number of tasks (achieved by avoiding duplicate
    information in the files)
  - The size of the task cache is 'limited' by doing file rotation.
    We keep 7 days of task history.

For details on *how* the cache itself works, please refer to the full
commit message of
    remote tasks: implement improved cache for remote tasks

# Benchmarks

Finally, some concrete data to back up the claimed performance improvments.
The times were measured *inside* the `get_tasks` function and not at
the API level, so the times do not include JSON serialization and
data transfer.

Benchmarking was done using the 'fake-remote' feature. There were 100
remotes, 10 PVE nodes per remote. The task cache contained
about 1.5 million tasks.
                                               before        after
list of active tasks (*):                     ~1.3s          ~30µs
list of 500 tasks, offset 0 (**):             ~1.3s         ~500µs
list of 500 tasks, offset 1 million (***):    ~1.3s         ~200ms
Size on disk:                                 ~500MB        ~200MB

(*):  Requested by the UI every 3s
(**): Requested by the UI when visiting Remotes > Tasks
(***): E.g. when scrolling towars the bottom of 'Remotes > Tasks'

In the old implementation, the archive file was *always* fully deserialized
and loaded into RAM, this is the reason why the time needed is pretty
idential for all scenarios.
The new implementation reads the archive files only line by line,
and only 500 tasks were loaded into RAM at the same time. The higher the offset,
the more archive lines/files we have to scan, which increases the
time needed to access the data. The tasks are sorted descending
by starttime, as a result the requests get slower the further you
go back in history.

The 'before' times do NOT include the time needed for actually fetching
the task data.

This series was preseded by [1], however almost all of the code has changes, which
is the reason why I send this as a new series.

Note: 
I asked Max for feedback on this while it was still only available on my staff
repo. He kindly pointed out some smaller issues which are already fixed in this
first version on the list. He was okay with me adding his 'R-b' tags
right away.

[1] https://lore.proxmox.com/pdm-devel/20250128122520.167796-1-l.wagner@proxmox.com/

proxmox-datacenter-manager:

Lukas Wagner (8):
  test support: add NamedTempFile helper
  test support: add NamedTempDir helper
  move task_cache.rs to remote_tasks/mod.rs
  remote tasks: implement improved cache for remote tasks
  remote tasks: add background task for task polling, use new task cache
  pdm-api-types: remote tasks: implement From<&str> for TaskStateType
  fake remote: add missing fields to make the debug feature compile
    again
  fake remote: generate fake task data

 lib/pdm-api-types/src/lib.rs             |  15 +
 server/src/api/pve/lxc.rs                |  10 +-
 server/src/api/pve/mod.rs                |   6 +-
 server/src/api/pve/qemu.rs               |   6 +-
 server/src/api/remote_tasks.rs           |  13 +-
 server/src/bin/proxmox-datacenter-api.rs |   3 +-
 server/src/lib.rs                        |   4 +-
 server/src/remote_tasks/mod.rs           | 473 +++++++++++
 server/src/remote_tasks/task_cache.rs    | 964 +++++++++++++++++++++++
 server/src/task_cache.rs                 | 524 ------------
 server/src/test_support/fake_remote.rs   |  87 +-
 server/src/test_support/mod.rs           |   4 +
 server/src/test_support/temp.rs          |  60 ++
 13 files changed, 1619 insertions(+), 550 deletions(-)
 create mode 100644 server/src/remote_tasks/mod.rs
 create mode 100644 server/src/remote_tasks/task_cache.rs
 delete mode 100644 server/src/task_cache.rs
 create mode 100644 server/src/test_support/temp.rs


Summary over all repositories:
  13 files changed, 1619 insertions(+), 550 deletions(-)

-- 
Generated by git-murpp 0.8.0




More information about the pdm-devel mailing list