[pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead

Samuel Rufinatscha s.rufinatscha at proxmox.com
Fri Jan 2 17:07:39 CET 2026


Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Problem

For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:

1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
   proxmox_sys::crypt::verify_crypt_pw for the provided token secret

Under load, this results in significant CPU usage spent in repeated
password hashing for the same token+secret pairs. The attached
flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:

1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow file API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window

Testing

*PBS (pbs-config)*

To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #7017 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup. Confirmed that
   proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
   hot section of the flamegraph. CPU usage is now dominated by TLS
   overhead.
3. Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

*PDM (proxmox-access-control)*

To verify the effect in PDM, I followed a similar testing approach.
Instead of PBS’ /status, I profiled the /version endpoint with cargo
flamegraph [2] and verified that the expensive hashing path disappears
from the hot section after introducing caching.

Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

Benchmarks:

Two different benchmarks have been run to measure caching effects
and RwLock contention:

(1) Requests per second for PBS /status endpoint (E2E)

Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [4]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).

(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests

The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.

Patch summary

pbs-config:

0001 – pbs-config: add token.shadow generation to ConfigVersionCache
Extends ConfigVersionCache to provide a process-shared generation
number for token.shadow changes.

0002 – pbs-config: cache verified API token secrets
Adds an in-memory cache to cache verified, plain-text API token secrets.
Cache is invalidated through the process-shared ConfigVersionCache
generation number. Uses openssl’s memcmp constant-time for matching
secrets.

0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
Stats token.shadow mtime and length and clears the cache when the
file changes, on each token verification request.

0004 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
checks so that fs::metadata calls are not performed on each request.

proxmox-access-control:

0005 – access-control: extend AccessControlConfig for token.shadow invalidation

Extends the AccessControlConfig trait with
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() for
proxmox-access-control to get the shared token.shadow generation number
and bump it on token shadow changes.

0006 – access-control: cache verified API token secrets
Mirrors PBS PATCH 0002.

0007 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS PATCH 0003.

0008 – access-control: add TTL window to token-secret cache
Mirrors PBS PATCH 0004.

proxmox-datacenter-manager:

0009 – pdm-config: add token.shadow generation to ConfigVersionCache
Extends PDM ConfigVersionCache and implements
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() from AccessControlConfig for
PDM.

0010 – docs: document API token-cache TTL effects
Documents the effects of the TTL window on token.shadow edits

Changes from v1 to v2:

* (refactor) Switched cache initialization to LazyLock
* (perf) Use parking_lot::RwLock and best-effort cache access on the
  read/refresh path (try_read/try_write) to avoid lock contention
* (doc) Document TTL-delayed effect of manual token.shadow edits
* (fix) Add generation guards (API_MUTATION_GENERATION +
  FILE_GENERATION) to prevent caching across concurrent set/delete and
  external edits

Changes from v2 to v3:

* (refactor) Replace PBS per-process cache invalidation with a
  cross-process token.shadow generation based on PBS
  ConfigVersionCache, ensuring cache consistency between privileged
  and unprivileged daemons.
* (refactor) Decoupling generation source from the
  proxmox/proxmox-access-control cache implementation: extend
  AccessControlConfig hooks so that products can provide the shared
  token.shadow generation source.
* (refactor) Extend PDM's ConfigVersionCache with
  token_shadow_generation
  and introduce a pdm_config::AccessControlConfig wrapper implementing
  the new proxmox-access-control trait hooks. Switch server and CLI
  initialization to use pdm_config::AccessControlConfig instead of
  pdm_api_types::AccessControlConfig.
* (refactor) Adapt generation checks around cached-secret comparison to
  use the new shared generation source.
* (fix/logic) cache_try_insert_secret: Update the local cache
  generation if stale, allowing the new secret to be inserted
  immediately
* (refactor) Extract cache invalidation logic into a
  invalidate_cache_state helper to reduce duplication and ensure
  consistent state resets
* (refactor) Simplify refresh_cache_if_file_changed: handle the
  un-initialized/reset state and adjust the generation mismatch
  path to ensure file metadata is always re-read.
* (doc) Clarify TTL-delayed effects of manual token.shadow edits.

Please see the patch specific changelogs for more details.

Thanks for considering this patch series, I look forward to your
feedback.

Best,
Samuel Rufinatscha

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] attachment 1794 [1]: Flamegraph PDM baseline
[4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049

proxmox-backup:

Samuel Rufinatscha (4):
  pbs-config: add token.shadow generation to ConfigVersionCache
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                             |   1 +
 docs/user-management.rst               |   4 +
 pbs-config/Cargo.toml                  |   1 +
 pbs-config/src/config_version_cache.rs |  18 ++
 pbs-config/src/token_shadow.rs         | 298 ++++++++++++++++++++++++-
 5 files changed, 321 insertions(+), 1 deletion(-)


proxmox:

Samuel Rufinatscha (4):
  proxmox-access-control: extend AccessControlConfig for token.shadow
    invalidation
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/init.rs         |  17 ++
 proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
 4 files changed, 317 insertions(+), 1 deletion(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (2):
  pdm-config: implement token.shadow generation
  docs: document API token-cache TTL effects

 cli/admin/src/main.rs                       |  2 +-
 docs/access-control.rst                     |  4 ++
 lib/pdm-config/Cargo.toml                   |  1 +
 lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs  | 18 +++++
 lib/pdm-config/src/lib.rs                   |  2 +
 server/src/acl.rs                           |  3 +-
 7 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control_config.rs


Summary over all repositories:
  16 files changed, 738 insertions(+), 5 deletions(-)

-- 
Generated by git-murpp 0.8.1




More information about the pbs-devel mailing list