[pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead
Samuel Rufinatscha
s.rufinatscha at proxmox.com
Fri Jan 2 17:07:39 CET 2026
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Problem
For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:
1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
proxmox_sys::crypt::verify_crypt_pw for the provided token secret
Under load, this results in significant CPU usage spent in repeated
password hashing for the same token+secret pairs. The attached
flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:
1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow file API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window
Testing
*PBS (pbs-config)*
To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #7017 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup. Confirmed that
proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
hot section of the flamegraph. CPU usage is now dominated by TLS
overhead.
3. Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
*PDM (proxmox-access-control)*
To verify the effect in PDM, I followed a similar testing approach.
Instead of PBS’ /status, I profiled the /version endpoint with cargo
flamegraph [2] and verified that the expensive hashing path disappears
from the hot section after introducing caching.
Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
Benchmarks:
Two different benchmarks have been run to measure caching effects
and RwLock contention:
(1) Requests per second for PBS /status endpoint (E2E)
Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [4]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).
(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests
The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.
Patch summary
pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
Extends ConfigVersionCache to provide a process-shared generation
number for token.shadow changes.
0002 – pbs-config: cache verified API token secrets
Adds an in-memory cache to cache verified, plain-text API token secrets.
Cache is invalidated through the process-shared ConfigVersionCache
generation number. Uses openssl’s memcmp constant-time for matching
secrets.
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
Stats token.shadow mtime and length and clears the cache when the
file changes, on each token verification request.
0004 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
checks so that fs::metadata calls are not performed on each request.
proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
Extends the AccessControlConfig trait with
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() for
proxmox-access-control to get the shared token.shadow generation number
and bump it on token shadow changes.
0006 – access-control: cache verified API token secrets
Mirrors PBS PATCH 0002.
0007 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS PATCH 0003.
0008 – access-control: add TTL window to token-secret cache
Mirrors PBS PATCH 0004.
proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
Extends PDM ConfigVersionCache and implements
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() from AccessControlConfig for
PDM.
0010 – docs: document API token-cache TTL effects
Documents the effects of the TTL window on token.shadow edits
Changes from v1 to v2:
* (refactor) Switched cache initialization to LazyLock
* (perf) Use parking_lot::RwLock and best-effort cache access on the
read/refresh path (try_read/try_write) to avoid lock contention
* (doc) Document TTL-delayed effect of manual token.shadow edits
* (fix) Add generation guards (API_MUTATION_GENERATION +
FILE_GENERATION) to prevent caching across concurrent set/delete and
external edits
Changes from v2 to v3:
* (refactor) Replace PBS per-process cache invalidation with a
cross-process token.shadow generation based on PBS
ConfigVersionCache, ensuring cache consistency between privileged
and unprivileged daemons.
* (refactor) Decoupling generation source from the
proxmox/proxmox-access-control cache implementation: extend
AccessControlConfig hooks so that products can provide the shared
token.shadow generation source.
* (refactor) Extend PDM's ConfigVersionCache with
token_shadow_generation
and introduce a pdm_config::AccessControlConfig wrapper implementing
the new proxmox-access-control trait hooks. Switch server and CLI
initialization to use pdm_config::AccessControlConfig instead of
pdm_api_types::AccessControlConfig.
* (refactor) Adapt generation checks around cached-secret comparison to
use the new shared generation source.
* (fix/logic) cache_try_insert_secret: Update the local cache
generation if stale, allowing the new secret to be inserted
immediately
* (refactor) Extract cache invalidation logic into a
invalidate_cache_state helper to reduce duplication and ensure
consistent state resets
* (refactor) Simplify refresh_cache_if_file_changed: handle the
un-initialized/reset state and adjust the generation mismatch
path to ensure file metadata is always re-read.
* (doc) Clarify TTL-delayed effects of manual token.shadow edits.
Please see the patch specific changelogs for more details.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] attachment 1794 [1]: Flamegraph PDM baseline
[4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
proxmox-backup:
Samuel Rufinatscha (4):
pbs-config: add token.shadow generation to ConfigVersionCache
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
Cargo.toml | 1 +
docs/user-management.rst | 4 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/config_version_cache.rs | 18 ++
pbs-config/src/token_shadow.rs | 298 ++++++++++++++++++++++++-
5 files changed, 321 insertions(+), 1 deletion(-)
proxmox:
Samuel Rufinatscha (4):
proxmox-access-control: extend AccessControlConfig for token.shadow
invalidation
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/init.rs | 17 ++
proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
4 files changed, 317 insertions(+), 1 deletion(-)
proxmox-datacenter-manager:
Samuel Rufinatscha (2):
pdm-config: implement token.shadow generation
docs: document API token-cache TTL effects
cli/admin/src/main.rs | 2 +-
docs/access-control.rst | 4 ++
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 18 +++++
lib/pdm-config/src/lib.rs | 2 +
server/src/acl.rs | 3 +-
7 files changed, 100 insertions(+), 3 deletions(-)
create mode 100644 lib/pdm-config/src/access_control_config.rs
Summary over all repositories:
16 files changed, 738 insertions(+), 5 deletions(-)
--
Generated by git-murpp 0.8.1
More information about the pbs-devel
mailing list