[pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop

Samuel Rufinatscha s.rufinatscha at proxmox.com
Wed Nov 12 16:20:07 CET 2025


On 11/12/25 12:25 PM, Fabian Grünbichler wrote:
> On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch makes Drop O(1) on the fast path by reusing the maintenance-
> 
> I am not sure what the O(1) is refering to? This patch implements a
> faster cache lookup in front of the (slow) config parsing variant, but
> that doesn't really align well with what the "Big O" notation tries to
> express ;)
> 
> The parsing below still scales with the number of datastores in the
> config, after all. It can just be skipped sometimes :)
>

Good point — the O(1) reference is a rather misleading. I’ll rephrase it 
in v2 :)

>> mode decision captured at lookup time and stored with the cached
>> datastore entry. When the last reference goes away we:
>> - decrement active-operation counters, and
>> - evict only if the cached decision mandates eviction.
>>
>> If the cache tag is absent or not fresh, a subsequent slow-path lookup
>> will be performed.
>>
>> Testing
>>
>> Compared flamegraphs before and after: prior to this change
>> (on top of patch 1), stacks originating from Drop included
>> pbs_config::datastore::config(). After the change, those vanish from
>> the drop path.
>>
>> An end-to-end benchmark using `/status?verbose=0` with 1000 datastores,
>> 5 requests per store, and 16-way parallelism shows a further
>> improvement:
>>
>> | Metric                  | After commit 1 | After commit 2 | Δ (abs) | Δ (%)   |
>> |-------------------------|:--------------:|:--------------:|:-------:|:-------:|
>> | Total time              | 11s            | 10s            | −1s     | −9.09%  |
>> | Throughput (all rounds) | 454.55         | 500.00         | +45.45  | +10.00% |
>> | Cold RPS (round #1)     | 90.91          | 100.00         | +9.09   | +10.00% |
>> | Warm RPS (rounds 2..N)  | 363.64         | 400.00         | +36.36  | +10.00% |
>>
>> Optimizing Drop improves overall throughput by ~10%. The gain appears
>> in both cold and warm rounds, and the flamegraph confirms the config
>> reload no longer sits on the hot path.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha at proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs | 31 +++++++++++++++++++++++++++----
>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 18eebb58..da80416a 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -200,15 +200,38 @@ impl Drop for DataStore {
>>               // remove datastore from cache iff
>>               //  - last task finished, and
>>               //  - datastore is in a maintenance mode that mandates it
>> -            let remove_from_cache = last_task
>> -                && pbs_config::datastore::config()
>> +
>> +            // first check: check if last task finished
>> +            if !last_task {
>> +                return;
>> +            }
>> +
>> +            let cached_tag = self.inner.cached_config_tag.as_ref();
>> +            let last_gen_num = cached_tag.and_then(|c| c.last_generation);
>> +            let gen_num = ConfigVersionCache::new()
>> +                .ok()
>> +                .map(|c| c.datastore_generation());
>> +
>> +            let cache_is_fresh = match (last_gen_num, gen_num) {
>> +                (Some(a), Some(b)) => a == b,
>> +                _ => false,
>> +            };
> 
> this is just last_gen_num == gen_num and checking that either is Some.
> if we make the tag always contain a generation instead of an option, we
> can simplify this code ;)
>

Good point, will adjust this. I think we could keep 
`ConfigVersionCache::new().ok()` and create the optional cache tag only 
if the generation number is `Some`. This way, the lookup would still be 
able to perform a slow path read if the cache isn’t available for any 
reason.

>> +
>> +            let mm_mandate = if cache_is_fresh {
>> +                cached_tag
>> +                    .and_then(|c| c.last_maintenance_mode.as_ref())
>> +                    .is_some_and(|m| m.clear_from_cache())
>> +            } else {
>> +                pbs_config::datastore::config()
>>                       .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>>                       .is_ok_and(|c| {
>>                           c.get_maintenance_mode()
>>                               .is_some_and(|m| m.clear_from_cache())
>> -                    });
>> +                    })
>> +            };
>>   
>> -            if remove_from_cache {
>> +            // second check: check maintenance mode mandate
>> +            if mm_mandate {
>>                   DATASTORE_MAP.lock().unwrap().remove(self.name());
>>               }
>>           }
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel





More information about the pbs-devel mailing list