[pbs-devel] [PATCH proxmox-backup 2/2] datastore: reinsert unused chunks into cache during instantiation

Tue Aug 5 08:15:55 CEST 2025

On 8/4/25 10:41 PM, Thomas Lamprecht wrote:
> Am 01.08.25 um 16:10 schrieb Christian Ebner:
>> The local datastore chunk cache stores the currently cached chunk
>> digests in-memory, the chunk's data is stored however on the
>> filesystem. The in-memory cache might however be lost when:
>> - the datastore is removed for the lookup cache when a corresponding
>>    maintenance mode is set.
>> - the services are restarted.
>> - the system is rebooted.
>>
>> After above actions, the cache is reistantiated again together with
>> the datastore on the next datastore lookup, calculating a cache
>> capacity based on the currently available storage space. This however
>> leaves the previously cached chunks out.
>> Therefore, reinsert them in an asynchronos task, by iterating over
>> them an insert the chunk digest again. For these previously used
>> chunks, increase also the cache size as this is now usable storage
>> for the cache as well.
> 
> I really would like some basic numbers for patches doing things with
> caches, especially if they iterate over all chunks present on disk, IIUC.
> AFAICIT it at least happens in the background, so doesn't delays the
> one instantiating the new datastore struct directly, but without some
> pacing going through all chunks as fast as possible might introduce
> significant IO pressure I think.

Yes, although given that the cache should be limited in most cases, the 
actual chunk iteration should not be that bad. It took a couple of 
seconds with a cache store containing 8190 chunks, although on a NVME 
SSD. But I will do a more in depth runtime and I/O pressure analysis in 
the next days.

> Also, what happens if the datastore instance is already dropped again
> during this cache re-warming? AFAICT that can only realistically happen
> with maintenance mode, as with restarts/reboots it naturally should not
> matter. Also, just to be sure, this might also block any shutdown
> future from resolving, just like any other spawn_blocking, or am I
> mistaken?

Right, these are edge case I did not consider, given that these patches 
were created rather in a hurry with the primary intention to have a stop 
gap. After all the users will run out of storage space sooner or later, 
given that nothing will ever reclaim the space occupied by the chunks 
not in the cache anymore.

For now a
```
find /<datastore-path>/.chunks/ -type f -exec truncate --size 0 {} \;
```
while having the datastore in maintenance mode offline is a valid 
workaround to recover from these cases, although clearing the contents 
instead of reclaiming them.

But all in all, given your concerns I will invest more time into these 
patches so we can (hopefully) roll them out and fix this issue for the 
users.