[pbs-devel] Slow overview of existing backups

Wed Jan 25 17:08:44 CET 2023

Hi,

Am 25/01/2023 um 11:26 schrieb Mark Schouten:
> Requesting the available backups from a PBS takes quite a long time.> Are there any plans to start implementing caching or an overal index-file for a datastore?

There's already the host systems page cache that helps a lot, as long
there's enough memory to avoid displacing its content frequently.

> PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
> 

PBS is build such that the file system is the source of truth, one can,
e.g., remove stuff there or use the manager CLI, multiple PBS instances
can also run parallel, e.g., during upgrade.

So having a guaranteed in-sync cache is not as trivial as it might sound.

> I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.

We mostly walk the directory structure and read the (quite small) manifest
files for some info like last verification, but we do not check the backup
(data) itself.

Note that using namespaces for separating many backups into multiple folder
can help, as a listing then only needs to check the indices from the namespace.

But, what data and backup amount count/sizes are we talking here?
How many groups, how many snapshots (per group), many disks on backups?

And what hardware is hosting that data (cpu, disk, memory).

Hows PSI looking during listing? head /proc/pressure/*