[pbs-devel] Slow overview of existing backups

Mark Schouten mark at tuxis.nl
Thu Jan 26 09:03:24 CET 2023


Hi,

>>  PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
>>
>
>PBS is build such that the file system is the source of truth, one can,
>e.g., remove stuff there or use the manager CLI, multiple PBS instances
>can also run parallel, e.g., during upgrade.
>
>So having a guaranteed in-sync cache is not as trivial as it might sound.
>

You can also remove stuff from /var/lib/mysql/, but then you break it. 
There is nothing wrong with demanding your user to don’t touch any 
files, except via the tooling you provide. And the tooling you provide, 
can hint the service to rebuild the index. Same goes for upgrades, you 
are in charge of them.

We also need to regularly run garbage collection, which is a nice moment 
to update my desired index and check if it’s actually correct. On every 
backup run, delete, verify, you can update and check the index. Those 
are all moments a user is not actually waiting for it and getting 
timeouts, refreshing screens, and other annoyances.

>
>>  I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.
>
>We mostly walk the directory structure and read the (quite small) manifest
>files for some info like last verification, but we do not check the backup
>(data) itself.
>
>Note that using namespaces for separating many backups into multiple folder
>can help, as a listing then only needs to check the indices from the namespace.
>
>But, what data and backup amount count/sizes are we talking here?

Server:
2x Intel Silver 4114 (10 cores, 20 threads each)
256GB RAM
A zpool consisting of:
- 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices

Datastores:
- 73 datastores
- Total of 240T Allocated data

Datastore that triggered my question:
- 263 Groups
- 2325 Snapshots
- 60TB In use
- Dedup factor of 19.3

>How many groups, how many snapshots (per group), many disks on backups?
>
>And what hardware is hosting that data (cpu, disk, memory).
>
>Hows PSI looking during listing? head /proc/pressure/*

root at pbs003:/proc/pressure# head *
==> cpu <==
some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

==> io <==
some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422

==> memory <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631

Currently running 9 tasks:
- 3 Verifys
- 1 Backup
- 2 Syncjobs
- 2 GC Runs
- 1 Reader

—
Mark Schouten, CTO
Tuxis B.V.
mark at tuxis.nl / +31 318 200208





More information about the pbs-devel mailing list