[pbs-devel] Slow overview of existing backups
Mark Schouten
mark at tuxis.nl
Thu Jan 26 09:03:24 CET 2023
Hi,
>> PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
>>
>
>PBS is build such that the file system is the source of truth, one can,
>e.g., remove stuff there or use the manager CLI, multiple PBS instances
>can also run parallel, e.g., during upgrade.
>
>So having a guaranteed in-sync cache is not as trivial as it might sound.
>
You can also remove stuff from /var/lib/mysql/, but then you break it.
There is nothing wrong with demanding your user to don’t touch any
files, except via the tooling you provide. And the tooling you provide,
can hint the service to rebuild the index. Same goes for upgrades, you
are in charge of them.
We also need to regularly run garbage collection, which is a nice moment
to update my desired index and check if it’s actually correct. On every
backup run, delete, verify, you can update and check the index. Those
are all moments a user is not actually waiting for it and getting
timeouts, refreshing screens, and other annoyances.
>
>> I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.
>
>We mostly walk the directory structure and read the (quite small) manifest
>files for some info like last verification, but we do not check the backup
>(data) itself.
>
>Note that using namespaces for separating many backups into multiple folder
>can help, as a listing then only needs to check the indices from the namespace.
>
>But, what data and backup amount count/sizes are we talking here?
Server:
2x Intel Silver 4114 (10 cores, 20 threads each)
256GB RAM
A zpool consisting of:
- 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
Datastores:
- 73 datastores
- Total of 240T Allocated data
Datastore that triggered my question:
- 263 Groups
- 2325 Snapshots
- 60TB In use
- Dedup factor of 19.3
>How many groups, how many snapshots (per group), many disks on backups?
>
>And what hardware is hosting that data (cpu, disk, memory).
>
>Hows PSI looking during listing? head /proc/pressure/*
root at pbs003:/proc/pressure# head *
==> cpu <==
some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
==> io <==
some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
==> memory <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
Currently running 9 tasks:
- 3 Verifys
- 1 Backup
- 2 Syncjobs
- 2 GC Runs
- 1 Reader
—
Mark Schouten, CTO
Tuxis B.V.
mark at tuxis.nl / +31 318 200208
More information about the pbs-devel
mailing list