[pbs-devel] Slow overview of existing backups

Fri Mar 10 10:09:42 CET 2023

Hi all,

any thought on this?

—
Mark Schouten, CTO
Tuxis B.V.
mark at tuxis.nl / +31 318 200208

------ Original Message ------
>From "Mark Schouten" <mark at tuxis.nl>
To "Thomas Lamprecht" <t.lamprecht at proxmox.com>; "Proxmox Backup Server 
development discussion" <pbs-devel at lists.proxmox.com>
Date 1/26/2023 9:03:24 AM
Subject Re[2]: [pbs-devel] Slow overview of existing backups

>Hi,
>
>>>  PBS knows when something changed in terms of backups, and thus when it’s time to update that index.
>>>
>>
>>PBS is build such that the file system is the source of truth, one can,
>>e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>can also run parallel, e.g., during upgrade.
>>
>>So having a guaranteed in-sync cache is not as trivial as it might sound.
>>
>
>You can also remove stuff from /var/lib/mysql/, but then you break it. There is nothing wrong with demanding your user to don’t touch any files, except via the tooling you provide. And the tooling you provide, can hint the service to rebuild the index. Same goes for upgrades, you are in charge of them.
>
>We also need to regularly run garbage collection, which is a nice moment to update my desired index and check if it’s actually correct. On every backup run, delete, verify, you can update and check the index. Those are all moments a user is not actually waiting for it and getting timeouts, refreshing screens, and other annoyances.
>
>>
>>>  I have the feeling that when you request an overview now, all individual backups are checked, which seems suboptimal.
>>
>>We mostly walk the directory structure and read the (quite small) manifest
>>files for some info like last verification, but we do not check the backup
>>(data) itself.
>>
>>Note that using namespaces for separating many backups into multiple folder
>>can help, as a listing then only needs to check the indices from the namespace.
>>
>>But, what data and backup amount count/sizes are we talking here?
>
>Server:
>2x Intel Silver 4114 (10 cores, 20 threads each)
>256GB RAM
>A zpool consisting of:
>- 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>
>Datastores:
>- 73 datastores
>- Total of 240T Allocated data
>
>Datastore that triggered my question:
>- 263 Groups
>- 2325 Snapshots
>- 60TB In use
>- Dedup factor of 19.3
>
>>How many groups, how many snapshots (per group), many disks on backups?
>>
>>And what hardware is hosting that data (cpu, disk, memory).
>>
>>Hows PSI looking during listing? head /proc/pressure/*
>
>root at pbs003:/proc/pressure# head *
>==> cpu <==
>some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>
>==> io <==
>some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>
>==> memory <==
>some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>
>Currently running 9 tasks:
>- 3 Verifys
>- 1 Backup
>- 2 Syncjobs
>- 2 GC Runs
>- 1 Reader
>
>—
>Mark Schouten, CTO
>Tuxis B.V.
>mark at tuxis.nl / +31 318 200208