[pbs-devel] Slow overview of existing backups
devzero at web.de
Fri Mar 10 11:16:05 CET 2023
>> Requesting the available backups from a PBS takes quite a long time.
>> Are there any plans to start implementing caching or an overal
index-file for a datastore?
> There's already the host systems page cache that helps a lot, as long
> there's enough memory to avoid displacing its content frequently.
>- 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
ah ok, i see you should have fast metadata access because of special device
what about freshly booting your backup server and issuing
zpool iostat -rv
after listing backups and observing slowness ?
with this we can get more insight where time is spent , if it's only
all about metadata access and if things are working good from a
filesystem/performance/metadata point of view.
i don't expect issues there anymore as special vdev got mature in the
meantime, but you never know. remembering
https://github.com/openzfs/zfs/issues/8130 for example....
if that looks sane from a performance perspective, taking a closer look
at the pbs/indexer level would be good.
Am 10.03.23 um 10:09 schrieb Mark Schouten:
> Hi all,
> any thought on this?
> Mark Schouten, CTO
> Tuxis B.V.
> mark at tuxis.nl / +31 318 200208
> ------ Original Message ------
> From "Mark Schouten" <mark at tuxis.nl>
> To "Thomas Lamprecht" <t.lamprecht at proxmox.com>; "Proxmox Backup
> Server development discussion" <pbs-devel at lists.proxmox.com>
> Date 1/26/2023 9:03:24 AM
> Subject Re: [pbs-devel] Slow overview of existing backups
>>>> PBS knows when something changed in terms of backups, and thus
>>>> when it’s time to update that index.
>>> PBS is build such that the file system is the source of truth, one can,
>>> e.g., remove stuff there or use the manager CLI, multiple PBS instances
>>> can also run parallel, e.g., during upgrade.
>>> So having a guaranteed in-sync cache is not as trivial as it might
>> You can also remove stuff from /var/lib/mysql/, but then you break
>> it. There is nothing wrong with demanding your user to don’t touch
>> any files, except via the tooling you provide. And the tooling you
>> provide, can hint the service to rebuild the index. Same goes for
>> upgrades, you are in charge of them.
>> We also need to regularly run garbage collection, which is a nice
>> moment to update my desired index and check if it’s actually correct.
>> On every backup run, delete, verify, you can update and check the
>> index. Those are all moments a user is not actually waiting for it
>> and getting timeouts, refreshing screens, and other annoyances.
>>>> I have the feeling that when you request an overview now, all
>>>> individual backups are checked, which seems suboptimal.
>>> We mostly walk the directory structure and read the (quite small)
>>> files for some info like last verification, but we do not check the
>>> (data) itself.
>>> Note that using namespaces for separating many backups into multiple
>>> can help, as a listing then only needs to check the indices from the
>>> But, what data and backup amount count/sizes are we talking here?
>> 2x Intel Silver 4114 (10 cores, 20 threads each)
>> 256GB RAM
>> A zpool consisting of:
>> - 17 three-way mirrors of 18TB Western Digital HC550’s, SAS
>> - 2 three-way mirrors of 960GB Samsung PM9A3 nvme’s as special devices
>> - 73 datastores
>> - Total of 240T Allocated data
>> Datastore that triggered my question:
>> - 263 Groups
>> - 2325 Snapshots
>> - 60TB In use
>> - Dedup factor of 19.3
>>> How many groups, how many snapshots (per group), many disks on backups?
>>> And what hardware is hosting that data (cpu, disk, memory).
>>> Hows PSI looking during listing? head /proc/pressure/*
>> root at pbs003:/proc/pressure# head *
>> ==> cpu <==
>> some avg10=0.74 avg60=0.58 avg300=0.21 total=8570917611
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=0
>> ==> io <==
>> some avg10=20.45 avg60=23.93 avg300=27.69 total=176562636690
>> full avg10=19.25 avg60=22.69 avg300=26.82 total=165397148422
>> ==> memory <==
>> some avg10=0.00 avg60=0.00 avg300=0.00 total=67894436
>> full avg10=0.00 avg60=0.00 avg300=0.00 total=66761631
>> Currently running 9 tasks:
>> - 3 Verifys
>> - 1 Backup
>> - 2 Syncjobs
>> - 2 GC Runs
>> - 1 Reader
>> Mark Schouten, CTO
>> Tuxis B.V.
>> mark at tuxis.nl / +31 318 200208
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
More information about the pbs-devel