[pve-devel] Discussion of major PBS restore speedup in proxmox-backup-qemu

Tue Jun 24 09:28:18 CEST 2025

> Adam Kalisz via pve-devel <pve-devel at lists.proxmox.com> hat am 23.06.2025 18:10 CEST geschrieben:
> Hi list,

Hi!

> before I go through all the hoops to submit a patch I wanted to discuss
> the current form of the patch that can be found here:
> 
> https://github.com/NOT-NULL-Makers/proxmox-backup-qemu/commit/e91f09cfd1654010d6205d8330d9cca71358e030
> 
> The speedup process was discussed here:
> 
> https://forum.proxmox.com/threads/abysmally-slow-restore-from-backup.133602/
> 
> The current numbers are:
> 
> With the most current snapshot of a VM with 10 GiB system disk and 2x
> 100 GiB disks with random data:
> 
> Original as of 1.5.1:
> 10 GiB system:    duration=11.78s,  speed=869.34MB/s
> 100 GiB random 1: duration=412.85s, speed=248.03MB/s
> 100 GiB random 2: duration=422.42s, speed=242.41MB/s
> 
> With the 12-way concurrent fetching:
> 
> 10 GiB system:    duration=2.05s,   speed=4991.99MB/s
> 100 GiB random 1: duration=100.54s, speed=1018.48MB/s
> 100 GiB random 2: duration=100.10s, speed=1022.97MB/s

Those numbers do look good - do you also have CPU usage stats before
and after?

> The hardware is on the PVE side:
> 2x Intel Xeon Gold 6244, 1 TB RAM, 2x 100 Gbps Mellanox, 14x Samsung
> NVMe 3,8 TB drives in RAID10 using mdadm/ LVM-thin.
> 
> On the PBS side:
> 2x Intel Xeon Gold 6334, 1 TB RAM, 2x 100 Gbps Mellanox, 8x Samsung
> NVMe in RAID using 4 ZFS mirrors with recordsize 1M, lz4 compression.
> 
> Similar or slightly better speeds were achieved on Hetzner AX52 with
> AMD Ryzen 7 7700 with 64 GB RAM and 2x 1 TB NVMe in stripe on PVE with
> recordsize 16k connected to another Hetzner AX52 using a 10 Gbps
> connection. The PBS has normal NVMe ZFS mirror again with recordsize
> 1M.
> 
> On bigger servers a 16-way concurrency was even better on smaller
> servers with high frequency CPUs 8-way concurrency performed better.
> The 12-way concurrency is a compromise. We seem to hit a bottleneck
> somewhere in the realm of TLS connection and shallow buffers. The
> network on the 100 Gbps servers can support up to about 3 GBps (almost
> 20 Gbps) of traffic in a single TCP connection using mbuffer. The
> storage can keep up with such a speed.

This sounds like it might make sense to make the number of threads
configurable (the second lower count can probably be derived from it?)
to allow high-end systems to make the most of it, without overloading
smaller setups. Or maybe deriving it from the host CPU count would
also work?

> Before I submit the patch, I would also like to do the most up to date
> build but I have trouble updating my build environment to reflect the
> latest commits. What do I have to put in my /etc/apt/sources.list to be
> able to install e.g. librust-cbindgen-0.27+default-dev librust-http-
> body-util-0.1+default-dev librust-hyper-1+default-dev and all the rest?

We are currently in the process of rebasing all our repositories on top
of the upcoming Debian Trixie release. The built packages are not yet
available for public testing, so you'd either need to wait a bit (in the
order of a few weeks at most), or submit the patches for the current
stable Bookworm-based version and let us forward port them.

> This work was sponsored by ČMIS s.r.o. and consulted with the General
> Manager Václav Svátek (ČMIS), Daniel Škarda (NOT NULL Makers s.r.o.)
> and Linux team leader Roman Müller (ČMIS).

Nice! Looking forward to the "official" patch submission!
Fabian