[pbs-devel] [PATCH v2 proxmox-backup] partial fix #5560: client: periodically show backup progress

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Oct 10 16:45:43 CEST 2024


Am 09/10/2024 um 11:20 schrieb Christian Ebner:
> Spawn a new tokio task which about every minute displays the
> cumulative progress of the backup for pxar, ppxar or img archive
> streams. Catalog and metadata archive streams are excluded from the
> output for better readability, and because the catalog upload lives
> for the whole upload time, leading to possible temporal
> misalignments in the output. The actual payload data is written via
> the other streams anyway.
> 
> Add accounting for uploaded chunks, to distinguish from chunks queued
> for upload, but not actually uploaded yet.
> 
> Example output in the backup task log:
> ```
> ...
> INFO:  root.pxar: elapsed 60.00 s, new: 191.446 MiB, reused: 0 B, total: 191.446 MiB, uploaded: 13.021 MiB (compressed 5.327 MiB, average: 222.221 KiB/s)
> INFO:  root.pxar: elapsed 120.00 s, new: 191.446 MiB, reused: 0 B, total: 191.446 MiB, uploaded: 27.068 MiB (compressed 11.583 MiB, average: 230.977 KiB/s)
> INFO:  root.pxar: elapsed 180.00 s, new: 191.446 MiB, reused: 0 B, total: 191.446 MiB, uploaded: 36.138 MiB (compressed 14.987 MiB, average: 205.58 KiB/s)

Thx for tackling this, but I'm rather nitpicky with the formatting of
progress reports, so quite a bit commentary w.r.t. that:

I'm not a total fan of those averaged bandwidth indicators, as they often
suggest a slow tool (or uplink) if not much new data has to be sent.
If, it might make a bit more sense to print the bandwidth of the total
processed data?

Printing the elapsed time just in seconds can be rather unwieldy for longer
running operations, e.g. "elapsed 32280 s, ..." for "8 h 58 m" is not so
easy to parse. A HumanDuration which renders to something like, for example,
"1w 2d 3h 4m 5.67s" could be nicer here (parts that are 0 simply omitted),
but even just a local fn that handles this up to hour range would be a lot
better.

And I see some confusion potential with "new" as in "is it new since last
status report output or total new data compared to previous snapshot"

Is "total" the amount of read data here? As that might be one of the better
indicators, i.e. if I (roughly) know that directory I back up holds
10 GB of data and the client reports it read 8.7 GB it would be helpful
for me even if it's naturally also not guaranteed to progress linearly.

Potentially also just report the compressed amount for "uploaded", as
that's what really got uploaded?

As of is, the format seems to benefit devs and technical users the most
way, for the ordinary user it might be a bit much.

Maybe reduce this to something like:

processed X data in T (optionally: processing-rate) uploaded Y

where X is totally processed data, T is the elapsed time and Y is the
amount of data that actually had to be sent over the network link.
Just as an more actionable idea, there might be better variants.

FWIW: We could still add a more detail reporting mode enabled through
some CLI option later.




More information about the pbs-devel mailing list