[pve-devel] [PATCH pve-qemu] fix: 3865: backup restore human readable output

Fri Mar 25 12:05:28 CET 2022

(re-sending, seems you forgot to hit Reply-All and only sent it to me directly).

On 25.03.22 10:41, Daniel Tschlatscher wrote:
> On 3/25/22 09:29, Thomas Lamprecht wrote:
>> Thanks for the patch, some comments inline.
>> On 24.03.22 16:44, Daniel Tschlatscher wrote:

>> please format the commit message according our submission rules:
>> https://pve.proxmox.com/wiki/Developer_Documentation#Commits_and_Commit_Messages
>>
>> an output example (maybe even an before/after) excerpt would be nice to have here.
>
> Just leave out the ":" after "fix", right? Or is there something else I should change?

I actually meant the per-line character length, quoting:
> Make sure the line length of the commit's message is not longer than 70 characters.
> Note, HTTPS links are an exception and should not be split.

but yeah, now that you say it, it really should be `fix #3865: ...`

>>>   +typedef struct VmaRestoreState {
>>>   +    BlockBackend *target;
>>>   +    bool write_zeroes;
>>> @@ -649,13 +653,31 @@ index 0000000000..2b1d1cdab3
>>>   +
>>>   +        if (verbose) {
>>>   +            time_t duration = time(NULL) - vmar->start_time;
>> The resulting accuracy for time in seconds could be not that good, did you make any
>> thoughts or comparison regarding that.
> I "built" upon what was here. This is the way that seconds passed were measured in the original code and works well enough in this case. time_t has some functionality for tracking milliseconds though but I don't think that much fine control over the time elapsed has any real benefits here.

sorry but what? Example we have a byte delta of 100 MiB since last report, and nothing
guarantees that we're currently very close to a second boundary, so lets see
how the error compares comparing second resolution to some millisecond ones.

Using second granularity anything between 1 <= t < 2 gets you 100 MiB/s
On the other hand:

                    "actual"     "1s cut off"   "error"
100 MiB / 1.000s = 100.0 MiB/s   100.0 MiB/s      0.0
100 MiB / 1.050s =  95.2 MiB/s   100.0 MiB/s      4.8
100 MiB / 1.123s =  89.0 MiB/s   100.0 MiB/s     11.0
100 MiB / 1.130s =  88.5 MiB/s   100.0 MiB/s     11.5
100 MiB / 1.500s =  66.7 MiB/s   100.0 MiB/s     33.3
100 MiB / 1.750s =  57.1 MiB/s   100.0 MiB/s     42.9
100 MiB / 1.999s =  50.0 MiB/s   100.0 MiB/s     50.0

So even just 7 ms (1.123s vs 1.13s) make a 0.5 MiB/s difference, IMO already relevant.

Wouldn't dare to call that "working well enough", or are we talking about
different things?

>>> ++
>>> ++                if (delta != 0)
>>> ++                    bps /= delta;
>> currently, if delta would be null you'd print a size unit as throughput unit?
>>
> This is due to the underlying code. The function that is wrapping this part of the code is called multiple times every second (something like ~20 times / sec) and in the original code a print statement would only occur after the percentage of bytes transferred increased by at least 1%.> 
> The time elapsed between an increase in percent could be less than a second. In this case the output simply prints how much bytes were transferred since this second began. This means the value for throughput would at the time of the print be always lower than the real world B/s and only the last print in that second would represent the actual B/s.

This just doesn't makes sense to me, the if clauses enforces that its called at
max every 2s (or 100% edge case) already, and using wrong units is using wrong
units, you cannot really argue basic math/physics away..

>
> Nonetheless, I later adapted this code to behave like the backup create function does, which updates every 3 seconds, eliminating this problem entirely. This is therefore dead code anyway.
>

later where?

Actual improvement, track duration in milliseconds or less and still catch the edge
case for when delta is zero and just set throughput to zero then (as no data can
transfer in zero time).

>> would IMO be worth it to factor above lines out in a static local helper to avoid
>> crowding this function to much, e.g. (types may be adapted if reasonable):
>>
>> print_restore_progress(uint64_t total_byte, uint64_t restored_byte, uint64_t duration_ms);
>>
>> I would handle the last_X statics in there and also re-calculate the percentage as float
>> in there, so that we can print a similar amount of digits after the decimal place like
>> we do in backup.
>
> A function sounds good, though I am not quite sure what you mean by "print a similar amount of digits after the decimal place like we do in backup"

Hmm, I thought percentage isn't a integer there, so meant something like 5.1%,
but apparently it is a plain integer (may just have been in an experiment done
once I rewrote that whole thing a bit ago), so for now you can ignore that.