[pbs-devel] [PATCH proxmox-backup v2 1/2] partial fix #3701: sync/pull: add transfer-last parameter

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Jan 17 13:16:20 CET 2023


On January 11, 2023 3:52 pm, Stefan Hanreich wrote:
> Specifying the transfer-last parameter limits the amount of backups
> that get synced via the pull command/sync job. The parameter specifies
> how many of the N latest backups should get pulled/synced. All other
> backups will get skipped.
> 
> This is particularly useful in situations where the sync target has
> less disk space than the source. Syncing all backups from the source
> is not possible if there is not enough disk space on the target.
> Additionally this can be used for limiting the amount of data
> transferred, reducing load on the network.
> 
> Signed-off-by: Stefan Hanreich <s.hanreich at proxmox.com>
> ---
> I had to make slight adjustments to Wolfgang's proposed condition because it
> wouldn't work in cases where transfer-last was greater than the total amount of
> backups available.
> Nevertheless, the condition should now be a lot less obtuse and easier to read.

see below for a suggestion for further improvement

one high-level comment: I am not sure about the interaction between re-syncing
the last synced snapshot and transfer-last. I think I'd prefer the last synced
snapshot to be always re-synced (the main purpose is to get the backup log/notes
in case backup and pull/sync align just right, so not much data should be
transferred/used), which is not happening at the moment (but trivially
implemented, just needs an additional condition in the transfer-last check..).

> 
>  pbs-api-types/src/jobs.rs         | 11 +++++++++++
>  src/api2/config/sync.rs           |  9 +++++++++
>  src/api2/pull.rs                  | 10 +++++++++-
>  src/bin/proxmox-backup-manager.rs | 11 ++++++++++-
>  src/server/pull.rs                | 17 ++++++++++++++++-
>  5 files changed, 55 insertions(+), 3 deletions(-)
> 

[..]

> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 65eedf2c..81f4faf3 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -1,5 +1,6 @@
>  //! Sync datastore from remote server
>  
> +use std::cmp::min;
>  use std::collections::{HashMap, HashSet};
>  use std::io::{Seek, SeekFrom};
>  use std::sync::atomic::{AtomicUsize, Ordering};
> @@ -59,6 +60,8 @@ pub(crate) struct PullParameters {
>      group_filter: Option<Vec<GroupFilter>>,
>      /// Rate limits for all transfers from `remote`
>      limit: RateLimitConfig,
> +    /// How many snapshots should be transferred at most (taking the newest N snapshots)
> +    transfer_last: Option<usize>,
>  }
>  
>  impl PullParameters {
> @@ -78,6 +81,7 @@ impl PullParameters {
>          max_depth: Option<usize>,
>          group_filter: Option<Vec<GroupFilter>>,
>          limit: RateLimitConfig,
> +        transfer_last: Option<usize>,
>      ) -> Result<Self, Error> {
>          let store = DataStore::lookup_datastore(store, Some(Operation::Write))?;
>  
> @@ -109,6 +113,7 @@ impl PullParameters {
>              max_depth,
>              group_filter,
>              limit,
> +            transfer_last,
>          })
>      }
>  
> @@ -573,7 +578,7 @@ impl std::fmt::Display for SkipInfo {
>      fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>          write!(
>              f,
> -            "skipped: {} snapshot(s) ({}) older than the newest local snapshot",
> +            "skipped: {} snapshot(s) ({}) - older than the newest local snapshot or excluded by transfer-last",

it would possibly be nicer to store a "reason" in SkipInfo, and count the
"already synced" and "excluded by transfer-last" snapshots separately (if only
so that an admin can verify that transfer-last has the desired effect).

>              self.count,
>              self.affected().map_err(|_| std::fmt::Error)?
>          )
> @@ -646,6 +651,11 @@ async fn pull_group(
>          count: 0,
>      };
>  
> +    let total_amount = list.len();
> +
> +    let mut transfer_amount = params.transfer_last.unwrap_or(total_amount);
> +    transfer_amount = min(transfer_amount, total_amount);

I'd prefer this to just calculate the cutoff for the check below, e.g.

let XXX = params
        .transfer_last
        .map(|count| total_amount.saturating_sub(count))
        .unwrap_or_default();

(or an equivalent match construct, or some variant of your unwrap+min construct
- doesn't really matter as long as it treats underflows correctly)

> +
>      for (pos, item) in list.into_iter().enumerate() {
>          let snapshot = item.backup;
>  
> @@ -668,6 +678,11 @@ async fn pull_group(
>              }
>          }
>  
> +        if pos < (total_amount - transfer_amount) {

because this screams "underflow" to me (even though that is checked above in the
code in your variant, there might be more stuff added in-between and once it's
no longer on the same screen this adds cognitive overhead ;)), whereas

if pos < cutoff {
    ..
}

makes it clear that there is no chance of a bad underflow occuring *here*

> +            skip_info.update(snapshot.time);
> +            continue;
> +        }
> +
>          // get updated auth_info (new tickets)
>          let auth_info = client.login().await?;


one more related thing that might have room for improvement:

the progress info counts snapshots skipped cause of transfer-last as "done", but
the order of logging is:
- print re-sync (if applicable)
- print pulled snapshot progress
- print info about skipped snapshots (if applicable)

it might be better (with SkipInfo split) to print
- print info about skipped < last_sync
- print re-sync (see below though)
- print info about skipped transfer-last
- print pull snapshot progress ("done" can now include skipped snapshots without
confusion)

e.g., an example of the status quo (one snapshot synced, tansfer-last = 2, a
total of 8 remote snapshots):

2023-01-17T12:49:46+01:00: percentage done: 33.33% (1/3 groups)
2023-01-17T12:49:46+01:00: skipped: 2 snapshot(s) (2022-12-13T11:09:43Z .. 2022-12-16T09:44:34Z) - older than the newest local snapshot or excluded by transfer-last
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z
2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob
2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx
2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s)
2023-01-17T12:49:46+01:00: got backup log file "client.log.blob"
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:09:55Z done
2023-01-17T12:49:46+01:00: percentage done: 62.50% (1/3 groups, 7/8 snapshots in group #2)
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z
2023-01-17T12:49:46+01:00: sync archive qemu-server.conf.blob
2023-01-17T12:49:46+01:00: sync archive drive-scsi0.img.fidx
2023-01-17T12:49:46+01:00: downloaded 0 bytes (0.00 MiB/s)
2023-01-17T12:49:46+01:00: got backup log file "client.log.blob"
2023-01-17T12:49:46+01:00: sync snapshot vm/104/2023-01-16T13:24:58Z done
2023-01-17T12:49:46+01:00: percentage done: 66.67% (2/3 groups)
2023-01-17T12:49:46+01:00: skipped: 6 snapshot(s) (2022-12-16T09:25:03Z ..
2023-01-12T09:43:15Z) - older than the newest local snapshot or excluded by transfer-last


if we revamp the progress display, it might also make sense to improve the
output for groups that are already 100% synced:

2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z
2023-01-17T12:49:46+01:00: no data changes
2023-01-17T12:49:46+01:00: re-sync snapshot vm/800/2023-01-16T12:28:10Z done
2023-01-17T12:49:46+01:00: percentage done: 100.00% (3/3 groups)

IMHO we could remove the "re-sync .. done" line (the next line will be a
progress line anyway, either for this snapshot or the whole group!), and instead
add an opening line ("sync group XXX..") that helps when scanning the log.





More information about the pbs-devel mailing list