[pbs-devel] applied: [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Nov 22 10:36:19 CET 2024


Quoting Christian Ebner (2024-10-08 11:46:17)
> Known chunks are expected to be present on the datastore a-priori,
> allowing clients to only re-index these chunks without uploading the
> raw chunk data. The list of reusable known chunks is send to the
> client by the server, deduced from the indexed chunks of the previous
> backup snapshot of the group.
> 
> If however such a known chunk disappeared (the previous backup
> snapshot having been verified before that or not verified just yet),
> the backup will finish just fine, leading to a seemingly successful
> backup. Only a subsequent verification job will detect the backup
> snapshot as being corrupt.
> 
> In order to reduce the impact, stat the list of previously known
> chunks when finishing the backup. If a missing chunk is detected, the
> backup run itself will fail and the previous backup snapshots verify
> state is set to failed.
> This prevents the same snapshot from being reused by another,
> subsequent backup job.
> 
> Note:
> The current backup run might have been just fine, if the now missing
> known chunk is not indexed. But since there is no straight forward
> way to detect which known chunks have not been reused in the fast
> incremental mode for fixed index backups, the backup run is
> considered failed.
> 
> link to issue in bugtracker:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5710
> 
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> Tested-by: Gabriel Goller <g.goller at proxmox.com>
> Reviewed-by: Gabriel Goller <g.goller at proxmox.com>
> ---
> Changes since version 3, thanks to Gabriel for additional comments:
> - Use anyhow error context also for manifest update error
> - Use `with_context` over mapping the error, which is more concise
> 
> Changes since version 2, thanks to Gabriel for testing and review:
> - Use and display anyhow error context
> - s/backp/backup/
> 
> Changes since version 1, thanks to Dietmar and Gabriel for feedback:
> - Only stat on backup finish
> - Distinguish newly uploaded from previously known chunks, to be able
>   to only stat the latter.
> 
> New test on my side show a performance degradation of ~2% for the VM
> backup and about ~10% for the LXC backup as compared to an unpatched
> server.
> In contrast to version 1 of the patches the PBS datastore this time
> was located on an NFS share backed by an NVME SSD.
> 
> I did perform vzdump backups of a VM with a 32G disk attached and a
> LXC container with a Debian install and rootfs of ca. 400M (both off,
> no changes in data in-between backup runs).
> Again performed 5 runs each after an initial run to assure full chunk
> presence on server and valid previous snapshot.
> 
> Here the updated figures:
> 
> -----------------------------------------------------------
> patched                    | unpatched
> -----------------------------------------------------------
> VM           | LXC         | VM           | LXC
> -----------------------------------------------------------
> 14.0s ± 0.8s | 2.2s ± 0.1s | 13.7s ± 0.5s | 2.0s ± 0.03s
> -----------------------------------------------------------
> 
>  src/api2/backup/environment.rs | 54 +++++++++++++++++++++++++++++-----
>  src/api2/backup/mod.rs         | 22 +++++++++++++-
>  2 files changed, 68 insertions(+), 8 deletions(-)
> 
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 99d885e2e..19624fae3 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -1,4 +1,4 @@
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use nix::dir::Dir;
>  use std::collections::HashMap;
>  use std::sync::{Arc, Mutex};
> @@ -72,8 +72,14 @@ struct FixedWriterState {
>      incremental: bool,
>  }
>  
> -// key=digest, value=length
> -type KnownChunksMap = HashMap<[u8; 32], u32>;
> +#[derive(Copy, Clone)]
> +struct KnownChunkInfo {
> +    uploaded: bool,
> +    length: u32,
> +}
> +
> +// key=digest, value=KnownChunkInfo
> +type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
>  
>  struct SharedBackupState {
>      finished: bool,
> @@ -159,7 +165,13 @@ impl BackupEnvironment {
>  
>          state.ensure_unfinished()?;
>  
> -        state.known_chunks.insert(digest, length);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: false,
> +                length,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -213,7 +225,13 @@ impl BackupEnvironment {
>          }
>  
>          // register chunk
> -        state.known_chunks.insert(digest, size);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: true,
> +                length: size,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -248,7 +266,13 @@ impl BackupEnvironment {
>          }
>  
>          // register chunk
> -        state.known_chunks.insert(digest, size);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: true,
> +                length: size,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -256,7 +280,23 @@ impl BackupEnvironment {
>      pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
>          let state = self.state.lock().unwrap();
>  
> -        state.known_chunks.get(digest).copied()
> +        state
> +            .known_chunks
> +            .get(digest)
> +            .map(|known_chunk_info| known_chunk_info.length)
> +    }
> +
> +    /// stat known chunks from previous backup, so excluding newly uploaded ones
> +    pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
> +        let state = self.state.lock().unwrap();
> +        for (digest, known_chunk_info) in &state.known_chunks {
> +            if !known_chunk_info.uploaded {
> +                self.datastore
> +                    .stat_chunk(digest)
> +                    .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
> +            }
> +        }
> +        Ok(())
>      }
>  
>      /// Store the writer with an unique ID
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index ea0d0292e..63c49f653 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -1,6 +1,6 @@
>  //! Backup protocol (HTTP2 upgrade)
>  
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use futures::*;
>  use hex::FromHex;
>  use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
> @@ -785,6 +785,26 @@ fn finish_backup(
>  ) -> Result<Value, Error> {
>      let env: &BackupEnvironment = rpcenv.as_ref();
>  
> +    if let Err(err) = env.stat_prev_known_chunks() {
> +        env.debug(format!("stat registered chunks failed - {err:?}"));
> +
> +        if let Some(last) = env.last_backup.as_ref() {
> +            // No need to acquire snapshot lock, already locked when starting the backup
> +            let verify_state = SnapshotVerifyState {
> +                state: VerifyState::Failed,
> +                upid: env.worker.upid().clone(), // backup writer UPID
> +            };
> +            let verify_state = serde_json::to_value(verify_state)?;
> +            last.backup_dir
> +                .update_manifest(|manifest| {
> +                    manifest.unprotected["verify_state"] = verify_state;
> +                })
> +                .with_context(|| "manifest update failed")?;
> +        }
> +
> +        bail!("stat known chunks failed - {err:?}");
> +    }
> +
>      env.finish_backup()?;
>      env.log("successfully finished backup");
>  
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




More information about the pbs-devel mailing list