[pbs-devel] applied: [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
Fabian Grünbichler
f.gruenbichler at proxmox.com
Fri Nov 22 10:36:19 CET 2024
Quoting Christian Ebner (2024-10-08 11:46:17)
> Known chunks are expected to be present on the datastore a-priori,
> allowing clients to only re-index these chunks without uploading the
> raw chunk data. The list of reusable known chunks is send to the
> client by the server, deduced from the indexed chunks of the previous
> backup snapshot of the group.
>
> If however such a known chunk disappeared (the previous backup
> snapshot having been verified before that or not verified just yet),
> the backup will finish just fine, leading to a seemingly successful
> backup. Only a subsequent verification job will detect the backup
> snapshot as being corrupt.
>
> In order to reduce the impact, stat the list of previously known
> chunks when finishing the backup. If a missing chunk is detected, the
> backup run itself will fail and the previous backup snapshots verify
> state is set to failed.
> This prevents the same snapshot from being reused by another,
> subsequent backup job.
>
> Note:
> The current backup run might have been just fine, if the now missing
> known chunk is not indexed. But since there is no straight forward
> way to detect which known chunks have not been reused in the fast
> incremental mode for fixed index backups, the backup run is
> considered failed.
>
> link to issue in bugtracker:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5710
>
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> Tested-by: Gabriel Goller <g.goller at proxmox.com>
> Reviewed-by: Gabriel Goller <g.goller at proxmox.com>
> ---
> Changes since version 3, thanks to Gabriel for additional comments:
> - Use anyhow error context also for manifest update error
> - Use `with_context` over mapping the error, which is more concise
>
> Changes since version 2, thanks to Gabriel for testing and review:
> - Use and display anyhow error context
> - s/backp/backup/
>
> Changes since version 1, thanks to Dietmar and Gabriel for feedback:
> - Only stat on backup finish
> - Distinguish newly uploaded from previously known chunks, to be able
> to only stat the latter.
>
> New test on my side show a performance degradation of ~2% for the VM
> backup and about ~10% for the LXC backup as compared to an unpatched
> server.
> In contrast to version 1 of the patches the PBS datastore this time
> was located on an NFS share backed by an NVME SSD.
>
> I did perform vzdump backups of a VM with a 32G disk attached and a
> LXC container with a Debian install and rootfs of ca. 400M (both off,
> no changes in data in-between backup runs).
> Again performed 5 runs each after an initial run to assure full chunk
> presence on server and valid previous snapshot.
>
> Here the updated figures:
>
> -----------------------------------------------------------
> patched | unpatched
> -----------------------------------------------------------
> VM | LXC | VM | LXC
> -----------------------------------------------------------
> 14.0s ± 0.8s | 2.2s ± 0.1s | 13.7s ± 0.5s | 2.0s ± 0.03s
> -----------------------------------------------------------
>
> src/api2/backup/environment.rs | 54 +++++++++++++++++++++++++++++-----
> src/api2/backup/mod.rs | 22 +++++++++++++-
> 2 files changed, 68 insertions(+), 8 deletions(-)
>
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 99d885e2e..19624fae3 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -1,4 +1,4 @@
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
> use nix::dir::Dir;
> use std::collections::HashMap;
> use std::sync::{Arc, Mutex};
> @@ -72,8 +72,14 @@ struct FixedWriterState {
> incremental: bool,
> }
>
> -// key=digest, value=length
> -type KnownChunksMap = HashMap<[u8; 32], u32>;
> +#[derive(Copy, Clone)]
> +struct KnownChunkInfo {
> + uploaded: bool,
> + length: u32,
> +}
> +
> +// key=digest, value=KnownChunkInfo
> +type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
>
> struct SharedBackupState {
> finished: bool,
> @@ -159,7 +165,13 @@ impl BackupEnvironment {
>
> state.ensure_unfinished()?;
>
> - state.known_chunks.insert(digest, length);
> + state.known_chunks.insert(
> + digest,
> + KnownChunkInfo {
> + uploaded: false,
> + length,
> + },
> + );
>
> Ok(())
> }
> @@ -213,7 +225,13 @@ impl BackupEnvironment {
> }
>
> // register chunk
> - state.known_chunks.insert(digest, size);
> + state.known_chunks.insert(
> + digest,
> + KnownChunkInfo {
> + uploaded: true,
> + length: size,
> + },
> + );
>
> Ok(())
> }
> @@ -248,7 +266,13 @@ impl BackupEnvironment {
> }
>
> // register chunk
> - state.known_chunks.insert(digest, size);
> + state.known_chunks.insert(
> + digest,
> + KnownChunkInfo {
> + uploaded: true,
> + length: size,
> + },
> + );
>
> Ok(())
> }
> @@ -256,7 +280,23 @@ impl BackupEnvironment {
> pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
> let state = self.state.lock().unwrap();
>
> - state.known_chunks.get(digest).copied()
> + state
> + .known_chunks
> + .get(digest)
> + .map(|known_chunk_info| known_chunk_info.length)
> + }
> +
> + /// stat known chunks from previous backup, so excluding newly uploaded ones
> + pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
> + let state = self.state.lock().unwrap();
> + for (digest, known_chunk_info) in &state.known_chunks {
> + if !known_chunk_info.uploaded {
> + self.datastore
> + .stat_chunk(digest)
> + .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
> + }
> + }
> + Ok(())
> }
>
> /// Store the writer with an unique ID
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index ea0d0292e..63c49f653 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -1,6 +1,6 @@
> //! Backup protocol (HTTP2 upgrade)
>
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
> use futures::*;
> use hex::FromHex;
> use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
> @@ -785,6 +785,26 @@ fn finish_backup(
> ) -> Result<Value, Error> {
> let env: &BackupEnvironment = rpcenv.as_ref();
>
> + if let Err(err) = env.stat_prev_known_chunks() {
> + env.debug(format!("stat registered chunks failed - {err:?}"));
> +
> + if let Some(last) = env.last_backup.as_ref() {
> + // No need to acquire snapshot lock, already locked when starting the backup
> + let verify_state = SnapshotVerifyState {
> + state: VerifyState::Failed,
> + upid: env.worker.upid().clone(), // backup writer UPID
> + };
> + let verify_state = serde_json::to_value(verify_state)?;
> + last.backup_dir
> + .update_manifest(|manifest| {
> + manifest.unprotected["verify_state"] = verify_state;
> + })
> + .with_context(|| "manifest update failed")?;
> + }
> +
> + bail!("stat known chunks failed - {err:?}");
> + }
> +
> env.finish_backup()?;
> env.log("successfully finished backup");
>
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
More information about the pbs-devel
mailing list