[pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Nov 21 11:09:53 CET 2024


> Gabriel Goller <g.goller at proxmox.com> hat am 21.11.2024 11:04 CET geschrieben:
> 
>  
> On 20.11.2024 14:11, Fabian Grünbichler wrote:
> >a few small nits inline, looks good to me otherwise, but given the size of this
> >and the size of the push series, I'd rather this be rebased on top of the other
> >one ;)
> 
> Sure, shouldn't be a lot of work. Should I send a rebased version on top
> of the current push series as a v4?

yes, but please wait until it's applied (there have been a few changes queued on-top where I am not sure whether they might cause more conflicts ;))

> 
> >Quoting Gabriel Goller (2024-11-05 11:40:13)
> >> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> >> index 414ec878d01a..e6174322dad6 100644
> >> --- a/pbs-datastore/src/backup_info.rs
> >> +++ b/pbs-datastore/src/backup_info.rs
> >> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
> >>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
> >>
> >>  use pbs_api_types::{
> >> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
> >> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
> >> +    BACKUP_FILE_REGEX,
> >>  };
> >>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
> >>
> >> @@ -583,6 +584,15 @@ impl BackupDir {
> >>
> >>          Ok(())
> >>      }
> >> +
> >> +    /// Load the verify state from the manifest.
> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
> >
> >should this be a Result<Option<..>> to allow differentiation between no
> >verification state, and failure to parse or load the manifest? that would allow
> >us to resync totally corrupted snapshots as well (although that might be
> >considered out of scope based on the parameter description ;))
> 
> Yep it was already like this in the first version, no idea why I changed
> it. Like this we can return the load_manifest error with the Result and
> swallow the inner error with a `ok()` as it doesn't matter anymore.
> 
>      pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
>          let manifest = self.load_manifest()?;
>          Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
>      }
> 
> 
> I think we also want to resync on errors when reading the manifest, I'll
> include that in the next version! Something like this maybe:
> 
>      match local_dir.verify_state() {
>          Ok(Some(state)) => {
>              if state == VerifyState::Failed {
>                  return Some((dir, true));
>              }
>          },
>          Ok(None) => {
>              // This means there either was an error parsing the manifest, or the 
>              // verify_state item was not found. This could be a new backup.

IMHO this should only be reached if no verification state is in the manifest (because no verification has happened yet), but the manifest was otherwise completely parseable. this can be treated the same as an okay verify state, since we can't know any better.

>          }
>          Err(_) => {
>              // There was an error loading the manifest, probably better if we
>              // resync.
>              return Some((dir, true));
>          }
>      }
> 
> >> +        if let Ok(manifest) = self.load_manifest() {
> >> +            manifest.0.verify_state()
> >> +        } else {
> >> +            None
> >> +        }
> >> +    }
> >>  }
> >>
> >>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
> >> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
> >> index c3df014272a0..623c1499c0bb 100644
> >> --- a/pbs-datastore/src/manifest.rs
> >> +++ b/pbs-datastore/src/manifest.rs
> >> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
> >>  use serde::{Deserialize, Serialize};
> >>  use serde_json::{json, Value};
> >>
> >> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
> >> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
> >>  use pbs_tools::crypt_config::CryptConfig;
> >>
> >>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
> >> @@ -242,6 +242,17 @@ impl BackupManifest {
> >>          let manifest: BackupManifest = serde_json::from_value(json)?;
> >>          Ok(manifest)
> >>      }
> >> +
> >> +    /// Get the verify state of the snapshot
> >> +    ///
> >> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
> >> +    /// function will return `None`.
> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
> >
> >should this be a Result<Option<..>> to allow differentiation between no
> >verification state, and failure to parse?
> 
> Hmm so I could return a Result<Option<..>> by checking the error of the
> serde_json::from_value call. I could check if the "verify_state" value
> wasn't found in the manifest by calling `is_eof` [0] and if not, return
> a Ok(None), otherwise return an Error. This will make it more
> complicated for all the callers though – also 99% of the callers will
> treat Err the same as Ok(None) anyways. LTMK what you think!
> 
>      /// Get the verify state of the snapshot
>      ///
>      /// Note: New snapshots, which have not been verified yet, do not have a status and this
>      /// function will return `Ok(None)`.
>      pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
>          let verify = self.unprotected["verify_state"].clone();

can't you just check here whether we have a value and return None otherwise?

>          match serde_json::from_value::<SnapshotVerifyState>(verify) {
>              Err(err) => {

then this can just bubble up the error?

>                  // `verify_state` item has not been found
>                  if err.is_eof() {
>                      Ok(None)
>                  }else {
>                      Err(err.into())
>                  }
>              },
>              Ok(svs) => {
>                  Ok(Some(svs))
>              }
>          }
>      }
> 
> 
> Else I could just return a Result<SnapshotVerifyState>.

I think differentiating between Ok(Some(state)), Ok(None) and Err(err) is important here, so I'd rather not do that ;)

> 
> [0]: https://docs.rs/serde_json/latest/serde_json/struct.Error.html#method.is_eof
> 
> >also, it would be great if existing code retrieving this could be adapted to
> >use these new helpers, which would require having the Result there as well..
> 
> Yep, overlooked those, my bad.
> 
> >> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
> >>          let mut path = snapshot.full_path();
> >>          path.push(&item.filename);
> >>
> >> -        if path.exists() {
> >> +        if !corrupt && path.exists() {
> >>              match ArchiveType::from_path(&item.filename)? {
> >>                  ArchiveType::DynamicIndex => {
> >>                      let index = DynamicIndexReader::open(&path)?;
> >> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
> >>      reader: Arc<dyn SyncSourceReader + 'a>,
> >>      snapshot: &'a pbs_datastore::BackupDir,
> >>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> >> +    corrupt: bool,
> >>  ) -> Result<SyncStats, Error> {
> >>      let (_path, is_new, _snap_lock) = snapshot
> >>          .datastore()
> >> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
> >>      let sync_stats = if is_new {
> >
> >is_new and corrupt are never both true..
> >
> >>          info!("sync snapshot {}", snapshot.dir());
> >>
> >> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
> >> +        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
> >
> >so this should be always false ;)
> 
> Agree, wrote a comment and passed directly `false`.
> 
> >>              Err(err) => {
> >>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
> >>                      snapshot.backup_ns(),
> >> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
> >>              }
> >>          }
> >>      } else {
> >> -        info!("re-sync snapshot {}", snapshot.dir());
> >> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
> >> +        if corrupt {
> >> +            info!(
> >> +                "re-sync snapshot {} due to bad verification result",
> >
> >nit: why not call it "corrupt", since that is what the parameter is called?
> 
> ack
> 
> >> +                snapshot.dir()
> >> +            );
> >> +        } else {
> >> +            info!("re-sync snapshot {}", snapshot.dir());
> >> +        }
> >> +        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
> >>      };
> >>
> >>      Ok(sync_stats)




More information about the pbs-devel mailing list