[pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs

Gabriel Goller g.goller at proxmox.com
Thu Nov 21 11:04:07 CET 2024


On 20.11.2024 14:11, Fabian Grünbichler wrote:
>a few small nits inline, looks good to me otherwise, but given the size of this
>and the size of the push series, I'd rather this be rebased on top of the other
>one ;)

Sure, shouldn't be a lot of work. Should I send a rebased version on top
of the current push series as a v4?

>Quoting Gabriel Goller (2024-11-05 11:40:13)
>> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> index 414ec878d01a..e6174322dad6 100644
>> --- a/pbs-datastore/src/backup_info.rs
>> +++ b/pbs-datastore/src/backup_info.rs
>> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>>
>>  use pbs_api_types::{
>> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
>> +    BACKUP_FILE_REGEX,
>>  };
>>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> @@ -583,6 +584,15 @@ impl BackupDir {
>>
>>          Ok(())
>>      }
>> +
>> +    /// Load the verify state from the manifest.
>> +    pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse or load the manifest? that would allow
>us to resync totally corrupted snapshots as well (although that might be
>considered out of scope based on the parameter description ;))

Yep it was already like this in the first version, no idea why I changed
it. Like this we can return the load_manifest error with the Result and
swallow the inner error with a `ok()` as it doesn't matter anymore.

     pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
         let manifest = self.load_manifest()?;
         Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
     }


I think we also want to resync on errors when reading the manifest, I'll
include that in the next version! Something like this maybe:

     match local_dir.verify_state() {
         Ok(Some(state)) => {
             if state == VerifyState::Failed {
                 return Some((dir, true));
             }
         },
         Ok(None) => {
             // This means there either was an error parsing the manifest, or the 
             // verify_state item was not found. This could be a new backup.
         }
         Err(_) => {
             // There was an error loading the manifest, probably better if we
             // resync.
             return Some((dir, true));
         }
     }

>> +        if let Ok(manifest) = self.load_manifest() {
>> +            manifest.0.verify_state()
>> +        } else {
>> +            None
>> +        }
>> +    }
>>  }
>>
>>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
>> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
>> index c3df014272a0..623c1499c0bb 100644
>> --- a/pbs-datastore/src/manifest.rs
>> +++ b/pbs-datastore/src/manifest.rs
>> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
>>  use serde::{Deserialize, Serialize};
>>  use serde_json::{json, Value};
>>
>> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
>> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
>>  use pbs_tools::crypt_config::CryptConfig;
>>
>>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
>> @@ -242,6 +242,17 @@ impl BackupManifest {
>>          let manifest: BackupManifest = serde_json::from_value(json)?;
>>          Ok(manifest)
>>      }
>> +
>> +    /// Get the verify state of the snapshot
>> +    ///
>> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
>> +    /// function will return `None`.
>> +    pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse?

Hmm so I could return a Result<Option<..>> by checking the error of the
serde_json::from_value call. I could check if the "verify_state" value
wasn't found in the manifest by calling `is_eof` [0] and if not, return
a Ok(None), otherwise return an Error. This will make it more
complicated for all the callers though – also 99% of the callers will
treat Err the same as Ok(None) anyways. LTMK what you think!

     /// Get the verify state of the snapshot
     ///
     /// Note: New snapshots, which have not been verified yet, do not have a status and this
     /// function will return `Ok(None)`.
     pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
         let verify = self.unprotected["verify_state"].clone();
         match serde_json::from_value::<SnapshotVerifyState>(verify) {
             Err(err) => {
                 // `verify_state` item has not been found
                 if err.is_eof() {
                     Ok(None)
                 }else {
                     Err(err.into())
                 }
             },
             Ok(svs) => {
                 Ok(Some(svs))
             }
         }
     }


Else I could just return a Result<SnapshotVerifyState>.

[0]: https://docs.rs/serde_json/latest/serde_json/struct.Error.html#method.is_eof

>also, it would be great if existing code retrieving this could be adapted to
>use these new helpers, which would require having the Result there as well..

Yep, overlooked those, my bad.

>> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
>>          let mut path = snapshot.full_path();
>>          path.push(&item.filename);
>>
>> -        if path.exists() {
>> +        if !corrupt && path.exists() {
>>              match ArchiveType::from_path(&item.filename)? {
>>                  ArchiveType::DynamicIndex => {
>>                      let index = DynamicIndexReader::open(&path)?;
>> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
>>      reader: Arc<dyn SyncSourceReader + 'a>,
>>      snapshot: &'a pbs_datastore::BackupDir,
>>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
>> +    corrupt: bool,
>>  ) -> Result<SyncStats, Error> {
>>      let (_path, is_new, _snap_lock) = snapshot
>>          .datastore()
>> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
>>      let sync_stats = if is_new {
>
>is_new and corrupt are never both true..
>
>>          info!("sync snapshot {}", snapshot.dir());
>>
>> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
>> +        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
>
>so this should be always false ;)

Agree, wrote a comment and passed directly `false`.

>>              Err(err) => {
>>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>>                      snapshot.backup_ns(),
>> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
>>              }
>>          }
>>      } else {
>> -        info!("re-sync snapshot {}", snapshot.dir());
>> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
>> +        if corrupt {
>> +            info!(
>> +                "re-sync snapshot {} due to bad verification result",
>
>nit: why not call it "corrupt", since that is what the parameter is called?

ack

>> +                snapshot.dir()
>> +            );
>> +        } else {
>> +            info!("re-sync snapshot {}", snapshot.dir());
>> +        }
>> +        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
>>      };
>>
>>      Ok(sync_stats)




More information about the pbs-devel mailing list