[pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
Gabriel Goller
g.goller at proxmox.com
Thu Nov 21 11:04:07 CET 2024
On 20.11.2024 14:11, Fabian Grünbichler wrote:
>a few small nits inline, looks good to me otherwise, but given the size of this
>and the size of the push series, I'd rather this be rebased on top of the other
>one ;)
Sure, shouldn't be a lot of work. Should I send a rebased version on top
of the current push series as a v4?
>Quoting Gabriel Goller (2024-11-05 11:40:13)
>> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> index 414ec878d01a..e6174322dad6 100644
>> --- a/pbs-datastore/src/backup_info.rs
>> +++ b/pbs-datastore/src/backup_info.rs
>> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>> use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>>
>> use pbs_api_types::{
>> - Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>> + Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
>> + BACKUP_FILE_REGEX,
>> };
>> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> @@ -583,6 +584,15 @@ impl BackupDir {
>>
>> Ok(())
>> }
>> +
>> + /// Load the verify state from the manifest.
>> + pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse or load the manifest? that would allow
>us to resync totally corrupted snapshots as well (although that might be
>considered out of scope based on the parameter description ;))
Yep it was already like this in the first version, no idea why I changed
it. Like this we can return the load_manifest error with the Result and
swallow the inner error with a `ok()` as it doesn't matter anymore.
pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
let manifest = self.load_manifest()?;
Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
}
I think we also want to resync on errors when reading the manifest, I'll
include that in the next version! Something like this maybe:
match local_dir.verify_state() {
Ok(Some(state)) => {
if state == VerifyState::Failed {
return Some((dir, true));
}
},
Ok(None) => {
// This means there either was an error parsing the manifest, or the
// verify_state item was not found. This could be a new backup.
}
Err(_) => {
// There was an error loading the manifest, probably better if we
// resync.
return Some((dir, true));
}
}
>> + if let Ok(manifest) = self.load_manifest() {
>> + manifest.0.verify_state()
>> + } else {
>> + None
>> + }
>> + }
>> }
>>
>> impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
>> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
>> index c3df014272a0..623c1499c0bb 100644
>> --- a/pbs-datastore/src/manifest.rs
>> +++ b/pbs-datastore/src/manifest.rs
>> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
>> use serde::{Deserialize, Serialize};
>> use serde_json::{json, Value};
>>
>> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
>> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
>> use pbs_tools::crypt_config::CryptConfig;
>>
>> pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
>> @@ -242,6 +242,17 @@ impl BackupManifest {
>> let manifest: BackupManifest = serde_json::from_value(json)?;
>> Ok(manifest)
>> }
>> +
>> + /// Get the verify state of the snapshot
>> + ///
>> + /// Note: New snapshots, which have not been verified yet, do not have a status and this
>> + /// function will return `None`.
>> + pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse?
Hmm so I could return a Result<Option<..>> by checking the error of the
serde_json::from_value call. I could check if the "verify_state" value
wasn't found in the manifest by calling `is_eof` [0] and if not, return
a Ok(None), otherwise return an Error. This will make it more
complicated for all the callers though – also 99% of the callers will
treat Err the same as Ok(None) anyways. LTMK what you think!
/// Get the verify state of the snapshot
///
/// Note: New snapshots, which have not been verified yet, do not have a status and this
/// function will return `Ok(None)`.
pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
let verify = self.unprotected["verify_state"].clone();
match serde_json::from_value::<SnapshotVerifyState>(verify) {
Err(err) => {
// `verify_state` item has not been found
if err.is_eof() {
Ok(None)
}else {
Err(err.into())
}
},
Ok(svs) => {
Ok(Some(svs))
}
}
}
Else I could just return a Result<SnapshotVerifyState>.
[0]: https://docs.rs/serde_json/latest/serde_json/struct.Error.html#method.is_eof
>also, it would be great if existing code retrieving this could be adapted to
>use these new helpers, which would require having the Result there as well..
Yep, overlooked those, my bad.
>> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
>> let mut path = snapshot.full_path();
>> path.push(&item.filename);
>>
>> - if path.exists() {
>> + if !corrupt && path.exists() {
>> match ArchiveType::from_path(&item.filename)? {
>> ArchiveType::DynamicIndex => {
>> let index = DynamicIndexReader::open(&path)?;
>> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
>> reader: Arc<dyn SyncSourceReader + 'a>,
>> snapshot: &'a pbs_datastore::BackupDir,
>> downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
>> + corrupt: bool,
>> ) -> Result<SyncStats, Error> {
>> let (_path, is_new, _snap_lock) = snapshot
>> .datastore()
>> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
>> let sync_stats = if is_new {
>
>is_new and corrupt are never both true..
>
>> info!("sync snapshot {}", snapshot.dir());
>>
>> - match pull_snapshot(reader, snapshot, downloaded_chunks).await {
>> + match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
>
>so this should be always false ;)
Agree, wrote a comment and passed directly `false`.
>> Err(err) => {
>> if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>> snapshot.backup_ns(),
>> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
>> }
>> }
>> } else {
>> - info!("re-sync snapshot {}", snapshot.dir());
>> - pull_snapshot(reader, snapshot, downloaded_chunks).await?
>> + if corrupt {
>> + info!(
>> + "re-sync snapshot {} due to bad verification result",
>
>nit: why not call it "corrupt", since that is what the parameter is called?
ack
>> + snapshot.dir()
>> + );
>> + } else {
>> + info!("re-sync snapshot {}", snapshot.dir());
>> + }
>> + pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
>> };
>>
>> Ok(sync_stats)
More information about the pbs-devel
mailing list