[pbs-devel] [PATCH proxmox-backup] verify: handle manifest update errors as non-fatal
Christian Ebner
c.ebner at proxmox.com
Wed Jan 29 09:15:21 CET 2025
On 1/28/25 13:43, Gabriel Goller wrote:
> On 28.01.2025 12:47, Christian Ebner wrote:
>> Since commit 8ea00f6e ("allow to abort verify jobs") errors
>> propagated up to the verify jobs worker call side are interpreted as
>> job aborts.
>>
>> The manifest update did not honor this, leading to the verify job
>> being aborted with the misleading log entry:
>> `verification failed - job aborted`
>>
>> Instead, handle the manifest update error non-fatal just like any
>> other verification related error, log it including the error message
>> and continue verification with the next item.
>>
>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>> ---
>> src/backup/verify.rs | 18 +++++++++++++-----
>> 1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
>> index 840a37859..02478b165 100644
>> --- a/src/backup/verify.rs
>> +++ b/src/backup/verify.rs
>> @@ -3,7 +3,7 @@ use std::sync::atomic::{AtomicUsize, Ordering};
>> use std::sync::{Arc, Mutex};
>> use std::time::Instant;
>>
>> -use anyhow::{bail, format_err, Error};
>> +use anyhow::{bail, Error};
>> use nix::dir::Dir;
>> use tracing::{error, info, warn};
>>
>> @@ -399,12 +399,20 @@ pub fn verify_backup_dir_with_lock(
>> state: verify_result,
>> upid,
>> };
>> - let verify_state = serde_json::to_value(verify_state)?;
>> - backup_dir
>> - .update_manifest(|manifest| {
>> +
>> + if let Err(err) = {
>> + let verify_state = serde_json::to_value(verify_state)?;
>> + backup_dir.update_manifest(|manifest| {
>> manifest.unprotected["verify_state"] = verify_state;
>> })
>> - .map_err(|err| format_err!("unable to update manifest blob -
>> {}", err))?;
>> + } {
>> + info!(
>> + "verify {}:{} - manifest update error: {err}",
>> + verify_worker.datastore.name(),
>> + backup_dir.dir(),
>> + );
>
> Is there any reason for not using tracing::error? This would be nice to
> find in the syslog as well. Also using "{err:#}" would print the whole
> error chain/context.
The reason I used `info` over `error` here is that all other errors for
verification are logged the same way, so I would argue that logging only
this one case as error is not correct. If we however decide for using
the `error` over the `info`, I think this should be adapted for all
verification errors being logged, to be consistent.
Regarding error formatting, as the errors by update_manifest do not add
error context, this has no effect currently as far as I see, but you are
right in that this would be more future proof.
Should I send a v2 for that?
More information about the pbs-devel
mailing list