[pbs-devel] [RFC proxmox-backup] drop exclusive lock for verify-after-complete

Fabian Grünbichler f.gruenbichler at proxmox.com
Mon Feb 27 10:50:12 CET 2023


the backup is finished at that point, the only lock clash that is possible when
dropping the exclusive and attempting to obtain a shared lock would be

- the snapshot is pruned/removed
- the backup is in a pre-upgrade process, and the post-upgrade process opens a reader

the first case is OK, if the other invocation wins the race and removes the
snapshot verification is pointless anyway.

the second case means the snapshot is not verified directly after completion
(this fact would be logged in the backup task log), but usable immediately for
pulling/restoring/..

this should decrease the chances of triggering the issues described in #4523

Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
---

Notes:
    right now our locking helpers don't support a direct downgrade (or attempt to
    upgrade, for that matter). given that we don't have many use cases that require
    either, I am not sure whether it's worth it to include that option in the
    planned revamp by Stefan (Sterz).
    
    for fully fixing #4523, we'd also need to improve our "snapshot is still being
    created" heuristics as described in the comment there. this would entail
    writing and removing some sort of marker in the backup session (and all other
    code paths that create snapshot dirs, like pull/sync, tape restore, ..) and
    checking that when listing snapshots, similar to protection status. this is
    mainly relevant for systems that use syncfs for ensuring datastore
    consistency.
    
    Tested by doing a backup with big delta and verify-after-complete set while
    doing pulls in a loop. The window where the snapshot was no longer consider
    in-progress but still exclusively locked still exists, but got much smaller.

 src/api2/backup/environment.rs | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 4f07f9b4..5291bce8 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -7,7 +7,7 @@ use ::serde::Serialize;
 use serde_json::{json, Value};
 
 use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
-use proxmox_sys::fs::{replace_file, CreateOptions};
+use proxmox_sys::fs::{lock_dir_noblock_shared, replace_file, CreateOptions};
 
 use pbs_api_types::Authid;
 use pbs_datastore::backup_info::{BackupDir, BackupInfo};
@@ -634,7 +634,7 @@ impl BackupEnvironment {
     /// If verify-new is set on the datastore, this will run a new verify task
     /// for the backup. If not, this will return and also drop the passed lock
     /// immediately.
-    pub fn verify_after_complete(&self, snap_lock: Dir) -> Result<(), Error> {
+    pub fn verify_after_complete(&self, excl_snap_lock: Dir) -> Result<(), Error> {
         self.ensure_finished()?;
 
         if !self.datastore.verify_new() {
@@ -642,6 +642,14 @@ impl BackupEnvironment {
             return Ok(());
         }
 
+        // Downgrade to shared lock, the backup itself is finished
+        drop(excl_snap_lock);
+        let snap_lock = lock_dir_noblock_shared(
+            &self.backup_dir.full_path(),
+            "snapshot",
+            "snapshot is already locked by another operation",
+        )?;
+
         let worker_id = format!(
             "{}:{}/{}/{:08X}",
             self.datastore.name(),
-- 
2.30.2






More information about the pbs-devel mailing list