[pbs-devel] [PATCH proxmox-backup v5 2/5] fix #3935: datastore/api/backup: add lock helpers to backup dir/group

Stefan Sterz s.sterz at proxmox.com
Tue Sep 13 14:28:01 CEST 2022


On 9/12/22 10:03, Thomas Lamprecht wrote:
> this one doesn't fixes #3935, it's rather preparation to make that simpler to
> fix, so there we normally just write something like "prepartion for/related to #3935"
> 

Noted!

> On 24/08/2022 14:48, Stefan Sterz wrote:
>> to avoid duplicate code, add helpers for locking groups and snapshots
>> to the BackupGroup and BackupDir traits respectively and refactor
>> existing code to use them.
> 
> we also loose quite a bit of specific error messages, fwict? That can be classified
> as semantic change if being strict, in any case loss of information.
> 

Looking through the code I found the following distinct error messages
that existed prior to this commit:

1. 3x "locked by another operation"
2. 2x "another backup is already running"
3. 2x "possibly running or in use"
4. 1x "base snapshot is already locked by another operation"
5. 1x "internal error - tried creating snapshot that's already in use"
6. 1x "possible running backup"

Imo most of them are probably just as helpful as the new error messages
(see next paragraph or the next patch in the series), basically just
stating that something else is already locking the group/snapshot. I can
see 4 and 5 being useful sometimes. Some may also be helpful just due to
being unique searchable strings. If you prefer me adding a parameter
that allows setting a custom error message, I would be happy to do so.

Note that the error messages are re-worded in the next patch of the
series to "unable to acquire shared snapshot lock" (or equivalent for
group locks).

> I'd appreciate listing the intended semantic changes in the commit message.
> 

I'll add "this also changes the error semantics of the refactored
functions as locking related error messages are now unified and, thus,
less specific." If we decide to not add a new parameter. Is that alright?

Please tell me what route you'd prefer and also if the commit message
re-wording necessitates me re-sending the entire series. Thanks!

>>
>> Signed-off-by: Stefan Sterz <s.sterz at proxmox.com>
>> ---
>>  pbs-datastore/src/backup_info.rs     | 31 ++++++++++---
>>  pbs-datastore/src/datastore.rs       | 68 +++++++++-------------------
>>  pbs-datastore/src/snapshot_reader.rs |  8 +---
>>  src/api2/backup/mod.rs               |  8 +---
>>  src/api2/reader/mod.rs               |  7 +--
>>  src/backup/verify.rs                 |  7 +--
>>  6 files changed, 52 insertions(+), 77 deletions(-)
>>
>> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> index c3ce7a1d..52d927ed 100644
>> --- a/pbs-datastore/src/backup_info.rs
>> +++ b/pbs-datastore/src/backup_info.rs
>> @@ -6,7 +6,9 @@ use std::sync::Arc;
>>  
>>  use anyhow::{bail, format_err, Error};
>>  
>> -use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>> +use proxmox_sys::fs::{
>> +    lock_dir_noblock, lock_dir_noblock_shared, replace_file, CreateOptions, DirLockGuard,
>> +};
>>  
>>  use pbs_api_types::{
>>      Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>> @@ -200,9 +202,8 @@ impl BackupGroup {
>>      ///
>>      /// Returns true if all snapshots were removed, and false if some were protected
>>      pub fn destroy(&self) -> Result<bool, Error> {
>> +        let _guard = self.lock()?;
>>          let path = self.full_group_path();
>> -        let _guard =
>> -            proxmox_sys::fs::lock_dir_noblock(&path, "backup group", "possible running backup")?;
>>  
>>          log::info!("removing backup group {:?}", path);
>>          let mut removed_all_snaps = true;
>> @@ -236,6 +237,15 @@ impl BackupGroup {
>>          self.store
>>              .set_owner(&self.ns, self.as_ref(), auth_id, force)
>>      }
>> +
>> +    /// Lock a group exclusively
>> +    pub fn lock(&self) -> Result<DirLockGuard, Error> {
>> +        lock_dir_noblock(
>> +            &self.full_group_path(),
>> +            "backup group",
>> +            "possible running backup",
>> +        )
>> +    }
>>  }
>>  
>>  impl AsRef<pbs_api_types::BackupNamespace> for BackupGroup {
>> @@ -439,15 +449,23 @@ impl BackupDir {
>>              .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
>>      }
>>  
>> +    /// Lock this snapshot exclusively
>> +    pub fn lock(&self) -> Result<DirLockGuard, Error> {
>> +        lock_dir_noblock(&self.full_path(), "snapshot", "possibly running or in use")
>> +    }
>> +
>> +    /// Acquire a shared lock on this snapshot
>> +    pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
>> +        lock_dir_noblock_shared(&self.full_path(), "snapshot", "possibly running or in use")
>> +    }
>> +
>>      /// Destroy the whole snapshot, bails if it's protected
>>      ///
>>      /// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
>>      pub fn destroy(&self, force: bool) -> Result<(), Error> {
>> -        let full_path = self.full_path();
>> -
>>          let (_guard, _manifest_guard);
>>          if !force {
>> -            _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
>> +            _guard = self.lock()?;
>>              _manifest_guard = self.lock_manifest()?;
>>          }
>>  
>> @@ -455,6 +473,7 @@ impl BackupDir {
>>              bail!("cannot remove protected snapshot"); // use special error type?
>>          }
>>  
>> +        let full_path = self.full_path();
>>          log::info!("removing backup snapshot {:?}", full_path);
>>          std::fs::remove_dir_all(&full_path).map_err(|err| {
>>              format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index a539ddc5..52a5f079 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -11,8 +11,8 @@ use nix::unistd::{unlinkat, UnlinkatFlags};
>>  
>>  use proxmox_schema::ApiType;
>>  
>> +use proxmox_sys::fs::DirLockGuard;
>>  use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
>> -use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
>>  use proxmox_sys::process_locker::ProcessLockSharedGuard;
>>  use proxmox_sys::WorkerTaskContext;
>>  use proxmox_sys::{task_log, task_warn};
>> @@ -630,41 +630,31 @@ impl DataStore {
>>      ///
>>      /// This also acquires an exclusive lock on the directory and returns the lock guard.
>>      pub fn create_locked_backup_group(
>> -        &self,
>> +        self: &Arc<Self>,
>>          ns: &BackupNamespace,
>>          backup_group: &pbs_api_types::BackupGroup,
>>          auth_id: &Authid,
>>      ) -> Result<(Authid, DirLockGuard), Error> {
>> -        // create intermediate path first:
>> -        let mut full_path = self.base_path();
>> -        for ns in ns.components() {
>> -            full_path.push("ns");
>> -            full_path.push(ns);
>> -        }
>> -        full_path.push(backup_group.ty.as_str());
>> -        std::fs::create_dir_all(&full_path)?;
>> +        let backup_group = self.backup_group(ns.clone(), backup_group.clone());
>>  
>> -        full_path.push(&backup_group.id);
>> +        // create intermediate path first
>> +        let full_path = backup_group.full_group_path();
>>  
>> -        // create the last component now
>> +        std::fs::create_dir_all(&full_path.parent().ok_or(format_err!(
>> +            "could not construct parent path for group {backup_group:?}"
>> +        ))?)?;
>> +
>> +        // now create the group, this allows us to check whether it existed before
>>          match std::fs::create_dir(&full_path) {
>>              Ok(_) => {
>> -                let guard = lock_dir_noblock(
>> -                    &full_path,
>> -                    "backup group",
>> -                    "another backup is already running",
>> -                )?;
>> -                self.set_owner(ns, backup_group, auth_id, false)?;
>> -                let owner = self.get_owner(ns, backup_group)?; // just to be sure
>> +                let guard = backup_group.lock()?;
>> +                self.set_owner(ns, backup_group.group(), auth_id, false)?;
>> +                let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
>>                  Ok((owner, guard))
>>              }
>>              Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {
>> -                let guard = lock_dir_noblock(
>> -                    &full_path,
>> -                    "backup group",
>> -                    "another backup is already running",
>> -                )?;
>> -                let owner = self.get_owner(ns, backup_group)?; // just to be sure
>> +                let guard = backup_group.lock()?;
>> +                let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
>>                  Ok((owner, guard))
>>              }
>>              Err(err) => bail!("unable to create backup group {:?} - {}", full_path, err),
>> @@ -675,29 +665,17 @@ impl DataStore {
>>      ///
>>      /// The BackupGroup directory needs to exist.
>>      pub fn create_locked_backup_dir(
>> -        &self,
>> +        self: &Arc<Self>,
>>          ns: &BackupNamespace,
>>          backup_dir: &pbs_api_types::BackupDir,
>>      ) -> Result<(PathBuf, bool, DirLockGuard), Error> {
>> -        let full_path = self.snapshot_path(ns, backup_dir);
>> -        let relative_path = full_path.strip_prefix(self.base_path()).map_err(|err| {
>> -            format_err!(
>> -                "failed to produce correct path for backup {backup_dir} in namespace {ns}: {err}"
>> -            )
>> -        })?;
>> -
>> -        let lock = || {
>> -            lock_dir_noblock(
>> -                &full_path,
>> -                "snapshot",
>> -                "internal error - tried creating snapshot that's already in use",
>> -            )
>> -        };
>> +        let backup_dir = self.backup_dir(ns.clone(), backup_dir.clone())?;
>> +        let relative_path = backup_dir.relative_path();
>>  
>> -        match std::fs::create_dir(&full_path) {
>> -            Ok(_) => Ok((relative_path.to_owned(), true, lock()?)),
>> +        match std::fs::create_dir(&backup_dir.full_path()) {
>> +            Ok(_) => Ok((relative_path, true, backup_dir.lock()?)),
>>              Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => {
>> -                Ok((relative_path.to_owned(), false, lock()?))
>> +                Ok((relative_path, false, backup_dir.lock()?))
>>              }
>>              Err(e) => Err(e.into()),
>>          }
>> @@ -1155,9 +1133,7 @@ impl DataStore {
>>  
>>      /// Updates the protection status of the specified snapshot.
>>      pub fn update_protection(&self, backup_dir: &BackupDir, protection: bool) -> Result<(), Error> {
>> -        let full_path = backup_dir.full_path();
>> -
>> -        let _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
> 
> "running" what?
>

You tell me :D that message has been in the code base longer than I've
been here. Supposedly it was meant to say something like "possible
another backups is running or the snapshot is use already". I agree it's
not ideal though, hence why I did rephrase it in the next patch of this
series. There I completely refactor the locking mechanism anyway, this,
as you set, was just in preparation to that. Hence, I wanted to change
as little as possible about the actual code.

>> +        let _guard = backup_dir.lock()?;
>>  
>>          let protected_path = backup_dir.protected_file();
>>          if protection {
>> diff --git a/pbs-datastore/src/snapshot_reader.rs b/pbs-datastore/src/snapshot_reader.rs
>> index 899c3bce..08b2b66e 100644
>> --- a/pbs-datastore/src/snapshot_reader.rs
>> +++ b/pbs-datastore/src/snapshot_reader.rs
>> @@ -6,8 +6,6 @@ use std::sync::Arc;
>>  use anyhow::{bail, Error};
>>  use nix::dir::Dir;
>>  
>> -use proxmox_sys::fs::lock_dir_noblock_shared;
>> -
>>  use pbs_api_types::{print_store_and_ns, BackupNamespace, Operation};
>>  
>>  use crate::backup_info::BackupDir;
>> @@ -39,10 +37,8 @@ impl SnapshotReader {
>>  
>>      pub(crate) fn new_do(snapshot: BackupDir) -> Result<Self, Error> {
>>          let datastore = snapshot.datastore();
>> -        let snapshot_path = snapshot.full_path();
>>  
>> -        let locked_dir =
>> -            lock_dir_noblock_shared(&snapshot_path, "snapshot", "locked by another operation")?;
>> +        let locked_dir = snapshot.lock_shared()?;
>>  
>>          let datastore_name = datastore.name().to_string();
>>          let manifest = match snapshot.load_manifest() {
>> @@ -57,7 +53,7 @@ impl SnapshotReader {
>>              }
>>          };
>>  
>> -        let mut client_log_path = snapshot_path;
>> +        let mut client_log_path = snapshot.full_path();
>>          client_log_path.push(CLIENT_LOG_BLOB_NAME);
>>  
>>          let mut file_list = vec![MANIFEST_BLOB_NAME.to_string()];
>> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
>> index e519a200..077d70f9 100644
>> --- a/src/api2/backup/mod.rs
>> +++ b/src/api2/backup/mod.rs
>> @@ -27,7 +27,6 @@ use pbs_datastore::manifest::{archive_type, ArchiveType};
>>  use pbs_datastore::{DataStore, PROXMOX_BACKUP_PROTOCOL_ID_V1};
>>  use pbs_tools::json::{required_array_param, required_integer_param, required_string_param};
>>  use proxmox_rest_server::{H2Service, WorkerTask};
>> -use proxmox_sys::fs::lock_dir_noblock_shared;
>>  
>>  mod environment;
>>  use environment::*;
>> @@ -181,12 +180,7 @@ fn upgrade_to_backup_protocol(
>>              }
>>  
>>              // lock last snapshot to prevent forgetting/pruning it during backup
>> -            let full_path = last.backup_dir.full_path();
>> -            Some(lock_dir_noblock_shared(
>> -                &full_path,
>> -                "snapshot",
>> -                "base snapshot is already locked by another operation",
>> -            )?)
>> +            Some(last.backup_dir.lock_shared()?)
>>          } else {
>>              None
>>          };
>> diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs
>> index e2a10da3..ea7c4aa7 100644
>> --- a/src/api2/reader/mod.rs
>> +++ b/src/api2/reader/mod.rs
>> @@ -27,7 +27,6 @@ use pbs_datastore::manifest::{archive_type, ArchiveType};
>>  use pbs_datastore::{DataStore, PROXMOX_BACKUP_READER_PROTOCOL_ID_V1};
>>  use pbs_tools::json::required_string_param;
>>  use proxmox_rest_server::{H2Service, WorkerTask};
>> -use proxmox_sys::fs::lock_dir_noblock_shared;
>>  
>>  use crate::api2::backup::optional_ns_param;
>>  use crate::api2::helpers;
>> @@ -125,11 +124,7 @@ fn upgrade_to_backup_reader_protocol(
>>              }
>>          }
>>  
>> -        let _guard = lock_dir_noblock_shared(
>> -            &backup_dir.full_path(),
>> -            "snapshot",
>> -            "locked by another operation",
>> -        )?;
>> +        let _guard = backup_dir.lock_shared()?;
>>  
>>          let path = datastore.base_path();
>>  
>> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
>> index 3984e28d..6c4acf3a 100644
>> --- a/src/backup/verify.rs
>> +++ b/src/backup/verify.rs
>> @@ -16,7 +16,6 @@ use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
>>  use pbs_datastore::index::IndexFile;
>>  use pbs_datastore::manifest::{archive_type, ArchiveType, BackupManifest, FileInfo};
>>  use pbs_datastore::{DataBlob, DataStore, StoreProgress};
>> -use proxmox_sys::fs::lock_dir_noblock_shared;
>>  
>>  use crate::tools::parallel_handler::ParallelHandler;
>>  
>> @@ -328,11 +327,7 @@ pub fn verify_backup_dir(
>>      upid: UPID,
>>      filter: Option<&dyn Fn(&BackupManifest) -> bool>,
>>  ) -> Result<bool, Error> {
>> -    let snap_lock = lock_dir_noblock_shared(
>> -        &backup_dir.full_path(),
>> -        "snapshot",
>> -        "locked by another operation",
>> -    );
>> +    let snap_lock = backup_dir.lock_shared();
>>      match snap_lock {
>>          Ok(snap_lock) => {
>>              verify_backup_dir_with_lock(verify_worker, backup_dir, upid, filter, snap_lock)
>





More information about the pbs-devel mailing list