[pbs-devel] [PATCH proxmox-backup v8 1/4] datastore/api/backup: prepare for fix of #3935 by adding lock helpers

Christian Ebner c.ebner at proxmox.com
Tue Mar 25 11:12:33 CET 2025


On 3/25/25 10:57, Shannon Sterz wrote:
> On Tue Mar 25, 2025 at 10:35 AM CET, Christian Ebner wrote:
>> some small nits inline
>>
>> On 3/24/25 13:51, Shannon Sterz wrote:
>>> to avoid duplicate code, add helpers for locking groups and snapshots
>>> to the BackupGroup and BackupDir traits respectively and refactor
>>> existing code to use them.
>>>
>>> this also adapts error handling by adding relevant context to each
>>> locking helper call site. otherwise, we might loose valuable
>>> information useful for debugging. note, however, that users that
>>> relied on specific error messages will break.
>>>
>>> Signed-off-by: Shannon Sterz <s.sterz at proxmox.com>
>>> ---
> 
> -->8 snip 8<--
> 
>>> +
>>> +    /// Lock a group exclusively
>>> +    pub fn lock(&self) -> Result<DirLockGuard, Error> {
>>> +        lock_dir_noblock(
>>> +            &self.full_group_path(),
>>
>> this allocates the group path...
>>
>>> +            "backup group",
>>> +            "possible running backup",
>>> +        )
>>> +    }
>>>    }
>>>
>>>    impl AsRef<pbs_api_types::BackupNamespace> for BackupGroup {
>>> @@ -442,15 +454,33 @@ impl BackupDir {
>>>                .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
>>>        }
>>>
>>> +    /// Lock this snapshot exclusively
>>> +    pub fn lock(&self) -> Result<DirLockGuard, Error> {
>>> +        lock_dir_noblock(
>>> +            &self.full_path(),
>>
>> ... this allocates the snapshot path ...
>>
>>> +            "snapshot",
>>> +            "backup is running or snapshot is in use",
>>> +        )
>>> +    }
>>> +
>>> +    /// Acquire a shared lock on this snapshot
>>> +    pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
>>> +        lock_dir_noblock_shared(
>>> +            &self.full_path(),
>>> +            "snapshot",
>>> +            "backup is running or snapshot is in use, could not acquire shared lock",
>>> +        )
>>> +    }
>>> +
>>>        /// Destroy the whole snapshot, bails if it's protected
>>>        ///
>>>        /// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
>>>        pub fn destroy(&self, force: bool) -> Result<(), Error> {
>>> -        let full_path = self.full_path();
>>> -
>>>            let (_guard, _manifest_guard);
>>>            if !force {
>>> -            _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
>>> +            _guard = self
>>> +                .lock()
>>> +                .with_context(|| format!("while destroying snapshot '{self:?}'"))?;
>>>                _manifest_guard = self.lock_manifest()?;
>>>            }
>>>
>>> @@ -458,6 +488,7 @@ impl BackupDir {
>>>                bail!("cannot remove protected snapshot"); // use special error type?
>>>            }
>>>
>>> +        let full_path = self.full_path();
>>
>> ... and here it reallocates the snapshot path again.
>>
>>>            log::info!("removing backup snapshot {:?}", full_path);
>>>            std::fs::remove_dir_all(&full_path).map_err(|err| {
>>>                format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>> index a6a91ca7..1cbd0038 100644
>>> --- a/pbs-datastore/src/datastore.rs
>>> +++ b/pbs-datastore/src/datastore.rs
>>> @@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
>>>    use std::path::{Path, PathBuf};
>>>    use std::sync::{Arc, LazyLock, Mutex};
>>>
>>> -use anyhow::{bail, format_err, Error};
>>> +use anyhow::{bail, format_err, Context, Error};
>>>    use nix::unistd::{unlinkat, UnlinkatFlags};
>>>    use tracing::{info, warn};
>>>
>>> @@ -13,8 +13,8 @@ use proxmox_human_byte::HumanByte;
>>>    use proxmox_schema::ApiType;
>>>
>>>    use proxmox_sys::error::SysError;
>>> +use proxmox_sys::fs::DirLockGuard;
>>>    use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
>>> -use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
>>>    use proxmox_sys::linux::procfs::MountInfo;
>>>    use proxmox_sys::process_locker::ProcessLockSharedGuard;
>>>    use proxmox_worker_task::WorkerTaskContext;
>>> @@ -774,41 +774,35 @@ impl DataStore {
>>>        ///
>>>        /// This also acquires an exclusive lock on the directory and returns the lock guard.
>>>        pub fn create_locked_backup_group(
>>> -        &self,
>>> +        self: &Arc<Self>,
>>>            ns: &BackupNamespace,
>>>            backup_group: &pbs_api_types::BackupGroup,
>>>            auth_id: &Authid,
>>>        ) -> Result<(Authid, DirLockGuard), Error> {
>>> -        // create intermediate path first:
>>> -        let mut full_path = self.base_path();
>>> -        for ns in ns.components() {
>>> -            full_path.push("ns");
>>> -            full_path.push(ns);
>>> -        }
>>> -        full_path.push(backup_group.ty.as_str());
>>> -        std::fs::create_dir_all(&full_path)?;
>>> +        let backup_group = self.backup_group(ns.clone(), backup_group.clone());
>>>
>>> -        full_path.push(&backup_group.id);
>>> +        // create intermediate path first
>>> +        let full_path = backup_group.full_group_path();
>>
>> ... same here, this reallocates the group path.
>>
>> I see that this will change with the new locking mechanism, in the
>> following patches, but might it make sense to also generate the
>> respective paths together with the locking in order to avoid double
>> allocation?
>>
>> However not that relevant since it will get phased out once fully
>> switched to the new locking mechanism anyways.
> 
> if i understand you correctly then we have basically two choices. a)
> make the locking handlers take a parameter that will eventually be
> ignored, which is the actual path of the group or b) let the locking
> helper return the path.
> 
> both would mean that there is now more churn, as we are messing with the
> signatures of the locking helpers now and when we remove the old locking
> mechanism completely.
> 
> also i don't think having the path be allocated twice is that much of a
> performance hit. especially considering that both options mean that
> during the transition period we would always create the locking path and
> the full path. even when already using the new locking mechanism where
> we don't technically need the latter at all call sites anymore.
> 
> there is also option c) which could add a field to these structs like
> `full_path: Option<PathBuf>` (or similar) that caches these allocations,
> but that seems a bit like over-engineering to me tbh. at least i
> wouldn't want to add that unless it has actual performance impact.
> 
> -->8 snip 8<--

Agreed, just wanted to bring your attention to this in case it was not 
considered. Nothing in the path creation callstack seems expensive 
enough to justify any of your suggested approaches, also given that the 
locking path will change anyways.




More information about the pbs-devel mailing list