[pbs-devel] [PATCH proxmox-backup v8 1/4] datastore/api/backup: prepare for fix of #3935 by adding lock helpers
Shannon Sterz
s.sterz at proxmox.com
Tue Mar 25 10:57:31 CET 2025
On Tue Mar 25, 2025 at 10:35 AM CET, Christian Ebner wrote:
> some small nits inline
>
> On 3/24/25 13:51, Shannon Sterz wrote:
>> to avoid duplicate code, add helpers for locking groups and snapshots
>> to the BackupGroup and BackupDir traits respectively and refactor
>> existing code to use them.
>>
>> this also adapts error handling by adding relevant context to each
>> locking helper call site. otherwise, we might loose valuable
>> information useful for debugging. note, however, that users that
>> relied on specific error messages will break.
>>
>> Signed-off-by: Shannon Sterz <s.sterz at proxmox.com>
>> ---
-->8 snip 8<--
>> +
>> + /// Lock a group exclusively
>> + pub fn lock(&self) -> Result<DirLockGuard, Error> {
>> + lock_dir_noblock(
>> + &self.full_group_path(),
>
> this allocates the group path...
>
>> + "backup group",
>> + "possible running backup",
>> + )
>> + }
>> }
>>
>> impl AsRef<pbs_api_types::BackupNamespace> for BackupGroup {
>> @@ -442,15 +454,33 @@ impl BackupDir {
>> .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
>> }
>>
>> + /// Lock this snapshot exclusively
>> + pub fn lock(&self) -> Result<DirLockGuard, Error> {
>> + lock_dir_noblock(
>> + &self.full_path(),
>
> ... this allocates the snapshot path ...
>
>> + "snapshot",
>> + "backup is running or snapshot is in use",
>> + )
>> + }
>> +
>> + /// Acquire a shared lock on this snapshot
>> + pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
>> + lock_dir_noblock_shared(
>> + &self.full_path(),
>> + "snapshot",
>> + "backup is running or snapshot is in use, could not acquire shared lock",
>> + )
>> + }
>> +
>> /// Destroy the whole snapshot, bails if it's protected
>> ///
>> /// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
>> pub fn destroy(&self, force: bool) -> Result<(), Error> {
>> - let full_path = self.full_path();
>> -
>> let (_guard, _manifest_guard);
>> if !force {
>> - _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
>> + _guard = self
>> + .lock()
>> + .with_context(|| format!("while destroying snapshot '{self:?}'"))?;
>> _manifest_guard = self.lock_manifest()?;
>> }
>>
>> @@ -458,6 +488,7 @@ impl BackupDir {
>> bail!("cannot remove protected snapshot"); // use special error type?
>> }
>>
>> + let full_path = self.full_path();
>
> ... and here it reallocates the snapshot path again.
>
>> log::info!("removing backup snapshot {:?}", full_path);
>> std::fs::remove_dir_all(&full_path).map_err(|err| {
>> format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index a6a91ca7..1cbd0038 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
>> use std::path::{Path, PathBuf};
>> use std::sync::{Arc, LazyLock, Mutex};
>>
>> -use anyhow::{bail, format_err, Error};
>> +use anyhow::{bail, format_err, Context, Error};
>> use nix::unistd::{unlinkat, UnlinkatFlags};
>> use tracing::{info, warn};
>>
>> @@ -13,8 +13,8 @@ use proxmox_human_byte::HumanByte;
>> use proxmox_schema::ApiType;
>>
>> use proxmox_sys::error::SysError;
>> +use proxmox_sys::fs::DirLockGuard;
>> use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
>> -use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
>> use proxmox_sys::linux::procfs::MountInfo;
>> use proxmox_sys::process_locker::ProcessLockSharedGuard;
>> use proxmox_worker_task::WorkerTaskContext;
>> @@ -774,41 +774,35 @@ impl DataStore {
>> ///
>> /// This also acquires an exclusive lock on the directory and returns the lock guard.
>> pub fn create_locked_backup_group(
>> - &self,
>> + self: &Arc<Self>,
>> ns: &BackupNamespace,
>> backup_group: &pbs_api_types::BackupGroup,
>> auth_id: &Authid,
>> ) -> Result<(Authid, DirLockGuard), Error> {
>> - // create intermediate path first:
>> - let mut full_path = self.base_path();
>> - for ns in ns.components() {
>> - full_path.push("ns");
>> - full_path.push(ns);
>> - }
>> - full_path.push(backup_group.ty.as_str());
>> - std::fs::create_dir_all(&full_path)?;
>> + let backup_group = self.backup_group(ns.clone(), backup_group.clone());
>>
>> - full_path.push(&backup_group.id);
>> + // create intermediate path first
>> + let full_path = backup_group.full_group_path();
>
> ... same here, this reallocates the group path.
>
> I see that this will change with the new locking mechanism, in the
> following patches, but might it make sense to also generate the
> respective paths together with the locking in order to avoid double
> allocation?
>
> However not that relevant since it will get phased out once fully
> switched to the new locking mechanism anyways.
if i understand you correctly then we have basically two choices. a)
make the locking handlers take a parameter that will eventually be
ignored, which is the actual path of the group or b) let the locking
helper return the path.
both would mean that there is now more churn, as we are messing with the
signatures of the locking helpers now and when we remove the old locking
mechanism completely.
also i don't think having the path be allocated twice is that much of a
performance hit. especially considering that both options mean that
during the transition period we would always create the locking path and
the full path. even when already using the new locking mechanism where
we don't technically need the latter at all call sites anymore.
there is also option c) which could add a field to these structs like
`full_path: Option<PathBuf>` (or similar) that caches these allocations,
but that seems a bit like over-engineering to me tbh. at least i
wouldn't want to add that unless it has actual performance impact.
-->8 snip 8<--
More information about the pbs-devel
mailing list