[pbs-devel] [PATCH proxmox-backup 3/5] datastore: implement sync-level tuning for datastores

Fabian Grünbichler f.gruenbichler at proxmox.com
Mon May 23 09:13:19 CEST 2022


On May 20, 2022 2:42 pm, Dominik Csapak wrote:
> currently, we don't (f)sync on chunk insertion (or at any point after
> that), which can lead to broken chunks in case of e.g. an unexpected
> powerloss. To fix that, offer a tuning option for datastores that
> controls the level of syncs it does:
> 
> * None (default): same as current state, no (f)syncs done at any point
> * Filesystem: at the end of a backup, the datastore issues
>   a syncfs(2) to the filesystem of the datastore
> * File: issues an fsync on each chunk as they get inserted
>   (using our 'replace_file' helper) and a fsync on the directory handle
> 
> a small benchmark showed the following (times in mm:ss):
> setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner
> 
> size                none    filesystem  file
> 2GiB (fits in ram)   00:13   0:41        01:00
> 33GiB                05:21   05:31       13:45
> 
> so if the backup fits in memory, there is a large difference between all
> of the modes (expected), but as soon as it exceeds the memory size,
> the difference between not syncing and syncing the fs at the end becomes
> much smaller.
> 
> i also tested on an nvme, but there the syncs basically made no difference
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
> i retested with the 'fsync on dirhandle' part, but the result
> was not significantly different than before, neither on nvme nor on
> a spinning disk...
> 
>  pbs-api-types/src/datastore.rs   | 32 +++++++++++++++++
>  pbs-datastore/src/chunk_store.rs | 60 ++++++++++++++++++++++++++------
>  pbs-datastore/src/datastore.rs   | 37 +++++++++++++++++---
>  src/api2/backup/environment.rs   |  2 ++
>  src/api2/config/datastore.rs     |  9 +++--
>  5 files changed, 124 insertions(+), 16 deletions(-)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index e2bf70aa..6563aa73 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -214,6 +214,37 @@ pub enum ChunkOrder {
>      Inode,
>  }
>  
> +#[api]
> +#[derive(PartialEq, Serialize, Deserialize)]
> +#[serde(rename_all = "lowercase")]
> +/// The level of syncing that is done when writing into a datastore.
> +pub enum DatastoreFSyncLevel {
> +    /// No special fsync or syncfs calls are triggered. The system default dirty write back
> +    /// mechanism ensures that data gets is flushed eventually via the `dirty_writeback_centisecs`
> +    /// and `dirty_expire_centisecs` kernel sysctls, defaulting to ~ 30s.
> +    ///
> +    /// This mode provides generally the best performance, as all write back can happen async,
> +    /// which reduces IO pressure.
> +    /// But it may cause losing data on powerloss or system crash without any uninterruptible power supply.
> +    None,
> +    /// Triggers a fsync after writing any chunk on the datastore. While this can slow down
> +    /// backups significantly, depending on the underlying file system and storage used, it
> +    /// will ensure fine-grained consistency. But in practice there are no benefits over the
> +    /// file system level sync, so you should prefer that one, as on most systems the file level
> +    /// one is slower and causes more IO pressure compared to the file system level one.

I am not sure whether that "in practice" statement really holds
- we only tested the exact failure case on a few filesystems, there 
  might be ones in use out there where a powerloss can also lead to a 
  truncated chunk, not only an empty/missing one. granted, both will be 
  detected on the next verification, but only the latter will be 
  automatically cleaned up by a subsequent backup task that uploads this 
  chunk..
- the FS underlying the datastore might be used for many datastores, or 
  even other, busy, non-datastore usage. not an ideal setup, but there 
  might be $reasons. in this case, syncfs might have a much bigger 
  negative effect (because of syncing out other, unrelated I/O) than 
  fsync.
- not sure what effect syncfs has if a datastore is really busy (as in, 
  has tons of basically no-op backups over a short period of time)

I'd rather mark 'Filesystem' as a good compromise, and the 'File' on as 
most consistent.

> +    File,
> +    /// Trigger a filesystem wide sync after all backup data got written but before finishing the
> +    /// task. This allows that every finished backup is fully written back to storage
> +    /// while reducing the impact most file systems in contrast to the file level sync.

missing 'on'? (reducing the impact *on* most file systems)?

> +    Filesystem,
> +}
> +
> +impl Default for DatastoreFSyncLevel {
> +    fn default() -> Self {
> +        DatastoreFSyncLevel::None
> +    }
> +}
> +
>  #[api(
>      properties: {
>          "chunk-order": {
> @@ -228,6 +259,7 @@ pub enum ChunkOrder {
>  pub struct DatastoreTuning {
>      /// Iterate chunks in this order
>      pub chunk_order: Option<ChunkOrder>,
> +    pub sync_level: Option<DatastoreFSyncLevel>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 1f169dee..86dd1c17 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,11 +1,10 @@
> -use std::io::Write;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
>  
>  use anyhow::{bail, format_err, Error};
>  
> -use pbs_api_types::GarbageCollectionStatus;
> +use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>  use proxmox_sys::fs::{create_dir, create_path, CreateOptions};
>  use proxmox_sys::process_locker::{
>      ProcessLockExclusiveGuard, ProcessLockSharedGuard, ProcessLocker,
> @@ -22,6 +21,7 @@ pub struct ChunkStore {
>      chunk_dir: PathBuf,
>      mutex: Mutex<()>,
>      locker: Option<Arc<Mutex<ProcessLocker>>>,
> +    sync_level: DatastoreFSyncLevel,
>  }
>  
>  // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
> @@ -68,6 +68,7 @@ impl ChunkStore {
>              chunk_dir: PathBuf::new(),
>              mutex: Mutex::new(()),
>              locker: None,
> +            sync_level: Default::default(),
>          }
>      }
>  
> @@ -88,6 +89,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          worker: Option<&dyn WorkerTaskContext>,
> +        sync_level: DatastoreFSyncLevel,
>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -144,7 +146,7 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base)
> +        Self::open(name, base, sync_level)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -153,7 +155,11 @@ impl ChunkStore {
>          lockfile_path
>      }
>  
> -    pub fn open<P: Into<PathBuf>>(name: &str, base: P) -> Result<Self, Error> {
> +    pub fn open<P: Into<PathBuf>>(
> +        name: &str,
> +        base: P,
> +        sync_level: DatastoreFSyncLevel,
> +    ) -> Result<Self, Error> {
>          let base: PathBuf = base.into();
>  
>          if !base.is_absolute() {
> @@ -176,6 +182,7 @@ impl ChunkStore {
>              chunk_dir,
>              locker: Some(locker),
>              mutex: Mutex::new(()),
> +            sync_level,
>          })
>      }
>  
> @@ -461,9 +468,27 @@ impl ChunkStore {
>              }
>          }
>  
> -        proxmox_sys::fs::replace_file(chunk_path, raw_data, CreateOptions::new(), false).map_err(
> -            |err| format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}"),
> -        )?;
> +        let chunk_dir_path = chunk_path
> +            .parent()
> +            .ok_or_else(|| format_err!("unable to get chunk dir"))?
> +            .to_owned();
> +
> +        proxmox_sys::fs::replace_file(
> +            chunk_path,
> +            raw_data,
> +            CreateOptions::new(),
> +            self.sync_level == DatastoreFSyncLevel::File,
> +        )
> +        .map_err(|err| {
> +            format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
> +        })?;
> +
> +        if self.sync_level == DatastoreFSyncLevel::File {
> +            // fsync dir handle to persist the tmp rename
> +            let dir = std::fs::File::open(chunk_dir_path)?;
> +            nix::unistd::fsync(dir.as_raw_fd())
> +                .map_err(|err| format_err!("fsync failed: {err}"))?;
> +        }
>  
>          drop(lock);
>  
> @@ -520,13 +545,21 @@ fn test_chunk_store1() {
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>  
> -    let chunk_store = ChunkStore::open("test", &path);
> +    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
>      assert!(chunk_store.is_err());
>  
>      let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>          .unwrap()
>          .unwrap();
> -    let chunk_store = ChunkStore::create("test", &path, user.uid, user.gid, None).unwrap();
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        None,
> +        DatastoreFSyncLevel::None,
> +    )
> +    .unwrap();
>  
>      let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>          .build()
> @@ -538,7 +571,14 @@ fn test_chunk_store1() {
>      let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>      assert!(exists);
>  
> -    let chunk_store = ChunkStore::create("test", &path, user.uid, user.gid, None);
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        None,
> +        DatastoreFSyncLevel::None,
> +    );
>      assert!(chunk_store.is_err());
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 5af8a295..613f3196 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -18,8 +18,8 @@ use proxmox_sys::WorkerTaskContext;
>  use proxmox_sys::{task_log, task_warn};
>  
>  use pbs_api_types::{
> -    Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, DatastoreTuning,
> -    GarbageCollectionStatus, HumanByte, Operation, UPID,
> +    Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, DatastoreFSyncLevel,
> +    DatastoreTuning, GarbageCollectionStatus, HumanByte, Operation, UPID,
>  };
>  use pbs_config::ConfigVersionCache;
>  
> @@ -61,6 +61,7 @@ pub struct DataStoreImpl {
>      chunk_order: ChunkOrder,
>      last_generation: usize,
>      last_update: i64,
> +    sync_level: DatastoreFSyncLevel,
>  }
>  
>  impl DataStoreImpl {
> @@ -75,6 +76,7 @@ impl DataStoreImpl {
>              chunk_order: ChunkOrder::None,
>              last_generation: 0,
>              last_update: 0,
> +            sync_level: Default::default(),
>          })
>      }
>  }
> @@ -154,7 +156,13 @@ impl DataStore {
>              }
>          }
>  
> -        let chunk_store = ChunkStore::open(name, &config.path)?;
> +        let tuning: DatastoreTuning = serde_json::from_value(
> +            DatastoreTuning::API_SCHEMA
> +                .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> +        )?;
> +
> +        let chunk_store =
> +            ChunkStore::open(name, &config.path, tuning.sync_level.unwrap_or_default())?;
>          let datastore = DataStore::with_store_and_config(chunk_store, config, generation, now)?;
>  
>          let datastore = Arc::new(datastore);
> @@ -197,7 +205,12 @@ impl DataStore {
>      ) -> Result<Arc<Self>, Error> {
>          let name = config.name.clone();
>  
> -        let chunk_store = ChunkStore::open(&name, &config.path)?;
> +        let tuning: DatastoreTuning = serde_json::from_value(
> +            DatastoreTuning::API_SCHEMA
> +                .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> +        )?;
> +        let chunk_store =
> +            ChunkStore::open(&name, &config.path, tuning.sync_level.unwrap_or_default())?;
>          let inner = Arc::new(Self::with_store_and_config(chunk_store, config, 0, 0)?);
>  
>          if let Some(operation) = operation {
> @@ -242,6 +255,7 @@ impl DataStore {
>              chunk_order,
>              last_generation,
>              last_update,
> +            sync_level: tuning.sync_level.unwrap_or_default(),
>          })
>      }
>  
> @@ -1257,4 +1271,19 @@ impl DataStore {
>          todo!("split out the namespace");
>      }
>      */
> +
> +    /// Syncs the filesystem of the datastore if 'sync_level' is set to
> +    /// [`DatastoreFSyncLevel::Filesystem`]. Uses syncfs(2).
> +    pub fn try_ensure_sync_level(&self) -> Result<(), Error> {
> +        if self.inner.sync_level != DatastoreFSyncLevel::Filesystem {
> +            return Ok(());
> +        }
> +        let file = std::fs::File::open(self.base_path())?;
> +        let fd = file.as_raw_fd();
> +        log::info!("syncinc filesystem");
> +        if unsafe { libc::syncfs(fd) } < 0 {
> +            bail!("error during syncfs: {}", std::io::Error::last_os_error());
> +        }
> +        Ok(())
> +    }
>  }
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 8c1c42db..98a13c6e 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -623,6 +623,8 @@ impl BackupEnvironment {
>              }
>          }
>  
> +        self.datastore.try_ensure_sync_level()?;
> +
>          // marks the backup as successful
>          state.finished = true;
>  
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index 28342c2c..9fdd73dc 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -11,8 +11,8 @@ use proxmox_section_config::SectionConfigData;
>  use proxmox_sys::WorkerTaskContext;
>  
>  use pbs_api_types::{
> -    Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DATASTORE_SCHEMA,
> -    PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY,
> +    Authid, DataStoreConfig, DataStoreConfigUpdater, DatastoreNotify, DatastoreTuning,
> +    DATASTORE_SCHEMA, PRIV_DATASTORE_ALLOCATE, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_MODIFY,
>      PROXMOX_CONFIG_DIGEST_SCHEMA,
>  };
>  use pbs_config::BackupLockGuard;
> @@ -70,6 +70,10 @@ pub(crate) fn do_create_datastore(
>  ) -> Result<(), Error> {
>      let path: PathBuf = datastore.path.clone().into();
>  
> +    let tuning: DatastoreTuning = serde_json::from_value(
> +        DatastoreTuning::API_SCHEMA
> +            .parse_property_string(datastore.tuning.as_deref().unwrap_or(""))?,
> +    )?;
>      let backup_user = pbs_config::backup_user()?;
>      let _store = ChunkStore::create(
>          &datastore.name,
> @@ -77,6 +81,7 @@ pub(crate) fn do_create_datastore(
>          backup_user.uid,
>          backup_user.gid,
>          worker,
> +        tuning.sync_level.unwrap_or_default(),
>      )?;
>  
>      config.set_data(&datastore.name, "datastore", &datastore)?;
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 





More information about the pbs-devel mailing list