[pbs-devel] [PATCH proxmox-backup v3 3/5] datastore: implement sync-level tuning for datastores

Wolfgang Bumiller w.bumiller at proxmox.com
Thu Oct 20 15:09:18 CEST 2022


Looks mostly good to me, minor nits inline:

On Thu, Oct 20, 2022 at 09:40:56AM +0200, Dominik Csapak wrote:
> currently, we don't (f)sync on chunk insertion (or at any point after
> that), which can lead to broken chunks in case of e.g. an unexpected
> powerloss. To fix that, offer a tuning option for datastores that
> controls the level of syncs it does:
> 
> * None (default): same as current state, no (f)syncs done at any point
> * Filesystem: at the end of a backup, the datastore issues
>   a syncfs(2) to the filesystem of the datastore
> * File: issues an fsync on each chunk as they get inserted
>   (using our 'replace_file' helper) and a fsync on the directory handle
> 
> a small benchmark showed the following (times in mm:ss):
> setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner
> 
> size                none    filesystem  file
> 2GiB (fits in ram)   00:13   0:41        01:00
> 33GiB                05:21   05:31       13:45
> 
> so if the backup fits in memory, there is a large difference between all
> of the modes (expected), but as soon as it exceeds the memory size,
> the difference between not syncing and syncing the fs at the end becomes
> much smaller.
> 
> i also tested on an nvme, but there the syncs basically made no difference
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  pbs-api-types/src/datastore.rs   | 37 ++++++++++++++++++++
>  pbs-datastore/src/chunk_store.rs | 59 +++++++++++++++++++++++++++-----
>  pbs-datastore/src/datastore.rs   | 39 ++++++++++++++++++---
>  src/api2/backup/environment.rs   |  2 ++
>  src/api2/config/datastore.rs     |  9 +++--
>  5 files changed, 131 insertions(+), 15 deletions(-)
> 
> diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
> index 0af11b33..15ea80cd 100644
> --- a/pbs-api-types/src/datastore.rs
> +++ b/pbs-api-types/src/datastore.rs
> @@ -168,6 +168,42 @@ pub enum ChunkOrder {
>      Inode,
>  }
>  
> +#[api]
> +#[derive(PartialEq, Serialize, Deserialize)]

+ Clone, Copy, Debug, Eq
also: you can derive Default now...

> +#[serde(rename_all = "lowercase")]
> +/// The level of syncing that is done when writing into a datastore.
> +pub enum DatastoreFSyncLevel {
> +    /// No special fsync or syncfs calls are triggered. The system default dirty write back
> +    /// mechanism ensures that data gets is flushed eventually via the `dirty_writeback_centisecs`
> +    /// and `dirty_expire_centisecs` kernel sysctls, defaulting to ~ 30s.
> +    ///
> +    /// This mode provides generally the best performance, as all write back can happen async,
> +    /// which reduces IO pressure.
> +    /// But it may cause losing data on powerloss or system crash without any uninterruptible power
> +    /// supply.

... if you add #[default] here.

> +    None,
> +    /// Triggers a fsync after writing any chunk on the datastore. While this can slow down
> +    /// backups significantly, depending on the underlying file system and storage used, it
> +    /// will ensure fine-grained consistency. Depending on the exact setup, there might be no
> +    /// benefits over the file system level sync, so if the setup allows it, you should prefer
> +    /// that one. Despite the possible negative impact in performance, it's the most consistent
> +    /// mode.
> +    File,
> +    /// Trigger a filesystem wide sync after all backup data got written but before finishing the
> +    /// task. This allows that every finished backup is fully written back to storage
> +    /// while reducing the impact on many file systems in contrast to the file level sync.
> +    /// Depending on the setup, it might have a negative impact on unrelated write operations
> +    /// of the underlying filesystem, but it is generally a good compromise between performance
> +    /// and consitency.
> +    Filesystem,
> +}
> +
> +impl Default for DatastoreFSyncLevel {

Then you can drop this.

> +    fn default() -> Self {
> +        DatastoreFSyncLevel::None
> +    }
> +}
> +
>  #[api(
>      properties: {
>          "chunk-order": {
> @@ -182,6 +218,7 @@ pub enum ChunkOrder {
>  pub struct DatastoreTuning {
>      /// Iterate chunks in this order
>      pub chunk_order: Option<ChunkOrder>,
> +    pub sync_level: Option<DatastoreFSyncLevel>,
>  }
>  
>  pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 145d2c7e..a8582485 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
(...)
> @@ -460,9 +468,27 @@ impl ChunkStore {
>              }
>          }
>  
> -        proxmox_sys::fs::replace_file(chunk_path, raw_data, CreateOptions::new(), false).map_err(
> -            |err| format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}"),
> -        )?;
> +        let chunk_dir_path = chunk_path
> +            .parent()
> +            .ok_or_else(|| format_err!("unable to get chunk dir"))?
> +            .to_owned();

No need to clone this slice, just drop the `.to_owned()`...

> +
> +        proxmox_sys::fs::replace_file(
> +            chunk_path,

...and use `&chunk_path` here, this takes an `AsRef<Path>`

> +            raw_data,
> +            CreateOptions::new(),
> +            self.sync_level == DatastoreFSyncLevel::File,
> +        )
> +        .map_err(|err| {
> +            format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
> +        })?;
> +
> +        if self.sync_level == DatastoreFSyncLevel::File {
> +            // fsync dir handle to persist the tmp rename
> +            let dir = std::fs::File::open(chunk_dir_path)?;
> +            nix::unistd::fsync(dir.as_raw_fd())
> +                .map_err(|err| format_err!("fsync failed: {err}"))?;
> +        }
>  
>          drop(lock);
>  





More information about the pbs-devel mailing list