[pbs-devel] [PATCH proxmox-backup v2] datastore: remove datastore from internal cache based on maintenance mode

Hannes Laimer h.laimer at proxmox.com
Mon Mar 4 12:12:09 CET 2024


On Mon Mar 4, 2024 at 11:42 AM CET, Thomas Lamprecht wrote:
> Am 01/03/2024 um 16:03 schrieb Hannes Laimer:
> > We keep a DataStore cache, so ChunkStore's and lock files are kept by
> > the proxy process and don't have to be reopened every time. However, for
> > specific maintenance modes, e.g. 'offline', our process should not keep
> > file in that datastore open. This clears the cache entry of a datastore
> > if it is in a specific maintanance mode and the last task finished, which
> > also drops any files still open by the process.
>
> One always asks themselves if command sockets are the right approach, but
> for this it seems alright.
>
> Some code style comments inline.
>
> > Signed-off-by: Hannes Laimer <h.laimer at proxmox.com>
> > Tested-by: Gabriel Goller <g.goller at proxmox.com>
> > Reviewed-by: Gabriel Goller <g.goller at proxmox.com>
> > ---
> > 
> > v2, thanks @Gabriel:
> >  - improve comments
> >  - remove not needed &'s and .clone()'s
> > 
> >  pbs-api-types/src/maintenance.rs   |  6 +++++
> >  pbs-datastore/src/datastore.rs     | 41 ++++++++++++++++++++++++++++--
> >  pbs-datastore/src/task_tracking.rs | 23 ++++++++++-------
> >  src/api2/config/datastore.rs       | 18 +++++++++++++
> >  src/bin/proxmox-backup-proxy.rs    |  8 ++++++
> >  5 files changed, 85 insertions(+), 11 deletions(-)
> > 
> > diff --git a/pbs-api-types/src/maintenance.rs b/pbs-api-types/src/maintenance.rs
> > index 1b03ca94..a1564031 100644
> > --- a/pbs-api-types/src/maintenance.rs
> > +++ b/pbs-api-types/src/maintenance.rs
> > @@ -77,6 +77,12 @@ pub struct MaintenanceMode {
> >  }
> >  
> >  impl MaintenanceMode {
> > +    /// Used for deciding whether the datastore is cleared from the internal cache after the last
> > +    /// task finishes, so all open files are closed.
> > +    pub fn clear_from_cache(&self) -> bool {
>
> that function name makes it sound like calling it does actively clears it,
> but this is only for checking if a required condition for clearing is met.
>
> So maybe use a name that better convey that and maybe even avoid coupling
> this to an action that a user of ours executes, as this might have some use
> for other call sites too.
>
> From top of my head one could use `is_offline` as name, adding a note to
> the doc-comment that this is e.g. used to check if a datastore can be
> removed from the cache would still be fine though.
>

I agree, the name is somewhat misleading. The idea was to make it easy
to potentially add more modes here in the future, so maybe something
a little more general like `is_accessible` would make sense?

> > +        self.ty == MaintenanceType::Offline
> > +    }
> > +
> >      pub fn check(&self, operation: Option<Operation>) -> Result<(), Error> {
> >          if self.ty == MaintenanceType::Delete {
> >              bail!("datastore is being deleted");
> > diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> > index 2f0e5279..f26dff83 100644
> > --- a/pbs-datastore/src/datastore.rs
> > +++ b/pbs-datastore/src/datastore.rs
> > @@ -104,8 +104,27 @@ impl Clone for DataStore {
> >  impl Drop for DataStore {
> >      fn drop(&mut self) {
> >          if let Some(operation) = self.operation {
> > -            if let Err(e) = update_active_operations(self.name(), operation, -1) {
> > -                log::error!("could not update active operations - {}", e);
> > +            let mut last_task = false;
> > +            match update_active_operations(self.name(), operation, -1) {
> > +                Err(e) => log::error!("could not update active operations - {}", e),
> > +                Ok(updated_operations) => {
> > +                    last_task = updated_operations.read + updated_operations.write == 0;
> > +                }
> > +            }
> > +
> > +            // remove datastore from cache iff 
> > +            //  - last task finished, and
> > +            //  - datastore is in a maintenance mode that mandates it
> > +            let remove_from_cache = last_task
> > +                && pbs_config::datastore::config()
> > +                    .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
> > +                    .map_or(false, |c| {
> > +                        c.get_maintenance_mode()
> > +                            .map_or(false, |m| m.clear_from_cache())
> > +                    });
> > +
> > +            if remove_from_cache {
> > +                DATASTORE_MAP.lock().unwrap().remove(self.name());
> >              }
> >          }
> >      }
> > @@ -193,6 +212,24 @@ impl DataStore {
> >          Ok(())
> >      }
> >  
> > +    /// trigger clearing cache entries based on maintenance mode. Entries will only
> > +    /// be cleared iff there is no other task running, if there is, the end of the
> > +    /// last running task will trigger the clearing of the cache entry.
> > +    pub fn update_datastore_cache() -> Result<(), Error> {
>
> why does this work on all but not a single datastore, after all we always want to
> remove a specific one?
>

Actually just missed that our command_socket also does args, will update
this in v3.

> > +        let (config, _digest) = pbs_config::datastore::config()?;
> > +        for (store, (_, _)) in &config.sections {
> > +            let datastore: DataStoreConfig = config.lookup("datastore", store)?;
> > +            if datastore
> > +                .get_maintenance_mode()
> > +                .map_or(false, |m| m.clear_from_cache())
> > +            {
> > +                let _ = DataStore::lookup_datastore(store, Some(Operation::Lookup));
>
> A comment that the actual removal from the cache happens through the drop handler
> would be good, as this is a bit to subtle for my taste, if one stumbles over this
> in a few months down the line it might cause a bit to much easily to avoid head
> scratching...
>
> Alternatively, factor the actual check-maintenance-mode-and-remove-from-cache out
> of the drop handler and call that explicit here, all you need of outside info is
> the name there anyway.

I think that would entail having to open the file twice in the drop
handler, once for updating it, and once for reading it. But just
reading it here and explicitly clearing it from the cache seems
reasonable, it makes it way clrearer what's happening. I'll change that
in a v3.

Thanks for the review!




More information about the pbs-devel mailing list