[pbs-devel] [PATCH proxmox-backup v9 36/46] api/datastore: implement refresh endpoint for stores with s3 backend

Mon Jul 21 16:48:17 CEST 2025

On Mon Jul 21, 2025 at 4:42 PM CEST, Christian Ebner wrote:
> On 7/21/25 4:31 PM, Hannes Laimer wrote:
>> On Mon Jul 21, 2025 at 4:26 PM CEST, Christian Ebner wrote:
>>> On 7/21/25 4:16 PM, Hannes Laimer wrote:
>>>> On Sat Jul 19, 2025 at 2:50 PM CEST, Christian Ebner wrote:
>>>>> Allows to easily refresh the contents on the local cache store for
>>>>> datastores backed by an S3 object store.
>>>>>
>>>>> In order to guarantee that no read or write operations are ongoing,
>>>>> the store is first set into the maintenance mode `S3Refresh`. Objects
>>>>> are then fetched into a temporary directory to avoid loosing contents
>>>>> and consistency in case of an error. Once all objects have been
>>>>> fetched, clears out existing contents and moves the newly fetched
>>>>> contents in place.
>>>>>
>>>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>>>> ---
>>>>> changes since version 8:
>>>>> - refactor s3 refresh into more compact methods
>>>>> - drop un-necessary drop(_lock)
>>>>> - use missing tokio::task::spawn_blocking context for blocking
>>>>>     maintenance mode setting
>>>>>
>>>>>    pbs-datastore/src/datastore.rs | 175 +++++++++++++++++++++++++++++++++
>>>>>    src/api2/admin/datastore.rs    |  34 +++++++
>>>>>    2 files changed, 209 insertions(+)
>>>>>
>>>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>>>> index a524d7b32..b2af05eac 100644
>>>>> --- a/pbs-datastore/src/datastore.rs
>>>>> +++ b/pbs-datastore/src/datastore.rs
>>>>> @@ -10,6 +10,7 @@ use anyhow::{bail, format_err, Context, Error};
>>>>>    use http_body_util::BodyExt;
>>>>>    use nix::unistd::{unlinkat, UnlinkatFlags};
>>>>>    use pbs_tools::lru_cache::LruCache;
>>>>> +use tokio::io::AsyncWriteExt;
>>>>>    use tracing::{info, warn};
>>>>>    
>>>>>    use proxmox_human_byte::HumanByte;
>>>>> @@ -2200,4 +2201,178 @@ impl DataStore {
>>>>>        pub fn old_locking(&self) -> bool {
>>>>>            *OLD_LOCKING
>>>>>        }
>>>>> +
>>>>> +    /// Set the datastore's maintenance mode to `S3Refresh`, fetch from S3 object store, clear and
>>>>> +    /// replace the local cache store contents. Once finished disable the maintenance mode again.
>>>>> +    /// Returns with error for other datastore backends without setting the maintenance mode.
>>>>> +    pub async fn s3_refresh(self: &Arc<Self>) -> Result<(), Error> {
>>>>> +        match self.backend()? {
>>>>> +            DatastoreBackend::Filesystem => bail!("store '{}' not backed by S3", self.name()),
>>>>> +            DatastoreBackend::S3(s3_client) => {
>>>>> +                let self_clone = Arc::clone(self);
>>>>> +                tokio::task::spawn_blocking(move || {
>>>>> +                    self_clone.maintenance_mode(Some(MaintenanceMode {
>>>>> +                        ty: MaintenanceType::S3Refresh,
>>>>> +                        message: None,
>>>>> +                    }))
>>>>> +                })
>>>>> +                .await?
>>>>> +                .context("failed to set maintenance mode")?;
>>>>
>>>> I think we should hold the config lock, so it can't be changed while we
>>>> refresh, no?
>>>
>>> Yes, but that is handled by the method itself, also to limit lock scope.
>>>
>>> See further below...
>>>
>> 
>> maybe I'm missing something, but the limited scope is what I mean. I
>> think we should try to prevent changing the maintenance mode away from
>> `S3Refresh` before we're done, so basically holding the lock while we
>> refresh.
>
> No, the intention here is to allow the user to manually clear the 
> maintenance mode again, as the s3 refresh might fail. So the refresh 
> manually sets it to guarantee consistency and clears it if finished.
>
> If the lock on the config is kept for the whole refresh, than the user 
> will never be able to recover from an error.

ACK, you is right!