[pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'

Lukas Wagner l.wagner at proxmox.com
Tue Aug 22 13:33:44 CEST 2023


Thanks for the review! Comments inline.

On 8/22/23 12:08, Max Carrara wrote:
> On 8/21/23 15:44, Lukas Wagner wrote:
>> For now, it contains a file-backed cache with expiration logic.
>> The cache should be safe to be accessed from multiple processes at
>> once.
>>
> 
> This seems pretty neat! The cache implementation seems straightforward
> enough. I'll see if I can test it more thoroughly later.
> 
> However, in my opinion we should have a crate like
> "proxmox-collections" (or something of the sort) with modules for each
> data structure / collection similar to the standard library; I'm
> curious what others think about that. imo it would be a great
> opportunity to introduce that crate in this series, since you're
> already introducing one for the cache anyway.
> 
> So, proxmox-collections would look something like this:
> 
>    proxmox-collections
>    └── src
>        ├── cache
>        │   ├── mod.rs
>        │   └── shared_cache.rs
>        └── lib.rs
> 
> Let me know what you think!
> 

I guess this would sense. Not sure if this will gain any other data 
structures soon, but I think going in that direction makes sense.


(...)

>> +    ///
>> +    /// Expired entries will *not* be returned.
>> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
>> +}
> 
> I don't necessarily think that a trait would be necessary in this
> case, as there's not really any other structure (that can be used as
> caching mechanism) that you're abstracting over. (more below)
> 

Yes, you are right. Clear case of premature optimi... refactoring ;)

>> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
>> new file mode 100644
>> index 0000000..be6212c
>> --- /dev/null
>> +++ b/proxmox-cache/src/shared_cache.rs
>> @@ -0,0 +1,263 @@
>> +use std::path::{Path, PathBuf};
>> +
>> +use anyhow::{bail, Error};
>> +use serde::{Deserialize, Serialize};
>> +use serde_json::Value;
>> +
>> +use proxmox_schema::api_types::SAFE_ID_FORMAT;
>> +use proxmox_sys::fs::CreateOptions;
>> +
>> +use crate::{Cache, DefaultTimeProvider, TimeProvider};
>> +
>> +/// A simple, file-backed cache that can be used from multiple processes concurrently.
>> +///
>> +/// Cache entries are stored as individual files inside a base directory. For instance,
>> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
>> +/// the base directory. As the extension implies, the cached data will be stored as a JSON
>> +/// string.
>> +///
>> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
>> +///
>> +/// ## Key Space
>> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
>> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
>> +///
>> +/// ## Concurrency
>> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
>> +/// multiple processes at the same time is safe.
>> +///
>> +/// ## Performance
>> +/// On a tmpfs:
>> +/// ```sh
>> +///   $ cargo run --release --example=performance
>> +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
>> +///   getting 100000 keys took 584.874842ms (5.848µs per key)
>> +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
>> +///
>> +/// Inserting/getting large objects might of course result in lower performance due to the cost
>> +/// of serialization.
>> +/// ```
>> +///
>> +pub struct SharedCache {
>> +    base_path: PathBuf,
>> +    time_provider: Box<dyn TimeProvider>,
>> +    create_options: CreateOptions,
>> +}
> 
> Instead, this should be generic:
> 
> pub struct SharedCache<K, V> { ... }

True, I could: K: AsRef<str> and V: Serialize + Deserialize

But yeah, as this is just an RFC to get feedback for the whole concept,
some of the implementation details are not completely fleshed out, 
partly on purpose and partly due to oversight.

> 
> .. and maybe rename it to SharedFileCache to make it explicit that this
> operates on a file. (but that's more dependent on one's taste tbh)
> 
Actually I originally named it `SharedFileCache`, but then changed it to 
changed it to `SharedCache`, because the former sounds a bit like it 
caches *files* rather than values - at least in my head.

(...)
> ... can be replaced as follows, in order to make it similar to
> std::collections::{HashMap, BTreeMap}:
> 
> impl<K: AsRef<str>> for SharedCache<K, Value> {
>      // Returns old value on successful insert, if given
>      fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn get(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn remove(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> }
> 
> If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
> be added as well, such as remove_entry, retain, clear, etc.
> 

I don't have any hard feelings regarding the naming, but not returning a 
Value from `delete` was a conscious decision - we simply don't need it 
right now. I don't want to deserialize to just throw away the value.
Also, reading *and* deleting at the same time *might* introduce the need 
for file locking - although I'm not completely sure about that yet.

If we ever need a `remove` that also returns the value, we could just 
introduce a second method, e.g. `take`.
> 
>> +
>> +impl SharedCache {
>> +    pub fn new<P: AsRef<Path>>(base_path: P, options: CreateOptions) -> Result<Self, Error> {
>> +        proxmox_sys::fs::create_path(
>> +            base_path.as_ref(),
>> +            Some(options.clone()),
>> +            Some(options.clone()),
>> +        )?;
>> +
>> +        Ok(SharedCache {
>> +            base_path: base_path.as_ref().to_owned(),
>> +            time_provider: Box::new(DefaultTimeProvider),
>> +            create_options: options,
>> +        })
>> +    }
>> +
>> +    fn enforce_safe_key(key: &str) -> Result<(), Error> {
>> +        let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format();
>> +        if safe_id_regex.is_match(key) {
>> +            Ok(())
>> +        } else {
>> +            bail!("invalid key format")
>> +        }
>> +    }
>> +
>> +    fn get_path_for_key(&self, key: &str) -> Result<PathBuf, Error> {
>> +        Self::enforce_safe_key(key)?;
>> +        let mut path = self.base_path.join(key);
>> +        path.set_extension("json");
>> +        Ok(path)
>> +    }
>> +}
>> +
>> +#[derive(Debug, Clone, Serialize, Deserialize)]
>> +struct CachedItem {
>> +    value: Value,
>> +    added_at: i64,
>> +    expires_in: Option<i64>,
>> +}
>> +
> 
> ... and for completion's sake: This can stay, as it's specific to the
> alternative implementation I've written above.
> 
> All in all, I think this would make your implementation more flexible.
> Let me know what you think!
> 


-- 
- Lukas





More information about the pve-devel mailing list