[pbs-devel] [PATCH proxmox-backup RFC 00/10] introduce typestate for datastore/chunkstore

Wolfgang Bumiller w.bumiller at proxmox.com
Wed Sep 4 09:34:41 CEST 2024


On Tue, Sep 03, 2024 at 02:33:51PM GMT, Hannes Laimer wrote:
> This patch series introduces two traits, CanRead and CanWrite, to define whether
> a datastore reference is readable, writable, or neither. Functions that read
> or write are now implemented in `impl<T: CanRead>` or `impl<T: CanWrite>` blocks, ensuring
> that they are only available to references that are supposed to read/write.
> 
> Motivation:
> Currently, we track the number of read/write references of a datastore but we don't
> track Lookup operations as they don't read or write, they still need a chunkstore, so
> eventhough they don't neccessarily directly do IO, they hold an open file handle.
> This is a problem for things like unmounting, currently lookup operations are only really
> short, so you'd need really unlucky timing to actually run into problems, but still,
> if a datastore is in "offline" maintenance mode, we shouldn't open filehandles on it.
> 
> By encoding state in the type:
> 1. We can assign non-readable/writable references for lookup operations.
> 2. The compiler ensures correct usage of references. Since it is easy to miss
>     what might happen a few function calls down the line, having the compiler
>     yell at you for easily missed things like this, is a really good thing
>     I think.
> 
> Changes:
> * Added CanRead and CanWrite traits.
> * Separated functions into impl<T: CanRead> or impl<T: CanWrite>.
> * Introduced three new datastore lookup functions that return concrete types implementing
>    CanRead, CanWrite, or neither.
> * Renamed lookup_datastore() to open_datastore() and made it private.
> 
> The main downside is needing separate datastore caches for read and write references due to
> concrete type requirements in the cache HashMap.
> 
> Almost all changes are either adding generics or moving functions into the appropriate
> trait implementations. The logic itself is only touched twice, once in datastore_lookup()
> and once check_privs_and_load_store() in /api/admin/datastore, this function now only checks
> the privs, the datastore opening happens in the endpoint function directly. 

So apart from some details (like sealing the marker traits and some
whitespace issues between the patches etc.), I'd like to get some
generic feedback from the others here.

The main fear I'm having here is that it might increase codegen time,
but IMO there are a bunch of methods where it should be easy to manually
monomorphise the actual logic by just wrapping the actual logic in a
simple `fn()` right inside the method body.
While this is definitely additional work, keep in mind that most of
*this* patch set is just moving code between different impl<> blocks, so
the changes aren't as huge as they appear.

Additionally, I wouldn't consider having to separate the cache into read
and write caches a downside.
Note: The locking: the process locker - which is one current source of
"dangerous uses" - has a FIXME comment about switching to `OFD` locks
once they are available in the `nix` crate - which they are already
(also, could've just used libc back then?). This might give us a chance
to make the locking generally less error prone as well, but will need
more detailed analysis. If we *can* make that switch, then eg. creating
another `ChunkStore` instance would cease to be dangerous.

All this to say: I do like this change. I was at first a bit shocked,
that the marker traits ended up seeping through into even the
`BackupDir` & friends, but since they are *handles* to the dirs, and not
just the mere names, I think this is actually fine, and we get
additional safety from the compiler.

Some compile-time benchmarks before & after would be nice, though ;-)




More information about the pbs-devel mailing list