[pve-devel] [PATCH container v2 03/11] alloc_disk: fail fast if storage does not support content type rootdir

Wed Apr 16 10:19:58 CEST 2025

On Tue, Apr 15, 2025 at 04:19:10PM +0200, Fabian Grünbichler wrote:
> On April 15, 2025 3:31 pm, Wolfgang Bumiller wrote:
> > On Tue, Apr 15, 2025 at 02:27:24PM +0200, Daniel Kral wrote:
> >> On 2/20/25 13:15, Fiona Ebner wrote:
> >> > I also noticed that we have no check against starting a container with
> >> > volumes on a storage that does not support 'rootdir'. We have such a
> >> > check for VMs IIRC. Prohibiting that would also be good, but maybe
> >> > something for PVE 9 where we can also check for misconfigured
> >> > containers/storages via the pve8to9 script up front so users can adapt.
> >> 
> >> I'm preparing the v3 for this now, but I just noticed there actually is a
> >> assertion for this since e6da5357cc ("fix #3421: allow custom storage
> >> plugins to support rootfs") if I'm not missing something here in
> >> __mountpoint_mount(...).
> >> 
> >> What I don't yet understand is why there is no similar check for this in
> >> __mountpoint_mount for subvolumes, e.g. I can't start the container if I
> >> have a mountpoint on a directory storage without 'rootdir' support, but I
> >> can do so if the mountpoint is on a zfs pool without 'rootdir' support.
> >> 
> >> Since starting the container results in
> >> 
> >> run_buffer: 571 Script exited with status 25
> >> lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
> >> __lxc_start: 2034 Failed to initialize container "101"
> >> TASK ERROR: startup for container '101' failed
> >> 
> >> for the WebGUI, I'll try to squeeze in a patch to make the error message a
> >> little more readable if there's something going wrong when mounting.
> >> 
> >> ---
> >> 
> >> On another note, I've also noticed that if the root disk / mountpoint is
> >> already on a storage which does not support 'rootdir', the user is unable to
> >> move it to another storage... Shouldn't we allow users to do that so they
> >> can easily move out error states? Either way, this can be a follow-up anway,
> >> so no need to make this patch series any longer.
> > 
> > The `rootdir` content type is generally a bit wonky currently.
> > The problem is we're mixing content types and "allowed contents"
> > together:
> > 
> > For ZFS and btrfs we use subvolumes which are their own *content type*:
> > "subvol".
> 
> no? that's just the format, not the volume type.
> 
> > 
> > For *other* directory storages, we have a special case for `size=0`
> > where we have a directory of an "unlimited" size, which is also
> > considered to be of *content type* "subvol".
> 
> same here
> 
> > In the other case we actually allocate the content type *image*,
> > *BUT*(!!!) the Plugin.pm's default `list_volumes` implementation will
> > artificially *name* it "rootdir" *if* the *associated VMID* is from a
> > container by querying the VM list via `PVE::Cluster::get_vmlist()`.
> > This is not something we can ask external storage plugin devs to do,
> > IMO.
> 
> yes
> 
> > *rootdir* as an actual *content type* does not *really* exist in
> > pve-storage, other than as a remnant from openvz times for directories
> > under the `rootdir/` directory on a directory storage, which I'm fairly
> > certain we don't support in pve-container...
> 
> it was actually `private`, and yes, this is not used anymore other than
> if you happen to have a system upgraded from PVE < 4 that still has
> images that are not usable in practice lying around..
> 
> > 
> > This means that currently what is referred to as the "rootdir" content
> > type is actually just the *permission* to put containers on the storage.
> > 
> > This is something we really need to fix up with PVE9 one way or another.
> > Personally I'd argue the content type should disappear entirely.
> > `list_volumes` should call use the correct type (images or subvol), and
> > the rootdir content permission should (and always / indepenent from the
> > storage and what *content type* we create) in pve-container's disk
> > allocation code.
> 
> yes, there are basically three levels involved
> 
> - content type on the storage itself - this defines what you can put on
>   the storage, and it properly differentiates between rootdir and images
> - volume type as returned by parse_volname and consumed by other storage
>   helpers (often abbreviated vtype) - this very often cannot
>   differentiate whether a volume of type image is an image or a rootdir
>   (this is something we want to change, but it's a tricky and long
>   migration path)
> - image format (raw, qcow2, subvol, ..)
> 
> also see Max' pve-storage series thread, where some of this came up
> during review as well. but the TL;DR (for now) is: when looking at what
> a storage can store, check for rootdir for containers. when looking
> whether a volume has the expected type, check for rootdir or image. when
> actually handling it, also check the format to decide what to do with
> it.
> 
> the issue is that for properly differentiating between 'images' and
> 'rootdir' on the storage level, we need to split them both on the volid
> level (so we can differentiate without asking the storage) and on the
> storage level (so we can map back from the things we find on the storage
> to a volid). and somehow handle transitioning from the current mess to a
> cleanly separated state, and coordinate this with external plugins as
> well. Fiona and me had some chats about that already in the past, maybe
> we should sit down next week and make a concrete plan?

Yes we should. It would be good to clean this up for the trixie release.