[pve-devel] [PATCH container v2 03/11] alloc_disk: fail fast if storage does not support content type rootdir

Tue Apr 15 16:19:10 CEST 2025

On April 15, 2025 3:31 pm, Wolfgang Bumiller wrote:
> On Tue, Apr 15, 2025 at 02:27:24PM +0200, Daniel Kral wrote:
>> On 2/20/25 13:15, Fiona Ebner wrote:
>> > I also noticed that we have no check against starting a container with
>> > volumes on a storage that does not support 'rootdir'. We have such a
>> > check for VMs IIRC. Prohibiting that would also be good, but maybe
>> > something for PVE 9 where we can also check for misconfigured
>> > containers/storages via the pve8to9 script up front so users can adapt.
>> 
>> I'm preparing the v3 for this now, but I just noticed there actually is a
>> assertion for this since e6da5357cc ("fix #3421: allow custom storage
>> plugins to support rootfs") if I'm not missing something here in
>> __mountpoint_mount(...).
>> 
>> What I don't yet understand is why there is no similar check for this in
>> __mountpoint_mount for subvolumes, e.g. I can't start the container if I
>> have a mountpoint on a directory storage without 'rootdir' support, but I
>> can do so if the mountpoint is on a zfs pool without 'rootdir' support.
>> 
>> Since starting the container results in
>> 
>> run_buffer: 571 Script exited with status 25
>> lxc_init: 845 Failed to run lxc.hook.pre-start for container "101"
>> __lxc_start: 2034 Failed to initialize container "101"
>> TASK ERROR: startup for container '101' failed
>> 
>> for the WebGUI, I'll try to squeeze in a patch to make the error message a
>> little more readable if there's something going wrong when mounting.
>> 
>> ---
>> 
>> On another note, I've also noticed that if the root disk / mountpoint is
>> already on a storage which does not support 'rootdir', the user is unable to
>> move it to another storage... Shouldn't we allow users to do that so they
>> can easily move out error states? Either way, this can be a follow-up anway,
>> so no need to make this patch series any longer.
> 
> The `rootdir` content type is generally a bit wonky currently.
> The problem is we're mixing content types and "allowed contents"
> together:
> 
> For ZFS and btrfs we use subvolumes which are their own *content type*:
> "subvol".

no? that's just the format, not the volume type.

> 
> For *other* directory storages, we have a special case for `size=0`
> where we have a directory of an "unlimited" size, which is also
> considered to be of *content type* "subvol".

same here

> In the other case we actually allocate the content type *image*,
> *BUT*(!!!) the Plugin.pm's default `list_volumes` implementation will
> artificially *name* it "rootdir" *if* the *associated VMID* is from a
> container by querying the VM list via `PVE::Cluster::get_vmlist()`.
> This is not something we can ask external storage plugin devs to do,
> IMO.

yes

> *rootdir* as an actual *content type* does not *really* exist in
> pve-storage, other than as a remnant from openvz times for directories
> under the `rootdir/` directory on a directory storage, which I'm fairly
> certain we don't support in pve-container...

it was actually `private`, and yes, this is not used anymore other than
if you happen to have a system upgraded from PVE < 4 that still has
images that are not usable in practice lying around..

> 
> This means that currently what is referred to as the "rootdir" content
> type is actually just the *permission* to put containers on the storage.
> 
> This is something we really need to fix up with PVE9 one way or another.
> Personally I'd argue the content type should disappear entirely.
> `list_volumes` should call use the correct type (images or subvol), and
> the rootdir content permission should (and always / indepenent from the
> storage and what *content type* we create) in pve-container's disk
> allocation code.

yes, there are basically three levels involved

- content type on the storage itself - this defines what you can put on
  the storage, and it properly differentiates between rootdir and images
- volume type as returned by parse_volname and consumed by other storage
  helpers (often abbreviated vtype) - this very often cannot
  differentiate whether a volume of type image is an image or a rootdir
  (this is something we want to change, but it's a tricky and long
  migration path)
- image format (raw, qcow2, subvol, ..)

also see Max' pve-storage series thread, where some of this came up
during review as well. but the TL;DR (for now) is: when looking at what
a storage can store, check for rootdir for containers. when looking
whether a volume has the expected type, check for rootdir or image. when
actually handling it, also check the format to decide what to do with
it.

the issue is that for properly differentiating between 'images' and
'rootdir' on the storage level, we need to split them both on the volid
level (so we can differentiate without asking the storage) and on the
storage level (so we can map back from the things we find on the storage
to a volid). and somehow handle transitioning from the current mess to a
cleanly separated state, and coordinate this with external plugins as
well. Fiona and me had some chats about that already in the past, maybe
we should sit down next week and make a concrete plan?