[pve-devel] [PATCH storage/zfsonlinux v2 0/3] fix #4997: lvm: avoid autoactivating (new) LVs after boot

Mon Mar 10 15:01:05 CET 2025

On 07/03/2025 13:14, Fabian Grünbichler wrote:
> [...]
>> # Mixed 7/8 cluster
>>
>> Unfortunately, we need to consider the mixed-version cluster between PVE 7 and
>> PVE 8 because PVE 7/Bullseye's LVM does not know `--setautoactivation`. A user
>> upgrading from PVE 7 will temporarily have a mixed 7/8 cluster. Once this
>> series is applied, the PVE 8 nodes will create new LVs with
>> `--setautoactivation n`, which the PVE 7 nodes do not know. In my tests, the
>> PVE 7 nodes can read/interact with such LVs just fine, *but*: As soon as a PVE
>> 7 node creates a new (unrelated) LV, the `--setautoactivation n` flag is reset
>> to default `y` on *all* LVs of the VG. I presume this is because creating a new
>> LV rewrites metadata, and the PVE 7 LVM doesn't write out the
>> `--setautoactivation n` flag. I imagine (have not tested) this will cause
>> problems on a mixed cluster.
>>
>> Hence, it may be safer to hold off on applying patches #2/#3 until PVE 9,
>> because then we can be sure all nodes run at least PVE 8.
> 
> wow - I wonder what other similar timebombs are lurking there??

Huh, good question, ...

> 
> e.g., for the changelog for bullseye -> bookworm:
> 
> 2.03.15:   "Increase some hash table size to better support large device sets." (hopefully those are in memory?)
> 2.03.14: various VDO things (not used by us at least)
> 2.03.12:   "Allow attaching cache to thin data volume." and other thin-related changes (not relevant for shared thankfully), auto activation, more VDO things
> 
> bookworm -> trixie:
> 
> 2.03.28: various things using radix_trees now (hopefully in memory?)
> 2.03.25: same
> 2.03.24: "Support external origin between different thin-pool." & "Support creation of thin-pool with VDO use for its data volume" (local only thankfully), 
> 2.03.23: "Set the first lv_attr flag for raid integrity images to i or I." (not used by us, maybe in memory only)
> 2.03.22: "Support parsing of vdo geometry format version 4.", "Fix parsing of VDO metadata.", "Allow snapshots of raid+integrity LV." (all not applicable to regular PVE setups)
> 2.03.18: "Add support for writecache metadata_only and pause_writeback settings." (?)

This is probably referring to new lvmcache options [1], shouldn't be
relevant for our default setup I guess.

> 
> so at least nothing that jumps out as definitely problematic for our common setups..

... thanks for taking a look. I didn't see anything problematic either.

>> # Interaction with zfs-initramfs
>>
>> One complication is that zfs-initramfs ships an initramfs-tools script that
>> unconditionally activates *all* VGs that are visible at boot time, see [2].
>> This is the case for local VGs, but also for shared VGs on FC/SAS. Patch #2 of
>> this series fixes this by making the script perform autoactivation instead,
>> which honors the `--setautoactivation` flag. The patch is necessary to fix
>> #4997 on FC/SAS-attached LUNs too.
>>
>> # Bonus fix for FC/SAS multipath+LVM issue
>>
>> As it turns out, this series seems to additionally fix an issue on hosts with
>> LVM on FC/SAS-attached LUNs *with multipath* where LVM would report "Device
>> mismatch detected" warnings because the LVs are activated too early in the boot
>> process before multipath is available. Our current suggested workaround is to
>> install multipath-tools-boot [2]. With this series applied, this shouldn't be
>> necessary anymore, as (newly created) LVs are not auto-activated after boot.
>>
>> # LVM-thick/LVM-thin
>>
>> Note that this change affects all LVs on LVM-thick, not just ones on shared
>> storage. As a result, also on single-node hosts, local guest disk LVs on
>> LVM-thick will not be automatically active after boot anymore (after applying
>> all patches of this series). Guest disk LVs on LVM-thin will still be
>> auto-activated, but since LVM-thin storage is necessarily local, we don't run
>> into #4997 here.
> 
> we could check the shared property, but I don't think having them not
> auto-activated hurts as long as it is documented..

This is referring to LVs on *local* LVM-*thick* storage, right? In that
case, I'd agree that not having them autoactivated either is okay
(cleaner even).

The patch series currently doesn't touch the LvmThinPlugin at all, so
all LVM-*thin* LVs will still be auto-activated at boot. We could also
patch LvmThinPlugin to create new thin LVs with `--setautoactivation n`
-- though it wouldn't give us much, except consistency with LVM-thick.

>> # Transition to LVs with `--setautoactivation n`
>>
>> Both v1 and v2 approaches only take effect for new LVs, so we should probably
>> also have pve8to9 check for guest disks on (shared?) LVM that have
>> autoactivation enabled, and suggest to the user to manually disable
>> autoactivation on the LVs, or even the entire VG if it holds only PVE-managed
>> LVs.
> 
> if we want to wait for PVE 9 anyway to start enabling (disabling? ;)) it, then
> the upgrade script would be a nice place to tell users to fix up their volumes?

The upgrade script being pve8to9, right? I'm just not sure yet what to
suggest: `lvchange --setautoactivation n` on each LV, or simply
`vgchange --setautoactivation n` on the whole shared VG (provided it
only contains PVE-managed LVs).

> OTOH, setting the flag automatically starting with PVE 9 also for existing
> volumes should have no downsides, [...]

Hmm, but how would be do that automatically?

> we need to document anyway that the behaviour
> there changed (so that people that rely on them becoming auto-activated on boot
> can adapt whatever is relying on that).. or we could provide a script that does
> it post-upgrade..

Yes, an extra script to run after the upgrade might be an option. Though
we'd also need to decide whether to deactivate on each individual LV, or
the complete VG (then we'd just assume that there no other
non-PVE-managed LVs in the VG that the user wants autoactivated).

> 
>> We could implement something on top to make the transition smoother, some ideas:
>>
>> - When activating existing LVs, check the auto activation flag, and if auto
>>   activation is still enabled, disable it.
> 
> the only question is whether we want to "pay" for that on each activate_volume?

Good question. It does seem a little extreme, also considering that once
all existing LVs have autoactivation disabled, all new LVs will have the
flag disabled as well and the check becomes obsolete.

It just occurred to me that we could also pass `--setautoactivation n`
to `lvchange -ay` in `activate_volume`, but a test shows that this
triggers a metadata update on *each activate_volume*, which sounds like
a bad idea.

> 
>> - When creating a new VG via the LVM storage plugin or the /disks/lvm API
>>   endpoints, disable autoactivation for the whole VG. But this may become
>>   confusing as then we'd deal with the flag on both LVs and VGs. Also, especially
>>   for FC/SAS-attached LUNs, users may not use the GUI/API and instead create the
>>   VG via the command line. We could adjust our guides to use `vgcreate
>>   --setautoactivation n` [4], but not all users may follow these guides.

[1] https://manpages.debian.org/trixie/lvm2/lvmcache.7.en.html#metadata_only