[pve-devel] [RFC qemu-server] clone disk: fix handling of snapshot-as-volume-chain for EFI disks
Friedrich Weber
f.weber at proxmox.com
Mon Nov 10 18:06:30 CET 2025
On 07/11/2025 18:19, Friedrich Weber wrote:
> Currently, cloning an EFI disk off a snapshot-as-volume-chain snapshot
> fails with "qemu-img: Failed to load snapshot: Can't find snapshot".
> The reason is that the special case for EFI disks calls `qemu-img dd`
> with the additional `-l snapname` option, which is only valid for
> qcow2-internal snapshots. For snapshot-as-volume-chain snapshots, the
> source volume is already the volume corresponding to the snapshot.
>
> Fix this by checking whether the snapshot in question is an external
> snapshot, and if it is, omitting the `-l` option.
>
> Reported in enterprise support.
Turns out this only fixes the issue in case of snapshot-as-volume-chain
on file-level storages, not on LVM. The reason is that
LVMPlugin::volume_snapshot_info does not set the `ext` key to 1 [1], so
the code doesn't recognize the external snapshot.
LVMPlugin::volume_snapshot_info should probably set `ext` (assuming that
"ext" means "external")?
>
> Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
> ---
>
> Notes:
> I noticed that two oddities about volume_snapshot_info:
>
> - it doesn't seem to work for qcow2-internal snapshots: It assumes
> that if `qemu img info` returns a JSON object, the snapshots are
> internal, and if it returns a JSON array, they are "external", but
> this doesn't seem correct -- I think because we pass
> --backing-chain, qemu-img will return an one-element JSON array even
> for internal snapshots. Hence, volume_snapshot_info incorrectly
> returns a hash ref with only a single member "current" if the
> snapshots are internal. But since I only found callsites guarded
> with snapshot-as-volume-chain, I guess this didn't hurt until now?
>
> It doesn't hurt for my patch either because it only checks for
> existence of $info->{$snapshot}->{ext} (which also works for
> qcow2-internal snapshots, because alreay $info->{$snapshot} will not
> exist then), but that's admittedly not nice and it's probably a good
> idea to fix volume_snapshot_info in addition.
>
> - but in which way should we fix it? Right now, for volume-chain
> snapshots, it seems like volume_snapshot_info will always have a
> member "current" pointing to the "actual" volume, even if the volume
> doesn't have any snapshots. Is this intended and do we want this
> also in case of qcow2-internal snapshots? From reading
> ZFSPoolPlugin::volume_snapshot_info, it looks like that one will
> return an empty hash (without "current") if the volume has no
> snapshots.
Looks like
- ZFSPoolPlugin::volume_snapshot_info indeed doesn't return "current"
- but there are several callsites of {LVM,}Plugin::volume_snapshot_info
that rely on 'current' being present for snapshot-as-volume chain [2]
So the implementations differ w.r.t. returning a "current" entry or not,
which is IMO a bit awkward API-wise. But fixing this in either way
(removing it for {LVM,}Plugin or adding it for ZFSPoolPlugin) would
probably require bigger changes in either the snapshot-as-volume-chain
or the ZFS replication code?
For fixing this particular bug, I guess we could instead call
`volume_qemu_snapshot_method` and only add the `-l snapname` parameter
if that returns 'qemu'. But I think this is only safe because we don't
allow enabling/disabling snapshot-as-volume-chain for file-level
storages on-the-fly.
What do you think?
[1]
https://git.proxmox.com/?p=pve-storage.git;a=blob;f=src/PVE/Storage/LVMPlugin.pm;h=3badfef2c;hb=a85ebe3#l849
[2]
https://git.proxmox.com/?p=qemu-server.git;a=blob;f=src/PVE/QemuServer/Blockdev.pm;h=8fa5eb51e;hb=b41c1e1440ff822#l420
More information about the pve-devel
mailing list