[pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-qemu] add external qcow2 snapshot support

DERUMIER, Alexandre alexandre.derumier at groupe-cyllene.com
Fri Jan 10 08:44:34 CET 2025


-------- Message initial --------
De: Fabian Grünbichler <f.gruenbichler at proxmox.com>
À: Proxmox VE development discussion <pve-devel at lists.proxmox.com>
Cc: Alexandre Derumier <alexandre.derumier at groupe-cyllene.com>
Objet: Re: [pve-devel] [PATCH-SERIES v3 pve-storage/qemu-server/pve-
qemu] add external qcow2 snapshot support
Date: 09/01/2025 15:13:14

> Alexandre Derumier via pve-devel <pve-devel at lists.proxmox.com> hat am
> 16.12.2024 10:12 CET geschrieben:

> This patch series implement qcow2 external snapshot support for files
> && lvm volumes
> 
> The current internal qcow2 snapshots have bad write performance
> because no metadatas can be preallocated.
> 
> This is particulary visible on a shared filesystem like ocfs2 or
> gfs2.
> 
> Also other bugs are freeze/lock reported by users since years on
> snapshots delete on nfs
> (The disk access seem to be frozen during all the delete duration)
> 
> This also open doors for remote snapshot export-import for storage
> replication.
>>
>>a few high level remarks:
>>- I am not sure whether we want to/can switch over to blockdev on the
>>fly (i.e., without some sort of opt-in phase to iron out kinks). what
>>about upgrades with running VMs? I guess some sort of flag and per-VM
>>switching would be a better plan..

I have tested live migration, and it's works for the small tests I have
done. (I was surprised myself). I'll try to do more longer test to be
100% sure that they are not corruption of datas.

 on the guest side, it's transparent. on qemu side, the devices and pci
plumbing is still the same, and qemu already use blockdev behind.

If needed, we could make a switch based on qemu version, or or manual
option.


>>- I am also not sure whether we want to mix and match internal and
>>external snapshots, we probably should require a storage to use it or
>>not at creation time and fix it, to allow us to keep the handling
>>code from exploding complexity-wise..

for qcow2, I have a "snapext" option on the storage, so you can't mix
internal/external at the same time.  
(We just need to allow to define the snapext option at storage create
only)

For lvm, only external snapshot are possible. (if format is qcow2)



>>- if you see a way to name the block graph nodes in a deterministic
>>fashion (i.e., have a 1:1 mapping between snapshot and block graph
>>node name) that would be wonderful, else we'd have to find another
>>way to improve the lookup there..

1:1 mapping with snapshot is not possible (I have tried it a lot),
because:
  - snapshot name can be too long (blockdev name is 31 characters max,
hash based on filename is difficult)
  - with external snapshot file renaming, this don't work  (snap-->
current). We can't rename a blockdev, so the mapping will drift.

  So, I don't think that it's possible to avoid lookup. (I really don't
have idea how to manage it).  
I'm not sure it's really a problem ?  it's just an extra qmp call, but
it's super fast.


>>- the relevant commits and code would really benefit from a summary
>>of the design/semantics, it's not easy to reconstruct this from the
>>code alone

ok will do

>>- some tests would be great as well, both to verify that the code
>>actually behaves like we expect, and also to catch regressions when
>>it is touched again in the future

yes, I was planning to add test, as we don't have too much tests on
qemu command line currently, this is a good time to add them.


>>- some performance numbers would also be great :)
>>-- internal vs external snapshots using the same underlying dir/fs
>>-- qcow2 vs qcow2 with snapshots vs raw on LVM using the same
>>underlying disk

I'll prepare different hardware san to compare.  (Note that it's not
only performance, but disk lock/freeze on snapshot rollback for example
with internal snapshot).

I'll try to rebench again ocfs2 && gfs2 too with external snapshot (as
metadatas preallocation is really important with them, and it's not
possible with internal snap)



>>- did you verify the space consumption on the SAN side with the LVM
>>plugin? would be interesting to know how it holds up in practice :)
Well, currently, it's reserving a full lvm volume for each snapshot.

If your storage support thin provisioning, its not too much a problem,
you can create a lun bigger than your physical storage. I have tried
with a netapp san, it's works.

but for low cost san, we thin provisioning not exist, I'm planning to
add dynamic extend fo small lvm.

I have send some test patch in september 2024,  qcow2 size can be
bigger than lvm size. (like qcow2=100G && lvm=1G), then it's possible
to send event with qemu on threshold usage and extend (through a custom
daemon) the lvm volume.
But I was planning to add this in a separated after that the external
snapshot code is commited.

>>I haven't done any real world testing yet since I expect that the
>>delta for the next version won't be small either ;)

Yes, the biggest part is done. (I Hope ^_^ )

I'll try to finish the blockdev convertion of some missing part (iscsi
path, uefi ,...).
The main blocking point for me is the proxmox backup code, I really
don't known how it's works.


> 
> Changelog v3:
>   storage:
>     - snapshots files now have the name of the snapshot, and
> "current" snapshot is the vm volname
>     - allow only qcow2 format for base image for simplication
>     - merge snapshot code in lvmplugin (qcow2 format auto enable
> snapshot)
>     - the code is a lot more simple now
> 
>   qemu-server:
>     - convertion -drive to modern -blockdev
>       This is needed for blockdev-reopen, where we need to create &&
> switch to snapshot files with same cache,aio,...
>       This is also needed to live rename snapshot files (to keep
> volname for current snapshot)
>       I have implemented && tested:
>          - disk create,delete,resize,convert,drive_mirror,
> hotplug,unplug,nbd mirror,cdrom insert/eject
>          - block protocol: file,block_device,rbd,nbd,glusterfs
>       Note that it's currently incomplete:
>          - proxmox backup/restore code need to be converted to
> blockdev, help is needed
>          - iscsi:// path is not yet implemented (I'll look for the
> v4)
>          - efi still in drive format (not blocking, but I'll look for
> v4)
> 
>       The live migration between -drive ---> --blockdev seem to work
> without breaking, and seem to be 
>       transparent for the guest ok, so I think we could try to target
> pve9 ?
> 
> 
> 
> storage.cfg example:
> 
> dir: local2
>         path /var/liv/vz
>         content snippets,vztmpl,backup,images,iso,rootdir
>         snapext 1
> 
> lvm:test
>         vgname test
>         content images
> 
> 
> 
> 
> 
> pve-storage:
> 
> Alexandre Derumier (3):
>   qcow2: add external snapshot support
>   lvmplugin: add qcow2 snapshot
>   storage: vdisk_free: remove external snapshots
> 
>  src/PVE/Storage.pm                 |  18 ++-
>  src/PVE/Storage/DirPlugin.pm       |   1 +
>  src/PVE/Storage/LVMPlugin.pm       | 231 ++++++++++++++++++++++++++-
> --
>  src/PVE/Storage/Plugin.pm          | 207 ++++++++++++++++++++++----
>  src/test/run_test_zfspoolplugin.pl |  18 +++
>  5 files changed, 424 insertions(+), 51 deletions(-)
> 
> 
> Alexandre Derumier (11):
>   blockdev: cmdline: convert drive to blockdev syntax
>   blockdev: fix cfg2cmd tests
>   blockdev : convert qemu_driveadd && qemu_drivedel
>   blockdev: vm_devices_list : fix block-query
>   blockdev: convert cdrom media eject/insert
>   blockdev: block_resize: convert to blockdev
>   blockdev: nbd_export: block-export-add : use drive-$id for nodename
>   blockdev: convert drive_mirror to blockdev_mirror
>   blockdev: mirror: change aio on target if io_uring is not default.
>   blockdev: add backing_chain support
>   qcow2: add external snapshot support
> 
>  PVE/QemuConfig.pm                             |   4 +-
>  PVE/QemuMigrate.pm                            |   2 +-
>  PVE/QemuServer.pm                             | 946 ++++++++++++++--
> --
>  test/cfg2cmd/bootorder-empty.conf.cmd         |  12 +-
>  test/cfg2cmd/bootorder-legacy.conf.cmd        |  12 +-
>  test/cfg2cmd/bootorder.conf.cmd               |  12 +-
>  ...putype-icelake-client-deprecation.conf.cmd |   6 +-
>  test/cfg2cmd/ide.conf.cmd                     |  23 +-
>  test/cfg2cmd/pinned-version-pxe-pve.conf.cmd  |   6 +-
>  test/cfg2cmd/pinned-version-pxe.conf.cmd      |   6 +-
>  test/cfg2cmd/pinned-version.conf.cmd          |   6 +-
>  test/cfg2cmd/q35-ide.conf.cmd                 |  23 +-
>  .../q35-linux-hostpci-template.conf.cmd       |   3 +-
>  test/cfg2cmd/seabios_serial.conf.cmd          |   6 +-
>  ...imple-balloon-free-page-reporting.conf.cmd |   6 +-
>  test/cfg2cmd/simple-btrfs.conf.cmd            |   6 +-
>  test/cfg2cmd/simple-virtio-blk.conf.cmd       |   6 +-
>  test/cfg2cmd/simple1-template.conf.cmd        |  11 +-
>  test/cfg2cmd/simple1.conf.cmd                 |   6 +-
>  19 files changed, 830 insertions(+), 272 deletions(-)
> 
> 
> pve-qemu:
> 
>   add block-commit-replaces option patch
> 
>  ...052-block-commit-add-replaces-option.patch | 137
> ++++++++++++++++++
>  debian/patches/series                         |   1 +
>  2 files changed, 138 insertions(+)
>  create mode 100644 debian/patches/pve/0052-block-commit-add-
> replaces-option.patch
> 
> -- 
> 
> 2.39.5




More information about the pve-devel mailing list