[pve-devel] Proposal: support for atomic snapshot of all VM disks at once
Max Carrara
m.carrara at proxmox.com
Tue Oct 8 12:50:15 CEST 2024
On Fri Oct 4, 2024 at 3:54 PM CEST, Ivaylo Markov wrote:
> Greetings,
>
> I am the maintainer of StorPool’s external storage plugin for PVE[0]
> which integrates our storage solution as a backend for VM disks. Our
> software has the ability to create atomic (crash-consistent) snapshots
> of a group of storage volumes. I’d like to use this feature in our
> plugin so that customers can perform whole VM snapshots, but that does
> not seem possible currently - the snapshot creation method is called
> individually for every disk.
Hello!
This does sound quite interesting.
>
> I was directed here to discuss this proposal and my implementation idea
> after an initial post in Bugzilla[1]. The goal is to give storage
> plugins the option to perform atomic crash-consistent snapshots of the
> virtual disks associated with a virtual machine where the backend
> supports it (e.g. Ceph, StorPool, and ZFS) without affecting those
> without such a feature.
Since you mentioned that the backends without such a feature won't be
affected, will the disks of the storage types that *do* support it still
be addressable individually? As in: Would it be possible to run both
group snapshots and individual snapshots on a VM's disks? (I'm assuming
that they will, but still wanted to ask.)
>
> I would add a `can_snapshot_volume_group` method to the base
> `PVE::Storage::Plugin` class, which would accept an array of the VM’s
> disks, and return a binary result whether an atomic snapshot is
> possible. The default implementation would return 0, but plugins with
> support can override it based on backend capabilities. For example, ZFS
> supports atomic snapshot of volume groups, but requires all volumes to
> be in the same pool.
> The actual snapshot can be performed by a `snapshot_volume_group
> method`, which is not expected to be called unless the driver supports
> this operation.
For both of these methods you'll have to keep in mind that the
`...::Plugin` class is designed to only handle a single volume at once.
I'm not against implementing methods that support addressing multiple
volumes (disks) at once, though, but the `PVE::Storage` API must handle
this gracefully. See more down below.
>
> In `PVE::AbstractConfig::snapshot_create` these two methods can be used
> to check and perform the atomic snapshots where possible, otherwise it
> would keep the current behavior of creating a snapshot for each disk
> separately. If the VM has drives with completely different storage
> backends (e.g. ZFS and LVM for whatever reason), the driver check can be
> skipped, and the current behavior used again.
This sounds good to me, though one thing that should be noted here is
that everything should go through the `PVE::Storage` API, *not*
`PVE::Storage::Plugin`.
This means that there need to be API subroutines for the `PVE::Storage`
module that are able to handle all the cases you described above. These
subs then work with `PVE::Storage::Plugin::can_snapshot_volume_group`
and `...::snapshot_volume_group`.
In general, this looks something like this:
* `PVE::Storage::Plugin` defines interface for concrete storage plugin
implementations
* `PVE::Storage` isn't aware of the concrete implementations and
strictly uses only the `...::Plugin` interface
* `PVE::AbstractConfig` and others only use the API methods provided
by `PVE::Storage`, allowing us to abstract over many different types
of storages
Since you already got your own storage plugin, I assume you're aware of
all of this, but I wanted to mention it regardless to avoid any
misunderstandings.
Sooo, all of this means that this will also require subroutines in
`PVE::Storage` that expose this kind of functionality (checking if group
snapshots are possible, performing group snapshots, ...).
Overall, you'd have to handle the following cases in `PVE::Storage`,
from what I can think of at the moment:
* Whether there is more than one underlying storage type
(if there is, don't support group snapshots)
* Whether the underlying storage type supports group snapshots *in
general*
* Whether a group snapshot may actually be performed on the passed
constellation of volumes (disks)
* ...
You'll probably need to introduce more than the two methods in
`PVE::Storage::Plugin` you mentioned above in order to really support
group snapshots for each different storage type and handle all the
(edge) cases.
Though, I really like the idea overall; I don't see why we shouldn't
add this.
If you haven't already, you should have a look at this regarding
contributions: https://pve.proxmox.com/wiki/Developer_Documentation#Software_License_and_Copyright
>
>
> [0] https://github.com/storpool/pve-storpool
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=5752
>
> Best regards,
More information about the pve-devel
mailing list