[pve-devel] Volume live migration concurrency

Wed May 28 09:06:53 CEST 2025

> Andrei Perapiolkin <andrei.perepiolkin at open-e.com> hat am 27.05.2025 18:08 CEST geschrieben:
> 
>  
> > 3. In the context of live migration: Will Proxmox skip calling
> > /deactivate_volume/ for snapshots that have already been activated?
> > Should the storage plugin explicitly deactivate all snapshots of a
> > volume during migration?
> 
> > a live migration is not concerned with snapshots of shared volumes, and local
> > volumes are removed on the source node after the migration has finished..
> >
> > but maybe you could expand this part?
> 
> My original idea was that since both 'activate_volume' and 
> 'deactivate_volume' methods have a 'snapname' argument they would both 
> be used to activate and deactivate snapshots respectivly.
> And for each snapshot activation, there would be a corresponding 
> deactivation.

deactivating volumes (and snapshots) is a lot trickier than activating
them, because you might have multiple readers in parallel that we don't
know about.

so if you have the following pattern

activate
do something
deactivate

and two instances of that are interleaved:

A: activate
B: activate
A: do something
A: deactivate
B: do something -> FAILURE, volume not active

you have a problem.

that's why we deactivate in special circumstances:
- as part of error handling for freshly activated volumes
- as part of migration when finally stopping the source VM or before
  freeing local source volumes
- ..

where we can be reasonably sure that no other user exists, or it is
required for safety purposes.

otherwise, we'd need to do refcounting on volume activations and have
some way to hook that for external users, to avoid premature deactivation.

> However, from observing the behavior during migration, I found that 
> 'deactivate_volume' is not called for snapshots that were previously 
> activated with 'activate_volume'.

where they activated for the migration? or for cloning from a snapshot?
or ..?

maybe there is call path that should deactivate that snapshot after using
it..

> Therefore, I assumed that 'deactivate_volume' is responsible for 
> deactivating all snapshots related to the volume that was previously 
> activated.
> The purpose if this question was to confirm this.
> 
>  From your response I conclude the following:
> 1. Migration does not manages(i.e. it does not activate or deactivate 
> them  volume snapshots.

that really depends. a storage migration might activate a snapshot if
that is required for transferring the volume. this mostly applies to
offline migration or unused volumes though, and only for some storages.

> 2. All volumes are expected to be present across all nodes in cluster, 
> for 'path' function to work.

if at all possible, path should just do a "logical" conversion of volume ID
to a stable/deterministic path, or the information required for Qemu to
access the volume if no path exists. ideally, this means it works without
activating the volume, but it might require querying the storage.

> 3. For migration to work volume should be simultaneously present on both 
> nodes.

for a live migration and shared storage, yes. for an offline migration with
shared storage, the VM is never started on the target node, so no volume
activation is required until that happens later. for local storages, volumes
only exist on one node anyway (they are copied during the migration).

> However, I couldn't find explicit instructions or guides on when and by 
> whom volume snapshot deactivation should be triggered.

yes, this is a bit under-specified unfortunately. we are currently working
on improving the documentation (and the storage plugin API).

> Is it possible for a volume snapshot to remain active active after 
> volume itself was deactivated?

I'd have to check all the code paths to give an answer to that.
snapshots are rarely activated in general - IIRC mostly for
- cloning from a snapshot
- replication (limited to ZFS at the moment)
- storage migration

so just did that:
- cloning from a snapshot only deactivates if the clone is to a different
  node, for both VM and CT -> see below
- CT backup in snapshot mode deletes the snapshot which implies deactivation
- storage_migrate (move_disk or offline migration) if a snapshot is passed,
  IIRC this only affects ZFS, which doesn't do activation anyway

> During testing proxmox 8.2 Ive encountered situations when cloning a 
> volume from a snapshot did not resulted in snapshot deactivation.
> This leads to the creation of 'dangling' snapshots if  the volume is 
> later migrated.

ah, that probably answers my question above.

I think this might be one of those cases where deactivation is hard - you
can have multiple clones from the same source VM running in parallel, and
only the last one would be allowed to deactivate the snapshot/volume..

> My current understanding is that all assets related to snapshots should 
> to be removed when volume is deactivation, is it correct?
> Or all volumes and snapshots expected to be present across the entire 
> cluster until they are explicitly deleted?

I am not quite sure what you mean by "present" - do you mean "exist in an
activated state"?

> Second option requires additional recommendation on artifact management.
> May be it should be sent it as an separate email, but draft it here.
> 
> If all volumes and snapshots are consistently present across entire 
> cluster and their creation/operation results in creation of additional 
> artifacts(such as iSCSI targets, multipath sessions, etc..), then this 
> artifacts should be removed on deletion of associated volume or snapshot.
> Currently, it is unclear how all nodes in the cluster are notified of 
> such deletion as only one node in the cluster receives 'free_image' or 
> 'volume_snapshot_delete'  request.
> What is a proper way to instruct plugin on other nodes in the cluster 
> that given volume/snapshot is requested for deletion and all artifacts 
> related to it have to be removed?

I now get where you are coming from I think! a volume should only be active
on a single node, except during a live migration, where the source node
will always get a deactivation call at the end.

deactivating a volume should also tear down related, volume-specific
resources, if applicable.

> How should the cleanup tasks be triggered across the remaining nodes?

it should not be needed, but I think you've found an edge case where we
need to improve.

I think our RBD plugin is also affected by this, all the other plugins
either:
- don't support snapshots (or cloning from them)
- are local only
- don't need any special activation/deactivation

I think the safe approach is likely to deactivate all snapshots when
deactivating the volume itself, for now.