[pve-devel] VM locked after failed Snapshot

Mon Sep 8 17:01:14 CEST 2014

> I see multiple problems here.
> 
> 1.) lock state should be removed by PVE in case of a failure.
> 
> Currently snapshot_create calls snapshot_prepare to set the lock. And at the end
> snapshot_commit deletes the log.
> 
> But currently in case of $err
> 
>     if ($err) {
>         warn "snapshot create failed: starting cleanup\n";
>         eval { snapshot_delete($vmid, $snapname, 0, $drivehash); };
>         warn $@ if $@;
>         die $err;
>     }
> 
> The lock isn't removed.

That is the intention of the look. We cannot commit, and rollback also fails. So we
keep the lock to indicate the need for operator invention.

> What is the correct way to remove a lock in this case?

Manually edit the config. Not sure if there is a better way. 
>
> 2.) in case of an unexpected failure or signal ceph/rbd does not remove it's
> watcher from the image. So the snapshot_delete failed in this case.
> 
> Output:
> 
> image has watchers - not removing
> Removing image: 0% complete...failed.
> rbd: error: image still has watchers
> 
> rbd has an automatic timeout after 30s should PVE handle this by waiting 30s
> and try it again?

What are 'unexpected failures'? I think such things should be handled inside librbd?