[pve-devel] VM locked after failed Snapshot

Mon Sep 8 21:30:27 CEST 2014

Am 08.09.2014 17:01, schrieb Dietmar Maurer:
>> I see multiple problems here.
>>
>> 1.) lock state should be removed by PVE in case of a failure.
>>
>> Currently snapshot_create calls snapshot_prepare to set the lock. And at the end
>> snapshot_commit deletes the log.
>>
>> But currently in case of $err
>>
>>      if ($err) {
>>          warn "snapshot create failed: starting cleanup\n";
>>          eval { snapshot_delete($vmid, $snapname, 0, $drivehash); };
>>          warn $@ if $@;
>>          die $err;
>>      }
>>
>> The lock isn't removed.
>
> That is the intention of the look. We cannot commit, and rollback also fails. So we
> keep the lock to indicate the need for operator invention.

Mhm but if a snapshot fails it fails - there is an error message. What 
is the reason to keep the lock?

>> What is the correct way to remove a lock in this case?
>
> Manually edit the config. Not sure if there is a better way.
>>
>> 2.) in case of an unexpected failure or signal ceph/rbd does not remove it's
>> watcher from the image. So the snapshot_delete failed in this case.
>>
>> Output:
>>
>> image has watchers - not removing
>> Removing image: 0% complete...failed.
>> rbd: error: image still has watchers
>>
>> rbd has an automatic timeout after 30s should PVE handle this by waiting 30s
>> and try it again?
>
> What are 'unexpected failures'? I think such things should be handled inside librbd?

In this case the rbd command had "received interupt"? May be a restart 
fo the pveproxy or api daemon? It is handles inside librbd ;-) It's the 
30s timeout ;-)

Stefan