[pve-devel] [PATCH qemu-server v2] Fix #2412: Missing VMs in pools

Tue Oct 15 11:23:11 CEST 2019

On Mon, Oct 14, 2019 at 02:02:26PM +0200, Thomas Lamprecht wrote:
> On 10/14/19 1:10 PM, Dominic Jäger wrote:
> > Between calling vm_destroy and removing the ID from user.cfg (remove_vm_access)
> > creating a new VM with this ID was possible. VMs could go missing from pools as
> > a consequence.
> > 
> > Adding a lock solves this for clones from the same node. Additionally,
> > unlinking must happen at the very end of the deletion process to avoid that
> > other nodes use the ID in the meanwhile.
> > 
> > Co-developed-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> > Signed-off-by: Dominic Jäger <d.jaeger at proxmox.com>
> > ---
> > v1->v2: Move unlink to the end of the deletion process
> > 
> >  PVE/API2/Qemu.pm | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> > index 267a08e..f84aca7 100644
> > --- a/PVE/API2/Qemu.pm
> > +++ b/PVE/API2/Qemu.pm
> > @@ -1492,9 +1492,14 @@ __PACKAGE__->register_method({
> >  	    my $upid = shift;
> >  
> >  	    syslog('info', "destroy VM $vmid: $upid\n");
> > -	    PVE::QemuServer::vm_destroy($storecfg, $vmid, $skiplock);
> 
> so skiplock is now a dead parameter? I mean that /could/ be OK, but the
> commit never mentions anywhere why this is the case. No users, just kept
> for backward compat? If so I'd at least adapt the params description, so
> people reading man pages or CLI help output can have an idea that this
> does nothing.
> Or is it in use and still required to be handled? If so, address that.
> Either way, just "silently" removing it is not OK.
> 
Removing the parameter was an accident. I'll think about it and send a v3.

> > -	    PVE::AccessControl::remove_vm_access($vmid);
> > -            PVE::Firewall::remove_vmfw_conf($vmid);
> > +	    PVE::QemuConfig->lock_config($vmid, sub {
> > +		die "VM $vmid is running - destroy failed\n"
> > +		    if (PVE::QemuServer::check_running($vmid));
> > +		PVE::QemuServer::destroy_vm($storecfg, $vmid, 1, 0);
> > +		PVE::AccessControl::remove_vm_access($vmid);
> > +		PVE::Firewall::remove_vmfw_conf($vmid);
> > +		unlink PVE::QemuConfig->config_file($vmid);
> 
> unlink does not errors itself if the removal failed (e.g., pmxcfs went just
> inquorate, so please do a
> unlink ... or die "final removal of $vmid config failed: $!\n";
> 
> to tell an user in such a case.

Ok. Is there a reason why we don't do this in PVE::QemuServer::destroy_vm or
should we actually add the error handling there, too?
> 
> Looks OK, besides those two things.
> 
> > +	    });
> >  	};
> >  
> >  	return $rpcenv->fork_worker('qmdestroy', $vmid, $authuser, $realcmd);
> > 
> 
>