[pve-devel] [RFC qemu-server] vm_resume: correctly honor $nocheck

Sat May 25 14:32:27 CEST 2019

On 5/23/19 9:22 PM, Fabian Grünbichler wrote:
> for both vm_mon_cmd calls. under certain circumstances, the following
> sequence of events can otherwise fail when live-migrating under load:
> 
> S...source node
> T...target node
> 
> 0: migration is complete, handover from S to T starts
> 1: S: logically move VM config file from S to T via rename()
> 2: S: rename returns, config file is (visibly) moved on S
> 3: S: trigger resume on T via mtunnel
> 4a: T: call vm_resume while config file move is not yet visible on T
> 4b: T: call vm_resume while config file move is already visible on T
> 
> 4a instead of 4b means vm_mon_cmd will die in check_running unless
> vm_mon_cmd_nocheck is used.
> 
> under heavy pmxcfs load and a slow cluster/corosync network, there can
> be a few seconds of delay between 1 and 2, with a subsequent race ending in 4a
> instead of 4b.
> 
> this issue was reported to occur on bulk migrations.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
> RFC mainly since I am not entirely sure whether we should also take a closer
> look at whether there is a bug in pmxcfs somewhere here..
> 
> See the link below for user reports:
> https://forum.proxmox.com/threads/random-vm-ends-up-in-paused-state-after-bulk-migrate.54403/
> 

applied, thanks a lot!

We now just need to find someone we can force^W motivate to work on the
proposed changes of pulling stuff into mtunnel commands :-) 
While we can to this anytime, thanks to the versioning of the mtunnel command
line interface from you, it could be nice to do it with 6.0 as then we only
would need to keep the "receiver" part, as we want to allow migrating from
old -> new, but could already drop the "sender" parts, as we do and cannot
really care about new -> old migrations, at least if I do not miss anything.