[pve-devel] [RFC qemu-server] vm_resume: correctly honor $nocheck

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri May 24 09:36:14 CEST 2019


On Fri, May 24, 2019 at 08:24:17AM +0200, Dominik Csapak wrote:
> LGTM, i introduced this seemingly sometime last year... (oops)
> 
> the question remains why it sometimes takes so long for a rename to
> be propagated

the race window is actually very small - I guess it gets a bit bigger
and thus triggers more easily with the additional load.

the delay that the rename takes to return can go up into the seconds
range (which is okay - if the pmxcfs is very very busy, write operations
can block a bit, I tested with two nodes writing non-stop ;)).

the delay between visibility on source and target is so small that it is
within the margin of errors (we are talking about measuring timestamps
across node boundaries after all).

> in my opinion this violates the assumptions we make regarding ownership
> of files/vms since it seems here that nobody own the vm when this happens
> (the source node believes the target is the owner and vice versa)

maybe we can take a closer look next week at pmxcfs debug output.. we
don't have many instances of moving ownership from one node to the other
though, and migrating happens under a config lock anyway so modulo this
missing nocheck I don't see a way that this is problematic..

it's probably an issue of node T having received and acked the change,
but not yet fully processed it. if you ack the change after making it
visible, you have the reverse problem (T getting updated before S).




More information about the pve-devel mailing list