[pve-devel] [RFC qemu-server] vm_resume: correctly honor $nocheck

Thomas Lamprecht t.lamprecht at proxmox.com
Fri May 24 09:52:53 CEST 2019


On 5/24/19 9:36 AM, Fabian Grünbichler wrote:
> On Fri, May 24, 2019 at 08:24:17AM +0200, Dominik Csapak wrote:
>> LGTM, i introduced this seemingly sometime last year... (oops)
>>
>> the question remains why it sometimes takes so long for a rename to
>> be propagated
> 
> the race window is actually very small - I guess it gets a bit bigger
> and thus triggers more easily with the additional load.
> 
> the delay that the rename takes to return can go up into the seconds
> range (which is okay - if the pmxcfs is very very busy, write operations
> can block a bit, I tested with two nodes writing non-stop ;)).
> 
> the delay between visibility on source and target is so small that it is
> within the margin of errors (we are talking about measuring timestamps
> across node boundaries after all).
> 
>> in my opinion this violates the assumptions we make regarding ownership
>> of files/vms since it seems here that nobody own the vm when this happens
>> (the source node believes the target is the owner and vice versa)
> 
> maybe we can take a closer look next week at pmxcfs debug output.. we
> don't have many instances of moving ownership from one node to the other
> though, and migrating happens under a config lock anyway so modulo this
> missing nocheck I don't see a way that this is problematic..
> 
> it's probably an issue of node T having received and acked the change,
> but not yet fully processed it. if you ack the change after making it
> visible, you have the reverse problem (T getting updated before S).

Virtual Synchrony / TOTEM [0] just says that if node a sees events happen
in order: A -> B -> C then all nodes will see it in this order.

But it's _not_ guaranteed that they see it at the same instant, that's not
really possible.

Corosync uses Extendend Virtual Synchrony[1] which in addition to above
also ensures that group changes are ordered, but we undo this in our
distributed final state machine (dfsm) in pmxcfs to virtual synchrony
again, I won't say that there' so bug, but at least this beahvior is not
one.

You cannot really mix pmxcfs / totem / cpg operations with a SSH connection
and assume any order guarantees between them, there are none.

One would need to also sent a "event" over pmxcfs to signal the target
note about the continue of the file, this then _would_ be ordered correctly,
else, yes, there's a bug.

[0]: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.767
[1]: https://pdfs.semanticscholar.org/99a2/89cf0c97804cec4bd6ad70459d21267d525a.pdf





More information about the pve-devel mailing list