[pve-devel] [PATCH qemu-server 0/2] migration: fix sporadic nbd-server-stop timeout

Alexandre Derumier aderumier at odiso.com
Fri Sep 29 10:28:57 CEST 2023


Hi,

We had some sporadic nbd-stop error when trying to migrate vm with rbd storage + writeback between 2 differents cluster:
(This is without my other targetcpu patch)


2023-09-28 16:20:39 ERROR: error - tunnel command '{"cmd":"nbdstop"}' failed - failed to handle 'nbdstop' command - VM 140 qmp command 'nbd-server-stop' failed - got timeout
2023-09-28 16:20:39 ERROR: migration finished with problems (duration 00:01:42)


I'm not sure, maybe it's related to writeback, because it never happend with a fresh started vm, but vms running since some time can trigger this.
(I'm not sure, maybe nbd need to flush pending datas in cache ?)


Currently, the tunnel command have a 30s timeout, but the qmp command is only at 5s.
Also the tunnel v2 command don't have any eval, so the migration abort keeping both source && target vm locked.
unlocking target vm and resume it manually is working, so it really seem to be a too low timeout.


Alexandre Derumier (2):
  nbd_stop: increase timeout to 25s
  migration: add missing eval on nbdstop with tunnel v2.

 PVE/QemuMigrate.pm | 8 +++++++-
 PVE/QemuServer.pm  | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.39.2





More information about the pve-devel mailing list