[pve-devel] migrate local -> drbd fails with vanished job
Roland Kammerer
roland.kammerer at linbit.com
Fri Nov 26 14:03:57 CET 2021
Dear PVE devs,
While most of our users start with fresh VMs on DRBD storage, from time
to time people try to migrate a local VM to DRBD storage. This currently fails.
Migrating VMs from DRBD to DRBD works.
I added some debug code to PVE/QemuServer.pm, which looks like the location
things go wrong, or at least where I saw them going wrong:
root at pve:/usr/share/perl5/PVE# diff -Nur QemuServer.pm{.orig,}
--- QemuServer.pm.orig 2021-11-26 11:27:28.879989894 +0100
+++ QemuServer.pm 2021-11-26 11:26:30.490988789 +0100
@@ -7390,6 +7390,8 @@
$completion //= 'complete';
$op //= "mirror";
+ print "$vmid, $vmiddst, $jobs, $completion, $qga, $op \n";
+ { use Data::Dumper; print Dumper($jobs); };
eval {
my $err_complete = 0;
@@ -7419,6 +7421,7 @@
next;
}
+ print "vanished: $vanished\n"; # same as !defined($jobs)
die "$job_id: '$op' has been cancelled\n" if !defined($job);
my $busy = $job->{busy};
With that in place, I try to live migrate the running VM from node "pve" to
"pvf":
2021-11-26 11:29:10 starting migration of VM 100 to node 'pvf' (xx.xx.xx.xx)
2021-11-26 11:29:10 found local disk 'local-lvm:vm-100-disk-0' (in current VM config)
2021-11-26 11:29:10 starting VM 100 on remote node 'pvf'
2021-11-26 11:29:18 volume 'local-lvm:vm-100-disk-0' is 'drbdstorage:vm-100-disk-1' on the target
2021-11-26 11:29:18 start remote tunnel
2021-11-26 11:29:19 ssh tunnel ver 1
2021-11-26 11:29:19 starting storage migration
2021-11-26 11:29:19 scsi0: start migration to nbd:unix:/run/qemu-server/100_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393.
100, 100, HASH(0x557b44474a80), skip, , mirror
$VAR1 = {
'drive-scsi0' => {}
};
vanished: 1
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2021-11-26 11:29:19 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
2021-11-26 11:29:19 aborting phase 2 - cleanup resources
2021-11-26 11:29:19 migrate_cancel
2021-11-26 11:29:22 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems
What I also see on "pvf" is that the plugin actually creates the DRBD block
device, and "something" even tries to write data to it, as the DRBD devices
auto-promotes to Primary.
Any hints how I can debug that further? The block device should be ready at
that point. What is going on in the background here?
FWIW the plugin can be found here:
https://github.com/linbit/linstor-proxmox
Regards, rck
More information about the pve-devel
mailing list