[pve-devel] migrate local -> drbd fails with vanished job

Roland Kammerer roland.kammerer at linbit.com
Mon Nov 29 09:48:41 CET 2021


Hi Fabian,

thanks for the walk through, that clears things up a bit.

On Fri, Nov 26, 2021 at 05:06:05PM +0100, Fabian Grünbichler wrote:
> are there any messages on either end in the journal?

I'm afraid they are not too useful, but maybe you see something that
rings any bells.

Some notes:
- The first DRBD Primary is us forcing one node to be Primary (and
  switching it back to Secondary immediately), just so that we kick off
  the initial sync. The second Primary then might very well be starting
  the VM as you explained.
- The destination disk is about 500 MB larger than the source, which was
  just me adding some extra space in my debug code in the plugin to
  avoid any rounding errors/disk slightly too small errors.

What could be a problem here? Different sizes? Different block size? If
it does not fail somewhere else/earlier, what should make nbd writes
special? Strange.

Regards, rck

source node (.119/pve):
Nov 29 09:06:01 pve Controller[1213]: 09:06:01.628 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Drbd-auto-verify-Algo for vm-100-disk-1 automatically set to crct10dif-pclmul
Nov 29 09:06:12 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:06:12 pve pvedaemon[3660]: Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393.
Nov 29 09:06:12 pve pvedaemon[3660]: VM 100 qmp command failed - VM 100 qmp command 'block-job-cancel' failed - Block job 'drive-scsi0' not found
Nov 29 09:06:13 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:06:13 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:06:14 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:06:15 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:06:16 pve pvedaemon[3660]: migration problems
Nov 29 09:06:16 pve pvedaemon[1198]: <root at pam> end task UPID:pve:00000E4C:00008799:61A489E7:qmigrate:100:root at pam: migration problems
Nov 29 09:10:14 pve pvedaemon[1198]: <root at pam> starting task UPID:pve:000013D6:0000EAFA:61A48AE6:qmigrate:100:root at pam:
Nov 29 09:10:15 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:15 pve Controller[1213]: 09:10:15.579 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Volume definition with number '0' successfully  created in resource definition 'vm-100-disk-1'.
Nov 29 09:10:15 pve Controller[1213]: 09:10:15.651 [MainWorkerPool-1] INFO  LINSTOR/Controller - SYSTEM - Drbd-auto-verify-Algo for vm-100-disk-1 automatically set to crct10dif-pclmul
Nov 29 09:10:25 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:26 pve pvedaemon[5078]: Use of uninitialized value $qga in concatenation (.) or string at /usr/share/perl5/PVE/QemuServer.pm line 7393.
Nov 29 09:10:26 pve pvedaemon[5078]: VM 100 qmp command failed - VM 100 qmp command 'block-job-cancel' failed - Block job 'drive-scsi0' not found
Nov 29 09:10:27 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:27 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:28 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:28 pve pmxcfs[982]: [status] notice: received log
Nov 29 09:10:29 pve pvedaemon[5078]: migration problems
Nov 29 09:10:29 pve pvedaemon[1198]: <root at pam> end task UPID:pve:000013D6:0000EAFA:61A48AE6:qmigrate:100:root at pam: migration problems

target node (.120/pvf):
Nov 29 09:06:16 pvf sshd[2645]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:06:16 pvf systemd[1]: Started Session 8 of user root.
Nov 29 09:06:16 pvf systemd-logind[711]: New session 8 of user root.
Nov 29 09:06:16 pvf sshd[2645]: Received disconnect from 10.43.8.119 port 60698:11: disconnected by user
Nov 29 09:06:16 pvf sshd[2645]: Disconnected from user root 10.43.8.119 port 60698
Nov 29 09:06:16 pvf sshd[2645]: pam_unix(sshd:session): session closed for user root
Nov 29 09:06:16 pvf systemd-logind[711]: Session 8 logged out. Waiting for processes to exit.
Nov 29 09:06:16 pvf systemd[1]: session-8.scope: Succeeded.
Nov 29 09:06:16 pvf systemd-logind[711]: Removed session 8.
Nov 29 09:06:16 pvf pmxcfs[897]: [status] notice: received log
Nov 29 09:10:14 pvf pmxcfs[897]: [status] notice: received log
Nov 29 09:10:14 pvf sshd[3188]: Accepted publickey for root from 10.43.8.119 port 60700 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:14 pvf sshd[3188]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:14 pvf systemd[1]: Started Session 9 of user root.
Nov 29 09:10:14 pvf systemd-logind[711]: New session 9 of user root.
Nov 29 09:10:14 pvf sshd[3188]: Received disconnect from 10.43.8.119 port 60700:11: disconnected by user
Nov 29 09:10:14 pvf sshd[3188]: Disconnected from user root 10.43.8.119 port 60700
Nov 29 09:10:14 pvf sshd[3188]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:14 pvf systemd-logind[711]: Session 9 logged out. Waiting for processes to exit.
Nov 29 09:10:14 pvf systemd[1]: session-9.scope: Succeeded.
Nov 29 09:10:14 pvf systemd-logind[711]: Removed session 9.
Nov 29 09:10:14 pvf sshd[3195]: Accepted publickey for root from 10.43.8.119 port 60702 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:14 pvf sshd[3195]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:14 pvf systemd[1]: Started Session 10 of user root.
Nov 29 09:10:14 pvf systemd-logind[711]: New session 10 of user root.
Nov 29 09:10:15 pvf qm[3201]: <root at pam> starting task UPID:pvf:00000C82:0000EA8E:61A48AE7:qmstart:100:root at pam:
Nov 29 09:10:15 pvf qm[3202]: start VM 100: UPID:pvf:00000C82:0000EA8E:61A48AE7:qmstart:100:root at pam:
Nov 29 09:10:15 pvf Satellite[812]: 09:10:15.686 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'vm-100-disk-1' created for node 'pvf'.
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Starting worker thread (from drbdsetup [3254])
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: meta-data IO uses: blk-bio
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: rs_discard_granularity feature disabled
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: disk( Diskless -> Attaching )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: Maximum number of peer devices = 7
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Method to ensure write ordering: flush
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: drbd_bm_resize called with capacity == 3128600
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: resync bitmap: bits=391075 words=42777 pages=84
Nov 29 09:10:19 pvf kernel: drbd1001: detected capacity change from 0 to 3128600
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: size = 1528 MB (1564300 KB)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: recounting of set bits took additional 0ms
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: disk( Attaching -> Inconsistent )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: attached to current UUID: 0000000000000004
Nov 29 09:10:19 pvf Satellite[812]: 09:10:19.270 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Primary Resource vm-100-disk-1
Nov 29 09:10:19 pvf Satellite[812]: 09:10:19.271 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Primary bool set on Resource vm-100-disk-1
Nov 29 09:10:19 pvf Satellite[812]: 09:10:19.293 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'vm-100-disk-1' updated for node 'pvf'.
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: rs_discard_granularity feature disabled
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Preparing cluster-wide state change 126926591 (0->-1 7683/4609)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Committing cluster-wide state change 126926591 (0ms)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: role( Secondary -> Primary )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: disk( Inconsistent -> UpToDate )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: size = 1528 MB (1564300 KB)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Forced to consider local data as UpToDate!
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: new current UUID: E1541A8E1BAC0761 weak: FFFFFFFFFFFFFFFE
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: role( Primary -> Secondary )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Preparing cluster-wide state change 594297135 (0->-1 3/1)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: Committing cluster-wide state change 594297135 (0ms)
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: role( Secondary -> Primary )
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1: role( Primary -> Secondary )
Nov 29 09:10:19 pvf Satellite[812]: 09:10:19.589 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'vm-100-disk-1' updated for node 'pvf'.
Nov 29 09:10:19 pvf kernel: drbd vm-100-disk-1/0 drbd1001: rs_discard_granularity feature disabled
Nov 29 09:10:24 pvf systemd[1]: Started 100.scope.
Nov 29 09:10:24 pvf systemd-udevd[3410]: Using default interface naming scheme 'v247'.
Nov 29 09:10:24 pvf systemd-udevd[3410]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 29 09:10:25 pvf kernel: device tap100i0 entered promiscuous mode
Nov 29 09:10:25 pvf systemd-udevd[3419]: Using default interface naming scheme 'v247'.
Nov 29 09:10:25 pvf systemd-udevd[3419]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 29 09:10:25 pvf systemd-udevd[3410]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 29 09:10:25 pvf systemd-udevd[3434]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 29 09:10:25 pvf systemd-udevd[3434]: Using default interface naming scheme 'v247'.
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 29 09:10:25 pvf kernel: device fwln100i0 entered promiscuous mode
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Nov 29 09:10:25 pvf kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 29 09:10:25 pvf kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 29 09:10:25 pvf kernel: device fwpr100p0 entered promiscuous mode
Nov 29 09:10:25 pvf kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Nov 29 09:10:25 pvf kernel: vmbr0: port 2(fwpr100p0) entered forwarding state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Nov 29 09:10:25 pvf kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Nov 29 09:10:25 pvf kernel: drbd vm-100-disk-1: Preparing cluster-wide state change 523588126 (0->-1 3/1)
Nov 29 09:10:25 pvf kernel: drbd vm-100-disk-1: Committing cluster-wide state change 523588126 (0ms)
Nov 29 09:10:25 pvf kernel: drbd vm-100-disk-1: role( Secondary -> Primary )
Nov 29 09:10:25 pvf qm[3201]: <root at pam> end task UPID:pvf:00000C82:0000EA8E:61A48AE7:qmstart:100:root at pam: OK
Nov 29 09:10:25 pvf sshd[3195]: Received disconnect from 10.43.8.119 port 60702:11: disconnected by user
Nov 29 09:10:25 pvf sshd[3195]: Disconnected from user root 10.43.8.119 port 60702
Nov 29 09:10:25 pvf sshd[3195]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:25 pvf systemd-logind[711]: Session 10 logged out. Waiting for processes to exit.
Nov 29 09:10:25 pvf systemd[1]: session-10.scope: Succeeded.
Nov 29 09:10:25 pvf systemd-logind[711]: Removed session 10.
Nov 29 09:10:25 pvf sshd[3463]: Accepted publickey for root from 10.43.8.119 port 60706 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:25 pvf sshd[3463]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:25 pvf systemd[1]: Started Session 11 of user root.
Nov 29 09:10:25 pvf systemd-logind[711]: New session 11 of user root.
Nov 29 09:10:26 pvf sshd[3470]: Accepted publickey for root from 10.43.8.119 port 60708 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:26 pvf sshd[3470]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:26 pvf systemd[1]: Started Session 12 of user root.
Nov 29 09:10:26 pvf systemd-logind[711]: New session 12 of user root.
Nov 29 09:10:27 pvf qm[3476]: <root at pam> starting task UPID:pvf:00000D95:0000EF22:61A48AF3:qmstop:100:root at pam:
Nov 29 09:10:27 pvf qm[3477]: stop VM 100: UPID:pvf:00000D95:0000EF22:61A48AF3:qmstop:100:root at pam:
Nov 29 09:10:27 pvf QEMU[3406]: kvm: terminating on signal 15 from pid 3477 (task UPID:pvf:00000D95:0000EF22:61A48AF3:qmstop:100:root at pam:)
Nov 29 09:10:27 pvf kernel: drbd vm-100-disk-1: role( Primary -> Secondary )
Nov 29 09:10:27 pvf qm[3476]: <root at pam> end task UPID:pvf:00000D95:0000EF22:61A48AF3:qmstop:100:root at pam: OK
Nov 29 09:10:27 pvf sshd[3470]: Received disconnect from 10.43.8.119 port 60708:11: disconnected by user
Nov 29 09:10:27 pvf sshd[3470]: Disconnected from user root 10.43.8.119 port 60708
Nov 29 09:10:27 pvf sshd[3470]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:27 pvf systemd[1]: session-12.scope: Succeeded.
Nov 29 09:10:27 pvf systemd-logind[711]: Session 12 logged out. Waiting for processes to exit.
Nov 29 09:10:27 pvf systemd-logind[711]: Removed session 12.
Nov 29 09:10:27 pvf sshd[3479]: Accepted publickey for root from 10.43.8.119 port 60710 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:27 pvf sshd[3479]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:27 pvf systemd[1]: Started Session 13 of user root.
Nov 29 09:10:27 pvf systemd-logind[711]: New session 13 of user root.
Nov 29 09:10:27 pvf kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Nov 29 09:10:27 pvf kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 29 09:10:27 pvf kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 29 09:10:27 pvf kernel: device fwln100i0 left promiscuous mode
Nov 29 09:10:27 pvf kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Nov 29 09:10:27 pvf kernel: device fwpr100p0 left promiscuous mode
Nov 29 09:10:27 pvf kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Nov 29 09:10:27 pvf qmeventd[719]: read: Connection reset by peer
Nov 29 09:10:27 pvf systemd[1]: 100.scope: Succeeded.
Nov 29 09:10:28 pvf pvesm[3485]: <root at pam> starting task UPID:pvf:00000DA8:0000EF87:61A48AF4:imgdel:100 at drbdstorage:root at pam:
Nov 29 09:10:28 pvf Satellite[812]: 09:10:28.144 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Resource 'vm-100-disk-1' updated for node 'pvf'.
Nov 29 09:10:28 pvf kernel: drbd vm-100-disk-1/0 drbd1001: disk( UpToDate -> Detaching )
Nov 29 09:10:28 pvf kernel: drbd vm-100-disk-1/0 drbd1001: disk( Detaching -> Diskless )
Nov 29 09:10:28 pvf kernel: drbd vm-100-disk-1/0 drbd1001: drbd_bm_resize called with capacity == 0
Nov 29 09:10:28 pvf kernel: drbd vm-100-disk-1: Terminating worker thread
Nov 29 09:10:28 pvf pvesm[3485]: <root at pam> end task UPID:pvf:00000DA8:0000EF87:61A48AF4:imgdel:100 at drbdstorage:root at pam: OK
Nov 29 09:10:28 pvf sshd[3479]: Received disconnect from 10.43.8.119 port 60710:11: disconnected by user
Nov 29 09:10:28 pvf sshd[3479]: Disconnected from user root 10.43.8.119 port 60710
Nov 29 09:10:28 pvf sshd[3479]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:28 pvf systemd-logind[711]: Session 13 logged out. Waiting for processes to exit.
Nov 29 09:10:28 pvf systemd[1]: session-13.scope: Succeeded.
Nov 29 09:10:28 pvf systemd-logind[711]: Removed session 13.
Nov 29 09:10:28 pvf sshd[3463]: Received disconnect from 10.43.8.119 port 60706:11: disconnected by user
Nov 29 09:10:28 pvf sshd[3463]: Disconnected from user root 10.43.8.119 port 60706
Nov 29 09:10:28 pvf sshd[3463]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:28 pvf systemd-logind[711]: Session 11 logged out. Waiting for processes to exit.
Nov 29 09:10:28 pvf systemd[1]: session-11.scope: Succeeded.
Nov 29 09:10:28 pvf systemd-logind[711]: Removed session 11.
Nov 29 09:10:29 pvf sshd[3534]: Accepted publickey for root from 10.43.8.119 port 60712 ssh2: RSA SHA256:3YNHg8CC3M6IPFUA9PlP5B7wnm2nbFLpnXd09j6k5yU
Nov 29 09:10:29 pvf sshd[3534]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 29 09:10:29 pvf systemd[1]: Started Session 14 of user root.
Nov 29 09:10:29 pvf systemd-logind[711]: New session 14 of user root.
Nov 29 09:10:29 pvf sshd[3534]: Received disconnect from 10.43.8.119 port 60712:11: disconnected by user
Nov 29 09:10:29 pvf sshd[3534]: Disconnected from user root 10.43.8.119 port 60712
Nov 29 09:10:29 pvf sshd[3534]: pam_unix(sshd:session): session closed for user root
Nov 29 09:10:29 pvf systemd-logind[711]: Session 14 logged out. Waiting for processes to exit.
Nov 29 09:10:29 pvf systemd[1]: session-14.scope: Succeeded.
Nov 29 09:10:29 pvf systemd-logind[711]: Removed session 14.
Nov 29 09:10:29 pvf pmxcfs[897]: [status] notice: received log





More information about the pve-devel mailing list