[PVE-User] Problem with QEMU drive-mirror after cancelling VM disk move

Wed Apr 1 08:38:21 CEST 2020

On March 31, 2020 5:07 pm, Mikhail wrote:
> On 3/31/20 2:53 PM, Fabian Grünbichler wrote:
>> you should be able to manually clean the messup using the QMP/monitor 
>> interface:
>> 
>> `man qemu-qmp-ref` gives a detailed tour, you probably want
>> `query-block-jobs` and `query-block`, and then, depending on the output
>> `block-job-cancel` or `block-job-complete`.
>> 
>> the HMP interface accessible via 'qm monitor <VMID>' has slightly 
>> different commands: `info block -v`, `info block-jobs` and 
>> `block_job_cancel`/`block_job_complete` ('_' instead of '-').
> 
> Thanks for your prompt response.
> I've tried the following under VM's "Monitor" section within Proxmox WEB
> GUI:
> 
> # info block-jobs
> Type mirror, device drive-scsi0: Completed 6571425792 of 10725883904
> bytes, speed limit 0 bytes/s
> 
> and after that I tried to cancel this block job using:
> 
> # block_job_cancel -f drive-scsi0
> 
> However, the block job is still there even after 3 attempts trying to
> cancel it:
> 
> # info block-jobs
> Type mirror, device drive-scsi0: Completed 6571425792 of 10725883904
> bytes, speed limit 0 bytes/s
> 
> Same happens when I connect to it via root console using "qm monitor".
> 
> I guess this is now completely stuck and the only way would be to power
> off/on the VM?

well, you could investigate more with the QMP interface (it gives a lot 
more information). but yes, a shutdown/boot cycle should get rid of the 
block-job.

>> feel free to post the output of the query/info commands before deciding 
>> how to proceed. the complete task log of the failed 'move disk' 
>> operation would also be interesting, if it is still available.
> 
> I just asked my colleague who was cancelling this Disk move operation.
> He said he had to cancel it because it was stuck at 61.27%. The Disk
> move task log is below, I truncated repeating lines:
> 
> deprecated setting 'migration_unsecure' and new 'migration: type' set at
> same time! Ignore 'migration_unsecure'
> create full clone of drive scsi0 (nvme-local-vm:123/vm-123-disk-0.qcow2)
> drive mirror is starting for drive-scsi0
> drive-scsi0: transferred: 24117248 bytes remaining: 10713300992 bytes
> total: 10737418240 bytes progression: 0.22 % busy: 1 ready: 0
> drive-scsi0: transferred: 2452619264 bytes remaining: 6635388928 bytes
> total: 9088008192 bytes progression: 26.99 % busy: 1 ready: 0
> drive-scsi0: transferred: 3203399680 bytes remaining: 6643777536 bytes
> total: 9847177216 bytes progression: 32.53 % busy: 1 ready: 0
> drive-scsi0: transferred: 4001366016 bytes remaining: 6632243200 bytes
> total: 10633609216 bytes progression: 37.63 % busy: 1 ready: 0
> drive-scsi0: transferred: 4881121280 bytes remaining: 5856296960 bytes
> total: 10737418240 bytes progression: 45.46 % busy: 1 ready: 0
> drive-scsi0: transferred: 6554648576 bytes remaining: 4171235328 bytes
> total: 10725883904 bytes progression: 61.11 % busy: 1 ready: 0
> drive-scsi0: transferred: 6571425792 bytes remaining: 4154458112 bytes
> total: 10725883904 bytes progression: 61.27 % busy: 1 ready: 0
> [ same line repeats like 250+ times ]
> drive-scsi0: transferred: 6571425792 bytes remaining: 4154458112 bytes
> total: 10725883904 bytes progression: 61.27 % busy: 1 ready: 0
> drive-scsi0: transferred: 6571425792 bytes remaining: 4154458112 bytes
> total: 10725883904 bytes progression: 61.27 % busy: 1 ready: 0
> drive-scsi0: transferred: 6571425792 bytes remaining: 4154458112 bytes
> total: 10725883904 bytes progression: 61.27 % busy: 1 ready: 0
> drive-scsi0: Cancelling block job

was the target some sort of network storage that started hanging? this 
looks rather unusual..