[pve-devel] [PATCH qemu-server v2] Catch qmp socket connections errors, so we can output a more specific error message

Wed Aug 2 13:48:16 CEST 2017

On 07/31/2017 03:00 PM, Fabian Grünbichler wrote:
> On Thu, Jul 27, 2017 at 11:25:41AM +0200, Emmanuel Kasper wrote:
>> It can happen that the qmp connection gets lost while mirroring a disk.
>> In that case the current block job get cancelled, but the real cause of the failure
>> is lost, becase we die() at a later step with the generic message
>> "die "$job: mirroring has been cancelled\n"
> 
> I am not quite sure I can follow.. see below
> 
>>
>> example:
>> ...
>> drive-scsi0: transferred: 5524946944 bytes remaining: 918355968 bytes total: 6443302912 bytes progression: 85.75 % busy: 1 ready: 0
>> drive-scsi0: Cancelling block job
>> drive-scsi0: Done.
>> 2017-07-26 15:39:56 ERROR: online migrate failure - mirroring error: drive-scsi0: mirroring has been cancelled
>> 2017-07-26 15:39:56 aborting phase 2 - cleanup resources
>> 2017-07-26 15:39:56 migrate_cancel
>> ...
> 
> but this must be from dying in line 6054 (caught by the eval in 6030),
> not from dying in line 6036? which means that query-block-jobs maybe
> returned an empty array (or undef?)..

yes you're right this is caused by dying in 6054
since the dying in 6054 has a misleading message I sent a new patch for that

>>
>> after patch applied:
>> 2017-07-27 09:43:37 ERROR: online migrate failure - mirroring error: lost connection to qemu machine protocol: VM 600 not running
>> 2017-07-27 09:43:37 aborting phase 2 - cleanup resources
> 
> but this would mean vm_qmp_command (called by vm_mon_cmd) died in line
> 4798, because check_running returned false??
> 
> I'd rather fix check_running returning false then, because obviously the
> VM IS running isn't it? ;)

in that case the VM was NOT running so the check_running was right :)