[pve-devel] [PATCH v2 qemu-server 1/4] migration: avoid crash with heavy IO on local VM disk

Fiona Ebner f.ebner at proxmox.com
Wed Jul 3 15:49:01 CEST 2024


Am 03.07.24 um 15:44 schrieb Fiona Ebner:
> Am 03.07.24 um 15:15 schrieb Fabian Grünbichler:
>> On May 28, 2024 10:50 am, Fiona Ebner wrote:
>>> +	eval {
>>> +	    mon_cmd(
>>> +		$vmid,
>>> +		"block-job-change",
>>> +		id => $job,
>>> +		type => 'mirror',
>>> +		'copy-mode' => 'write-blocking',
>>> +	    );
>>> +	    $switching->{$job} = 1;
>>> +	};
>>> +	die "could not switch mirror job $job to active mode - $@\n" if $@;
>>> +    }
>>> +
>>> +    while (1) {
>>> +	my $stats = mon_cmd($vmid, "query-block-jobs");
>>> +
>>> +	my $running_jobs = {};
>>> +	$running_jobs->{$_->{device}} = $_ for $stats->@*;
>>> +
>>> +	for my $job (sort keys $switching->%*) {
>>> +	    if ($running_jobs->{$job}->{'actively-synced'}) {
>>> +		print "$job: successfully switched to actively synced mode\n";
>>> +		delete $switching->{$job};
>>> +	    }
>>> +	}
>>> +
>>> +	last if scalar(keys $switching->%*) == 0;
>>> +
>>> +	sleep 1;
>>> +    }
>>
>> so what could be the cause here for a job not switching? and do we
>> really want to loop forever if it happens?
>>
> 
> That should never happen. The 'block-job-change' QMP command already
> succeeded. That means further writes will be done synchronously to the
> target. Once the remaining dirty parts have been mirrored by the
> background iteration, the actively-synced flag will be set and we break
> out of the loop.
> 
> We got to the ready condition already before doing the switch, getting
> there again is even easier after the switch:
> https://gitlab.com/qemu-project/qemu/-/blob/stable-9.0/block/mirror.c?ref_type=heads#L1078
> 

Well, "should". If a job fails after switching, then we'd actually be
stuck. Will write a v2 that is robust against that.




More information about the pve-devel mailing list