[pve-devel] [PATCH v2 qemu-server 1/4] migration: avoid crash with heavy IO on local VM disk
Fiona Ebner
f.ebner at proxmox.com
Wed Jul 3 15:49:01 CEST 2024
Am 03.07.24 um 15:44 schrieb Fiona Ebner:
> Am 03.07.24 um 15:15 schrieb Fabian Grünbichler:
>> On May 28, 2024 10:50 am, Fiona Ebner wrote:
>>> + eval {
>>> + mon_cmd(
>>> + $vmid,
>>> + "block-job-change",
>>> + id => $job,
>>> + type => 'mirror',
>>> + 'copy-mode' => 'write-blocking',
>>> + );
>>> + $switching->{$job} = 1;
>>> + };
>>> + die "could not switch mirror job $job to active mode - $@\n" if $@;
>>> + }
>>> +
>>> + while (1) {
>>> + my $stats = mon_cmd($vmid, "query-block-jobs");
>>> +
>>> + my $running_jobs = {};
>>> + $running_jobs->{$_->{device}} = $_ for $stats->@*;
>>> +
>>> + for my $job (sort keys $switching->%*) {
>>> + if ($running_jobs->{$job}->{'actively-synced'}) {
>>> + print "$job: successfully switched to actively synced mode\n";
>>> + delete $switching->{$job};
>>> + }
>>> + }
>>> +
>>> + last if scalar(keys $switching->%*) == 0;
>>> +
>>> + sleep 1;
>>> + }
>>
>> so what could be the cause here for a job not switching? and do we
>> really want to loop forever if it happens?
>>
>
> That should never happen. The 'block-job-change' QMP command already
> succeeded. That means further writes will be done synchronously to the
> target. Once the remaining dirty parts have been mirrored by the
> background iteration, the actively-synced flag will be set and we break
> out of the loop.
>
> We got to the ready condition already before doing the switch, getting
> there again is even easier after the switch:
> https://gitlab.com/qemu-project/qemu/-/blob/stable-9.0/block/mirror.c?ref_type=heads#L1078
>
Well, "should". If a job fails after switching, then we'd actually be
stuck. Will write a v2 that is robust against that.
More information about the pve-devel
mailing list