[pve-devel] [PATCH qemu-server 1/2] migration: avoid migrating disk images multiple times

Tue May 9 14:55:20 CEST 2023

On 5/9/23 09:34, Fiona Ebner wrote:
> Am 02.05.23 um 15:17 schrieb Aaron Lauterer:
>> Scan the VM config and store the volid and full path for each storage.
>> Do the same when we scan each storage.  Then we can have these
>> scenarios:
>> * multiple storage configurations might point to the same storage
>> The result is, that when scanning the storages, we find the disk image
>> multiple times.
>> -> we ignore them
>>
> 
> Having the same storage with overlapping content types is a
> configuration error (except if you use different 'content-dirs' I
> guess). We can't make guarantees for locking, which e.g. leads to races
> for allocation, it can lead to left-over references, etc.. Rather than
> trying to handle this here, I'd prefer a warning about the
> misconfiguration somewhere (maybe during storage.cfg parsing?) and/or
> error out when attempting to create such a configuration. Adding
> something in the docs also makes sense if there isn't yet.

After having a discussion with @Fabian offline, and I hope I don't forget to 
mention something:

Yes, having two storage configurations pointing to the same location should not 
happen as far as we know. For most situation where one might want to do that, 
there are other, better options to separate it on the storage level.
For example:
* ZFS and different volblock sizes -> use different base datasets for each storage
* RBD: use KRBD or not -> use RBD namespaces to separate them

But it is hard to detect that on the storage layer reliably. For example, with 
an RBD storage I might add different monitors; do they point to the same 
cluster? There is no way to tell unless we open a connection and gather the Ceph 
FSID of that cluster.
For other storage types, it would also be possible to run into similar problems 
where we cannot really tell, by the storage definition alone, if they point to 
the same location or not.

Another approach that could make a migration handle such situations better but 
should only be targeting PVE 8:

* Don't scan all storages and only look at disk images that are referenced in 
the config. With this, we should have removed most situations where aliases 
would happen, and a migration is less likely to fail, because a storage is not 
online.
* If we detect an aliased and referenced image, fail the migration with the hint 
that this setup should get fixed.

But since we would fail the migration, instead of potentially creating duplicate 
images on the target node, this is a rather breaking change -> PVE 8

I hope I summed it up correctly.