[pve-devel] [PATCH v2 qemu-server 1/1] implement external snapshot

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Oct 23 12:14:28 CEST 2024


> Alexandre Derumier via pve-devel <pve-devel at lists.proxmox.com> hat am 30.09.2024 13:31 CEST geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier at groupe-cyllene.com>
> ---
>  PVE/QemuServer.pm | 108 ++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 95 insertions(+), 13 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..1523df15 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -1549,7 +1549,11 @@ sub print_drive_commandline_full {
>      } else {
>  	if ($storeid) {
>  	    $path = PVE::Storage::path($storecfg, $volid);
> -	    $format //= qemu_img_format($scfg, $volname);
> +	    if ($scfg->{snapext}) {
> +		$format //= qemu_img_format($scfg, $path);
> +	    } else {
> +		$format //= qemu_img_format($scfg, $volname);
> +	    }

another reason to forbid raw-based snapshotting? ;)

>  	} else {
>  	    $path = $volid;
>  	    $format //= "raw";
> @@ -4713,9 +4717,31 @@ sub qemu_volume_snapshot {
>      my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
>  
>      my $running = check_running($vmid);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;

forbidden post-if + declaration, see https://pve.proxmox.com/wiki/Perl_Style_Guide

> +    if ($do_snapshots_with_qemu) {
> +	if($do_snapshots_with_qemu == 2) {

wrong nesting - this should be

if ($do_snapshots_with_qemu == 1) {
..
} elsif ($do_snapshots_with_qemu == 2) {
..
} else {
..
}

> +	    my $snapshot_file = PVE::Storage::path($storecfg, $volid, $snap);
> +	    #allocate volume is external snapshot is a block device
> +	    my $snap_volid = undef;
> +	    if ($snapshot_file =~ m|^/dev/.+|) {
> + 		my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
> +		my $size = PVE::Storage::volume_size_info($storecfg, $volid, 5);
> +		#add 100M for qcow2 headers
> +		$size = int($size/1024) + (100*1024);
> +		my $snap_volname = $volname."-snap-$snap";
> +		$snap_volid = PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'raw', $snap_volname, $size);
> +		PVE::Storage::activate_volumes($storecfg, [$snap_volid]);
> +	    }

haven't tested this part

> +
> +	    eval { mon_cmd($vmid, 'blockdev-snapshot-sync', device => $deviceid, 'snapshot-file' => $snapshot_file, format => 'qcow2') };

if we want the current volume to keep its name, and the snapshot volume to actually contain *that* snapshot's data, we need some sort of rename dance here as well.. i.e., rename the current volume to have the snapshot volume name, then snapshot it back into the "current" name. not sure what the proper qmp runes would be to achieve that?

maybe (untested!):

let's say "vm-100-disk-1.qcow2" is the current volume. it might or might not have snapshots/backing files already.

1. snapshot into snapshot volume "vm-100-disk-1-snap-foobar.qcow2"

"vm-100-disk-1.qcow2" is the backing file of the new "vm-100-disk-1-snap-foobar.qcow2" volume, and now contains the delta for the snapshot "foobar"
2. block-stream "vm-100-disk-1.qcow2", potentially with its backing file as base, into "vm-100-disk-1-snap-foobar.qcow2"
now "vm-100-disk-1-snap-foobar.qcow2" should contain the delta of snapshot "foobar" to the previous snapshot (if one exists, or the complete data otherwise)

3. delete "vm-100-disk-1.qcow2" on the storage layer now (it's no longer part of the backing chain)

4. snapshot "vm-100-disk-1-snap-foobar.qcow2" into the now free "vm-100-disk-1.qcow2" volume

then we end up with a snapshot volume representing the snapshot delta, and a current volume on top that gets the new writes?

steps 1-3 are just preparation/renaming the "live" top overlay, 4 is the actual snapshotting part. but of course, this causes I/O, so would require further adaptations to work in a consistent fashion without a huge downtime. 

alternatively, something like this could also work (also completely untested):

1. snapshot into temp name

"vm-100-disk-1.qcow2" is now the backing file of this new volume, and contains the state for snapshot "foobar"

2. hardlink "vm-100-disk-1.qcow2" into "vm-100-disk-1-snap-foobar.qcow2"
3. QMP change-backing-file of temp volume to "vm-100-disk-1-snap-foobar.qcow2"

"vm-100-disk-1.qcow2" is now no longer part of the backing chain

4. remove "vm-100-disk-1.qcow2"
5. snapshot into "vm-100-disk-1.qcow2"
6. block-stream temp name into "vm-100-disk-1.qcow2", with "vm-100-disk-1-snap-foobar.qcow2" as base

since the temp volume is empty (VM doesn't do I/O if multiple disks are snapshotted), block-stream should be fast in this case I think..

>  
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> -	mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +	    if ($@) {
> +		PVE::Storage::vdisk_free($storecfg, $snap_volid) if $snapshot_file =~ m|^/dev/.+|;

this should check definedness of $snap_volid, instead of $snapshot_file?

> +		die $@;
> +	    }
> +	} else {
> +	    mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
> @@ -4735,13 +4761,52 @@ sub qemu_volume_snapshot_delete {
>  	});
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> -	mon_cmd(
> -	    $vmid,
> -	    'blockdev-snapshot-delete-internal-sync',
> -	    device => $attached_deviceid,
> -	    name => $snap,
> -	);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;

same post-if/declaration issue here

> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> +	if ($do_snapshots_with_qemu == 2) {

and same nesting if comment here ;)

> +
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +
> +	    my $currentpath = $snapshots->{current}->{file};
> +	    my $snappath = $snapshots->{$snap}->{file};
> +	    my $snapvolid = $snapshots->{$snap}->{volid};
> +	    return if !$snappath;  #already delete

how can this be? if the device is attached and snapshotted the snapshot must be part of the backing chain?

> +	    my $parentsnap = $snapshots->{$snap}->{parent};
> +	    die "error: we can't find a parent for this snapshot" if !$parentsnap;
> +
> +	    my $parentpath = $snapshots->{$parentsnap}->{file};
> +	    my $parentformat = $snapshots->{$parentsnap}->{'format'} if $parentsnap;
> +
> +	    print "block-commit  top:$snappath base:$parentpath\n";
> +
> +            my $job_id = "commit-$attached_deviceid";
> +	    my $jobs = {};
> +	    mon_cmd(
> +	        $vmid,
> +		'block-commit',
> +		'job-id' => $job_id,
> +		device => $attached_deviceid,
> +		top => $snappath,
> +		base => $parentpath,
> +	    );
> +	    $jobs->{$job_id} = {};
> +
> +	    #if we delete the current, block-job-complete to finish
> +	    my $completion = $currentpath eq $snappath ? 'complete' : 'auto';
> +	    qemu_drive_mirror_monitor($vmid, undef, $jobs, $completion, 0, 'commit');
> +	    #fixme. delete the disks when all jobs are ok ?
> +	    #delete the lvm volume
> +	    PVE::Storage::vdisk_free($storecfg, $snapvolid);
> +	} else {
> +	    mon_cmd(
> +	        $vmid,
> +		'blockdev-snapshot-delete-internal-sync',
> +		device => $attached_deviceid,
> +		name => $snap,
> +	    );
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot_delete(
>  	    $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> @@ -7776,6 +7841,8 @@ sub do_snapshots_with_qemu {
>  	return 1;
>      }
>  
> +    return 2 if $scfg->{snapext};
> +

that would definitely warrant a comment and/or an exhaustive check of existing call sites ;)

>      if ($volid =~ m/\.(qcow2|qed)$/){
>  	return 1;
>      }
> @@ -7849,8 +7916,23 @@ sub qemu_img_convert {
>      if ($src_storeid) {
>  	PVE::Storage::activate_volumes($storecfg, [$src_volid], $snapname);
>  	my $src_scfg = PVE::Storage::storage_config($storecfg, $src_storeid);
> -	$src_format = qemu_img_format($src_scfg, $src_volname);
> -	$src_path = PVE::Storage::path($storecfg, $src_volid, $snapname);
> +	if($src_scfg->{snapext}) {

this whole thing here is very confusing..

> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $src_volid);
> +	    $snapname = 'current' if !$snapname;
> +	    #if we don't clone the current image
> +	    #need to use the parent if available, as it's the readonly image view
> +	    #at the time of the snapshot
> +	    my $parentsnap = $snapshots->{$snapname}->{parent};
> +	    $snapname = $parentsnap if($parentsnap && $snapname ne 'current');
> +	    $src_format = $snapshots->{$snapname}->{format};
> +	    $src_path = $snapshots->{$snapname}->{file};
> +	    $src_volid = $snapshots->{$snapname}->{volid};
> +	    $snapname = undef;
> +	    PVE::Storage::activate_volumes($storecfg, [$src_volid], $snapname);

$snapname is always undef for this activate_volumes invocation.. 

but this whole if seems kinda of strange, wouldn't it be enough to just call PVE::Storage::path with $snapname (to get the path to read from for cloning this snapshot or the volume itself) and then unset $snapname, or skip passing that to convert if snapshots are external?

> +	} else {
> +	    $src_format = qemu_img_format($src_scfg, $src_volname);
> +	    $src_path = PVE::Storage::path($storecfg, $src_volid, $snapname);

i.e., basically what is already done here (and if we don't support raw original volumes, then it's exactly this code?)

> +	}
>  	$src_is_iscsi = ($src_path =~ m|^iscsi://|);
>  	$cachemode = 'none' if $src_scfg->{type} eq 'zfspool';
>      } elsif (-f $src_volid || -b $src_volid) {
> @@ -7920,7 +8002,7 @@ sub qemu_img_format {
>  
>      # FIXME: this entire function is kind of weird given that `parse_volname`
>      # also already gives us a format?
> -    my $is_path_storage = $scfg->{path} || $scfg->{type} eq 'esxi';
> +    my $is_path_storage = $scfg->{path} || $scfg->{type} eq 'esxi' || $scfg->{snapext};
>  
>      if ($is_path_storage && $volname =~ m/\.($PVE::QemuServer::Drive::QEMU_FORMAT_RE)$/) {
>  	return $1;
> -- 
> 2.39.2




More information about the pve-devel mailing list