[pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Apr 2 10:10:33 CEST 2025


commit description missing here as well..

I haven't tested this (or the first patches doing the blockdev conversion) yet, but I see a few bigger design/architecture issues left (besides FIXMEs for missing pieces that previously worked ;)):

- we should probably move the decision whether a snapshot is done on the storage layer or by qemu into the control of the storage plugin, especially since we are currently cleaning that API up to allow easier implementation of external plugins
- if we do that, we can also make "uses external qcow2 snapshots" a property of the storage plugin+config to replace hard-coded checks for the snapext property or lvm+qcow2
- there are a few operations here that should not call directly into the storage plugin code or do equivalent actions, but should rather get a proper interface in that storage plugin API

the first one is the renaming of a blockdev while it is used, which is currently done like this:
-- "link" snapshot path to make it available under old and new name
-- handle blockdev additions/reopening/backing-file updates/deletions on the qemu layer
-- remove old snapshot path link
-- if LVM, rename actual volume (for non-LVM, linking followed by unlinking the source is effectively a rename already)

I wonder whether that couldn't be made more straight-forward by doing
-- rename snapshot volume/image (qemu must already have the old name open anyway and should be able to continue using it)
-- do blockdev additions/reopening/backing-file updates/deletions on the qemu layer

or is there an issue/check in qemu somewhere that prevents this approach? if not, we could just introduce a "volume_snapshot_rename" or extend rename_volume with a snapshot parameter..

the second thing that happens is deleting a snapshot volume/path, without deleting the whole snapshot.. that one we could easily support by extending volume_snapshot_delete by extending the $running parameter (e.g., passing "2") or adding a new one to signify that all the housekeeping was already done, and just the actual snapshot volume should be deleted. this shouldn't be an issue provided all such calls are guarded by first checking that we are using external snapshots..

> Alexandre Derumier via pve-devel <pve-devel at lists.proxmox.com> hat am 11.03.2025 11:29 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier at groupe-cyllene.com>
> ---
>  PVE/QemuConfig.pm       |   4 +-
>  PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++---
>  PVE/QemuServer/Drive.pm |   4 +
>  3 files changed, 220 insertions(+), 14 deletions(-)
> 
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index b60cc398..2b3acb15 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
>  
>      print "snapshotting '$device' ($drive->{file})\n";
>  
> -    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
>  }
>  
>  sub __snapshot_delete_remove_drive {
> @@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
>      my $storecfg = PVE::Storage::config();
>      my $volid = $drive->{file};
>  
> -    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
>  
>      push @$unused, $volid;
>  }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 60481acc..6ce3e9c6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4449,20 +4449,200 @@ sub qemu_block_resize {
>  }
>  
>  sub qemu_volume_snapshot {
> -    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
> -
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> -	mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;

forbidden syntax

> +    if ($do_snapshots_with_qemu) {
> +	if($do_snapshots_with_qemu == 2) {
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +	    my $parent_snap = $snapshots->{'current'}->{parent};
> +	    my $size = PVE::Storage::volume_size_info($storecfg, $volid, 5);
> +	    blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current', $snap, $parent_snap);
> +	    blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap, $size);
> +	} else {
> +	    mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
>  }
>  
> +sub blockdev_external_snapshot {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    #be sure to add drive in write mode
> +    delete($drive->{ro});

why?

> +
> +    my $new_file_blockdev = generate_file_blockdev($storecfg, $drive);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_file_blockdev);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
> +
> +    #preallocate add a new current file with reference to backing-file
> +    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
> +    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
> +    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2', $name, $size/1024, $snap_file_blockdev->{filename});

if we instead extend volume_snapshot similarly to what I describe up top (adding a parameter that renaming was already done), we don't need to extend vdisk_alloc's interface like this.. or maybe we could even combine blockdev_rename and blockdev_external_snapshot, to just call PVE::Storage::volume_snapshot to do rename+alloc, and then do the blockdev dance? in any case, this here would be the *only* external caller of vdisk_alloc with a backing file, so I don't think this is the right interface..

> +
> +    #backing need to be forced to undef in blockdev, to avoid reopen of backing-file on blockdev-add
> +    $new_fmt_blockdev->{backing} = undef;
> +
> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
> +
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev->{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
> +}
> +
> +sub blockdev_delete {
> +    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) = @_;
> +
> +    #add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_blockdev->{'node-name'}) };
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_blockdev->{'node-name'}) };
> +
> +    #delete the file (don't use vdisk_free as we don't want to delete all snapshot chain)
> +    print"delete old $file_blockdev->{filename}\n";
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($drive->{file});
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> +	PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
> +    } else {
> +	unlink($file_blockdev->{filename});
> +    }

this really needs to be handled in the storage layer

> +}
> +
> +sub blockdev_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap, $parent_snap) = @_;
> +
> +    print "rename $src_snap to $target_snap\n";
> +
> +    my $volid = $drive->{file};
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    #create a hardlink
> +    link($src_file_blockdev->{filename}, $target_file_blockdev->{filename});

this really needs to be handled in the storage layer

> +
> +    if($target_snap eq 'current' || $src_snap eq 'current') {
> +	#rename from|to current
> +
> +	#add backing to target
> +	if ($parent_snap) {
> +	    my $parent_fmt_nodename = encode_nodename('fmt', $volid, $parent_snap);
> +	    $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
> +	}
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> +	#reopen the current throttlefilter nodename with the target fmt nodename
> +	my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid, $drive);
> +	delete $drive_blockdev->{file};
> +	$drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};

these two lines can be a single line

> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$drive_blockdev]);
> +    } else {
> +	#intermediate snapshot
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> +	#reopen the parent node with the new target fmt backing node
> +	my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
> +	my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
> +	$parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-name'};
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
> +
> +	#change backing-file in qcow2 metadatas
> +	PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_blockdev->{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
> +    }
> +
> +    # delete old file|fmt nodes
> +    # add eval as reopen is auto removing the old nodename automatically only if it was created at vm start in command line argument

ugh..

> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_blockdev->{'node-name'})};
> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_blockdev->{'node-name'})};
> +
> +    unlink($src_file_blockdev->{filename});

same as above

> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    return if $scfg->{type} ne 'lvm';
> +
> +    print "rename underlay lvm volume $src_file_blockdev->{filename} to $target_file_blockdev->{filename}\n";
> +    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev->{filename}, $target_file_blockdev->{filename});

absolute no-go, this needs to be handled in the storage layer

> +}
> +
> +sub blockdev_commit {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap, $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    print "block-commit $src_snap to base:$target_snap\n";
> +    $src_snap = undef if $src_snap && $src_snap eq 'current';
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg, $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $src_file_blockdev, $src_snap);
> +
> +    my $job_id = "commit-$deviceid";
> +    my $jobs = {};
> +    my $opts = { 'job-id' => $job_id, device => $deviceid };
> +
> +    my $complete = undef;
> +    if ($src_snap) {
> +	$complete = 'auto';
> +	$opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
> +	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +    } else {
> +	$complete = 'complete';
> +	$opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +	$opts->{replaces} = $src_fmt_blockdev->{'node-name'};
> +    }
> +
> +    mon_cmd($vmid, "block-commit", %$opts);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0, 'commit');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev, $src_fmt_blockdev);
> +}
> +
> +sub blockdev_stream {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap, $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +    $target_snap = undef if $target_snap eq 'current';
> +
> +    my $parent_file_blockdev = generate_file_blockdev($storecfg, $drive, $parent_snap);
> +    my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_file_blockdev, $parent_snap);
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_file_blockdev, $target_snap);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg, $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $snap_file_blockdev, $snap);
> +
> +    my $job_id = "stream-$deviceid";
> +    my $jobs = {};
> +    my $options = { 'job-id' => $job_id, device => $target_fmt_blockdev->{'node-name'} };
> +    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
> +    $options->{'backing-file'} = $parent_file_blockdev->{filename};
> +
> +    mon_cmd($vmid, 'block-stream', %$options);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'stream');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev, $snap_fmt_blockdev);
> +}
> +
>  sub qemu_volume_snapshot_delete {
> -    my ($vmid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
>      my $attached_deviceid;
>  
> @@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
>  	});
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> -	mon_cmd(
> -	    $vmid,
> -	    'blockdev-snapshot-delete-internal-sync',
> -	    device => $attached_deviceid,
> -	    name => $snap,
> -	);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> +	if ($do_snapshots_with_qemu == 2) {
> +
> +	    my $path = PVE::Storage::path($storecfg, $volid);
> +	    my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +	    my $parentsnap = $snapshots->{$snap}->{parent};
> +	    my $childsnap = $snapshots->{$snap}->{child};
> +
> +	    # if we delete the first snasphot, we commit because the first snapshot original base image, it should be big.
> +            # improve-me: if firstsnap > child : commit, if firstsnap < child do a stream.
> +	    if(!$parentsnap) {
> +		print"delete first snapshot $snap\n";
> +		blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childsnap, $snap);
> +		blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive, $snap, $childsnap, $snapshots->{$childsnap}->{child});
> +	    } else {
> +		#intermediate snapshot, we always stream the snapshot to child snapshot
> +		print"stream intermediate snapshot $snap to $childsnap\n";
> +		blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive, $snap, $parentsnap, $childsnap);
> +	    }
> +	} else {
> +	    mon_cmd(
> +	        $vmid,
> +		'blockdev-snapshot-delete-internal-sync',
> +		device => $attached_deviceid,
> +		name => $snap,
> +	    );
> +	}
>      } else {
>  	PVE::Storage::volume_snapshot_delete(
>  	    $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 51513546..7ba401bd 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
>  sub generate_file_blockdev {
>      my ($storecfg, $drive, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      my $blockdev = {};
>  
> @@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      die "format_blockdev can't be used for nbd" if $volid =~ /^nbd:/;
>  
> -- 
> 2.39.5




More information about the pve-devel mailing list