[pve-devel] [PATCH v4 qemu-server 11/11] qcow2: add external snapshot support

DERUMIER, Alexandre alexandre.derumier at groupe-cyllene.com
Fri Apr 4 13:31:57 CEST 2025


Hi Fabian,

>>the first one is the renaming of a blockdev while it is used, which
>>is currently done like this:
>>-- "link" snapshot path to make it available under old and new name
>>-- handle blockdev additions/reopening/backing-file updates/deletions
>>on the qemu layer
>>-- remove old snapshot path link
>>-- if LVM, rename actual volume (for non-LVM, linking followed by
>>unlinking the source is effectively a rename already)

>>I wonder whether that couldn't be made more straight-forward by doing
>>-- rename snapshot volume/image (qemu must already have the old name
>>open anyway and should be able to continue using it)
>>-- do blockdev additions/reopening/backing-file updates/deletions on
>>the qemu layer

>>or is there an issue/check in qemu somewhere that prevents this
>>approach? if not, we could just introduce a "volume_snapshot_rename"
>>or extend rename_volume with a snapshot parameter..

I have done tests this last 2 days, and it's working fine indeed. (I
have done test with fio running during the snapshot rename/reopen, no
problem).

so I'm using Storage::rename_volume now with snapshot param


>>the second thing that happens is deleting a snapshot volume/path,
>>without deleting the whole snapshot.. that one we could easily
>>support by extending volume_snapshot_delete by extending the $running
>>parameter (e.g., passing "2") or adding a new one to signify that all
>>the housekeeping was already done, and just the actual snapshot
>>volume should be deleted. this shouldn't be an issue provided all
>>such calls are guarded by first checking that we are using external
>>snapshots..

I have reused vdisk_free for this one, as I'm seeing a comment about
$running deprecation in Storage.pm

# FIXME PVE 8.x remove $running parameter (needs APIAGE reset)
sub volume_snapshot_delete {
    my ($cfg, $volid, $snap, $running) = @_;


vdisk_free have also a cluster_lock_storage, so for lvm , I think it's
better.

(I have introduce a $snap param to vdisk_free, to only delete the
specific snapshot, and not the whole chain)




> Alexandre Derumier via pve-devel <pve-devel at lists.proxmox.com> hat am
> 11.03.2025 11:29 CET geschrieben:
> Signed-off-by: Alexandre Derumier <alexandre.derumier at groupe-
> cyllene.com>
> ---
>  PVE/QemuConfig.pm       |   4 +-
>  PVE/QemuServer.pm       | 226 +++++++++++++++++++++++++++++++++++++-
> --
>  PVE/QemuServer/Drive.pm |   4 +
>  3 files changed, 220 insertions(+), 14 deletions(-)
> 
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index b60cc398..2b3acb15 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -377,7 +377,7 @@ sub __snapshot_create_vol_snapshot {
>  
>      print "snapshotting '$device' ($drive->{file})\n";
>  
> -    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg,
> $drive, $snapname);
>  }
>  
>  sub __snapshot_delete_remove_drive {
> @@ -414,7 +414,7 @@ sub __snapshot_delete_vol_snapshot {
>      my $storecfg = PVE::Storage::config();
>      my $volid = $drive->{file};
>  
> -    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $volid, $snapname);
> +    PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg,
> $drive, $snapname);
>  
>      push @$unused, $volid;
>  }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 60481acc..6ce3e9c6 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4449,20 +4449,200 @@ sub qemu_block_resize {
>  }
>  
>  sub qemu_volume_snapshot {
> -    my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
> -
> -    if ($running && do_snapshots_with_qemu($storecfg, $volid,
> $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $deviceid) if $running;

forbidden syntax

> +    if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parent_snap = $snapshots->{'current'}->{parent};
> +     my $size = PVE::Storage::volume_size_info($storecfg, $volid,
> 5);
> +     blockdev_rename($storecfg, $vmid, $deviceid, $drive, 'current',
> $snap, $parent_snap);
> +     blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive,
> $snap, $size);
> + } else {
> +     mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device =>
> $deviceid, name => $snap);
> + }
>      } else {
>   PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
>      }
>  }
>  
> +sub blockdev_external_snapshot {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $size) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    #be sure to add drive in write mode
> +    delete($drive->{ro});

why?

> +
> +    my $new_file_blockdev = generate_file_blockdev($storecfg,
> $drive);
> +    my $new_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $new_file_blockdev);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    #preallocate add a new current file with reference to backing-
> file
> +    my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
> +    my $name = (PVE::Storage::parse_volname($storecfg, $volid))[1];
> +    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2',
> $name, $size/1024, $snap_file_blockdev->{filename});

if we instead extend volume_snapshot similarly to what I describe up
top (adding a parameter that renaming was already done), we don't need
to extend vdisk_alloc's interface like this.. or maybe we could even
combine blockdev_rename and blockdev_external_snapshot, to just call
PVE::Storage::volume_snapshot to do rename+alloc, and then do the
blockdev dance? in any case, this here would be the *only* external
caller of vdisk_alloc with a backing file, so I don't think this is the
right interface..

> +
> +    #backing need to be forced to undef in blockdev, to avoid reopen
> of backing-file on blockdev-add
> +    $new_fmt_blockdev->{backing} = undef;
> +
> +    PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$new_fmt_blockdev);
> +
> +    mon_cmd($vmid, 'blockdev-snapshot', node => $snap_fmt_blockdev-
> >{'node-name'}, overlay => $new_fmt_blockdev->{'node-name'});
> +}
> +
> +sub blockdev_delete {
> +    my ($storecfg, $vmid, $drive, $file_blockdev, $fmt_blockdev) =
> @_;
> +
> +    #add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $file_blockdev->{'node-name'}) };
> +    eval { mon_cmd($vmid, 'blockdev-del', 'node-name' =>
> $fmt_blockdev->{'node-name'}) };
> +
> +    #delete the file (don't use vdisk_free as we don't want to
> delete all snapshot chain)
> +    print"delete old $file_blockdev->{filename}\n";
> +
> +    my $storage_name = PVE::Storage::parse_volume_id($drive-
> >{file});
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    if ($scfg->{type} eq 'lvm') {
> + PVE::Storage::LVMPlugin::lvremove($file_blockdev->{filename});
> +    } else {
> + unlink($file_blockdev->{filename});
> +    }

this really needs to be handled in the storage layer

> +}
> +
> +sub blockdev_rename {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap, $parent_snap) = @_;
> +
> +    print "rename $src_snap to $target_snap\n";
> +
> +    my $volid = $drive->{file};
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    #create a hardlink
> +    link($src_file_blockdev->{filename}, $target_file_blockdev-
> >{filename});

this really needs to be handled in the storage layer

> +
> +    if($target_snap eq 'current' || $src_snap eq 'current') {
> + #rename from|to current
> +
> + #add backing to target
> + if ($parent_snap) {
> +     my $parent_fmt_nodename = encode_nodename('fmt', $volid,
> $parent_snap);
> +     $target_fmt_blockdev->{backing} = $parent_fmt_nodename;
> + }
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the current throttlefilter nodename with the target fmt
> nodename
> + my $drive_blockdev = generate_drive_blockdev($storecfg, $vmid,
> $drive);
> + delete $drive_blockdev->{file};
> + $drive_blockdev->{file} = $target_fmt_blockdev->{'node-name'};

these two lines can be a single line

> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$drive_blockdev]);
> +    } else {
> + #intermediate snapshot
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add',
> %$target_fmt_blockdev);
> +
> + #reopen the parent node with the new target fmt backing node
> + my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> + $parent_fmt_blockdev->{backing} = $target_fmt_blockdev->{'node-
> name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options
> => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file',
> device => $deviceid, 'image-node-name' => $parent_fmt_blockdev-
> >{'node-name'}, 'backing-file' => $target_file_blockdev->{filename});
> +    }
> +
> +    # delete old file|fmt nodes
> +    # add eval as reopen is auto removing the old nodename
> automatically only if it was created at vm start in command line
> argument

ugh..

> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_file_blockdev->{'node-name'})};
> +    eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del',
> 'node-name' => $src_fmt_blockdev->{'node-name'})};
> +
> +    unlink($src_file_blockdev->{filename});

same as above

> +
> +    #rename underlay
> +    my $storage_name = PVE::Storage::parse_volume_id($volid);
> +    my $scfg = $storecfg->{ids}->{$storage_name};
> +    return if $scfg->{type} ne 'lvm';
> +
> +    print "rename underlay lvm volume $src_file_blockdev->{filename}
> to $target_file_blockdev->{filename}\n";
> +    PVE::Storage::LVMPlugin::lvrename(undef, $src_file_blockdev-
> >{filename}, $target_file_blockdev->{filename});

absolute no-go, this needs to be handled in the storage layer

> +}
> +
> +sub blockdev_commit {
> +    my ($storecfg, $vmid, $deviceid, $drive, $src_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +
> +    print "block-commit $src_snap to base:$target_snap\n";
> +    $src_snap = undef if $src_snap && $src_snap eq 'current';
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $src_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $src_snap);
> +    my $src_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $src_file_blockdev, $src_snap);
> +
> +    my $job_id = "commit-$deviceid";
> +    my $jobs = {};
> +    my $opts = { 'job-id' => $job_id, device => $deviceid };
> +
> +    my $complete = undef;
> +    if ($src_snap) {
> + $complete = 'auto';
> + $opts->{'top-node'} = $src_fmt_blockdev->{'node-name'};
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> +    } else {
> + $complete = 'complete';
> + $opts->{'base-node'} = $target_fmt_blockdev->{'node-name'};
> + $opts->{replaces} = $src_fmt_blockdev->{'node-name'};
> +    }
> +
> +    mon_cmd($vmid, "block-commit", %$opts);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor ($vmid, undef, $jobs, $complete, 0,
> 'commit');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $src_file_blockdev,
> $src_fmt_blockdev);
> +}
> +
> +sub blockdev_stream {
> +    my ($storecfg, $vmid, $deviceid, $drive, $snap, $parent_snap,
> $target_snap) = @_;
> +
> +    my $volid = $drive->{file};
> +    $target_snap = undef if $target_snap eq 'current';
> +
> +    my $parent_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $parent_snap);
> +    my $parent_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $parent_file_blockdev, $parent_snap);
> +
> +    my $target_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $target_snap);
> +    my $target_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $target_file_blockdev, $target_snap);
> +
> +    my $snap_file_blockdev = generate_file_blockdev($storecfg,
> $drive, $snap);
> +    my $snap_fmt_blockdev = generate_format_blockdev($storecfg,
> $drive, $snap_file_blockdev, $snap);
> +
> +    my $job_id = "stream-$deviceid";
> +    my $jobs = {};
> +    my $options = { 'job-id' => $job_id, device =>
> $target_fmt_blockdev->{'node-name'} };
> +    $options->{'base-node'} = $parent_fmt_blockdev->{'node-name'};
> +    $options->{'backing-file'} = $parent_file_blockdev->{filename};
> +
> +    mon_cmd($vmid, 'block-stream', %$options);
> +    $jobs->{$job_id} = {};
> +    qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0,
> 'stream');
> +
> +    blockdev_delete($storecfg, $vmid, $drive, $snap_file_blockdev,
> $snap_fmt_blockdev);
> +}
> +
>  sub qemu_volume_snapshot_delete {
> -    my ($vmid, $storecfg, $volid, $snap) = @_;
> +    my ($vmid, $storecfg, $drive, $snap) = @_;
>  
> +    my $volid = $drive->{file};
>      my $running = check_running($vmid);
>      my $attached_deviceid;
>  
> @@ -4474,13 +4654,35 @@ sub qemu_volume_snapshot_delete {
>   });
>      }
>  
> -    if ($attached_deviceid && do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid)) {
> - mon_cmd(
> -     $vmid,
> -     'blockdev-snapshot-delete-internal-sync',
> -     device => $attached_deviceid,
> -     name => $snap,
> - );
> +    my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg,
> $volid, $attached_deviceid) if $running;
> +    if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
> +
> +     my $path = PVE::Storage::path($storecfg, $volid);
> +     my $snapshots = PVE::Storage::volume_snapshot_info($storecfg,
> $volid);
> +     my $parentsnap = $snapshots->{$snap}->{parent};
> +     my $childsnap = $snapshots->{$snap}->{child};
> +
> +     # if we delete the first snasphot, we commit because the first
> snapshot original base image, it should be big.
> +            # improve-me: if firstsnap > child : commit, if
> firstsnap < child do a stream.
> +     if(!$parentsnap) {
> + print"delete first snapshot $snap\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive,
> $childsnap, $snap);
> + blockdev_rename($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $childsnap, $snapshots->{$childsnap}->{child});
> +     } else {
> + #intermediate snapshot, we always stream the snapshot to child
> snapshot
> + print"stream intermediate snapshot $snap to $childsnap\n";
> + blockdev_stream($storecfg, $vmid, $attached_deviceid, $drive,
> $snap, $parentsnap, $childsnap);
> +     }
> + } else {
> +     mon_cmd(
> +         $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> +     );
> + }
>      } else {
>   PVE::Storage::volume_snapshot_delete(
>       $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
> index 51513546..7ba401bd 100644
> --- a/PVE/QemuServer/Drive.pm
> +++ b/PVE/QemuServer/Drive.pm
> @@ -1117,6 +1117,8 @@ sub print_drive_throttle_group {
>  sub generate_file_blockdev {
>      my ($storecfg, $drive, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      my $blockdev = {};
>  
> @@ -1260,6 +1262,8 @@ sub do_snapshots_with_qemu {
>  sub generate_format_blockdev {
>      my ($storecfg, $drive, $file, $snap, $nodename) = @_;
>  
> +    $snap = undef if $snap && $snap eq 'current';
> +
>      my $volid = $drive->{file};
>      die "format_blockdev can't be used for nbd" if $volid =~
> /^nbd:/;
>  
> -- 
> 2.39.5




More information about the pve-devel mailing list