[pve-devel] [PATCH v3 qemu-server 11/11] qcow2: add external snapshot support
Fabian Grünbichler
f.gruenbichler at proxmox.com
Thu Jan 9 12:57:30 CET 2025
> Alexandre Derumier via pve-devel <pve-devel at lists.proxmox.com> hat am 16.12.2024 10:12 CET geschrieben:
it would be great if there'd be a summary of the design choices and a high level summary of what happens to the files and block-node-graph here. it's a bit hard to judge from the code below whether it would be possible to eliminate the dynamically named block nodes, for example ;)
a few more comments documenting the behaviour and ideally also some tests (mocking the QMP interactions?) would be nice
> Signed-off-by: Alexandre Derumier <alexandre.derumier at groupe-cyllene.com>
> ---
> PVE/QemuConfig.pm | 4 +-
> PVE/QemuServer.pm | 345 ++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 335 insertions(+), 14 deletions(-)
>
> diff --git a/PVE/QemuConfig.pm b/PVE/QemuConfig.pm
> index ffdf9f03..c17edb46 100644
> --- a/PVE/QemuConfig.pm
> +++ b/PVE/QemuConfig.pm
> @@ -375,7 +375,7 @@ sub __snapshot_create_vol_snapshot {
>
> print "snapshotting '$device' ($drive->{file})\n";
>
> - PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot($vmid, $device, $storecfg, $drive, $snapname);
> }
>
> sub __snapshot_delete_remove_drive {
> @@ -412,7 +412,7 @@ sub __snapshot_delete_vol_snapshot {
> my $storecfg = PVE::Storage::config();
> my $volid = $drive->{file};
>
> - PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $volid, $snapname);
> + PVE::QemuServer::qemu_volume_snapshot_delete($vmid, $storecfg, $drive, $snapname);
>
> push @$unused, $volid;
> }
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 3a3feadf..f29a8449 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -4959,20 +4959,269 @@ sub qemu_block_resize {
> }
>
> sub qemu_volume_snapshot {
> - my ($vmid, $deviceid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $deviceid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> -
> - if ($running && do_snapshots_with_qemu($storecfg, $volid, $deviceid)) {
> - mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $deviceid) if $running;
> + if ($do_snapshots_with_qemu) {
> + if($do_snapshots_with_qemu == 2) {
this could do without the additional nesting:
if ($do_snapshots_with_qemu == 1) {
...
} elsif ($do_snapshots_with_qemu == 2) {
...
} else {
...
}
> + my $snap_path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $path = PVE::Storage::path($storecfg, $volid);
> + blockdev_current_rename($storecfg, $vmid, $deviceid, $drive, $path, $snap_path, 1);
> + blockdev_external_snapshot($storecfg, $vmid, $deviceid, $drive, $snap);
what about error handling?
> + } else {
> + mon_cmd($vmid, 'blockdev-snapshot-internal-sync', device => $deviceid, name => $snap);
> + }
> } else {
> PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
this invocation here (continued below)
> }
> }
>
> +sub blockdev_external_snapshot {
> + my ($storecfg, $vmid, $deviceid, $drive, $snap) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $path = PVE::Storage::path($storecfg, $volid, $snap);
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
> + my $format_nodename = $format_node->{'node-name'};
> +
> + #preallocate add a new current file
> + my $new_current_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $new_current_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
okay, so here we have a dynamic node name because the desired target name is still occupied. could we rename the old block node first?
> + PVE::Storage::volume_snapshot($storecfg, $volid, $snap);
(continued from above) and this invocation here are the same?? wouldn't this already create the snapshot on the storage layer? and didn't we just hardlink + reopen + unlink to transform the previous current volume into the snap volume?
should this maybe have been vdisk_alloc and it just works by accident?
> + my $new_file_blockdev = generate_file_blockdev($storecfg, $drive, $new_current_file_nodename);
> + my $new_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $new_current_fmt_nodename, $new_file_blockdev);
> +
> + $new_fmt_blockdev->{backing} = undef;
generate_format_blockdev doesn't set backing? maybe this should be converted into an assertion?
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$new_fmt_blockdev);
> + mon_cmd($vmid, 'blockdev-snapshot', node => $format_nodename, overlay => $new_current_fmt_nodename);
> +}
> +
> +sub blockdev_snap_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $src_path, $target_path) = @_;
I think this whole thing needs more error handling and thought about how to recover from various points failing.. there's also quite some overlap with blockdev_current_rename, I wonder whether it would be possible to simplify the code further by merging the two? but see below, I think we can even get away with dropping this altogether if we switch from block-commit to block-stream..
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #copy the original drive param and change target file
> + my $target_fmt_nodename = get_blockdev_nextid("fmt-$deviceid", $nodes);
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
> +
> + my $src_fmt_node = find_blockdev_node($nodes, $src_path, 'fmt');
> + my $src_fmt_nodename = $src_fmt_node->{'node-name'};
> + my $src_file_node = find_blockdev_node($nodes, $src_path, 'file');
> + my $src_file_nodename = $src_file_node->{'node-name'};
> +
> + #untaint
> + if ($src_path =~ m/^(\S+)$/) {
> + $src_path = $1;
> + }
> + if ($target_path =~ m/^(\S+)$/) {
> + $target_path = $1;
> + }
shouldn't that have happened in the storage plugin?
> +
> + #create a hardlink
> + link($src_path, $target_path);
should this maybe be done by the storage plugin?
> +
> + #add new format blockdev
> + my $read_only = 1;
> + my $target_file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $target_file_blockdev->{filename} = $target_path;
> + my $target_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $target_fmt_nodename, $target_file_blockdev, $read_only);
> +
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$target_fmt_blockdev);
> +
> + #reopen the parent node with different backing file
> + my $parent_fmt_node = find_parent_node($nodes, $src_path);
> + my $parent_fmt_nodename = $parent_fmt_node->{'node-name'};
> + my $parent_path = $parent_fmt_node->{file};
> + my $parent_file_node = find_blockdev_node($nodes, $parent_path, 'file');
> + my $parent_file_nodename = $parent_file_node->{'node-name'};
> + my $filenode_exist = 1;
> + $read_only = $parent_fmt_node->{ro};
> + my $parent_fmt_blockdev = generate_format_blockdev($storecfg, $drive, $parent_fmt_nodename, $parent_file_nodename, $read_only);
> + $parent_fmt_blockdev->{backing} = $target_fmt_nodename;
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$parent_fmt_blockdev]);
> +
> + #change backing-file in qcow2 metadatas
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'change-backing-file', device => $deviceid, 'image-node-name' => $parent_fmt_nodename, 'backing-file' => $target_path);
> +
> + # fileblockdev seem to be autoremoved, if it have been created online, but not if they are created at start with command line
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_file_nodename) };
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $src_fmt_nodename) };
> +
> + #delete old $path link
> + unlink($src_path);
and this
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $src_path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $src_path, $target_path],
> + errmsg => "lvrename $src_path to $target_path error",
> + );
> + }
and this as well?
> +}
> +
> +sub blockdev_current_rename {
> + my ($storecfg, $vmid, $deviceid, $drive, $path, $target_path, $skip_underlay) = @_;
> + ## rename current running image
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> + my $target_file_nodename = get_blockdev_nextid("file-$deviceid", $nodes);
here we could already incorporate the snapshot name, since we know it?
> +
> + my $file_blockdev = generate_file_blockdev($storecfg, $drive, $target_file_nodename);
> + $file_blockdev->{filename} = $target_path;
> +
> + my $format_node = find_blockdev_node($nodes, $path, 'fmt');
then we'd know this is always the "current" node, however we deterministically name it?
> + my $format_nodename = $format_node->{'node-name'};
> +
> + my $file_node = find_blockdev_node($nodes, $path, 'file');
same here
> + my $file_nodename = $file_node->{'node-name'};
> +
> + my $backingfile = $format_node->{image}->{'backing-filename'};
> + my $backing_node = $backingfile ? find_blockdev_node($nodes, $backingfile, 'fmt') : undef;
> +
> + #create a hardlink
> + link($path, $target_path);
this
> + #add new file blockdev
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-add', %$file_blockdev);
> +
> + #reopen the current fmt nodename with a new file nodename
> + my $reopen_blockdev = generate_format_blockdev($storecfg, $drive, $format_nodename, $target_file_nodename);
> + $reopen_blockdev->{backing} = $backing_node->{'node-name'};
> + PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-reopen', options => [$reopen_blockdev]);
> +
> + # delete old file blockdev
> + # seem that the old file block is autoremoved after reopen if the file nodename is autogenerad with #block ?
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> + unlink($path);
> +
and this should be done by the storage layer I think? how does this interact with LVM? would we maybe want to mknod instead of hardlinking the device node? did you try whether a plain rename would also work (not sure - qemu already has an open FD to the file/blockdev, but I am not sure how LVM handles this ;))?
> + #skip_underlay: lvm will be renamed later in Storage::volume_snaphot
> + return if $skip_underlay;
> +
> + #rename underlay
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvrename $path to $target_path\n";
> + run_command(
> + ['/sbin/lvrename', $path, $target_path],
> + errmsg => "lvrename $path to $target_path error",
> + );
> + }
> +}
> +
> +sub blockdev_commit {
see comments below for qemu_volume_snapshot_delete, I think this..
> + my ($storecfg, $vmid, $deviceid, $drive, $top_path, $base_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($top_path =~ m/^(\S+)$/) {
> + $top_path = $1;
> + }
> +
> + print "block-commit top:$top_path to base:$base_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $base_node = find_blockdev_node($nodes, $base_path, 'fmt');
> + my $top_node = find_blockdev_node($nodes, $top_path, 'fmt');
> +
> + my $options = { 'job-id' => $job_id, device => $deviceid };
> + $options->{'top-node'} = $top_node->{'node-name'};
> + $options->{'base-node'} = $base_node->{'node-name'};
> +
> +
> + mon_cmd($vmid, 'block-commit', %$options);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor($vmid, undef, $jobs, 'auto', 0, 'commit');
> +
> + #remove fmt-blockdev, file-blockdev && file
> + my $fmt_node = find_blockdev_node($nodes, $top_path, 'fmt');
> + my $fmt_nodename = $fmt_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $fmt_nodename) };
> +
> + my $file_node = find_blockdev_node($nodes, $top_path, 'file');
> + my $file_nodename = $file_node->{'node-name'};
> + eval { PVE::QemuServer::Monitor::mon_cmd($vmid, 'blockdev-del', 'node-name' => $file_nodename) };
> +
> +
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $top_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $top_path],
> + errmsg => "lvremove $top_path",
> + );
> + } else {
> + unlink($top_path);
> + }
> +
> +}
> +
> +sub blockdev_live_commit {
and this can be replaced altogether with blockdev_stream..
> + my ($storecfg, $vmid, $deviceid, $drive, $current_path, $snapshot_path) = @_;
> +
> + my $nodes = get_blockdev_nodes($vmid);
> + my $volid = $drive->{file};
> +
> + #untaint
> + if ($current_path =~ m/^(\S+)$/) {
> + $current_path = $1;
> + }
> +
> + print "live block-commit top:$current_path to base:$snapshot_path\n";
> + my $job_id = "commit-$deviceid";
> + my $jobs = {};
> +
> + my $snapshot_node = find_blockdev_node($nodes, $snapshot_path, 'fmt');
> + my $snapshot_file_node = find_blockdev_node($nodes, $current_path, 'file');
> + my $current_node = find_blockdev_node($nodes, $current_path, 'fmt');
> +
> + my $opts = { 'job-id' => $job_id,
> + device => $deviceid,
> + 'base-node' => $snapshot_node->{'node-name'},
> + replaces => $current_node->{'node-name'}
> + };
> + mon_cmd($vmid, "block-commit", %$opts);
> + $jobs->{$job_id} = {};
> +
> + qemu_drive_mirror_monitor ($vmid, undef, $jobs, 'complete', 0, 'commit');
> +
> + eval { mon_cmd($vmid, 'blockdev-del', 'node-name' => $current_node->{'node-name'}) };
> +
> + my $storage_name = PVE::Storage::parse_volume_id($volid);
> + my $scfg = $storecfg->{ids}->{$storage_name};
> + if ($scfg->{type} eq 'lvm') {
> + print"lvremove $current_path\n";
> + run_command(
> + ['/sbin/lvremove', '-f', $current_path],
> + errmsg => "lvremove $current_path",
> + );
> + } else {
> + unlink($current_path);
> + }
> +
> + return;
> +
> +}
> +
> sub qemu_volume_snapshot_delete {
> - my ($vmid, $storecfg, $volid, $snap) = @_;
> + my ($vmid, $storecfg, $drive, $snap) = @_;
>
> + my $volid = $drive->{file};
> my $running = check_running($vmid);
> my $attached_deviceid;
>
> @@ -4984,13 +5233,51 @@ sub qemu_volume_snapshot_delete {
> });
> }
>
> - if ($attached_deviceid && do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid)) {
> - mon_cmd(
> - $vmid,
> - 'blockdev-snapshot-delete-internal-sync',
> - device => $attached_deviceid,
> - name => $snap,
> - );
> + my $do_snapshots_with_qemu = do_snapshots_with_qemu($storecfg, $volid, $attached_deviceid) if $running;
my + post-if is forbidden, but otherwise, the check for $attached_deviceid could move into the $running condition above.
> + if ($attached_deviceid && $do_snapshots_with_qemu) {
> +
> + if ($do_snapshots_with_qemu == 2) {
these ifs could be collapsed as well..
> +
> + my $path = PVE::Storage::path($storecfg, $volid);
> + my $snapshots = PVE::Storage::volume_snapshot_info($storecfg, $volid);
> +
> + my $snappath = $snapshots->{$snap}->{file};
> + return if !-e $snappath; #already deleted ?
> +
> + my $parentsnap = $snapshots->{$snap}->{parent};
> + my $childsnap = $snapshots->{$snap}->{child};
> +
> + my $parentpath = $snapshots->{$parentsnap}->{file} if $parentsnap;
> + my $childpath = $snapshots->{$childsnap}->{file} if $childsnap;
> +
> + #if first snapshot
> + if(!$parentsnap) {
> + print"delete first snapshot $childpath\n";
> + if($childpath eq $path) {
> + #if child is the current (last snapshot), we need to a live active-commit
wouldn't it make more sense to use block-stream to merge the contents of the to-be-deleted snapshot into the current overlay? that way we wouldn't need to rename anything, AFAICT..
see https://www.qemu.org/docs/master/interop/live-block-operations.html#brief-overview-of-live-block-qmp-primitives
> + print"commit first snapshot $snappath to current $path\n";
> + blockdev_live_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $path\n";
> + blockdev_current_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $path);
> + } else {
> + print"commit first snapshot $snappath to $childpath path\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $childpath, $snappath);
> + print" rename $snappath to $childpath\n";
> + blockdev_snap_rename($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $childpath);
same here, instead of commiting from the child into the to-be-deleted snapshot, and then renaming, why not just block-stream from the to-be-deleted snapshot into the child, and then discard the snapshot that is no longer needed?
> + }
> + } else {
> + #intermediate snapshot, we just need to commit the snapshot
> + print"commit intermediate snapshot $snappath to $parentpath\n";
> + blockdev_commit($storecfg, $vmid, $attached_deviceid, $drive, $snappath, $parentpath, 'auto');
commit is the wrong direction though?
if we have A -> B -> C, and B is deleted, the delta previously contained in B should be merged into C, not into A?
so IMHO a simple block-stream + removal of the to-be-deleted snapshot should be the right choice here as well?
that would effectively make all the paths identical AFAICT (stream from to-be-deleted snapshot to child, followed by deletion of the no longer used volume corresponding to the deleted/streamed snapshot) and no longer require any renaming..
> + }
> + } else {
> + mon_cmd(
> + $vmid,
> + 'blockdev-snapshot-delete-internal-sync',
> + device => $attached_deviceid,
> + name => $snap,
> + );
> + }
> } else {
> PVE::Storage::volume_snapshot_delete(
> $storecfg, $volid, $snap, $attached_deviceid ? 1 : undef);
> @@ -8066,6 +8353,8 @@ sub do_snapshots_with_qemu {
> return 1;
> }
>
> + return 2 if $scfg->{snapext} || $scfg->{type} eq 'lvm' && $volid =~ m/\.(qcow2)/;
> +
> if ($volid =~ m/\.(qcow2|qed)$/){
> return 1;
> }
> @@ -9169,6 +9458,38 @@ sub delete_ifaces_ipams_ips {
> }
> }
>
> +sub find_blockdev_node {
like I mentioned in another patch comment, this is already used by earlier patches. but if at all possible, it would be good to avoid the need for this in the first place..
> + my ($nodes, $path, $type) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^$type-(\S+)$/ && $node->{file} eq $path ) {
because $path encoding might change over time/versions..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $path\n" if !$found_node;
> + return $found_node;
> +}
> +
> +sub find_parent_node {
> + my ($nodes, $backing_path) = @_;
> +
> + my $found_nodeid = undef;
> + my $found_node = undef;
> + for my $nodeid (keys %$nodes) {
> + my $node = $nodes->{$nodeid};
> + if ($nodeid =~ m/^fmt-(\S+)$/ && $node->{backing_file} && $node->{backing_file} eq $backing_path) {
same applies here, but if we switch to block-stream, the only call site for this goes away anyway..
> + $found_node = $node;
> + last;
> + }
> + }
> + die "can't found nodeid for file $backing_path\n" if !$found_node;
> + return $found_node;
> +}
> +
> sub find_fmt_nodename_drive {
> my ($storecfg, $vmid, $drive, $nodes) = @_;
>
> --
> 2.39.5
More information about the pve-devel
mailing list