[pve-devel] [PATCH storage 3/9] plugin: dir: handle ova files for import

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Apr 17 14:45:26 CEST 2024


On April 16, 2024 3:18 pm, Dominik Csapak wrote:
> since we want to handle ova files (which are only ovf+vmdks bundled in a
> tar file) for import, add code that handles that.
> 
> we introduce a valid volname for files contained in ovas like this:
> 
>  storage:import/archive.ova/disk-1.vmdk
> 
> by basically treating the last part of the path as the name for the
> contained disk we want.
> 
> we then provide 3 functions to use for that:
> 
> * copy_needs_extraction: determines from the given volid (like above) if
>   that needs extraction to copy it, currently only 'import' vtype +
>   defined format returns true here (if we have more options in the
>   future, we can of course easily extend that)
> 
> * extract_disk_from_import_file: this actually extracts the file from
>   the archive. Currently only ova is supported, so the extraction with
>   'tar' is hardcoded, but again we can easily extend/modify that should
>   we need to.
> 
>   we currently extract into the import storage in a directory named:
>   `.tmp_<pid>_<targetvmid>` which should not clash with concurrent
>   operations (though we do extract it multiple times then)
> 
>   alternatively we could implement either a 'tmpstorage' parameter,
>   or use e.g. '/var/tmp/' or similar, but re-using the current storage
>   seemed ok.
> 
> * cleanup_extracted_image: intended to cleanup the extracted images from
>   above, including the surrounding temporary directory

the helpers could also all live in qemu-server for now, which would also
make extending it to use a different storage, or direct importing via a
pipe easier? see below ;)

> 
> we have to modify the `parse_ovf` a bit to handle the missing disk
> images, and we parse the size out of the ovf part (since this is
> informal only, it should be no problem if we cannot parse it sometimes)
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  src/PVE/API2/Storage/Status.pm |  1 +
>  src/PVE/Storage.pm             | 59 ++++++++++++++++++++++++++++++++++
>  src/PVE/Storage/DirPlugin.pm   | 13 +++++++-
>  src/PVE/Storage/OVF.pm         | 53 ++++++++++++++++++++++++++----
>  src/PVE/Storage/Plugin.pm      |  5 +++
>  5 files changed, 123 insertions(+), 8 deletions(-)
> 
> diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm
> index f7e324f..77ed57c 100644
> --- a/src/PVE/API2/Storage/Status.pm
> +++ b/src/PVE/API2/Storage/Status.pm
> @@ -749,6 +749,7 @@ __PACKAGE__->register_method({
>  				'efi-state-lost',
>  				'guest-is-running',
>  				'nvme-unsupported',
> +				'ova-needs-extracting',
>  				'ovmf-with-lsi-unsupported',
>  				'serial-port-socket-only',
>  			    ],
> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
> index f8ea93d..bc073ef 100755
> --- a/src/PVE/Storage.pm
> +++ b/src/PVE/Storage.pm
> @@ -2189,4 +2189,63 @@ sub get_import_metadata {
>      return $plugin->get_import_metadata($scfg, $volname, $storeid);
>  }
>  
> +sub copy_needs_extraction {
> +    my ($volid) = @_;
> +    my ($storeid, $volname) = parse_volume_id($volid);
> +    my $cfg = config();
> +    my $scfg = storage_config($cfg, $storeid);
> +    my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
> +
> +    my ($vtype, $name, $vmid, $basename, $basevmid, $isBase, $file_format) =
> +	$plugin->parse_volname($volname);
> +
> +    return $vtype eq 'import' && defined($file_format);
> +}

not sure this one is needed? it could also just be a call to
PVE::Storage::parse_volname in qemu-server?

> +
> +sub extract_disk_from_import_file {

similarly, this is basically PVE::Storage::get_import_dir + the
run_command call, and could live in qemu-server?

> +    my ($volid, $vmid) = @_;
> +
> +    my ($storeid, $volname) = parse_volume_id($volid);
> +    my $cfg = config();
> +    my $scfg = storage_config($cfg, $storeid);
> +    my $plugin = PVE::Storage::Plugin->lookup($scfg->{type});
> +
> +    my ($vtype, $name, undef, undef, undef, undef, $file_format) =
> +	$plugin->parse_volname($volname);
> +
> +    die "only files with content type 'import' can be extracted\n"
> +	if $vtype ne 'import' || !defined($file_format);
> +
> +    # extract the inner file from the name
> +    if ($volid =~ m!${name}/([^/]+)$!) {
> +	$name = $1;

we should probably be very conservative here and only allow [-_a-z0-9]
as a start - or something similar rather restrictive..

> +    }
> +
> +    my ($source_file) = $plugin->path($scfg, $volname, $storeid);
> +
> +    my $destdir = $plugin->get_subdir($scfg, 'import');
> +    my $pid = $$;
> +    $destdir .= "/.tmp_${pid}_${vmid}";
> +    mkdir $destdir;
> +
> +    ($source_file) = $source_file =~ m|^(/.*)|; # untaint

again a rather interesting untaint ;)

> +
> +    run_command(['tar', '-x', '-C', $destdir, '-f', $source_file, $name]);

if $name was a symlink in the archive, you've now created a symlink
pointing wherever..

> +
> +    return "$destdir/$name";

and this returns an absolute path to it, and now we are in trouble land
;) we should check that the file is a real file here..

> +}
> +
> +sub cleanup_extracted_image {

same for this?

> +    my ($source) = @_;
> +
> +    if ($source =~ m|^(/.+/\.tmp_[0-9]+_[0-9]+)/[^/]+$|) {
> +	my $tmpdir = $1;
> +
> +	unlink $source or $! == ENOENT or die "removing image $source failed: $!\n";
> +	rmdir $tmpdir or $! == ENOENT or die "removing tmpdir $tmpdir failed: $!\n";
> +    } else {
> +	die "invalid extraced image path '$source'\n";

nit: typo

these are also not discoverable if the error handling in qemu-server
failed for some reason.. might be a source of unwanted space
consumption..

> +    }
> +}
> +
>  1;
> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
> index 4dc7708..50ceab7 100644
> --- a/src/PVE/Storage/DirPlugin.pm
> +++ b/src/PVE/Storage/DirPlugin.pm
> @@ -260,14 +260,25 @@ sub get_import_metadata {
>      # NOTE: all types must be added to the return schema of the import-metadata API endpoint
>      my $warnings = [];
>  
> +    my $isOva = 0;
> +    if ($path =~ m!\.ova!) {
> +	$isOva = 1;
> +	push @$warnings, { type => 'ova-needs-extracting' };
> +    }
>      my $res = PVE::Storage::OVF::parse_ovf($path, $isOva);
>      my $disks = {};
>      for my $disk ($res->{disks}->@*) {
>  	my $id = $disk->{disk_address};
>  	my $size = $disk->{virtual_size};
>  	my $path = $disk->{relative_path};
> +	my $volid;
> +	if ($isOva) {
> +	    $volid = "$storeid:$volname/$path";
> +	} else {
> +	    $volid = "$storeid:import/$path",
> +	}
>  	$disks->{$id} = {
> -	    volid => "$storeid:import/$path",
> +	    volid => $volid,
>  	    defined($size) ? (size => $size) : (),
>  	};
>      }
> diff --git a/src/PVE/Storage/OVF.pm b/src/PVE/Storage/OVF.pm
> index 4a322b9..fb850a8 100644
> --- a/src/PVE/Storage/OVF.pm
> +++ b/src/PVE/Storage/OVF.pm
> @@ -85,11 +85,37 @@ sub id_to_pve {
>      }
>  }
>  
> +# technically defined in DSP0004 (https://www.dmtf.org/dsp/DSP0004) as an ABNF
> +# but realistically this always takes the form of 'bytes * base^exponent'
> +sub try_parse_capacity_unit {
> +    my ($unit_text) = @_;
> +
> +    if ($unit_text =~ m/^\s*byte\s*\*\s*([0-9]+)\s*\^\s*([0-9]+)\s*$/) {
> +	my $base = $1;
> +	my $exp = $2;
> +	return $base ** $exp;
> +    }
> +
> +    return undef;
> +}
> +
>  # returns two references, $qm which holds qm.conf style key/values, and \@disks
>  sub parse_ovf {
> -    my ($ovf, $debug) = @_;
> +    my ($ovf, $isOva, $debug) = @_;
> +
> +    # we have to ignore missing disk images for ova
> +    my $dom;
> +    if ($isOva) {
> +	my $raw = "";
> +	PVE::Tools::run_command(['tar', '-xO', '--wildcards', '--occurrence=1', '-f', $ovf, '*.ovf'], outfunc => sub {
> +	    my $line = shift;
> +	    $raw .= $line;
> +	});
> +	$dom = XML::LibXML->load_xml(string => $raw, no_blanks => 1);
> +    } else {
> +	$dom = XML::LibXML->load_xml(location => $ovf, no_blanks => 1);
> +    }
>  
> -    my $dom = XML::LibXML->load_xml(location => $ovf, no_blanks => 1);
>  
>      # register the xml namespaces in a xpath context object
>      # 'ovf' is the default namespace so it will prepended to each xml element
> @@ -177,7 +203,17 @@ sub parse_ovf {
>  	# @ needs to be escaped to prevent Perl double quote interpolation
>  	my $xpath_find_fileref = sprintf("/ovf:Envelope/ovf:DiskSection/\
>  ovf:Disk[\@ovf:diskId='%s']/\@ovf:fileRef", $disk_id);
> +	my $xpath_find_capacity = sprintf("/ovf:Envelope/ovf:DiskSection/\
> +ovf:Disk[\@ovf:diskId='%s']/\@ovf:capacity", $disk_id);
> +	my $xpath_find_capacity_unit = sprintf("/ovf:Envelope/ovf:DiskSection/\
> +ovf:Disk[\@ovf:diskId='%s']/\@ovf:capacityAllocationUnits", $disk_id);
>  	my $fileref = $xpc->findvalue($xpath_find_fileref);
> +	my $capacity = $xpc->findvalue($xpath_find_capacity);
> +	my $capacity_unit = $xpc->findvalue($xpath_find_capacity_unit);
> +	my $virtual_size;
> +	if (my $factor = try_parse_capacity_unit($capacity_unit)) {
> +	    $virtual_size = $capacity * $factor;
> +	}
>  
>  	my $valid_url_chars = qr@${valid_uripath_chars}|/@;
>  	if (!$fileref || $fileref !~ m/^${valid_url_chars}+$/) {
> @@ -217,23 +253,26 @@ ovf:Item[rasd:InstanceID='%s']/rasd:ResourceType", $controller_id);
>  	    die "error parsing $filepath, are you using a symlink ?\n";
>  	}
>  
> -	if (!-e $backing_file_path) {
> +	if (!-e $backing_file_path && !$isOva) {

this is actually not enough, the realpath call above can already fail if
$filepath points to a file in a subdir (note that realpath will only
check the path components, not the file itself).

e.g.:

error parsing foo/bar/chr-6.49.13-disk1.vmdk, are you using a symlink ? (500)

we could also tighten what we allow as filepath here, in addition to the
extraction code.

>  	    die "error parsing $filepath, file seems not to exist at $backing_file_path\n";
>  	}
>  
>  	($backing_file_path) = $backing_file_path =~ m|^(/.*)|; # untaint
>  	($filepath) = $filepath =~ m|^(.*)|; # untaint
>  
> -	my $virtual_size = PVE::Storage::file_size_info($backing_file_path);
> -	die "error parsing $backing_file_path, cannot determine file size\n"
> -	    if !$virtual_size;
> +	if (!$isOva) {
> +	    my $size = PVE::Storage::file_size_info($backing_file_path);
> +	    die "error parsing $backing_file_path, cannot determine file size\n"
> +		if !$size;
>  
> +	    $virtual_size = $size;
> +	}
>  	$pve_disk = {
>  	    disk_address => $pve_disk_address,
>  	    backing_file => $backing_file_path,
> -	    virtual_size => $virtual_size
>  	    relative_path => $filepath,
>  	};
> +	$pve_disk->{virtual_size} = $virtual_size if defined($virtual_size);
>  	push @disks, $pve_disk;
>  
>      }
> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
> index deaf8b2..ea069ab 100644
> --- a/src/PVE/Storage/Plugin.pm
> +++ b/src/PVE/Storage/Plugin.pm
> @@ -654,6 +654,11 @@ sub parse_volname {
>  	return ('backup', $fn);
>      } elsif ($volname =~ m!^snippets/([^/]+)$!) {
>  	return ('snippets', $1);
> +    } elsif ($volname =~ m!^import/([^/]+\.ova)\/([^/]+)$!) {
> +	my $archive = $1;
> +	my $file = $2;
> +	my (undef, $format, undef) = parse_name_dir($file);
> +	return ('import', $archive, 0, undef, undef, undef, $format);
>      } elsif ($volname =~ m!^import/([^/]+$PVE::Storage::IMPORT_EXT_RE_1)$!) {
>  	return ('import', $1);
>      } elsif ($volname =~ m!^import/([^/]+\.(raw|vmdk|qcow2))$!) {
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 




More information about the pve-devel mailing list