[pve-devel] [PATCH storage v3 03/10] plugin: dir: handle ova files for import
Fabian Grünbichler
f.gruenbichler at proxmox.com
Thu May 23 14:25:35 CEST 2024
On May 23, 2024 12:40 pm, Dominik Csapak wrote:
> On 5/22/24 12:08, Fabian Grünbichler wrote:
>> On April 29, 2024 1:21 pm, Dominik Csapak wrote:
>>> since we want to handle ova files (which are only ovf+images bundled in
>>> a tar file) for import, add code that handles that.
>>>
>>> we introduce a valid volname for files contained in ovas like this:
>>>
>>> storage:import/archive.ova/disk-1.vmdk
>>>
>>> by basically treating the last part of the path as the name for the
>>> contained disk we want.
>>>
>>> in that case we return 'import' as type with 'vmdk/qcow2/raw' as format
>>> (we cannot use something like 'ova+vmdk' without extending the 'format'
>>> parsing to that for all storages/formats. This is because it runs
>>> though a verify format check at least once)
>>>
>>> we then provide 3 functions to use for that:
>>>
>>> * copy_needs_extraction: determines from the given volid (like above) if
>>> that needs extraction to copy it, currently only 'import' vtype + a
>>> volid with the above format returns true
>>>
>>> * extract_disk_from_import_file: this actually extracts the file from
>>> the archive. Currently only ova is supported, so the extraction with
>>> 'tar' is hardcoded, but again we can easily extend/modify that should
>>> we need to.
>>>
>>> we currently extract into the either the import storage or a given
>>> target storage in the images directory so if the cleanup does not
>>> happen, the user can still see and interact with the image via
>>> api/cli/gui
>>>
>>> * cleanup_extracted_image: intended to cleanup the extracted images from
>>> above
>>>
>>> we have to modify the `parse_ovf` a bit to handle the missing disk
>>> images, and we parse the size out of the ovf part (since this is
>>> informal only, it should be no problem if we cannot parse it sometimes)
>>>
>>> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
>>> ---
>>> src/PVE/API2/Storage/Status.pm | 1 +
>>> src/PVE/GuestImport.pm | 100 +++++++++++++++++++++++++++++++++
>>> src/PVE/GuestImport/OVF.pm | 53 ++++++++++++++---
>>> src/PVE/Makefile | 1 +
>>> src/PVE/Storage.pm | 2 +-
>>> src/PVE/Storage/DirPlugin.pm | 15 ++++-
>>> src/PVE/Storage/Plugin.pm | 5 ++
>>> src/test/parse_volname_test.pm | 15 +++++
>>> 8 files changed, 182 insertions(+), 10 deletions(-)
>>> create mode 100644 src/PVE/GuestImport.pm
>>>
>>> diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm
>>> index dc6cc69..acde730 100644
>>> --- a/src/PVE/API2/Storage/Status.pm
>>> +++ b/src/PVE/API2/Storage/Status.pm
>>> @@ -749,6 +749,7 @@ __PACKAGE__->register_method({
>>> 'efi-state-lost',
>>> 'guest-is-running',
>>> 'nvme-unsupported',
>>> + 'ova-needs-extracting',
>>> 'ovmf-with-lsi-unsupported',
>>> 'serial-port-socket-only',
>>> ],
>>> diff --git a/src/PVE/GuestImport.pm b/src/PVE/GuestImport.pm
>>> new file mode 100644
>>> index 0000000..d405e30
>>> --- /dev/null
>>> +++ b/src/PVE/GuestImport.pm
>>> @@ -0,0 +1,100 @@
>>> +package PVE::GuestImport;
>>> +
>>> +use strict;
>>> +use warnings;
>>> +
>>> +use File::Path;
>>> +
>>> +use PVE::Storage;
>>
>> another circular module dependency..
>>
true, sorry for the noise! :)
> why do you think so? nothing in storage is using PVE::GuestImport only PVE::GuestImport::OVF ?
>
>>> +use PVE::Tools qw(run_command);
>>> +
>>> +sub copy_needs_extraction {
>>> + my ($volid) = @_;
>>> + my $cfg = PVE::Storage::config();
>>> + my ($vtype, $name, undef, undef, undef, undef, $fmt) = PVE::Storage::parse_volname($cfg, $volid);
>>> +
>>> + # only volumes inside ovas need extraction
>>> + return $vtype eq 'import' && $fmt =~ m/^ova\+(.*)$/;
>>> +}
>>
>> this could just as well live in qemu-server, there's only two call sites
>> in one module there.. one of them even already has the parsed volname ;)
>
> true
>
>>
>>> +
>>> +sub extract_disk_from_import_file {
>>
>> I don't really like that this is using lots of plugin stuff..
>>
>>> + my ($volid, $vmid, $target_storeid) = @_;
>>> +
>>> + my ($source_storeid, $volname) = PVE::Storage::parse_volume_id($volid);
>>> + $target_storeid //= $source_storeid;
>>> + my $cfg = PVE::Storage::config();
>>> + my $source_scfg = PVE::Storage::storage_config($cfg, $source_storeid);
>>> + my $source_plugin = PVE::Storage::Plugin->lookup($source_scfg->{type});
>>> +
>>> + my ($vtype, $name, undef, undef, undef, undef, $fmt) =
>>> + $source_plugin->parse_volname($volname);
>>
>> could be PVE::Storage::parse_volname
>>
>>> +
>>> + die "only files with content type 'import' can be extracted\n"
>>> + if $vtype ne 'import' || $fmt !~ m/^ova\+/;
>>> +
>>> + # extract the inner file from the name
>>> + my $archive;
>>> + my $inner_file;
>>> + if ($name =~ m!^(.*\.ova)/(${PVE::Storage::SAFE_CHAR_CLASS_RE}+)$!) {
>>> + $archive = "import/$1";
>>> + $inner_file = $2;
>>> + ($fmt) = $fmt =~ /^ova\+(.*)$/;
>>> + } else {
>>> + die "cannot extract $volid - invalid volname $volname\n";
>>> + }
>>> +
>>> + my ($ova_path) = $source_plugin->path($source_scfg, $archive, $source_storeid);
>>
>> could be PVE::Storage::path
>>
>>> +
>>> + my $target_scfg = PVE::Storage::storage_config($cfg, $target_storeid);
>>> + my $target_plugin = PVE::Storage::Plugin->lookup($target_scfg->{type});
>>> +
>>> + my $destdir = $target_plugin->get_subdir($target_scfg, 'images');
>>
>> could be PVE::Storage::get_image_dir
>>
>>> +
>>> + my $pid = $$;
>>> + $destdir .= "/tmp_${pid}_${vmid}";
>>> + mkpath $destdir;
>>> +
>>> + ($ova_path) = $ova_path =~ m|^(.*)$|; # untaint
>>> +
>>> + my $source_path = "$destdir/$inner_file";
>>> + my $target_path;
>>> + my $target_volname;
>>> + eval {
>>> + run_command(['tar', '-x', '--force-local', '-C', $destdir, '-f', $ova_path, $inner_file]);
>>> +
>>> + # check for symlinks and other non regular files
>>> + if (-l $source_path || ! -f $source_path) {
>>> + die "only regular files are allowed\n";
>>> + }
>>> +
>>> + my $target_diskname
>>> + = $target_plugin->find_free_diskname($target_storeid, $target_scfg, $vmid, $fmt, 1);
>>
>> these here requires holding a lock until the diskname is actually used
>> (the rename below), else it's racey..
>
> we do have a lock over vm creation in the only path this is called and it is vmid specific...
> so is this really a problem?
yes, every disk allocation needs to hold the storage lock to avoid two
actions in parallel thinking they own a "new" disk name that hasn't yet
been allocated properly.
it's possible to allocate new volumes without going over a guest
specific API after all, and there are automation use cases doing just
that (pre-allocating the volumes, then creating/updating the VM).
>>> + $target_volname = "$vmid/" . $target_diskname;
>>
>> this encodes a fact about volname semantics that might not be a given
>> for external, dir-based plugins (not sure if we want to worry about that
>> though, or how to avoid it ;)).
>
> i mean we could call 'alloc' with a very small size instead
> and simply "overwrite" it? then we'd also get around things like
> mkpath and imagedir etc.
that might actually be nice(r) than the current approach since it avoids
the volname format issue entirely. the only downside is that we then
briefly have a "wrong" disk visible, but since the VM has to be locked
at that point there shouldn't be too much harm in that?
>>> + $target_path = $target_plugin->filesystem_path($target_scfg, $target_volname);
>>
>> this should be equivalent to PVE::Storage::path for DirPlugin based
>> storages?
>>
>>> +
>>> + print "renaming $source_path to $target_path\n";
>>> + my $imagedir = $target_plugin->get_subdir($target_scfg, 'images');
>>
>> we already did this above, but see comment there ;)
>
> true ;)
>
>>
>>> + mkpath "$imagedir/$vmid";
>>> +
>>> + rename($source_path, $target_path) or die "unable to move - $!\n";
>>> + };
>>> + if (my $err = $@) {
>>> + unlink $source_path;
>>> + unlink $target_path if defined($target_path);
>>
>> isn't this pretty much impossible to happen? the last thing we do in the
>> eval block is the rename - if that failed, $target_path can't exist yet.
>> if it didn't fail, we can't end up here?
>
> that probably depends on the underlying filesystem no? not
> every fs has posix rename semantics i guess?
I think we can assume an intra-FS rename to either work and have an
effect, or not work and not have an effect on anything we want to
support as dir storage? :)
> in that case we'd cleanup the file, and if it does not exists, it doesn't hurt
> but
sure, but error handling tends to get more complicated over time, so not
having nops in there reduces the complexity somewhat IMHO.
>>> + rmdir $destdir;
>>
>> this and unlink $source_path could just be a remove_tree on $destdir
>> instead, with less chances of leaving stuff around?
>
> this is true of course and removes the issue above
>
>>
>>> + die "error during extraction: $err\n";
>>> + }
>>> +
>>> + rmdir $destdir;
>>
>> could also be a remove_tree, just to be on the safe side?
>
> yup
>
>>
>>> +
>>> + return "$target_storeid:$target_volname";
>>> +}
>>> +
>>> +sub cleanup_extracted_image {
>>> + my ($source) = @_;
>>> +
>>> + my $cfg = PVE::Storage::config();
>>> + PVE::Storage::vdisk_free($cfg, $source);
>>> +}
>>
>> why do we need this helper, and not just call vdisk_free directly in
>> qemu-server (we do that in tons of places there as part of error
>> handling for freshly allocated volumes)?
>
> ok makes sense
>
>>
>>> +
>>> +1;
>>> diff --git a/src/PVE/GuestImport/OVF.pm b/src/PVE/GuestImport/OVF.pm
>>> index 0eb5e9c..6b79078 100644
>>> --- a/src/PVE/GuestImport/OVF.pm
>>> +++ b/src/PVE/GuestImport/OVF.pm
>>> @@ -85,11 +85,37 @@ sub id_to_pve {
>>> }
>>> }
>>>
>>> +# technically defined in DSP0004 (https://www.dmtf.org/dsp/DSP0004) as an ABNF
>>> +# but realistically this always takes the form of 'byte * base^exponent'
>>> +sub try_parse_capacity_unit {
>>> + my ($unit_text) = @_;
>>> +
>>> + if ($unit_text =~ m/^\s*byte\s*\*\s*([0-9]+)\s*\^\s*([0-9]+)\s*$/) {
>>> + my $base = $1;
>>> + my $exp = $2;
>>> + return $base ** $exp;
>>> + }
>>> +
>>> + return undef;
>>> +}
>>> +
>>> # returns two references, $qm which holds qm.conf style key/values, and \@disks
>>> sub parse_ovf {
>>> - my ($ovf, $debug) = @_;
>>> + my ($ovf, $isOva, $debug) = @_;
>>> +
>>> + # we have to ignore missing disk images for ova
>>> + my $dom;
>>> + if ($isOva) {
>>> + my $raw = "";
>>> + PVE::Tools::run_command(['tar', '-xO', '--wildcards', '--occurrence=1', '-f', $ovf, '*.ovf'], outfunc => sub {
>>> + my $line = shift;
>>> + $raw .= $line;
>>> + });
>>> + $dom = XML::LibXML->load_xml(string => $raw, no_blanks => 1);
>>> + } else {
>>> + $dom = XML::LibXML->load_xml(location => $ovf, no_blanks => 1);
>>> + }
>>>
>>> - my $dom = XML::LibXML->load_xml(location => $ovf, no_blanks => 1);
>>>
>>> # register the xml namespaces in a xpath context object
>>> # 'ovf' is the default namespace so it will prepended to each xml element
>>> @@ -177,7 +203,17 @@ sub parse_ovf {
>>> # @ needs to be escaped to prevent Perl double quote interpolation
>>> my $xpath_find_fileref = sprintf("/ovf:Envelope/ovf:DiskSection/\
>>> ovf:Disk[\@ovf:diskId='%s']/\@ovf:fileRef", $disk_id);
>>> + my $xpath_find_capacity = sprintf("/ovf:Envelope/ovf:DiskSection/\
>>> +ovf:Disk[\@ovf:diskId='%s']/\@ovf:capacity", $disk_id);
>>> + my $xpath_find_capacity_unit = sprintf("/ovf:Envelope/ovf:DiskSection/\
>>> +ovf:Disk[\@ovf:diskId='%s']/\@ovf:capacityAllocationUnits", $disk_id);
>>> my $fileref = $xpc->findvalue($xpath_find_fileref);
>>> + my $capacity = $xpc->findvalue($xpath_find_capacity);
>>> + my $capacity_unit = $xpc->findvalue($xpath_find_capacity_unit);
>>> + my $virtual_size;
>>> + if (my $factor = try_parse_capacity_unit($capacity_unit)) {
>>> + $virtual_size = $capacity * $factor;
>>> + }
>>>
>>> my $valid_url_chars = qr@${valid_uripath_chars}|/@;
>>> if (!$fileref || $fileref !~ m/^${valid_url_chars}+$/) {
>>> @@ -217,7 +253,7 @@ ovf:Item[rasd:InstanceID='%s']/rasd:ResourceType", $controller_id);
>>> die "error parsing $filepath, are you using a symlink ?\n";
>>> }
>>>
>>> - if (!-e $backing_file_path) {
>>> + if (!-e $backing_file_path && !$isOva) {
>>> die "error parsing $filepath, file seems not to exist at $backing_file_path\n";
>>> }
>>>
>>> @@ -225,16 +261,19 @@ ovf:Item[rasd:InstanceID='%s']/rasd:ResourceType", $controller_id);
>>> ($filepath) = $filepath =~ m|^(${PVE::Storage::SAFE_CHAR_CLASS_RE}+)$|; # untaint & check no sub/parent dirs
>>> die "invalid path\n" if !$filepath;
>>>
>>> - my $virtual_size = PVE::Storage::file_size_info($backing_file_path);
>>> - die "error parsing $backing_file_path, cannot determine file size\n"
>>> - if !$virtual_size;
>>> + if (!$isOva) {
>>> + my $size = PVE::Storage::file_size_info($backing_file_path);
>>> + die "error parsing $backing_file_path, cannot determine file size\n"
>>> + if !$size;
>>>
>>> + $virtual_size = $size;
>>> + }
>>> $pve_disk = {
>>> disk_address => $pve_disk_address,
>>> backing_file => $backing_file_path,
>>> - virtual_size => $virtual_size
>>> relative_path => $filepath,
>>> };
>>> + $pve_disk->{virtual_size} = $virtual_size if defined($virtual_size);
>>> push @disks, $pve_disk;
>>>
>>> }
>>> diff --git a/src/PVE/Makefile b/src/PVE/Makefile
>>> index e15a275..0af3081 100644
>>> --- a/src/PVE/Makefile
>>> +++ b/src/PVE/Makefile
>>> @@ -5,6 +5,7 @@ install:
>>> install -D -m 0644 Storage.pm ${DESTDIR}${PERLDIR}/PVE/Storage.pm
>>> install -D -m 0644 Diskmanage.pm ${DESTDIR}${PERLDIR}/PVE/Diskmanage.pm
>>> install -D -m 0644 CephConfig.pm ${DESTDIR}${PERLDIR}/PVE/CephConfig.pm
>>> + install -D -m 0644 GuestImport.pm ${DESTDIR}${PERLDIR}/PVE/GuestImport.pm
>>> make -C Storage install
>>> make -C GuestImport install
>>> make -C API2 install
>>> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
>>> index 1ed91c2..adc1b45 100755
>>> --- a/src/PVE/Storage.pm
>>> +++ b/src/PVE/Storage.pm
>>> @@ -114,7 +114,7 @@ our $VZTMPL_EXT_RE_1 = qr/\.tar\.(gz|xz|zst)/i;
>>>
>>> our $BACKUP_EXT_RE_2 = qr/\.(tgz|(?:tar|vma)(?:\.(${\PVE::Storage::Plugin::COMPRESSOR_RE}))?)/;
>>>
>>> -our $IMPORT_EXT_RE_1 = qr/\.(ovf|qcow2|raw|vmdk)/;
>>> +our $IMPORT_EXT_RE_1 = qr/\.(ova|ovf|qcow2|raw|vmdk)/;
>>>
>>> our $SAFE_CHAR_CLASS_RE = qr/[a-zA-Z0-9\-\.\+\=\_]/;
>>>
>>> diff --git a/src/PVE/Storage/DirPlugin.pm b/src/PVE/Storage/DirPlugin.pm
>>> index 3e3b1e7..ea89464 100644
>>> --- a/src/PVE/Storage/DirPlugin.pm
>>> +++ b/src/PVE/Storage/DirPlugin.pm
>>> @@ -258,15 +258,26 @@ sub get_import_metadata {
>>> # NOTE: all types of warnings must be added to the return schema of the import-metadata API endpoint
>>> my $warnings = [];
>>>
>>> + my $isOva = 0;
>>> + if ($name =~ m/\.ova$/) {
>>> + $isOva = 1;
>>> + push @$warnings, { type => 'ova-needs-extracting' };
>>> + }
>>> my $path = $class->path($scfg, $volname, $storeid, undef);
>>> - my $res = PVE::GuestImport::OVF::parse_ovf($path);
>>> + my $res = PVE::GuestImport::OVF::parse_ovf($path, $isOva);
>>> my $disks = {};
>>> for my $disk ($res->{disks}->@*) {
>>> my $id = $disk->{disk_address};
>>> my $size = $disk->{virtual_size};
>>> my $path = $disk->{relative_path};
>>> + my $volid;
>>> + if ($isOva) {
>>> + $volid = "$storeid:$volname/$path";
>>> + } else {
>>> + $volid = "$storeid:import/$path",
>>> + }
>>> $disks->{$id} = {
>>> - volid => "$storeid:import/$path",
>>> + volid => $volid,
>>> defined($size) ? (size => $size) : (),
>>> };
>>> }
>>> diff --git a/src/PVE/Storage/Plugin.pm b/src/PVE/Storage/Plugin.pm
>>> index 33f0f3a..640d156 100644
>>> --- a/src/PVE/Storage/Plugin.pm
>>> +++ b/src/PVE/Storage/Plugin.pm
>>> @@ -654,6 +654,11 @@ sub parse_volname {
>>> return ('backup', $fn);
>>> } elsif ($volname =~ m!^snippets/([^/]+)$!) {
>>> return ('snippets', $1);
>>> + } elsif ($volname =~ m!^import/(${PVE::Storage::SAFE_CHAR_CLASS_RE}+\.ova\/(${PVE::Storage::SAFE_CHAR_CLASS_RE}+))$!) {
>>> + my $archive = $1;
>>> + my $file = $2;
>>> + my (undef, $format, undef) = parse_name_dir($file);
>>> + return ('import', $archive, undef, undef, undef, undef, "ova+$format");
>>
>> these could be improved if the format was already checked in the elsif
>> condition I think, since the error message of parse_name_dir is a bit
>> opaque/lacking context.. also, parse_name_dir allows subvol, which we
>> don't want to allow here I think?
>
> ok yeah that makes sense
>
>>
>>> } elsif ($volname =~ m!^import/(${PVE::Storage::SAFE_CHAR_CLASS_RE}+$PVE::Storage::IMPORT_EXT_RE_1)$!) {
>>> return ('import', $1, undef, undef, undef, undef, $2);
>>> }
>>> diff --git a/src/test/parse_volname_test.pm b/src/test/parse_volname_test.pm
>>> index a8c746f..bc7b4e8 100644
>>> --- a/src/test/parse_volname_test.pm
>>> +++ b/src/test/parse_volname_test.pm
>>> @@ -93,6 +93,21 @@ my $tests = [
>>> volname => 'import/import.ovf',
>>> expected => ['import', 'import.ovf', undef, undef, undef ,undef, 'ovf'],
>>> },
>>> + {
>>> + description => "Import, innner file of ova",
>>> + volname => 'import/import.ova/disk.qcow2',
>>> + expected => ['import', 'import.ova/disk.qcow2', undef, undef, undef, undef, 'ova+qcow2'],
>>> + },
>>> + {
>>> + description => "Import, innner file of ova",
>>> + volname => 'import/import.ova/disk.vmdk',
>>> + expected => ['import', 'import.ova/disk.vmdk', undef, undef, undef, undef, 'ova+vmdk'],
>>> + },
>>> + {
>>> + description => "Import, innner file of ova",
>>> + volname => 'import/import.ova/disk.raw',
>>> + expected => ['import', 'import.ova/disk.raw', undef, undef, undef, undef, 'ova+raw'],
>>> + },
>>> #
>>> # failed matches
>>> #
>>> --
>>> 2.39.2
>>>
>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>
>
>
More information about the pve-devel
mailing list