[pve-devel] [PATCH pve-storage 1/2] fix #4849: download-url: allow download and decompression of compressed ISOs

Wed Jul 26 14:31:45 CEST 2023

On July 25, 2023 4:37 pm, Philipp Hufnagl wrote:
> Signed-off-by: Philipp Hufnagl <p.hufnagl at proxmox.com>
> ---
>  src/PVE/API2/Storage/Status.pm | 20 ++++++++++++++++++--
>  src/PVE/Storage.pm             | 22 ++++++++++++++++++++++
>  2 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/src/PVE/API2/Storage/Status.pm b/src/PVE/API2/Storage/Status.pm
> index e4ce698..9ac4660 100644
> --- a/src/PVE/API2/Storage/Status.pm
> +++ b/src/PVE/API2/Storage/Status.pm
> @@ -578,6 +578,11 @@ __PACKAGE__->register_method({
>  		requires => 'checksum-algorithm',
>  		optional => 1,
>  	    },
> +	    compression => {
> +		description => "The compression algorithm used to compress",
> +		type => 'string',

should probably be restricted via an enum schema (containing 'zstd',
'gz' and 'lzop' for now).

> +		optional => 1,
> +	    },

nit: I would at least the change the description to indicate that
setting this means the downloaded file will be decompressed using this
algorithm

>  	    'checksum-algorithm' => {
>  		description => "The algorithm to calculate the checksum of the file.",
>  		type => 'string',
> @@ -642,14 +647,25 @@ __PACKAGE__->register_method({
>  	    http_proxy => $dccfg->{http_proxy},
>  	};
>  
> -	my ($checksum, $checksum_algorithm) = $param->@{'checksum', 'checksum-algorithm'};
> +	my ($checksum, $checksum_algorithm, $compression) = $param->@{'checksum', 'checksum-algorithm', 'compression'};
>  	if ($checksum) {
>  	    $opts->{"${checksum_algorithm}sum"} = $checksum;
>  	    $opts->{hash_required} = 1;
>  	}

compression should only be allowed for isos (for now), since templates
are *always* compressed at the moment, and uncompressing them while
downloading would actually make them unusable AFAICT?

>  
>  	my $worker = sub {
> -	    PVE::Tools::download_file_from_url("$path/$filename", $url, $opts);
> +	    my $save_to = "$path/$filename";
> +	    die "refusing to override existing file $save_to \n" if -e $save_to ;
> +	    $save_to .= ".$compression" if $compression;
> +	    PVE::Tools::download_file_from_url($save_to, $url, $opts);
> +	    if($compression)
> +	    {
> +		my $decrypton_error = PVE::Storage::decompress_iso($compression, $save_to);
> +		print $decrypton_error if $decrypton_error;
> +		unlink $save_to;
> +
> +
> +	    }

the decompression here could (or probably should) be moved into
download_file_from_url (passing in the decompression command via $opts).
this would also make the handling of the tmpfile easier, since now we
have
- a tempfile for the download
- renamed to a tempfile for the decompression
- uncompressed to final destination
- intermediate tempfile needs manual cleanup

if the decompression is moved into the helper (including any required
cleanups), we are not concerned at all about the tmpfiles here, which
would be nice(r) IMHO.

other than that, there is a few things that could be improved:
- code style (positioning of {, blank lines)
- $decrypton -> there is no encryption happening here :)
- error handling:
-- errors are normally not returned, but passed up the stack via "die"
-- the error should be propagated up so that the task fails and the user
knows the download failed and why, see below for more details

>  	};
>  
>  	my $worker_id = PVE::Tools::encode_text($filename); # must not pass : or the like as w-ID
> diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
> index b99ed35..0c62cc8 100755
> --- a/src/PVE/Storage.pm
> +++ b/src/PVE/Storage.pm
> @@ -1532,6 +1532,11 @@ sub decompressor_info {
>  	    lzo => ['lzop', '-d', '-c'],
>  	    zst => ['zstd', '-q', '-d', '-c'],
>  	},
> +	iso => {
> +	    gz =>  ['zstd', '-q', '-d'],

this might warrant a comment ;)

> +	    zst => ['zstd', '-q', '-d'],
> +	    lzo => ['lzop', '-q', '-d'],
> +	},

we did discuss this a bit already, but also posting here on list in case
someone else has an opinion:

while we possibly lose a bit of performance (although I doubt it really
matters much for regular iso files), aligning the iso and vma commands
here would simplify the filename handling - if the returned command
simply decompresses the next pushed argument to stdout, we don't have to
account for the peculiarities of each command with regards to automatic
extension removal. we could also save writing the data to disk twice
(compressed, and then uncompressed) if we add calculating the digest to
the pipe, or at least do that for the non-verifying case. isos are
commonly stored on NFS/CIFS shares, where this is even more expensive
cause of the network round trips.

>      };
>  
>      die "ERROR: archive format not defined\n"
> @@ -1611,6 +1616,23 @@ sub archive_auxiliaries_remove {
>      }
>  }
>  
> +sub decompress_iso
> +{

nit: style again

> +    my ($compression, $file) = @_;
> +
> +    my $raw = '';
> +    my $out = sub {
> +	my $output = shift;
> +	$raw .= "$output\n";
> +    };
> +
> +    my $info = decompressor_info('iso', $compression);
> +    my $decompressor = $info->{decompressor};
> +    
> +     run_command([@$decompressor, $file], outfunc => $out);

outfunc only captures STDOUT, not STDERR, so I don't think this part
does what you want it to do (based on the naming of the
'$decrypton_error' variable above).. there also is a 'logfunc'
(capturing both) and an 'errfunc' (capturing only STDERR). also, not all
output is necessarily an error, even in the face of `-q`. I think it
should be safe to assume that any decompression tool we call here will
fail run_command if the decompression fails for whatever reason, in
which case decompress_iso here will die, but that is *not* handled at
its call site, so no cleanup will happen and the compressed, downloaded
file will remain... note that by default, run_command will just forward
the command's output (on both FDs) to whatever those point to outside,
so a plain run_command should do the right thing in most circumstances,
and outfunc and friends are only needed if you actually need to do
something (parse, filter, ..) with the output.

> +     return wantarray ? ($raw, undef) : $raw;

why? we sometimes use wantarray, but only if it actually makes sense to
either return one or multiple values, depending on call-site context.. I
don't think it serves any purpose here ;)

> +}
> +
>  sub extract_vzdump_config_tar {
>      my ($archive, $conf_re) = @_;
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
>