[pve-devel] [PATCH v3 storage 1/2] fix #1710: add retrieve method for storage

Thomas Lamprecht t.lamprecht at proxmox.com
Tue May 4 11:31:36 CEST 2021


On 04.05.21 10:56, Lorenz Stechauner wrote:
> Users are now able to download/retrieve any .iso/... file onto their
> storages and verify file integrity with checksums.
> 
> Signed-off-by: Lorenz Stechauner <l.stechauner at proxmox.com>
> ---
>  PVE/API2/Storage/Status.pm | 158 +++++++++++++++++++++++++++++++++++--
>  PVE/Storage.pm             |  10 +++
>  2 files changed, 163 insertions(+), 5 deletions(-)
> 
> diff --git a/PVE/API2/Storage/Status.pm b/PVE/API2/Storage/Status.pm
> index 897b4a7..8388714 100644
> --- a/PVE/API2/Storage/Status.pm
> +++ b/PVE/API2/Storage/Status.pm
> @@ -5,6 +5,8 @@ use warnings;
>  
>  use File::Path;
>  use File::Basename;
> +use HTTP::Request;
> +use LWP::UserAgent;
>  use PVE::Tools;
>  use PVE::INotify;
>  use PVE::Cluster;
> @@ -412,11 +414,7 @@ __PACKAGE__->register_method ({
>  	my $size = -s $tmpfilename;
>  	die "temporary file '$tmpfilename' does not exist\n" if !defined($size);
>  
> -	my $filename = $param->{filename};
> -
> -	chomp $filename;
> -	$filename =~ s/^.*[\/\\]//;
> -	$filename =~ s/[^-a-zA-Z0-9_.]/_/g;
> +	my $filename = PVE::Storage::normalize_content_filename($param->{filename});
>  
>  	my $path;
>  
> @@ -497,4 +495,154 @@ __PACKAGE__->register_method ({
>  	return $upid;
>     }});
>  
> +__PACKAGE__->register_method({
> +    name => 'retrieve',
> +    path => '{storage}/retrieve',

over general path/method name, I'd either use "retrieve-url" or use "download-url"

> +    method => 'POST',
> +    description => "Download templates and ISO images by using an URL.",
> +    proxyto => 'node',
> +    permissions => {
> +	check => [ 'and',
> +	    ['perm', '/storage/{storage}', [ 'Datastore.AllocateTemplate' ]],
> +	    ['perm', '/', [ 'Sys.Audit', 'Sys.Modify' ]],
> +	],
> +    },
> +    protected => 1,
> +    parameters => {
> +	additionalProperties => 0,
> +	properties => {
> +	    node => get_standard_option('pve-node'),
> +	    storage => get_standard_option('pve-storage-id'),
> +	    url => {
> +		description => "The URL to retrieve the file from.",
> +		type => 'string',
> +	    },
> +	    content => {
> +		description => "Content type.",

auto-detect could be done in general? At least for a few whitelisted types
(e.g., .iso), but no hard feelings here, we can always make this more flexible
in the future

> +		type => 'string', format => 'pve-storage-content',
> +	    },
> +	    filename => {
> +		description => "The name of the file to create. Alternatively the file name given by the server will be used.",

You mention "alternatively", but this parameter is not optional?

> +		type => 'string',
> +	    },
> +	    checksum => {
> +		description => "The expected checksum of the file.",
> +		type => 'string',
> +		requires => 'checksumalg',
> +		optional => 1,
> +	    },
> +	    checksumalg => {

weird name? I really do not get the need to abbreviate everything in some custom way.
APIs parameter should speak for them self, at least somewhat, just use
"checksum-algorithm"

> +		description => "The algorithm to claculate the checksum of the file.",

typo s/claculate/calculate/

> +		type => 'string', enum => ['md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512'],

The enum goes in its own line

> +		requires => 'checksum',
> +		optional => 1,
> +	    },
> +	    insecure => {
> +		description => "Allow TLS certificates to be invalid.",

Rather use "verify-certificates"  and default to 1 (true).

> +		type => 'boolean',
> +		optional => 1,
> +	    }
> +	},
> +    },
> +    returns => { type => "string" },

use
returns => {
   type => "string",
},


> +    code => sub {
> +	my ($param) = @_;
> +
> +	my $rpcenv = PVE::RPCEnvironment::get();
> +

unnecessary extra newline

> +	my $user = $rpcenv->get_user();
> +
> +	my $cfg = PVE::Storage::config();
> +
> +	my $node = $param->{node};
> +	my $storage = $param->{storage};
> +	my $scfg = PVE::Storage::storage_check_enabled($cfg, $storage, $node);
> +
> +	die "can't upload to storage type '$scfg->{type}'"

1. Without a \n at then perl will add the line + module to that error message,
not nice for stuff which is not internal (i.e., assert-like).

2. normally its considered good to add some reason/explanation to an error, so
that user can know what needs to actually change to make it work.

Maybe:
"cannot upload to storage '$storage' (type '$scfg->{type}'), not a file based storage!\n"

> +	    if !defined($scfg->{path});
> +
> +	my $content = $param->{content};
> +
> +	my $url = $param->{url};

optimizer nit, above three lines could be:

my ($content, $url) = $param->@{'content', 'url'};

> +
> +	die "invalid https or http url"
> +	    if $url !~ qr!^https?://!;

we normally place short post-if's into the same line as the statement it guards

> +
> +	my $checksum = $param->{checksum};
> +	my $hash_alg = $param->{checksumalg};

consistency please, if the param is named checksum*something then the variable should be
named similar (hash vs. checksum here).

Also, in general I'd like to see some locality between variable declaration and use

> +
> +	my $filename = PVE::Storage::normalize_content_filename($param->{filename});
> +
> +	my $path;
> +
> +	# MIME type is checked in front end only
> +	# this check is omitted here intentionally and replaced by file extension check
> +	if ($content eq 'iso') {
> +	    if ($filename !~ m![^/]+$PVE::Storage::iso_extension_re$!) {
> +		raise_param_exc({ filename => "missing '.iso' or '.img' extension" });
> +	    }
> +	    $path = PVE::Storage::get_iso_dir($cfg, $storage);
> +	} elsif ($content eq 'vztmpl') {
> +	    if ($filename !~ m![^/]+\.tar\.[gx]z$!) {
> +		raise_param_exc({ filename => "missing '.tar.gz' or '.tar.xz' extension" });
> +	    }
> +	    $path = PVE::Storage::get_vztmpl_dir($cfg, $storage);
> +	} else {
> +	    raise_param_exc({ content => "upload content type '$content' not allowed" });
> +	}
> +
> +	die "storage '$storage' does not support '$content' content"
> +	    if !$scfg->{content}->{$content};
> +
> +	my $dest = "$path/$filename";
> +
> +	PVE::Storage::activate_storage($cfg, $storage);
> +	File::Path::make_path($path);
> +
> +	# -L follows redirects
> +	# -f silent fail on error

why silently failing? wouldn't the error be interesting, should not have to big implications
now that this call requires elevated privs, or?

> +	my @curlcmd = ('curl', '-L', '-o', $dest, '-f');

we use wget with --progress=dot:mega for the appliances, so why curl here?

In general I ask myself if there would be some common helper possible for both,
appliances and this here, to avoid code duplication, did you looked into that.

> +	push @curlcmd, '--insecure'
> +	    if $param->{insecure};
> +
> +	my $cmd = [@curlcmd, $url];

$cmd is overly general

> +
> +	my $cmd_check = [ [ 'echo', $checksum, $dest ], [ "${hash_alg}sum", '-c' ] ];
> +	my $cmd_check_flat = [ 'echo', $checksum, $dest, '|', "${hash_alg}sum", '-c'  ];  # only used for logging

yeah, no, we don't do that, the log command is at risk to get outdated without noticing.
Either auto generated it from the $cmd_check or omit in in general (i.e., by just logging the
checksum algorithm used.

Also why do we rely on the external tools if we have usage of the perl
Digest::{MD5,SHA256,..} quite some times already?

> +
> +	my $worker = sub {
> +	    my $upid = shift;
> +
> +	    print "starting file download from: $url\n";
> +	    print "target node: $node\n";

omit above node info, it's already in the task UPID

> +	    print "target storage: $storage\n";

isn't that to in the task UPID?

> +	    print "target file: $dest\n";

> +	    print "command: " . join(' ', @$cmd) . "\n";

if use PVE::Tools shellquote or the like, but actually I'd omit that...

The five prints could be

print "download '$url' to $storage as $filename\n";

further advantage, that does not leaks that much info, e.g., the full path
may not be known even for those with sys.audit.

> +
> +	    eval { PVE::Tools::run_command($cmd, errmsg => 'download failed'); };
> +	    if (my $err = $@) {
> +		unlink $dest;

check the unlink, for a single file you can use a simple or:

unlink $dest or warn "could not cleanup '$dest' - $!\n"

> +		die $err;
> +	    }
> +	    print "finished file download successfully\n";
> +
> +	    if ($checksum) {
> +		print "validating checksum...\n";
> +		print "expected $hash_alg checksum is: $checksum\n";

quite chatty, above could go into one line. Information is good to have in a task log,
but it should be concise to avoid drowning the relevant in noise.

> +		print cmd_check print "checksum validation command: " . join(' ', @$cmd_check_flat) . "\n";
> +
> +		eval { PVE::Tools::run_command($cmd_check, errmsg => 'checksum mismatch'); };

this all is racy, i.e., if something aborts/kills the task or above hangs we have a time
window where the downloaded file already shows up as valid (but not yet checked!) one
at the storage. Use a temporary file ending (we have some use of those in various places,
basically something like .tmp.$$) and do a rename only at the end.


> +		if (my $err = $@) {
> +		    unlink $dest;
> +		    die $err;
> +		}
> +		print "validated checksum successfully\n";
> +	    }
> +	};
> +
> +	my $upid = $rpcenv->fork_worker('imgdownload', $storage, $user, $worker);
> +
> +	return $upid;
> +    }});
> +
>  1;
> diff --git a/PVE/Storage.pm b/PVE/Storage.pm
> index 122c3e9..d57fd43 100755
> --- a/PVE/Storage.pm
> +++ b/PVE/Storage.pm
> @@ -1931,4 +1931,14 @@ sub assert_sid_unused {
>      return undef;
>  }
>  
> +sub normalize_content_filename {
> +    my ($filename) = @_;
> +
> +    chomp $filename;
> +    $filename =~ s/^.*[\/\\]//;
> +    $filename =~ s/[^-a-zA-Z0-9_.]/_/g;
> +
> +    return $filename;
> +}
> +
>  1;
> 





More information about the pve-devel mailing list