[pve-devel] applied: [PATCH v7 common 1/1] tools: add download_file_from_url

Thomas Lamprecht t.lamprecht at proxmox.com
Tue Jun 15 14:25:40 CEST 2021


On 14.06.21 11:05, Lorenz Stechauner wrote:
> code is based on
> manager:PVE/API2/Nodes.pm:aplinfo
> 

applied, with a slightly adapted commit message you send afterwards and some followups.

I'm a bit sorry to not check on this more closely again earlier as I found quite some
issues when finally wanting to apply this, I fixed up most of them in a followup but the
remaining part of the series needs to adapt.

> Signed-off-by: Lorenz Stechauner <l.stechauner at proxmox.com>
> ---
>  src/PVE/Tools.pm | 124 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
> 
> diff --git a/src/PVE/Tools.pm b/src/PVE/Tools.pm
> index 16ae3d2..7b82e00 100644
> --- a/src/PVE/Tools.pm
> +++ b/src/PVE/Tools.pm
> @@ -1829,4 +1829,128 @@ sub safe_compare {
>      return $cmp->($left, $right);
>  }
>  
> +
> +# opts
> +#  -> hash_required
> +#       if 1, at least one checksum has to be specified otherwise an error will be thrown
> +#  -> http_proxy
> +#  -> https_proxy
> +#  -> verify_certificates
> +#  -> sha(1|224|256|384|512)sum
> +#  -> md5sum
> +sub download_file_from_url {
> +    my ($dest, $url, $opts) = @_;
> +
> +    my $tmpdest = "$dest.tmp.$$";

If we'd kept the fork_worker here (see below) this wouldn't be ideal, as now it's using
the pveproxy worker pid, which can be shared by multiple in-flight requests, I'd have moved
it down into the worker sub which is actually it's own process and has a more unique pid..

> +
> +    my $algorithm;
> +    my $expected;
> +

prefixed above two variables with "checksum_" to avoid over general names in longer methods.

> +    for ('sha512', 'sha384', 'sha256', 'sha224', 'sha1', 'md5') {
> +	if (defined($opts->{"${_}sum"})) {
> +	    $algorithm = $_;
> +	    $expected = $opts->{"${_}sum"};
> +	    last;
> +	}
> +    }
> +
> +    die "checksum required but not specified\n" if ($opts->{hash_required} && !$algorithm);
> +
> +    my $worker = sub  {
> +	my $upid = shift;
> +
> +	print "downloading $url to $dest\n";
> +
> +	eval {
> +	    if (-f $dest && $algorithm) {
> +		print "calculating checksum of existing file...\n";
> +		my $correct = check_file_hash($algorithm, $expected, $dest);
> +
> +		if ($correct) {
> +		    print "file already exists, no need to download\n";
> +		    return;
> +		} else {
> +		    print "mismatch, downloading\n";

made a die "abort" out of above, IMO this is not really safe, admin should delete
the existing file first explicitly (we may relax this in the future if actually
requested by users, but as default I'd like to start out with safer behavior).

> +		}
> +	    }
> +
> +	    my @cmd = ('/usr/bin/wget', '--progress=dot:mega', '-O', $tmpdest, $url);

Changed to a array ref and dropped the /usr/bin/, which may be wrong (in theory) on some
systems.

> +
> +	    local %ENV;
> +	    if ($opts->{http_proxy}) {
> +		$ENV{http_proxy} = $opts->{http_proxy};
> +	    }
> +	    if ($opts->{https_proxy}) {
> +		$ENV{https_proxy} = $opts->{https_proxy};
> +	    }
> +
> +	    my $verify = $opts->{verify_certificates} // 1;
> +	    if (!$verify) {
> +		push @cmd, '--no-check-certificate';
> +	    }
> +
> +	    if (run_command([[@cmd]]) != 0) {
> +		die "download failed: $!\n";
> +	    }

why double array refs? I changed above three lines to:

run_command($cmd, errmsg => "download failed");

> +
> +	    if ($algorithm) {
> +		print "calculating checksum...\n";
> +
> +		my $correct = check_file_hash($algorithm, $expected, $tmpdest);
> +
> +		if ($correct) {
> +		    print "checksum verified\n";

not telling what was wrong is never good, i.e., the calculated checksum should be printed here.
I made 'check_file_hash' to 'get_file_hash' and let it just return the computed digest, that
way we do not need to pass expected either and just check ourself.

> +		} else {
> +		    die "checksum mismatch\n";
> +		}
> +	    } else {
> +		print "no checksum for verification specified\n";

nit: just noise, makes the task log unnecessarily longer 

> +	    }
> +
> +	    if (!rename($tmpdest, $dest)) {
> +		die "unable to save file: $!\n";

nit: it's already saved, so this error is misleading, I changed it to "unable to rename temporary file: $!"
to better reflect where the actual issues happened.

> +	    }
> +	};
> +	my $err = $@;
> +
> +	unlink $tmpdest;

this can fail too, and we should warn the admin about that 

> +
> +	if ($err) {
> +	    print "\n";
> +	    die $err;
> +	}
> +
> +	print "download finished\n";
> +    };
> +
> +    my $rpcenv = PVE::RPCEnvironment::get();

you use PVE::RPCEnvironment here but are missing an use statement, it works only as other
modules in pveproxy/pvedaemon include it and perl has no usage-hygiene.

But it cannot be "fixed" by just using this here, there's a reason that we have no other
fork_worker/RPCEnv use in PVE::Tools, PVE::RPCEnvironment is in pve-access-control, which
is a consumer of pve-common, so adding this here would create a circular dependency which
we want to avoid at all costs (makes bootstrapping a huge PITA without any benefit).

So I dropped the whole worker stuff, that needs to happen at the caller side.

> +    my $user = $rpcenv->get_user();
> +
> +    (my $filename = $dest) =~ s!.*/([^/]*)$!$1!;

not really safe, would break quite a bit if the filename contains a colon : or white-space,
this needs to be encoded

> +
> +    return $rpcenv->fork_worker('download', $filename, $user, $worker);
> +}
> +
> +sub check_file_hash {
> +    my ($algorithm, $expected, $filename) = @_;
> +
> +    my $algorithm_map = {
> +	'md5' => sub { Digest::MD5->new },
> +	'sha1' => sub { Digest::SHA->new(1) },
> +	'sha224' => sub { Digest::SHA->new(224) },
> +	'sha256' => sub { Digest::SHA->new(256) },
> +	'sha384' => sub { Digest::SHA->new(384) },
> +	'sha512' => sub { Digest::SHA->new(512) },

You use the Digest:: modules but never add an import use statement for that, meaning again
that it's up to luck (i.e., some other module imports it) if this works..

E.g., the following minimal test does not work even if the worker stuff is dropped:

perl -we 'use PVE::Tools; PVE::Tools::download_file_from_url("/tmp/foo",
  "http://download.proxmox.com/iso/proxmox-ve_3.4-102d4547-6.iso", {md5sum => "37efacd45b70d5d720b11b468f26136b"});'

> +    };
> +
> +    my $digester = $algorithm_map->{$algorithm}->() or die "unknown algorithm '$algorithm'\n";
> +
> +    open(my $fh, '<', $filename) or die "unable to open '$filename': $!\n";
> +    binmode($fh);
> +
> +    my $digest = $digester->addfile($fh)->hexdigest;
> +
> +    return lc($digest) eq lc($expected);
> +}
> +
>  1;
> 






More information about the pve-devel mailing list