[pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path

Mon Feb 3 16:42:49 CET 2025

Am 03.02.25 um 15:21 schrieb Max Carrara:
> On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote:
>> Am 09.01.25 um 15:48 schrieb Max Carrara:
>>
>> Why this kind of behavior with absolute paths? Seems surprising to me.
>> Wouldn't failing the call be better?
> 
> I decided to stick with Rust's behaviour [pathbuf] here and in most
> other places, simply in order to keep the behaviour consistent.
> 
> I'm not *really* a fan of it either, but I think it's better than
> failing the call -- IMO having to wrap path_join / path_push into an
> eval-block would be quite awkward for what's a rather "small" operation
> :s
> 
> Perhaps I should provide some additional context here as well so as to
> make my "design philosophy" more clear: The reason why I modeled
> PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because:
> 
>   1. Most (newer) developers will be much more likely to be familiar
>      with Rust instead of Perl, so if they should see this library,
>      they'll have a much easier time using it (or at least that's what
>      I *hope* would happen). Sort of as in: "Oh, this is just like the
>      path stuff in Rust, except for a few Perl-esque exceptions!"
> 
>   2. I wanted the library to be essentially exception-free, except where
>      obvious invariants aren't upheld by the caller (e.g. passing undef).
>      Unless you match the error message with a regex, there's no
>      structurally typed way to differ between errors in Perl, so I'd
>      like the "types" of errors to be mostly the same wherever possible.
> 
>   3. I didn't want to reinvent the wheel and instead stick to what other
>      popular libraries do as much as possible, while also avoiding any
>      additional abstractions (e.g. introducing an object for paths).
> 
> With all that being said, since that behaviour is surprising to you, I
> have no quarrels changing it! :P After all I'd like to have something
> that doesn't cause any friction when using it.
> 
> [pathbuf]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push
> 

You don't need to change it if there was good rationale behind it, which
you provided :) A short comment in the function documentation regarding
the inspiration would still be nice. Puts the responsibility on the
caller to not accidentally use an absolute path though. Maybe we can at
least warn instead? I suspect almost all calls with an absolute path in
the middle will be accidental.

>>
>>> +
>>> +    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
>>> +    # /etc/resolv.conf
>>> +
>>> +    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
>>> +    # /etc/hosts
>>> +
>>> +Throws an exception if any of the passed C<@paths> is C<undef>.
>>> +
>>> +=cut
>>> +
>>> +sub path_join : prototype(@) {
>>> +    my (@paths) = @_;
>>> +
>>> +    if (!scalar(@paths)) {
>>> +	return '';
>>> +    }
>>> +
>>> +    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
>>> +
>>
>> I think the rest could be written more efficiently like (untested):
>>
>> my $resulting_path = shift @paths;
>> for my $path (@paths) {
>>   if ($path =~ m#^/#) {
>>     $resulting_path = $path;
>>   } else {
>>     $resulting_path = path_push($resulting_path, $path);
>>   }
>> }
> 
> Note that I'm doing a reverse iteration below; I think doing it either
> way should be fine, because I doubt we'll ever need to join that many
> path components where it's actually going to make a difference ;P
> 

I also meant "efficiently" in terms of lines/readability ;) Shorter,
less loops and avoids index arithmetic.

>>
>>> +    # Find the last occurrence of a root directory and start conjoining the
>>> +    # components from there onwards
>>> +    my $index = scalar(@paths) - 1;
>>> +    while ($index > 0) {
>>> +	last if $paths[$index] =~ m#^/#;
>>> +	$index--;
>>> +    }
>>> +
>>> +    @paths = @paths[$index .. (scalar(@paths) - 1)];
>>> +
>>> +    my $resulting_path = shift @paths;
>>> +
>>> +    for my $path (@paths) {
>>> +	$resulting_path = path_push($resulting_path, $path);
>>> +    }
>>> +
>>> +    return $resulting_path;
>>> +}
>>> +
>>> +=head3 path_normalize($path)
>>> +
>>> +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the given
>>> +C<$path>.
>>> +
>>> +This removes unnecessary components of a path that can be safely
>>> +removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
>>> +
>>> +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
>>> +C<foo/bar/baz>.
>>> +
>>> +B<Note:> This will I<not> remove components referencing the parent directory,
>>> +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
>>> +remain as they are. However, the parent directory of C</> is C</>, so
>>> +C</../../foo> will be normalized to C</foo>.
>>> +
>>> +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> failed.
>>> +
>>> +=cut
>>> +
>>> +sub path_normalize : prototype($) {
>>> +    my ($path) = @_;
>>> +
>>> +    croak "\$path is undef" if !defined($path);
>>> +
>>> +    my $cleaned_path = eval {
>>> +	File::Spec->canonpath($path);
>>> +    };
>>> +
>>
>> Style nit: blank lines between eval and using $@ are better avoided
>> IMHO. I'd also have the eval expression be a single line, but no strong
>> feelings.
> 
> Could you perhaps elaborate why, please?
> 

Because new code gets added more easily in between. I've seen that
happen before.

---snip 8<---

>>> +Throws an exception if any of the arguments is C<undef>.
>>> +
>>> +=cut
>>> +
>>> +sub path_push : prototype($$) {
>>> +    my ($path, $other) = @_;
>>> +
>>> +    croak "\$path is undef" if !defined($path);
>>> +    croak "\$other is undef" if !defined($other);
>>> +
>>> +    return $path if $other eq '';
>>> +    return $other if path_is_absolute($other);
>>> +
>>> +    my $need_sep = $path ne '' && $path !~ m#/$#;
>>> +
>>> +    $path .= "/" if $need_sep;
>>> +    $path .= $other;
>>> +
>>> +    return $path;
>>> +}
>>> +
>>> +=head3 path_pop($path)
>>> +
>>> +Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
>>> +
>>
>> Why have this alias? The name suggests it would behave like an inverse
>> to path_push(), but it does not, because of the normalization of
>> path_parent(). So I'd rather not have it or have it be a non-normalizing
>> version (but maybe not worth it) to avoid surprises.
> 
> Could you please elaborate what you mean regarding the normalization of
> path_parent()?

I just mean push "." followed by pop is not the same as you start with.

> 
> path_parent() (or path_pop()) will only normalize as much as it needs to
> in order to remove the last path component and get to the one that
> precedes that:
> 
>   my $p = "foo/././bar/././baz/./.";
> 
>   $p = PVE::Path::path_parent($p):  # or path_pop()
>   # $p is now "foo/././bar"
> 
> path_push() does actually behave like an inverse if you include the
> redundant stuff:
> 
>   $p = PVE::Path::path_push($p, "././baz/./.")
>   # $p is now "foo/././bar/././baz/./." again
> 
> It's not a complete inverse, that's true (as in, it's not bijective ;P),
> because you can't replicate the example above with a path like
> "foo///bar///baz" since you can't push "///baz" onto "foo///bar"
> ("///baz" is treated as an absolute path).
> 
> I think a non-normalizing version wouldn't really be that nice to use;
> one would have to constantly remember to call path_normalize()
> beforehand to actually get the desired result (i.e. getting the parent
> directory).

Agreed.

> 
> I can just remove the alias if you prefer; it's not equivalent to Rust's
> PathBuf::pop() [pop] and if it's a source of confusion, we can probably
> do without.
> 
> [pop]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.pop
> 

I see where the inspiration comes from then :) I get why Rust has both,
because one is for Path and the other for PathBuf. And since they are
essentially the same operation, fine by me to keep this alias.

>>
>>> +=cut
>>> +
>>> +sub path_pop : prototype($) {
>>> +    my ($path) = @_;
>>> +    return path_parent($path);
>>> +}
>>> +
>>> +=head3 path_file_name($path)
>>> +
>>> +Returns the last component of the given C<$path>, if it is a legal file name,
>>> +or C<undef> otherwise.
>>> +
>>> +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
>>> +there is no valid file name.
>>> +
>>> +B<Note:> This does not check whether the given C<$path> actually points to a
>>> +file or a directory etc.
>>> +
>>> +Throws an exception if C<$path> is C<undef>.
>>> +
>>> +=cut
>>> +
>>> +sub path_file_name : prototype($) {
>>> +    my ($path) = @_;
>>> +
>>> +    croak "\$path is undef" if !defined($path);
>>> +
>>> +    my @components = path_components($path);
>>> +
>>> +    if (!scalar(@components)) {
>>> +	return;
>>> +    }
>>> +
>>> +    if (
>>> +	scalar(@components) == 1
>>> +	&& ($components[0] eq '/' || $components[0] eq '.')
>>> +    ) {
>>
>> Style nit: this conditional fits well into 100 characters, so can be one
>> line instead of four
> 
> Fair point, but I prefer to have it a little more readable here, and
> it's still in line with our style guide. :P

Personally, I find this version "more noisy" to read, because of the
split, but no hard feelings :)

---snip 8<---

>>> +=cut
>>> +
>>> +sub path_file_prefix : prototype($) {
>>> +    my ($path) = @_;
>>> +
>>> +    croak "\$path is undef" if !defined($path);
>>> +
>>> +    my $file_name = path_file_name($path);
>>> +    return undef if !defined($file_name);
>>> +
>>> +    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
>>> +    return $prefix;
>>> +}
>>> +
>>> +=head3 path_with_file_prefix($path, $prefix)
>>> +
>>
>> Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be
>> nicer to have a single
>> path_with_file_name_from_parts($path, $prefix, @suffixes)
>> function instead of these? Would seem more natural/straightforward to
>> me. The implementations are rather involved IMHO compared to how useful
>> the functions are. It's very easy to get the behavior for the (I suspect
>> rather uncommon) case of where you want to replace only prefix or
>> suffix(es) without already knowing the other. Just need to call
>> path_file_parts() first. Or did you take inspiration from somewhere else
>> for these?
> 
> I'll have to disagree here precisely because having to call
> path_file_parts() first is IMO tedious from the caller's perspective. In
> many cases I rarely care about all of the individual parts; instead I
> just e.g. want to rename a file while keeping the suffix(es), or I want to
> trim off the last suffix of something like "archive.tar.gz" because I'm
> decompressing it to "archive.tar" or similar.
> 
> In other words, in a lot of cases I just want to perform one individual
> operation, and the path_file_* and path_with_file_* functions represent
> all those individual operations as concisely as possible. The burden of
> "figuring out what to do" shouldn't be given to the caller in such
> cases.
> 
> Regarding inspiration: These functions were inspired after having read a
> related Rust GitHub issue [issue] where a lot of this was discussed as
> well.
> 
> That issue also suggests having some kind of iterator for file parts,
> that other methods of std::path::Path can be based on; this is actually
> what led me to adding path_file_parts() below. You have access to the
> individual parts there and can just join() them to the file name. That
> would be very similar to the path_with_file_name_from_parts() function
> you suggested. :P
> 
> [issue]: https://github.com/rust-lang/rust/issues/86319
> 

Okay, thank you for elaborating! Maybe such calls are more common than I
think.