[pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path

Mon Feb 3 17:45:32 CET 2025

On Mon Feb 3, 2025 at 4:42 PM CET, Fiona Ebner wrote:
> Am 03.02.25 um 15:21 schrieb Max Carrara:
> > On Fri Jan 31, 2025 at 1:23 PM CET, Fiona Ebner wrote:
> >> Am 09.01.25 um 15:48 schrieb Max Carrara:
> >>
> >> Why this kind of behavior with absolute paths? Seems surprising to me.
> >> Wouldn't failing the call be better?
> > 
> > I decided to stick with Rust's behaviour [pathbuf] here and in most
> > other places, simply in order to keep the behaviour consistent.
> > 
> > I'm not *really* a fan of it either, but I think it's better than
> > failing the call -- IMO having to wrap path_join / path_push into an
> > eval-block would be quite awkward for what's a rather "small" operation
> > :s
> > 
> > Perhaps I should provide some additional context here as well so as to
> > make my "design philosophy" more clear: The reason why I modeled
> > PVE::Path (mostly) after Rust's std::path::{Path, PathBuf} is because:
> > 
> >   1. Most (newer) developers will be much more likely to be familiar
> >      with Rust instead of Perl, so if they should see this library,
> >      they'll have a much easier time using it (or at least that's what
> >      I *hope* would happen). Sort of as in: "Oh, this is just like the
> >      path stuff in Rust, except for a few Perl-esque exceptions!"
> > 
> >   2. I wanted the library to be essentially exception-free, except where
> >      obvious invariants aren't upheld by the caller (e.g. passing undef).
> >      Unless you match the error message with a regex, there's no
> >      structurally typed way to differ between errors in Perl, so I'd
> >      like the "types" of errors to be mostly the same wherever possible.
> > 
> >   3. I didn't want to reinvent the wheel and instead stick to what other
> >      popular libraries do as much as possible, while also avoiding any
> >      additional abstractions (e.g. introducing an object for paths).
> > 
> > With all that being said, since that behaviour is surprising to you, I
> > have no quarrels changing it! :P After all I'd like to have something
> > that doesn't cause any friction when using it.
> > 
> > [pathbuf]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push
> > 
>
> You don't need to change it if there was good rationale behind it, which
> you provided :) A short comment in the function documentation regarding
> the inspiration would still be nice. Puts the responsibility on the
> caller to not accidentally use an absolute path though. Maybe we can at
> least warn instead? I suspect almost all calls with an absolute path in
> the middle will be accidental.

Sure, that sounds fine by me! :)

>
> >>
> >>> +
> >>> +    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
> >>> +    # /etc/resolv.conf
> >>> +
> >>> +    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
> >>> +    # /etc/hosts
> >>> +
> >>> +Throws an exception if any of the passed C<@paths> is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_join : prototype(@) {
> >>> +    my (@paths) = @_;
> >>> +
> >>> +    if (!scalar(@paths)) {
> >>> +	return '';
> >>> +    }
> >>> +
> >>> +    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
> >>> +
> >>
> >> I think the rest could be written more efficiently like (untested):
> >>
> >> my $resulting_path = shift @paths;
> >> for my $path (@paths) {
> >>   if ($path =~ m#^/#) {
> >>     $resulting_path = $path;
> >>   } else {
> >>     $resulting_path = path_push($resulting_path, $path);
> >>   }
> >> }
> > 
> > Note that I'm doing a reverse iteration below; I think doing it either
> > way should be fine, because I doubt we'll ever need to join that many
> > path components where it's actually going to make a difference ;P
> > 
>
> I also meant "efficiently" in terms of lines/readability ;) Shorter,
> less loops and avoids index arithmetic.

Oooh, that makes sense then! Sure, I'll give it a try :)

>
> >>
> >>> +    # Find the last occurrence of a root directory and start conjoining the
> >>> +    # components from there onwards
> >>> +    my $index = scalar(@paths) - 1;
> >>> +    while ($index > 0) {
> >>> +	last if $paths[$index] =~ m#^/#;
> >>> +	$index--;
> >>> +    }
> >>> +
> >>> +    @paths = @paths[$index .. (scalar(@paths) - 1)];
> >>> +
> >>> +    my $resulting_path = shift @paths;
> >>> +
> >>> +    for my $path (@paths) {
> >>> +	$resulting_path = path_push($resulting_path, $path);
> >>> +    }
> >>> +
> >>> +    return $resulting_path;
> >>> +}
> >>> +
> >>> +=head3 path_normalize($path)
> >>> +
> >>> +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the given
> >>> +C<$path>.
> >>> +
> >>> +This removes unnecessary components of a path that can be safely
> >>> +removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
> >>> +
> >>> +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
> >>> +C<foo/bar/baz>.
> >>> +
> >>> +B<Note:> This will I<not> remove components referencing the parent directory,
> >>> +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
> >>> +remain as they are. However, the parent directory of C</> is C</>, so
> >>> +C</../../foo> will be normalized to C</foo>.
> >>> +
> >>> +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> failed.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_normalize : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my $cleaned_path = eval {
> >>> +	File::Spec->canonpath($path);
> >>> +    };
> >>> +
> >>
> >> Style nit: blank lines between eval and using $@ are better avoided
> >> IMHO. I'd also have the eval expression be a single line, but no strong
> >> feelings.
> > 
> > Could you perhaps elaborate why, please?
> > 
>
> Because new code gets added more easily in between. I've seen that
> happen before.

Ah, okay! Sure, then I'll change it. At first I feared that there was
some (--> yet another) obscure Perl edge case I didn't know about. :P

>
> ---snip 8<---
>
> >>> +Throws an exception if any of the arguments is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_push : prototype($$) {
> >>> +    my ($path, $other) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +    croak "\$other is undef" if !defined($other);
> >>> +
> >>> +    return $path if $other eq '';
> >>> +    return $other if path_is_absolute($other);
> >>> +
> >>> +    my $need_sep = $path ne '' && $path !~ m#/$#;
> >>> +
> >>> +    $path .= "/" if $need_sep;
> >>> +    $path .= $other;
> >>> +
> >>> +    return $path;
> >>> +}
> >>> +
> >>> +=head3 path_pop($path)
> >>> +
> >>> +Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
> >>> +
> >>
> >> Why have this alias? The name suggests it would behave like an inverse
> >> to path_push(), but it does not, because of the normalization of
> >> path_parent(). So I'd rather not have it or have it be a non-normalizing
> >> version (but maybe not worth it) to avoid surprises.
> > 
> > Could you please elaborate what you mean regarding the normalization of
> > path_parent()?
>
> I just mean push "." followed by pop is not the same as you start with.
>
> > 
> > path_parent() (or path_pop()) will only normalize as much as it needs to
> > in order to remove the last path component and get to the one that
> > precedes that:
> > 
> >   my $p = "foo/././bar/././baz/./.";
> > 
> >   $p = PVE::Path::path_parent($p):  # or path_pop()
> >   # $p is now "foo/././bar"
> > 
> > path_push() does actually behave like an inverse if you include the
> > redundant stuff:
> > 
> >   $p = PVE::Path::path_push($p, "././baz/./.")
> >   # $p is now "foo/././bar/././baz/./." again
> > 
> > It's not a complete inverse, that's true (as in, it's not bijective ;P),
> > because you can't replicate the example above with a path like
> > "foo///bar///baz" since you can't push "///baz" onto "foo///bar"
> > ("///baz" is treated as an absolute path).
> > 
> > I think a non-normalizing version wouldn't really be that nice to use;
> > one would have to constantly remember to call path_normalize()
> > beforehand to actually get the desired result (i.e. getting the parent
> > directory).
>
> Agreed.
>
> > 
> > I can just remove the alias if you prefer; it's not equivalent to Rust's
> > PathBuf::pop() [pop] and if it's a source of confusion, we can probably
> > do without.
> > 
> > [pop]: https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.pop
> > 
>
> I see where the inspiration comes from then :) I get why Rust has both,
> because one is for Path and the other for PathBuf. And since they are
> essentially the same operation, fine by me to keep this alias.
>
> >>
> >>> +=cut
> >>> +
> >>> +sub path_pop : prototype($) {
> >>> +    my ($path) = @_;
> >>> +    return path_parent($path);
> >>> +}
> >>> +
> >>> +=head3 path_file_name($path)
> >>> +
> >>> +Returns the last component of the given C<$path>, if it is a legal file name,
> >>> +or C<undef> otherwise.
> >>> +
> >>> +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
> >>> +there is no valid file name.
> >>> +
> >>> +B<Note:> This does not check whether the given C<$path> actually points to a
> >>> +file or a directory etc.
> >>> +
> >>> +Throws an exception if C<$path> is C<undef>.
> >>> +
> >>> +=cut
> >>> +
> >>> +sub path_file_name : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my @components = path_components($path);
> >>> +
> >>> +    if (!scalar(@components)) {
> >>> +	return;
> >>> +    }
> >>> +
> >>> +    if (
> >>> +	scalar(@components) == 1
> >>> +	&& ($components[0] eq '/' || $components[0] eq '.')
> >>> +    ) {
> >>
> >> Style nit: this conditional fits well into 100 characters, so can be one
> >> line instead of four
> > 
> > Fair point, but I prefer to have it a little more readable here, and
> > it's still in line with our style guide. :P
>
> Personally, I find this version "more noisy" to read, because of the
> split, but no hard feelings :)
>
> ---snip 8<---
>
> >>> +=cut
> >>> +
> >>> +sub path_file_prefix : prototype($) {
> >>> +    my ($path) = @_;
> >>> +
> >>> +    croak "\$path is undef" if !defined($path);
> >>> +
> >>> +    my $file_name = path_file_name($path);
> >>> +    return undef if !defined($file_name);
> >>> +
> >>> +    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
> >>> +    return $prefix;
> >>> +}
> >>> +
> >>> +=head3 path_with_file_prefix($path, $prefix)
> >>> +
> >>
> >> Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be
> >> nicer to have a single
> >> path_with_file_name_from_parts($path, $prefix, @suffixes)
> >> function instead of these? Would seem more natural/straightforward to
> >> me. The implementations are rather involved IMHO compared to how useful
> >> the functions are. It's very easy to get the behavior for the (I suspect
> >> rather uncommon) case of where you want to replace only prefix or
> >> suffix(es) without already knowing the other. Just need to call
> >> path_file_parts() first. Or did you take inspiration from somewhere else
> >> for these?
> > 
> > I'll have to disagree here precisely because having to call
> > path_file_parts() first is IMO tedious from the caller's perspective. In
> > many cases I rarely care about all of the individual parts; instead I
> > just e.g. want to rename a file while keeping the suffix(es), or I want to
> > trim off the last suffix of something like "archive.tar.gz" because I'm
> > decompressing it to "archive.tar" or similar.
> > 
> > In other words, in a lot of cases I just want to perform one individual
> > operation, and the path_file_* and path_with_file_* functions represent
> > all those individual operations as concisely as possible. The burden of
> > "figuring out what to do" shouldn't be given to the caller in such
> > cases.
> > 
> > Regarding inspiration: These functions were inspired after having read a
> > related Rust GitHub issue [issue] where a lot of this was discussed as
> > well.
> > 
> > That issue also suggests having some kind of iterator for file parts,
> > that other methods of std::path::Path can be based on; this is actually
> > what led me to adding path_file_parts() below. You have access to the
> > individual parts there and can just join() them to the file name. That
> > would be very similar to the path_with_file_name_from_parts() function
> > you suggested. :P
> > 
> > [issue]: https://github.com/rust-lang/rust/issues/86319
> > 
>
> Okay, thank you for elaborating! Maybe such calls are more common than I
> think.

You're welcome!

Also thanks for the thorough review, it's much appreciated :) I'll see
if I can send out a v4 soon enough.