[pve-devel] [PATCH v3 pve-common 01/12] introduce PVE::Path

Fri Jan 31 13:23:52 CET 2025

Am 09.01.25 um 15:48 schrieb Max Carrara:
> +=head3 path_join(@paths)
> +
> +Joins multiple paths together. All kinds of paths are supported.
> +
> +Does not perform any C<L<< normalization|/"path_normalize($path)" >>>.
> +
> +    my $joined = path_join("foo", "bar/baz", "qux.txt");
> +    # foo/bar/baz/qux.txt
> +
> +    my $joined = path_join("/", "etc/pve/", "storage.cfg");
> +    # /etc/pve/storage.cfg
> +
> +Similar to C<L<< path_push()|/"path_push($path, $other)">>>, should any of the
> +C<@paths> be an absolute path, it I<replaces> all preceding paths.

Seems like the docs are somehow broken here, looking at it with perldoc,
it just states 'Similar to "path_push()"' and the rest of the sentence
is missing.

Why this kind of behavior with absolute paths? Seems surprising to me.
Wouldn't failing the call be better?

> +
> +    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
> +    # /etc/resolv.conf
> +
> +    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
> +    # /etc/hosts
> +
> +Throws an exception if any of the passed C<@paths> is C<undef>.
> +
> +=cut
> +
> +sub path_join : prototype(@) {
> +    my (@paths) = @_;
> +
> +    if (!scalar(@paths)) {
> +	return '';
> +    }
> +
> +    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
> +

I think the rest could be written more efficiently like (untested):

my $resulting_path = shift @paths;
for my $path (@paths) {
  if ($path =~ m#^/#) {
    $resulting_path = $path;
  } else {
    $resulting_path = path_push($resulting_path, $path);
  }
}

> +    # Find the last occurrence of a root directory and start conjoining the
> +    # components from there onwards
> +    my $index = scalar(@paths) - 1;
> +    while ($index > 0) {
> +	last if $paths[$index] =~ m#^/#;
> +	$index--;
> +    }
> +
> +    @paths = @paths[$index .. (scalar(@paths) - 1)];
> +
> +    my $resulting_path = shift @paths;
> +
> +    for my $path (@paths) {
> +	$resulting_path = path_push($resulting_path, $path);
> +    }
> +
> +    return $resulting_path;
> +}
> +
> +=head3 path_normalize($path)
> +
> +Wrapper for L<C<File::Spec/canonpath>>. Performs a logical cleanup of the given
> +C<$path>.
> +
> +This removes unnecessary components of a path that can be safely
> +removed, such as C<.>, trailing C</> or repeated occurrences of C</>.
> +
> +For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
> +C<foo/bar/baz>.
> +
> +B<Note:> This will I<not> remove components referencing the parent directory,
> +i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
> +remain as they are. However, the parent directory of C</> is C</>, so
> +C</../../foo> will be normalized to C</foo>.
> +
> +Throws an exception if C<$path> is C<undef> or the call to C<canonpath> failed.
> +
> +=cut
> +
> +sub path_normalize : prototype($) {
> +    my ($path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +
> +    my $cleaned_path = eval {
> +	File::Spec->canonpath($path);
> +    };
> +

Style nit: blank lines between eval and using $@ are better avoided
IMHO. I'd also have the eval expression be a single line, but no strong
feelings.

> +    croak "failed to clean up path: $@" if $@;
> +
> +    return $cleaned_path;
> +}
> +
> +=head3 path_parent($path)
> +
> +Returns the given C<$path> without its final component, if there is one.
> +
> +Trailing and repeated occurrences of C</> and C<.> are normalized on the fly
> +when needed. This means that e.g. C<foo////bar///.//> becomes C<foo>, but
> +C<foo/.//bar//./baz> becomes C<foo/.//bar>.
> +
> +This function's behaviour is almost identical to Rust's
> +L<< Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent >>,
> +with a few adaptations made wherever Perl treats things differently:
> +
> +=over
> +
> +=item * C</foo/bar> becomes C</foo>, C<foo/bar> becomes C<foo>.
> +
> +=item * C</foo> becomes C</>.
> +
> +=item * C<foo/bar/..> becomes C<foo/bar>. Note that C<foo/bar> is not
> +necessarily the real parent in the filesystem in the case of e.g. symlinks.
> +
> +=item * C<foo/../bar> becomes C<foo/..>.
> +
> +=item * C<foo> is interpreted as C<./foo> and becomes C<.>. This is because Perl's
> +C<L<File::Spec/canonpath>> interprets C<./foo> and C<foo> as the same thing.

I fail to see the "because" here. Why is this special handling important
in practice, i.e. why exactly care about the behavior of canonpath here?
It seems really surprising to get "." when asking for the parent of
"foo". With the normalization explained above, I'd argue it'd be much
more natural to return the empty string for both "./foo" and "foo". Or
if you want to follow Rust's parent() more closely (which I guess is the
reason for "/" and "." having different results), have "./foo" return
"." and "foo" return the empty string.

> +=head3 path_push($path, $other)
> +
> +Extends C<$path> with C<$other>, returning a new path.
> +
> +If C<$other> is absolute, it will be returned instead.
> +

Again, seems rather surprising to me to auto-magically do this.

> +Throws an exception if any of the arguments is C<undef>.
> +
> +=cut
> +
> +sub path_push : prototype($$) {
> +    my ($path, $other) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +    croak "\$other is undef" if !defined($other);
> +
> +    return $path if $other eq '';
> +    return $other if path_is_absolute($other);
> +
> +    my $need_sep = $path ne '' && $path !~ m#/$#;
> +
> +    $path .= "/" if $need_sep;
> +    $path .= $other;
> +
> +    return $path;
> +}
> +
> +=head3 path_pop($path)
> +
> +Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
> +

Why have this alias? The name suggests it would behave like an inverse
to path_push(), but it does not, because of the normalization of
path_parent(). So I'd rather not have it or have it be a non-normalizing
version (but maybe not worth it) to avoid surprises.

> +=cut
> +
> +sub path_pop : prototype($) {
> +    my ($path) = @_;
> +    return path_parent($path);
> +}
> +
> +=head3 path_file_name($path)
> +
> +Returns the last component of the given C<$path>, if it is a legal file name,
> +or C<undef> otherwise.
> +
> +If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
> +there is no valid file name.
> +
> +B<Note:> This does not check whether the given C<$path> actually points to a
> +file or a directory etc.
> +
> +Throws an exception if C<$path> is C<undef>.
> +
> +=cut
> +
> +sub path_file_name : prototype($) {
> +    my ($path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +
> +    my @components = path_components($path);
> +
> +    if (!scalar(@components)) {
> +	return;
> +    }
> +
> +    if (
> +	scalar(@components) == 1
> +	&& ($components[0] eq '/' || $components[0] eq '.')
> +    ) {

Style nit: this conditional fits well into 100 characters, so can be one
line instead of four

> +	return;
> +    }
> +
> +    if ($components[-1] eq '..') {
> +	return;
> +    }
> +
> +    return $components[-1];
> +}
> +
> +=head3 path_with_file_name($path, $file_name)
> +
> +Returns C<$path> with C<$file_name> as the new last component.
> +
> +This is essentially like calling C<L<< path_parent()|/"path_parent($path)" >>>
> +and using C<L<< path_push()|/"path_push($path, $other)" >>> to append the new
> +file name, but handles a few extra cases:
> +
> +=over
> +
> +=item * If C<$path> is C</>, appends C<$file_name>.
> +
> +=item * If C<$path> is an empty string, appends C<$file_name>.
> +
> +=item * If C<$path> ends with a parent directory reference (C<..>), replaces it
> +with C<$file_name>.

Hmm, why is that an "extra case" compared to calling path_parent() and
path_push()? I.e. your implementation doesn't handle ".." specifically.

> +
> +=back
> +
> +Throws an exception if any of the arguments is C<undef> or if C<$file_name>
> +contains a path separator (C</>).
> +
> +=cut
> +
> +sub path_with_file_name : prototype($$) {
> +    my ($path, $file_name) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +    croak "\$file_name is undef" if !defined($file_name);
> +    croak "\$file_name contains a path separator: $file_name" if $file_name =~ m|/|;
> +
> +    my $parent = path_parent($path);
> +
> +    # undef means that $path was either "" or "/", so we can just append to it
> +    return ($path . $file_name) if !defined($parent);
> +
> +    # Because the parent of "foo" is ".", return $file_name to stay consistent.
> +    # Otherwise, we'd end up with a current path ref prepended ("./$file_name")
> +    if ($parent eq '.' && $path !~ m|/|) {
> +	return $file_name;
> +    }

See, you already need special handling because of that surprising
behavior ;)

> +
> +    return path_push($parent, $file_name);
> +}
> +
> +my sub _path_file_prefix_suffix_str : prototype($) {
> +    my ($file_name) = @_;
> +
> +    confess "\$file_name is undef" if !defined($file_name);
> +
> +    confess "\$prefix not matched" if $file_name !~ m|^(\.*[^\.]*)(.*)|;
> +    my ($prefix, $suffix_str) = ($1, $2);
> +
> +    return ($prefix, $suffix_str);
> +}
> +
> +my sub _path_file_suffixes_from_str : prototype($) {
> +    my ($suffix_str) = @_;
> +
> +    confess "\$suffix_str is undef" if !defined($suffix_str);
> +
> +    # Suffixes are extracted "manually" because join()ing the result of split()
> +    # results in a different file name than the original. Let's say you have a
> +    # file named "foo.bar.". The correct suffixes would be ("bar", "").
> +    # With split, you get the following:
> +    #     split(/\./, ".bar.")     --> ("", "bar")     --> join()ed to "foo..bar"
> +    #     split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to "foo..bar."

Couldn't you just shift away the first entry from the result?

> +    my @suffixes = ();
> +    while ($suffix_str =~ s|^(\.[^\.]*)||) {
> +	my $suffix = $1;
> +	$suffix =~ s|^\.||;

Nit: Could also use substr() rather then regex replace.

> +	push(@suffixes, $suffix);
> +    }
> +
> +    return @suffixes;
> +}
> +
> +=head3 path_file_prefix($path)
> +
> +Returns the prefix of the file name of the given C<$path>. If the C<$path> does
> +not have a valid file name and thus no prefix, C<undef> is returned instead.
> +
> +The prefix of a file name is the part before any extensions (suffixes).
> +
> +    my $prefix = path_file_prefix("/etc/resolv.conf");
> +    # resolv
> +
> +    my $prefix = path_file_prefix("/tmp/archive.tar.zst");
> +    # archive
> +
> +Throws an exception if C<$path> is C<undef>.
> +

Documentation should mention how dot files (e.g. .foo.txt) are treated.

> +=cut
> +
> +sub path_file_prefix : prototype($) {
> +    my ($path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +
> +    my $file_name = path_file_name($path);
> +    return undef if !defined($file_name);
> +
> +    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
> +    return $prefix;
> +}
> +
> +=head3 path_with_file_prefix($path, $prefix)
> +

Hmm, looking at this and path_with_file_suffix{,es}(), would it maybe be
nicer to have a single
path_with_file_name_from_parts($path, $prefix, @suffixes)
function instead of these? Would seem more natural/straightforward to
me. The implementations are rather involved IMHO compared to how useful
the functions are. It's very easy to get the behavior for the (I suspect
rather uncommon) case of where you want to replace only prefix or
suffix(es) without already knowing the other. Just need to call
path_file_parts() first. Or did you take inspiration from somewhere else
for these?

---snip 8<---

> +=head3 path_file_parts($path)
> +
> +Returns the parts that constitute the file name (prefix and suffixes) of a
> +C<$path> as a list. If the C<$path> does not have a valid file name, an empty
> +list is returned instead.
> +
> +These parts are split in such a way that allows them to be C<join>ed together,
> +resulting in the original file name of the given C<$path> again.
> +
> +    my @file_parts = path_file_parts("/etc/pve/firewall/cluster.fw");
> +    # ("cluster", "fw")
> +    my $file_name = join(".", @file_parts);
> +
> +    my @file_parts = path_file_parts("/tmp/archive.tar.gz");
> +    # ("archive", "tar", "gz")
> +    my $file_name = join(".", @file_parts);
> +
> +Throws an exception if C<$path> is C<undef>.
> +

Should mention how dot files are treated.

> +=cut
> +
> +sub path_file_parts : prototype($) {
> +    my ($path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +
> +    my $file_name = path_file_name($path);
> +    if (!defined($file_name)) {
> +	return ();
> +    }
> +
> +    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
> +    my @suffixes = _path_file_suffixes_from_str($suffix_str);
> +
> +    return ($prefix, @suffixes);
> +}
> +
> +=head3 path_starts_with($path, $other_path)
> +
> +Checks whether a C<$path> starts with the components of C<$other_path>.
> +
> +    my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", "/etc/pve");
> +    # 1
> +
> +Throws an exception if any of the arguments is C<undef>.
> +

Should document the behavior for empty paths as well as the normalization.

> +=cut
> +
> +sub path_starts_with : prototype($$) {
> +    my ($path, $other_path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +    croak "\$other_path if undef" if !defined($other_path);
> +
> +    # Nothing starts with nothing
> +    return 1 if ($path eq '' && $other_path eq '');
> +
> +    # Nothing cannot start with something
> +    # Something cannot start with nothing
> +    return if ($path eq '' || $other_path eq '');
> +
> +    my @components = path_components($path);
> +    my @other_components = path_components($other_path);
> +
> +    my @pairs = zip_shortest(\@components, \@other_components);

Why not just shift off from both in a loop?

> +
> +    # for my ($comp, $other_comp) (@pairs) is experimental
> +    for my $pair (@pairs) {
> +	my ($comp, $other_comp) = $pair->@*;
> +
> +	if ($comp ne $other_comp) {
> +	    return;
> +	}
> +    }
> +
> +    return 1;
> +}
> +
> +=head3 path_ends_with($path, $other_path)
> +
> +Checks whether a C<$path> ends with the components of C<$other_path>.
> +
> +    my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", "firewall/cluster.fw");
> +    # 1
> +
> +Throws an exception if any of the arguments is C<undef>.
> +

Should document the behavior for empty paths as well as the normalization.

> +=cut
> +
> +sub path_ends_with : prototype($$) {
> +    my ($path, $other_path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +    croak "\$other_path if undef" if !defined($other_path);
> +
> +    # Nothing ends with nothing
> +    return 1 if ($path eq '' && $other_path eq '');
> +
> +    # Nothing cannot end with something
> +    # Something cannot end with nothing
> +    return if ($path eq '' || $other_path eq '');
> +
> +    my @components_rev = reverse(path_components($path));
> +    my @other_components_rev = reverse(path_components($other_path));
> +
> +    my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev);

Why not just shift off from both in a loop?

> +
> +    # for my ($comp, $other_comp) (@pairs_rev) is experimental
> +    for my $pair (@pairs_rev) {
> +	my ($comp, $other_comp) = $pair->@*;
> +
> +	if ($comp ne $other_comp) {
> +	    return;
> +	}
> +    }
> +
> +    return 1;
> +}
> +
> +=head3 path_equals($path, $other_path)
> +
> +Checks whether C<$path> equals C<$other_path>. The paths are compared
> +by their components, meaning that it's not necessary to
> +L<< normalize|/"path_normalize($path)" >> them beforehand.
> +
> +=cut
> +
> +sub path_equals : prototype($$) {
> +    my ($path, $other_path) = @_;
> +
> +    croak "\$path is undef" if !defined($path);
> +    croak "\$other_path if undef" if !defined($other_path);
> +
> +    # Nothing is nothing
> +    return 1 if ($path eq '' && $other_path eq '');
> +
> +    # Nothing is not something
> +    # Something is not nothing
> +    return if ($path eq '' || $other_path eq '');
> +
> +    my @components = path_components($path);
> +    my @other_components = path_components($other_path);
> +
> +    return if scalar(@components) != scalar(@other_components);
> +
> +    my @pairs = zip_longest(\@components, \@other_components);

Why not just shift off from both in a loop?

> +
> +    # for my ($comp, $other_comp) (@pairs_rev) is experimental
> +    for my $pair (@pairs) {
> +	my ($comp, $other_comp) = $pair->@*;
> +
> +	return if !defined($comp) || !defined($other_comp);
> +
> +	if ($comp ne $other_comp) {
> +	    return;
> +	}
> +    }
> +
> +    return 1;
> +}
> +
> +1;