[pve-devel] [PATCH v4 pve-common 01/12] introduce PVE::Path

Max Carrara m.carrara at proxmox.com
Fri Feb 7 15:03:44 CET 2025


The PVE::Path module concerns itself with file / directory path
operations, like getting the parent directory of a path, extracting
the file name of a path, splitting a path into its individual
components, joining path components together, comparing paths, and so
on.

This module is added here in order to address the shortcomings of the
Perl core modules (such as lacking a lot the aforementioned
functionalities) without relying on some kind of object-based
abstraction.

PVE::Path strictly only contains functions that manipulate file paths
without meddling with the filesystem. The reasoning here is to be able
to just import any function when its necessary without having to
change the surrounding code in order to adapt to some kind of
abstraction.

The other motivation for this module is that path manipulation
operations have been getting more and more common recently, especially
in regards to storage.

Signed-off-by: Max Carrara <m.carrara at proxmox.com>
---
Changes v3 --> v4:
  * Fix typo in docstring of `path_components` that caused building the
    docs to fail.
    - Thanks to Fiona for spotting that!

  * Emit a warning on call sites of `path_join` in case an absolute path
    is passed -- this should prevent any future accidents since passing
    an abs path here is almost always an accident, but still keeps
    the behaviour of `path_join` equivalent to Rust's `Path::join`
    / `PathBuf::push`.
    - Thanks to Fiona for the suggestion!

  * Emit a warning on call sites of `path_push`.
    - The same reasons of the changes to `path_join` apply here also.
    - Thanks to Fiona for the suggestion!

  * Simplify the logic around abs paths in `path_join` since that
    function isn't performance-critical anyway.
    - Thanks to Fiona for the suggestion!

  * Make eval-block in `path_normalize` more narrow and remove newline
    between the block and the first usage of `$@`.
    - Thanks to Fiona for the suggestion!

  * Make `path_normalize` treat paths "foo" and "./foo" differently.
    - I decided to deviate from Perl's `File::Spec->canonpath` here
      because frankly, its behaviour there doesn't make that much sense
      to me, after giving it some more thought. Functions like
      `path_components` already take the leading curr. dir ref into
      account and retain it (as do other libraries).
    - While it might be true that "foo" and "./foo" technically mean the
      same thing, sometimes you want to make it explicit that the path
      is relative to the current directory.
    - In other words, since this won't make an actual *logical*
      difference, it *can* make a semantic difference, so let's deviate
      from `canonpath` and make things simpler overall.
    - tbh, `canonpath` doesn't care either way, and the rest of Perl
      probably doesn't either.

  * Make behaviour of `path_parent` identical to Rust's `Path::parent`.
    - Previously, the parent of "./foo" and "foo" was both ".".
      Now the parent of "./foo" is still ".", but the parent of "foo" is
      now an empty string.
    - This was the next logical step after making `path_normalize`
      retain leading current dir references -- now it's actually
      possible to have the same behaviour as `Path::parent`.
    - This actually makes things a little more simple here and there.
    - Thanks to Fiona for the suggestion!

  * Remove handling of previous behaviour of `path_parent` where
    applicable.
    - This mostly just boils down to removing an extra case for relative
      paths that begin with a current dir reference, like e.g. "./foo".

  * Make `path_with_file_name` always append the new file name if the
    original path didn't have one.
    - This not only makes the behaviour similar to Rust's
      `PathBuf::set_file_name`, but also makes the function behave more
      consistently overall.

  * Change behaviour of `path_file_prefix` to be more in line with
    Rust's upcoming `Path::file_prefix` [prefix].
    - This pretty much only effects paths like "/foo/bar/..baz.quo",
      where previously "..baz" was considered the prefix, but now it's
      just ".".
    - In other words, everything before the second non-leading dot is
      now the prefix.
    - Docstrings across the other file op functions are updated in
      accordance with this change.

  * Check for non-leading dots in `path_with_file_prefix` instead of
    checking for dots only at the end of prefixes.

  * Mention how file names with a leading dot are treated in most file
    op functions like `path_file_prefix`, `path_file_with_prefix`, etc. and
    also provide an example for such cases.
    - This just makes it more obvious for consumers of this module how
      these functions work, leaving much less room for any ambiguities.
    - Not all docstrings have been adapted here, only those of immediate
      relevancy; some of the docstrings would otherwise get too large.
    - Thanks to Fiona for the suggestion!

  * Mention treatment of empty paths (literally just empty strings) in
    the docstrings of `path_starts_with`, `path_ends_with` and
    `path_equals`.
    - Thanks to Fiona for the suggestion!

  * Improve style here and there in minor parts of the code.

  * Update docstrings for all changes above where applicable / relevant.
    - Also fix some cases where docstrings were using a function than
      the one being documented in examples (copy-paste errors)

  * (Being extra detailed with the changes here because I want to make
    it a little easier to merge this, so I hope I haven't forgotten
    anything.)

[prefix]: https://doc.rust-lang.org/std/path/struct.Path.html#method.file_prefix

Changes v2 --> v3:
  * Don't return a reference to a list anymore when path_components,
    path_file_suffixes, path_file_parts are called in scalar context
  * Mention '/' being added at the start of the components being
    returned by path_components in its docstring
  * Mention special case of how absolute paths are handled and refer to
    path_push in path_join's docstring
  * Check whether path is absolute after checking whether it's empty
    instead of the other way around in path_push
  * Rework private helper functions and make them a little more
    efficient

Changes v1 --> v2:
  * Improve some wording in the docstring of path_components
  * Simplify some logic in path_parent and remove an unnecessary sanity
    check
  * Actually treat "foo" as "./foo" in path_parent as mentioned in the
    docstring -- This means that path_parent("foo") now returns "."
    instead of "".
  * Adapt the path_with_file_* functions to the above accordingly, so
    that e.g. path_with_file_name("foo", "bar") returns "bar" instead of
    "./bar".
  * Improve the "boolean" behaviour of path_is_absolute and
    path_is_absolute and return 1 when true, but use an empty return
    when false.
    - An empty return means "undef" in scalar context and an empty list
      in list context, so those functions will always return something
      that's correctly truthy or falsy for Perl, regardless of context


 src/Makefile    |    1 +
 src/PVE/Path.pm | 1027 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1028 insertions(+)
 create mode 100644 src/PVE/Path.pm

diff --git a/src/Makefile b/src/Makefile
index 2d8bdc4..25bc490 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ LIB_SOURCES = \
 	LDAP.pm \
 	Network.pm \
 	OTP.pm \
+	Path.pm \
 	PBSClient.pm \
 	PTY.pm \
 	ProcFSTools.pm \
diff --git a/src/PVE/Path.pm b/src/PVE/Path.pm
new file mode 100644
index 0000000..fa10375
--- /dev/null
+++ b/src/PVE/Path.pm
@@ -0,0 +1,1027 @@
+=head1 NAME
+
+C<PVE::Path> - Utilities related to handling file and directory paths
+
+=head1 DESCRIPTION
+
+This module provides functions concerned with file and directory path
+manipulation.
+
+None of the functions provided alter the filesystem in any way.
+
+The reason for this module's existence is to address a couple shortcomings:
+
+=over
+
+=item 1. The Perl core modules lack most of what is required for manipulating
+paths, for example getting the parent directory of a path, extracting the
+prefix of a file name (the "stem"), extracting the suffixes of a file name (the
+"endings" or "extensions"), checking whether two paths are the same, and so on.
+
+=item 2. If the Perl core modules provide something in that regard, it's usually
+provided in a not very ergonomic manner (L<C<File::Basename>>).
+
+=item 3. Additionally, the path utilities of the core modules are scattered
+across multiple modules, making them hard to discover.
+
+=item 4. Third-party libraries on CPAN mostly provide objects representing
+paths. Using any of these would require fundamental changes on how file paths
+are handled throughout our code, for almost no benefit.
+
+=back
+
+C<L<PVE::Path>> instead does without objects and strictly provides functions
+for path manipulation only. Any operation that is needed can simply be
+performed ad hoc by importing the corresponding function and doesn't require
+the surrounding code to conform to an abstraction like a path object.
+
+Additionally, some of the core modules' functionality is re-exported or
+re-implemented for ergonomic or logical purposes. The goal is to provide
+functions that don't come with any surprises and just behave like one assumes
+they would.
+
+This module takes inspiration from Rust's C<std::path> and Python's C<pathlib>,
+which are more modern path manipulation libraries.
+
+=head1 LIMITATIONS
+
+This module is limited to manipulating Unix-like / Linux file paths.
+
+=cut
+
+package PVE::Path;
+
+use strict;
+use warnings;
+
+use Carp qw(carp croak confess);
+use File::Spec ();
+use List::Util qw(any zip_shortest zip_longest);
+
+use Exporter qw(import);
+
+our @EXPORT_OK = qw(
+    path_is_absolute
+    path_is_relative
+
+    path_components
+    path_join
+
+    path_normalize
+
+    path_parent
+    path_push
+    path_pop
+
+    path_file_name
+    path_with_file_name
+
+    path_file_prefix
+    path_with_file_prefix
+
+    path_file_suffixes
+    path_with_file_suffixes
+
+    path_file_suffix
+    path_with_file_suffix
+
+    path_file_parts
+
+    path_starts_with
+    path_ends_with
+    path_equals
+);
+
+=head2 FUNCTIONS
+
+=cut
+
+=head3 path_is_absolute($path)
+
+Returns C<1> if C<$path> is absolute (starts with a C</>).
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_is_absolute : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path =~ m#^/#) {
+	return 1;
+    }
+
+    return;
+}
+
+=head3 path_is_relative($path)
+
+Returns C<1> if C<$path> is relative (doesn't start with a C</>).
+
+The opposite of C<L<< path_is_absolute()|/"path_is_absolute($path)" >>>.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_is_relative : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path !~ m#^/#) {
+	return 1;
+    }
+
+    return;
+}
+
+=head3 path_components($path)
+
+Returns a list of the given C<$path>'s individual components.
+
+The C<$path> is normalized a little during the parse:
+
+=over
+
+=item Repeated occurrences of C</> are removed, so C<foo/bar> and C<foo//bar>
+both have C<foo> and C<bar> as components.
+
+=item Trailing slashes C</> are removed.
+
+=item Occurrences of C<.> are normalized away, except the first C<.> at
+beginning of a path. This means that C<foo/bar>, C<foo/./bar>, C<foo/bar/.>,
+C<foo/././bar/./.>, etc. all have C<foo> and C<bar> as components, while
+C<./foo/bar>, C<./././foo/bar>, C<./foo/./bar/.> have C<.>, C<foo> and C<bar>
+as components.
+
+=item Absolute paths will retain a C</> at the beginning. This means that
+C</foo/bar> has C</>, C<foo> and C<bar> as components.
+
+=back
+
+No other normalization is performed to account for the possibility of symlinks
+existing. This means that C<foo/baz> and C<foo/bar/../baz> are distinct (because
+C<bar> could be a symlink and thus C<foo> isn't its parent).
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_components : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $is_abs = path_is_absolute($path);
+    my $has_cur_dir = $path =~ m#^\.$|^\./#;
+
+    my @components = split('/', $path);
+    my @normalized_components = ();
+
+    for my $component (@components) {
+	next if $component eq '' || $component eq '.';
+
+	push(@normalized_components, $component);
+    }
+
+    unshift(@normalized_components, '/') if $is_abs;
+    unshift(@normalized_components, '.') if $has_cur_dir;
+
+    return @normalized_components;
+}
+
+
+=head3 path_join(@paths)
+
+Joins multiple paths together. All kinds of paths are supported.
+
+Does not perform any C<L<< normalization|/"path_normalize($path)" >>>.
+
+    my $joined = path_join("foo", "bar/baz", "qux.txt");
+    # foo/bar/baz/qux.txt
+
+    my $joined = path_join("/", "etc/pve/", "storage.cfg");
+    # /etc/pve/storage.cfg
+
+Similar to C<L<< path_push()|/"path_push($path, $other)">>>, should any of the
+C<@paths> be an absolute path, it I<replaces> all preceding paths while emitting
+a warning.
+
+    my $joined = path_join("foo/bar", "/etc", "resolv.conf");
+    # /etc/resolv.conf
+
+    my $joined = path_join("foo", "/etc/resolv.conf", "/etc/hosts");
+    # /etc/hosts
+
+The reason for this behaviour is to stay consistent with Rust's
+C<L<< PathBuf::push()|https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push >>>.
+
+Throws an exception if any of the passed C<@paths> is C<undef>.
+
+=cut
+
+sub path_join : prototype(@) {
+    my (@paths) = @_;
+
+    if (!scalar(@paths)) {
+	return '';
+    }
+
+    croak "one of the provided paths is undef" if any { !defined($_) } @paths;
+
+    my $resulting_path = shift @paths;
+
+    for my $path (@paths) {
+	if ($path =~ m#^/#) {
+	    carp "passed absolute path to path_join";
+	    $resulting_path = $path;
+	} else {
+	    $resulting_path = path_push($resulting_path, $path);
+	}
+    }
+
+    return $resulting_path;
+}
+
+=head3 path_normalize($path)
+
+Performs a logical cleanup of the given C<$path>.
+
+This removes unnecessary components of a path that can be safely removed, such
+as references to the current directory C<.>, trailing or repeated occurrences
+of path separators C</>.
+
+For example, C<foo/./bar/baz/.> and C<foo////bar//baz//> will both become
+C<foo/bar/baz>.
+
+B<Difference to C<L<File::Spec/canonpath>>:> If the C<$path> starts by
+referencing the current directory C<.>, this reference is preserved, unless
+either C<.> or C<..> is the only component left after normalizing. This means
+that C<././foo///bar> will become C<./foo/bar>, and C<././.> will become C<.>.
+
+B<Note:> This will I<not> remove components referencing the parent directory,
+i.e. C<..>. For example, C<foo/bar/../baz> and C<foo/bar/baz/..> will therefore
+remain as they are. However, the parent directory of C</> is C</>, so
+C</../../foo> will be normalized to C</foo>.
+
+Throws an exception if C<$path> is C<undef> or the wrapped call to C<canonpath>
+failed.
+
+=cut
+
+sub path_normalize : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $cleaned_path = eval { File::Spec->canonpath($path); };
+    croak "failed to clean up path: $@" if $@;
+
+    if ($cleaned_path =~ m#^[^\./]# && $path =~ m#^\./#) {
+	$cleaned_path = "./" . $cleaned_path;
+    }
+
+    return $cleaned_path;
+}
+
+=head3 path_parent($path)
+
+Returns the given C<$path> without its final component, if there is one.
+
+Trailing and repeated occurrences of C</> and C<.> are normalized on the fly
+when needed. This means that e.g. C<foo////bar///.//> becomes C<foo>, but
+C<foo/.//bar//./baz> becomes C<foo/.//bar>.
+
+This function's behaviour is identical to Rust's
+L<< Path::parent|https://doc.rust-lang.org/std/path/struct.Path.html#method.parent >>.
+
+=over
+
+=item * C</foo/bar> becomes C</foo>, C<foo/bar> becomes C<foo>.
+
+=item * C</foo> becomes C</>.
+
+=item * C<foo/bar/..> becomes C<foo/bar>. Note that C<foo/bar> is not
+necessarily the real parent in the filesystem in the case of e.g. symlinks.
+
+=item * C<foo/../bar> becomes C<foo/..>.
+
+=item * C</> and an I<empty string> result in C<undef> being returned.
+
+=item * Paths consisting of a single component, like C<foo>, C<..> or C<.> result
+in an empty string.
+
+=back
+
+Will raise an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_parent : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if ($path eq '') {
+	return;
+    }
+
+    # A limit of -1 retains empty components at the end
+    my @components = split('/', $path, -1);
+
+    # Trim off needless extra components until actual final component is encountered, e.g.
+    # foo////bar////baz//// -> foo////bar////baz
+    # /// -> /
+    # ././//.///./ -> .
+    while (scalar(@components) > 1 && ($components[-1] eq '' || $components[-1] eq '.')) {
+	pop(@components);
+    }
+
+    my $final_component = pop(@components);
+
+    # We had a root dir with needless extra components, e.g. "//" or "////" or "//.///./" etc.
+    return if $final_component eq '';
+
+    # We had a current dir reference with needless extra components, e.g.
+    # "././" or ".///////" or "./././//./././//" etc.
+    return '' if $final_component eq '.';
+
+    # We had some other kind of single component like "foo", "bar" or "..",
+    # so return an empty string
+    return '' if !scalar(@components);
+
+    # Trim off needless extra components until actual parent component is encountered, like above
+    while (scalar(@components) > 1 && ($components[-1] eq '' || $components[-1] eq '.')) {
+	pop(@components);
+    }
+
+    # Handle lone root dir (@components with only one empty string)
+    if (scalar(@components) == 1 && $components[0] eq '') {
+	return '/';
+    }
+
+    return join('/', @components);
+}
+
+=head3 path_push($path, $other)
+
+Extends C<$path> with C<$other>, returning a new path.
+
+If C<$other> is absolute, it will be returned instead while also emitting a
+warning. The reason for this behaviour is to stay consistent with Rust's
+C<L<< PathBuf::push()|https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.push >>>.
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_push : prototype($$) {
+    my ($path, $other) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other is undef" if !defined($other);
+
+    return $path if $other eq '';
+
+    if (path_is_absolute($other)) {
+	carp "passed absolute path to path_push";
+	return $other;
+    }
+
+    my $need_sep = $path ne '' && $path !~ m#/$#;
+
+    $path .= "/" if $need_sep;
+    $path .= $other;
+
+    return $path;
+}
+
+=head3 path_pop($path)
+
+Alias for C<L<< path_parent()|/"path_parent($path)" >>>.
+
+=cut
+
+sub path_pop : prototype($) {
+    my ($path) = @_;
+    return path_parent($path);
+}
+
+=head3 path_file_name($path)
+
+Returns the last component of the given C<$path>, if it is a legal file name,
+or C<undef> otherwise.
+
+If C<$path> is an empty string, C</>, C<.> or ends with a C<..> component,
+there is no valid file name.
+
+B<Note:> This does not check whether the given C<$path> actually points to a
+file or a directory etc.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_name : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my @components = path_components($path);
+
+    if (!scalar(@components)) {
+	return;
+    }
+
+    if (
+	scalar(@components) == 1
+	&& ($components[0] eq '/' || $components[0] eq '.')
+    ) {
+	return;
+    }
+
+    if ($components[-1] eq '..') {
+	return;
+    }
+
+    return $components[-1];
+}
+
+=head3 path_with_file_name($path, $file_name)
+
+Returns C<$path> with C<$file_name> as the new last component.
+
+If C<L<< path_file_name()|/"path_file_name($path)" >>> returns C<undef>, this is
+equivalent to calling C<L<< path_push($path, $file_name)|/"path_push($path, $other)" >>>.
+
+Otherwise, this is equivalent to calling C<L<< path_parent()|/"path_parent($path)" >>>
+and using C<L<< path_push()|/"path_push($path, $other)" >>> to append the new
+file name. In other words, the new path will have the same parent as the old one.
+
+Throws an exception if any of the arguments is C<undef> or if C<$file_name>
+contains a path separator (C</>).
+
+=cut
+
+sub path_with_file_name : prototype($$) {
+    my ($path, $file_name) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$file_name is undef" if !defined($file_name);
+    croak "\$file_name contains a path separator: $file_name" if $file_name =~ m|/|;
+
+    my $old_file_name = path_file_name($path);
+
+    if (!defined($old_file_name)) {
+	return path_push($path, $file_name);
+    }
+
+    my $parent = path_parent($path);
+
+    # undef means that $path was either "" or "/", so we can just append to it
+    # Should never be hitting this case, but remaining defensive here nevertheless
+    return ($path . $file_name) if !defined($parent);
+
+    return path_push($parent, $file_name);
+}
+
+# Note: This assumes that $file_name is in fact a valid file name, as returned
+# by path_file_name
+my sub _path_file_prefix_suffix_str : prototype($) {
+    my ($file_name) = @_;
+
+    confess "failed to match for \$prefix and \$suffir_str"
+	if $file_name !~ m|^(\.?[^\.]*)(.*)|;
+
+    my ($prefix, $suffix_str) = ($1, $2);
+
+    return ($prefix, $suffix_str);
+}
+
+# Note: This assumes that $suffix_str isn't undef
+my sub _path_file_suffixes_from_str : prototype($) {
+    my ($suffix_str) = @_;
+
+    my @suffixes = split(/\./, $suffix_str, -1);
+
+    # Let's say you have a file named "foo.bar.". $suffix_str would be ".bar.";
+    # so with the call to split() above, you get the following:
+    #     split(/\./, ".bar.", -1) --> ("", "bar", "") --> join()ed to "foo..bar."
+    # Hence, shift() the first element away to get only the actual suffixes,
+    # allowing prefix and suffixes to be join()ed to restore the original file name.
+    shift @suffixes;
+
+    return @suffixes;
+}
+
+=head3 path_file_prefix($path)
+
+Returns the prefix of the file name of the given C<$path>. If the C<$path> does
+not have a valid file name and thus no prefix, C<undef> is returned instead.
+
+The prefix of a file name is the part before any extensions (suffixes).
+
+    my $prefix = path_file_prefix("/etc/resolv.conf");
+    # resolv
+
+    my $prefix = path_file_prefix("/tmp/archive.tar.zst");
+    # archive
+
+    my $prefix = path_file_prefix("/home/alice/.zshrc.bak");
+    # .zshrc
+
+In detail, this means that the prefix is:
+
+=over
+
+=item * C<undef>, if there is no file name
+
+=item * The entire file name if there is no embedded C<.>
+
+=item * The part of the file name before the first non-beginning C<.>
+
+=item * The entire file name if the file begins with C<.> and has no other C<.>s within
+
+=item * The part of the file name before the second C<.> if the file begins with C<.>
+
+=back
+
+This is equivalent to Rust's C<L<< Path::file_prefix()|https://doc.rust-lang.org/std/path/struct.Path.html#method.file_prefix >>>.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_prefix : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my ($prefix, $_suffix_str) = _path_file_prefix_suffix_str($file_name);
+    return $prefix;
+}
+
+=head3 path_with_file_prefix($path, $prefix)
+
+Returns C<$path> with a new C<$prefix>. This is similar to
+C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
+except that the file prefix is replaced.
+
+If C<$path> does not have a file name, C<undef> is returned instead.
+
+    my $new_path = path_with_file_prefix("/tmp/archive.tar.zst", "backup");
+    # /tmp/backup.tar.zst
+
+    my $new_path = path_with_file_prefix("/etc/pve", "ceph");
+    # /etc/ceph
+
+    my $new_path = path_with_file_prefix("/home/alice/.zshrc.bak", ".bashrc");
+    # /home/alice/.bashrc.bak
+
+Throws an exception if any of the arguments is C<undef>, or if C<$prefix>
+contains a path separator (C</>) or a non-leading C<.>.
+
+=cut
+
+sub path_with_file_prefix : prototype($$) {
+    my ($path, $prefix) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$prefix is undef" if !defined($prefix);
+    croak "\$prefix contains a path separator" if $prefix =~ m|/|;
+    croak "\$prefix contains non-leading dot" if $prefix !~ m|^\.?[^\.]*$|;
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my ($_old_prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    my $new_file_name = $prefix . $suffix_str;
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_suffixes($path)
+
+Returns the suffixes of the C<$path>'s file name as a list. If the C<$path> does
+not have a valid file name, an empty list is returned instead.
+
+The suffixes of a path are essentially the file name's extensions, the parts
+that come after the L<< prefix|/"path_file_prefix($path)" >>.
+
+    my @suffixes = path_file_suffixes("/etc/resolv.conf");
+    # ("conf")
+
+    my @suffixes = path_file_suffixes("/tmp/archive.tar.zst");
+    # ("tar", "zst")
+
+    my @suffixes = path_file_suffixes("/home/alice/.zshrc");
+    # ()
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_suffixes : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    if (!defined($file_name)) {
+	return ();
+    }
+
+    my ($_prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    return _path_file_suffixes_from_str($suffix_str);
+}
+
+=head3 path_with_file_suffixes($path, @suffixes)
+
+Returns C<$path> with new C<@suffixes>. This is similar to
+C<L<< path_with_file_name()|/"path_with_file_name($path, $file_name)" >>>,
+except that the suffixes of the file name are replaced.
+
+If the C<$path> does not have a file name, C<undef> is returned.
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "pxar", "gz");
+    # /tmp/archive.pxar.gz
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "gz");
+    # /tmp/archive.gz
+
+If the file name has no suffixes, the C<@suffixes> are appended instead:
+
+    my $new_path = path_with_file_suffixes("/etc/resolv", "conf");
+    # /etc/resolv.conf
+
+    my $new_path = path_with_file_suffixes("/etc/resolv", "conf", "zst");
+    # /etc/resolv.conf.zst
+
+If there are no C<@suffixes> provided, the file name's suffixes will
+be removed (if there are any):
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst");
+    # /tmp/archive
+
+Note that an empty string is still a valid suffix (an "empty" file ending):
+
+    my $new_path = path_with_file_suffixes("/tmp/archive.tar.zst", "", "", "", "zst");
+    # /tmp/archive....zst
+
+Throws an exception if C<$path> or any of the C<@suffixes> is C<undef>, or
+if any suffix contains a path separator (C</>) or a C<.>.
+
+=cut
+
+sub path_with_file_suffixes : prototype($@) {
+    my ($path, @suffixes) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "one of the provided suffixes is undef"
+	if any { !defined($_) } @suffixes;
+    croak "one of the provided suffixes contains a path separator"
+	if any { $_ =~ m|/| } @suffixes;
+    croak "one of the provided suffixes contains a dot"
+	if any { $_ =~ m|\.| } @suffixes;
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+
+    # Don't modify $path if there are no suffixes to be removed
+    return $path if !scalar(@suffixes) && $suffix_str eq '';
+
+    # sanity check
+    confess "\$prefix is undef" if !defined($prefix);
+
+    my $new_file_name = join(".", $prefix, @suffixes);
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_suffix($path)
+
+Returns the suffix of the C<$path>'s file name. If the C<$path> does not have a
+valid file name or if the file name has no suffix, C<undef> is returned
+instead.
+
+The suffix of a file name is essentially its extension, e.g.
+C</etc/resolv.conf> has the suffix C<conf>. If there are multiple suffixes,
+only the last will be returned; e.g. C</tmp/archive.tar.gz> has the suffix C<gz>.
+
+B<Note:> Files like e.g. C</tmp/foo.> have an empty string C<""> as suffix.
+
+    my $suffix = path_file_suffix("/etc/resolv.conf");
+    # "conf"
+
+    my $suffix = path_file_suffix("/tmp/archive.tar.zst");
+    # "zst"
+
+    my $suffix = path_file_suffix("/home/alice/.zshrc");
+    # undef
+
+For getting all suffixes of a path, see C<L<< path_file_suffixes()|/"path_file_suffixes($path)" >>>.
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_suffix : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my ($_prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    return pop(@suffixes);
+}
+
+=head3 path_with_file_suffix($path, $suffix)
+
+Returns C<$path> with a new C<$suffix>. This is similar to
+C<L<< path_with_file_suffixes()|/"path_with_file_suffixes($path, @suffixes)" >>>,
+except that only the last suffix of the file name is replaced.
+
+If the C<$path> does not have a file name, C<undef> is returned.
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "gz");
+    # /tmp/archive.tar.gz
+
+If the file name has no suffixes, the C<$suffix> is appended instead:
+
+    my $new_path = path_with_file_suffix("/etc/resolv", "conf");
+    # /etc/resolv.conf
+
+If C<$suffix> is C<undef>, the file name's (last) suffix will be removed (if
+there is one):
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", undef);
+    # /tmp/archive.tar
+
+    my $new_path = path_with_file_suffix("/etc/resolv", undef);
+    # /etc/resolv
+
+Note that an empty string is still a valid suffix (an "empty" file ending):
+
+    my $new_path = path_with_file_suffix("/tmp/archive.tar.zst", "");
+    # /tmp/archive.tar.
+
+    my $new_path = path_with_file_suffix("/etc/resolv", "");
+    # /etc/resolv.
+
+Throws an exception if C<$path> is C<undef>, or if C<$suffix> contains a path
+separator (C</>) or a C<.>.
+
+=cut
+
+sub path_with_file_suffix : prototype($$) {
+    my ($path, $suffix) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    if (defined($suffix)) {
+	croak "\$suffix contains a path separator" if $suffix =~ m|/|;
+	croak "\$suffix contains a dot" if $suffix =~ m|\.|;
+    }
+
+    my $file_name = path_file_name($path);
+    return undef if !defined($file_name);
+
+    my $parent = path_parent($path);
+
+    # sanity check -- should not happen because we checked for file name,
+    # and the existence of a file name implies there's a parent
+    confess "parent of \$path is undef" if !defined($parent);
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    # Don't modify $path if there is no suffix to be removed
+    return $path if !scalar(@suffixes) && !defined($suffix);
+
+    pop(@suffixes);
+    push(@suffixes, $suffix) if defined($suffix);
+
+    # sanity check
+    confess "\$prefix is undef" if !defined($prefix);
+
+    my $new_file_name = join(".", $prefix, @suffixes);
+
+    # Because the parent of "foo" is ".", return $new_file_name to stay consistent.
+    # Otherwise, we'd end up with a current path ref prepended ("./$new_file_name")
+    # (Done also in path_with_new_file_name)
+    if ($parent eq '.' && $path !~ m|/|) {
+	return $new_file_name;
+    }
+
+    return path_push($parent, $new_file_name);
+}
+
+=head3 path_file_parts($path)
+
+Returns the parts that constitute the file name (prefix and suffixes) of a
+C<$path> as a list. If the C<$path> does not have a valid file name, an empty
+list is returned instead.
+
+These parts are split in such a way that allows them to be C<join>ed together,
+resulting in the original file name of the given C<$path> again.
+
+    my @file_parts = path_file_parts("/etc/pve/firewall/cluster.fw");
+    # ("cluster", "fw")
+
+    # Parts can be joined to acquire the original file name again
+    my $file_name = join(".", @file_parts);
+
+    my @file_parts = path_file_parts("/tmp/archive.tar.gz");
+    # ("archive", "tar", "gz")
+
+    my @file_parts = path_file_parts("/home/alice/.zshrc.bak");
+    # (".zshrc", "bak")
+
+Throws an exception if C<$path> is C<undef>.
+
+=cut
+
+sub path_file_parts : prototype($) {
+    my ($path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+
+    my $file_name = path_file_name($path);
+    if (!defined($file_name)) {
+	return ();
+    }
+
+    my ($prefix, $suffix_str) = _path_file_prefix_suffix_str($file_name);
+    my @suffixes = _path_file_suffixes_from_str($suffix_str);
+
+    return ($prefix, @suffixes);
+}
+
+=head3 path_starts_with($path, $other_path)
+
+Checks whether a C<$path> starts with the components of C<$other_path>.
+
+    my $starts_with = path_starts_with("/etc/pve/firewall/cluster.fw", "/etc/pve");
+    # 1
+
+Since the paths are compared by their components, it's not necessary to
+L<< normalize|"path_normalize($path)" >> them beforehand.
+
+Additionally, if both paths are empty paths (C<"">), C<$path> is considered to
+start with C<$other_path> and vice versa. If instead only one path is empty,
+neither starts with the other.
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_starts_with : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing starts with nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing cannot start with something
+    # Something cannot start with nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components = path_components($path);
+    my @other_components = path_components($other_path);
+
+    my @pairs = zip_shortest(\@components, \@other_components);
+
+    # for my ($comp, $other_comp) (@pairs) is experimental
+    for my $pair (@pairs) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+=head3 path_ends_with($path, $other_path)
+
+Checks whether a C<$path> ends with the components of C<$other_path>.
+
+    my $ends_with = path_ends_with("/etc/pve/firewall/cluster.fw", "firewall/cluster.fw");
+    # 1
+
+Since the paths are compared by their components, it's not necessary to
+L<< normalize|"path_normalize($path)" >> them beforehand.
+
+Additionally, if both paths are empty paths (C<"">), C<$path> is considered to
+end with C<$other_path> and vice versa. If instead only one path is empty,
+neither ends with the other.
+
+Throws an exception if any of the arguments is C<undef>.
+
+=cut
+
+sub path_ends_with : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing ends with nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing cannot end with something
+    # Something cannot end with nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components_rev = reverse(path_components($path));
+    my @other_components_rev = reverse(path_components($other_path));
+
+    my @pairs_rev = zip_shortest(\@components_rev, \@other_components_rev);
+
+    # for my ($comp, $other_comp) (@pairs_rev) is experimental
+    for my $pair (@pairs_rev) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+=head3 path_equals($path, $other_path)
+
+Checks whether C<$path> equals C<$other_path>. The paths are compared
+by their components, meaning that it's not necessary to
+L<< normalize|/"path_normalize($path)" >> them beforehand.
+
+If both of the paths are empty (C<"">), they're considered equal.
+If only one of the two paths is empty, they're not considered equal.
+
+=cut
+
+sub path_equals : prototype($$) {
+    my ($path, $other_path) = @_;
+
+    croak "\$path is undef" if !defined($path);
+    croak "\$other_path if undef" if !defined($other_path);
+
+    # Nothing is nothing
+    return 1 if ($path eq '' && $other_path eq '');
+
+    # Nothing is not something
+    # Something is not nothing
+    return if ($path eq '' || $other_path eq '');
+
+    my @components = path_components($path);
+    my @other_components = path_components($other_path);
+
+    return if scalar(@components) != scalar(@other_components);
+
+    my @pairs = zip_longest(\@components, \@other_components);
+
+    # for my ($comp, $other_comp) (@pairs_rev) is experimental
+    for my $pair (@pairs) {
+	my ($comp, $other_comp) = $pair->@*;
+
+	return if !defined($comp) || !defined($other_comp);
+
+	if ($comp ne $other_comp) {
+	    return;
+	}
+    }
+
+    return 1;
+}
+
+1;
--
2.39.5





More information about the pve-devel mailing list