[pbs-devel] [PATCH proxmox-backup] fix #4823: datastore: ignore vanished files when walking directory

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Sep 8 11:36:17 CEST 2023

On 08/09/2023 09:41, Gabriel Goller wrote:
>> that's some deep indentation level.. not a must, but maybe you find
>> some good/simple way to refactor some of this to make it a bit less
>> crowded here (if, then in a separate patch please)
> I could check if `err` is an `io:Error`, thus returning early. Then calling `.unwrap()`
> to get the actual `io::Error` later on.

I re-checked the whole code and pushed a clean-up that reduces the
indentation level and also code lines. Can you please re-check on that
if to ensure I did not missed something by mistake?

Oh, and while checking this out I noticed something odd, i.e., that we
silence any error that isn't a io::Error one,  as of Walkdir::Error
rustdocs [0] that can only happen if there's a (symlink) loop, but
still. So I checked the history and it seems that I caused this way
back in 2020 by the original addition of the "ignore lost+found" EPERM
commit [1].
But just making that an error again now might be a breaking change,
depending on what effect symlink loops currently actually have on GC.
Can you test this please? If it does makes GC endlessly loop then we
can just introduce the error again, as it's now broken already anyway.
But if walkdir breaks the loop, or the like, then we might just make
that an log with warning level. As we do not enable WalkDir's opt-in
follow_symlinks it might not matter after all as the Loop error
shouldn't be triggered if symlinks aren't followed, but still good
to test and be certain.

[0]: https://rust.velas.com/walkdir/struct.Error.html#method.io_error
[1]: c3b090ac ("backup: list images: handle walkdir error, catch "lost+found"")

>>> +                            // only allow to skip ext4 fsck directory, avoid GC if, for example,
>>> +                            // a user got file permissions wrong on datastore rsync to new server
>>> +                            if err.depth() > 1 || !path.ends_with("lost+found") {
>>> +                                bail!("cannot continue garbage-collection safely, permission denied on: {:?}", path)
>>> +                            }
>>> +                        }
>>> +                        io::ErrorKind::NotFound => {
>>> +                            // ignore vanished file
>> would be still good to log that here, at least at debug level
>> if it can be noisy; but as there wasn't many that run into this
>> in the four years of PBS existing I'd guess a always visible
>> level is fine as long as the log message doesn't sounds scary.
> How about a "ignoring/skipping vanished file: {path}" on the info log level?

Yeah, something like that, but I'd use just one of "ignoring" or

Oh, and I would also mention in the commit message that backups that
are just being created (where a atomic rename from e.g. a tmpdir to
the actual one might also cause a ENOENT) are not problematic here,
as GC is handling those explicitly anyway.

More information about the pbs-devel mailing list