[pbs-devel] [PATCH proxmox-backup 4/4] fix #5853: client: pxar: exclude stale files on metadata read

Christian Ebner c.ebner at proxmox.com
Wed Nov 13 15:04:47 CET 2024


On 11/13/24 14:55, Fabian Grünbichler wrote:
> On November 13, 2024 2:45 pm, Christian Ebner wrote:
>> On 11/11/24 14:37, Fabian Grünbichler wrote:
>>> behaviour wise this seems okay to me, but if possible, I'd avoid all the
>>> return value tuples, see detailed comments below..
>>
>> Agreed, I am not a fan of passing the stale file handle error info along
>> a the boolean in the return value as well.
>>
>> But unfortunately passing along the error without loosing pre-existing
>> error context and switching all the `get_metadata` related functions to
>> return an `Errno` is not possible as is.
>>
>> E.g. `process_acl` returns an `anyhow::Error` (could be defined to
>> return an e.g. `Errno::EINVALID` instead?), special handling of
>> `Errno::E2BIG` for the xattr case only, ...
>>
>> The current approach was choosen to keep the current anyhow error
>> context close to the actual errors when they occur.
> 
> I think that should be solvable with some refactoring/defining of a
> proper error type..
> 
> but you can also attempt to downcast the anyhow error to get the ESTALE?

Ah true, thanks for the pointer!

I will send a new version of the patches incorporating your feedback.

> 
>>    > On November 5, 2024 3:01 pm, Christian Ebner wrote:
>>>> Skip and warn the user for files which returned a stale file handle
>>>> error while reading the metadata associated to that file.
>>>>
>>>> Instead of returning with an error when getting the metadata, return
>>>> a boolean flag signaling if a stale file handle has been encountered.
>>>>
>>>> Link to issue in bugtracker:
>>>> https://bugzilla.proxmox.com/show_bug.cgi?id=5853
>>>>
>>>> Link to thread in community forum:
>>>> https://forum.proxmox.com/threads/156822/
>>>>
>>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>>> ---
>>>>    pbs-client/src/pxar/create.rs | 100 ++++++++++++++++++++++------------
>>>>    1 file changed, 66 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>>>> index 2a844922c..85be00db4 100644
>>>> --- a/pbs-client/src/pxar/create.rs
>>>> +++ b/pbs-client/src/pxar/create.rs
>>>> @@ -228,7 +228,7 @@ where
>>>>        let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>>    
>>>>        let stat = nix::sys::stat::fstat(source_dir.as_raw_fd())?;
>>>> -    let metadata = get_metadata(
>>>> +    let (metadata, stale_fd) = get_metadata(
>>>
>>> stale_fd here is not used at all..
>>
>> Yes, that one should lead to a hard error as you mentioned, so should be
>> handled accordingly. I will adapt this to be a hard error.
>>
>>>
>>>>            source_dir.as_raw_fd(),
>>>>            &stat,
>>>>            feature_flags & fs_feature_flags,
>>>> @@ -744,7 +744,7 @@ impl Archiver {
>>>>                return Ok(());
>>>>            }
>>>>    
>>>> -        let metadata = get_metadata(
>>>> +        let (metadata, stale_fd) = get_metadata(
>>>
>>> this one is used
>>>
>>>>                fd.as_raw_fd(),
>>>>                stat,
>>>>                self.flags(),
>>>> @@ -753,6 +753,11 @@ impl Archiver {
>>>>                self.skip_e2big_xattr,
>>>>            )?;
>>>>    
>>>> +        if stale_fd {
>>>> +            log::warn!("Stale filehandle encountered, skip {:?}", self.path);
>>>> +            return Ok(());
>>>> +        }
>>>
>>> for this warning.. but get_metadata already logs (potentially multiple
>>> times ;)) that things are incomplete cause of the stale filehandle, this
>>> only adds the path context..
>>
>> but here there is also an early return, not just the log... this skips
>> over adding this entry, and any sub entries if the entry is a directory.
>>
>> The logging could however be moved to the get_metadata call and only be
>> logged once, agreed.
>>
>>>
>>>> +
>>>>            if self.previous_payload_index.is_none() {
>>>>                return self
>>>>                    .add_entry_to_archive(encoder, &mut None, c_file_name, stat, fd, &metadata, None)
>>>> @@ -1301,7 +1306,14 @@ impl Archiver {
>>>>            file_name: &Path,
>>>>            metadata: &Metadata,
>>>>        ) -> Result<(), Error> {
>>>> -        let dest = nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..])?;
>>>> +        let dest = match nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..]) {
>>>> +            Ok(dest) => dest,
>>>> +            Err(Errno::ESTALE) => {
>>>> +                log::warn!("Stale file handle encountered, skip {file_name:?}");
>>>> +                return Ok(());
>>>> +            }
>>>> +            Err(err) => return Err(err.into()),
>>>> +        };
>>>>            encoder.add_symlink(metadata, file_name, dest).await?;
>>>>            Ok(())
>>>>        }
>>>> @@ -1397,9 +1409,10 @@ fn get_metadata(
>>>>        fs_magic: i64,
>>>>        fs_feature_flags: &mut Flags,
>>>>        skip_e2big_xattr: bool,
>>>> -) -> Result<Metadata, Error> {
>>>> +) -> Result<(Metadata, bool), Error> {
>>>>        // required for some of these
>>>>        let proc_path = Path::new("/proc/self/fd/").join(fd.to_string());
>>>> +    let mut stale_fd = false;
>>>>    
>>>>        let mut meta = Metadata {
>>>>            stat: pxar::Stat {
>>>> @@ -1412,18 +1425,27 @@ fn get_metadata(
>>>>            ..Default::default()
>>>>        };
>>>>    
>>>> -    get_xattr_fcaps_acl(
>>>> +    if get_xattr_fcaps_acl(
>>>
>>> only call site, could just bubble up ESTALE
>>
>> As mentioned, this has 2 issues: Loss of anyhow error context for which
>> sub-function the Errno occurred and sub-functions like `process_acl`
>> which do not rely on ffi calls at all, returning plain `anyhow::Error`,
>> which granted could be redefined to return an Errno.
>>
>>>
>>>>            &mut meta,
>>>>            fd,
>>>>            &proc_path,
>>>>            flags,
>>>>            fs_feature_flags,
>>>>            skip_e2big_xattr,
>>>> -    )?;
>>>> -    get_chattr(&mut meta, fd)?;
>>>> +    )? {
>>>> +        stale_fd = true;
>>>> +        log::warn!("Stale filehandle, xattrs incomplete");
>>>> +    }
>>>> +    if get_chattr(&mut meta, fd)? {
>>>
>>> same
>>>
>>>> +        stale_fd = true;
>>>> +        log::warn!("Stale filehandle, chattr incomplete");
>>>> +    }
>>>>        get_fat_attr(&mut meta, fd, fs_magic)?;
>>>> -    get_quota_project_id(&mut meta, fd, flags, fs_magic)?;
>>>> -    Ok(meta)
>>>> +    if get_quota_project_id(&mut meta, fd, flags, fs_magic)? {
>>>
>>> same
>>>
>>>> +        stale_fd = true;
>>>> +        log::warn!("Stale filehandle, quota  project id incomplete");
>>>> +    }
>>>
>>> see above and way down below, IMHO all of these could just bubble up the error..
>>>
>>>> +    Ok((meta, stale_fd))
>>>>    }
>>>>    
>>>>    fn get_fcaps(
>>>> @@ -1431,22 +1453,23 @@ fn get_fcaps(
>>>>        fd: RawFd,
>>>>        flags: Flags,
>>>>        fs_feature_flags: &mut Flags,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>
>>> this is only called by get_xattr_fcaps_acl, so could just bubble up
>>> ESTALE as well..
>>>
>>>>        if !flags.contains(Flags::WITH_FCAPS) {
>>>> -        return Ok(());
>>>> +        return Ok(false);
>>>>        }
>>>>    
>>>>        match xattr::fgetxattr(fd, xattr::XATTR_NAME_FCAPS) {
>>>>            Ok(data) => {
>>>>                meta.fcaps = Some(pxar::format::FCaps { data });
>>>> -            Ok(())
>>>> +            Ok(false)
>>>>            }
>>>> -        Err(Errno::ENODATA) => Ok(()),
>>>> +        Err(Errno::ENODATA) => Ok(false),
>>>>            Err(Errno::EOPNOTSUPP) => {
>>>>                fs_feature_flags.remove(Flags::WITH_FCAPS);
>>>> -            Ok(())
>>>> +            Ok(false)
>>>>            }
>>>> -        Err(Errno::EBADF) => Ok(()), // symlinks
>>>> +        Err(Errno::EBADF) => Ok(false), // symlinks
>>>> +        Err(Errno::ESTALE) => Ok(true),
>>>>            Err(err) => Err(err).context("failed to read file capabilities"),
>>>>        }
>>>>    }
>>>> @@ -1458,32 +1481,35 @@ fn get_xattr_fcaps_acl(
>>>>        flags: Flags,
>>>>        fs_feature_flags: &mut Flags,
>>>>        skip_e2big_xattr: bool,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>>        if !flags.contains(Flags::WITH_XATTRS) {
>>>> -        return Ok(());
>>>> +        return Ok(false);
>>>>        }
>>>>    
>>>>        let xattrs = match xattr::flistxattr(fd) {
>>>>            Ok(names) => names,
>>>>            Err(Errno::EOPNOTSUPP) => {
>>>>                fs_feature_flags.remove(Flags::WITH_XATTRS);
>>>> -            return Ok(());
>>>> +            return Ok(false);
>>>>            }
>>>>            Err(Errno::E2BIG) => {
>>>>                match skip_e2big_xattr {
>>>> -                true => return Ok(()),
>>>> +                true => return Ok(false),
>>>>                    false => {
>>>>                        bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>>                    }
>>>>                };
>>>>            }
>>>> -        Err(Errno::EBADF) => return Ok(()), // symlinks
>>>> +        Err(Errno::EBADF) => return Ok(false), // symlinks
>>>> +        Err(Errno::ESTALE) => return Ok(true),
>>>
>>> see above
>>>
>>>>            Err(err) => return Err(err).context("failed to read xattrs"),
>>>>        };
>>>>    
>>>>        for attr in &xattrs {
>>>>            if xattr::is_security_capability(attr) {
>>>> -            get_fcaps(meta, fd, flags, fs_feature_flags)?;
>>>> +            if get_fcaps(meta, fd, flags, fs_feature_flags)? {
>>>> +                return Ok(true);
>>>
>>> see above
>>>
>>>> +            }
>>>>                continue;
>>>>            }
>>>>    
>>>> @@ -1505,35 +1531,37 @@ fn get_xattr_fcaps_acl(
>>>>                Err(Errno::EBADF) => (),   // symlinks, shouldn't be able to reach this either
>>>>                Err(Errno::E2BIG) => {
>>>>                    match skip_e2big_xattr {
>>>> -                    true => return Ok(()),
>>>> +                    true => return Ok(false),
>>>>                        false => {
>>>>                            bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>>                        }
>>>>                    };
>>>>                }
>>>> +            Err(Errno::ESTALE) => return Ok(true), // symlinks
>>>
>>> same here (and stray copy-paste comment I guess?)
>>>
>>>>                Err(err) => {
>>>>                    return Err(err).context(format!("error reading extended attribute {attr:?}"))
>>>>                }
>>>>            }
>>>>        }
>>>>    
>>>> -    Ok(())
>>>> +    Ok(false)
>>>>    }
>>>>    
>>>> -fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<(), Error> {
>>>> +fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<bool, Error> {
>>>>        let mut attr: libc::c_long = 0;
>>>>    
>>>>        match unsafe { fs::read_attr_fd(fd, &mut attr) } {
>>>>            Ok(_) => (),
>>>> +        Err(Errno::ESTALE) => return Ok(true),
>>>>            Err(errno) if errno_is_unsupported(errno) => {
>>>> -            return Ok(());
>>>> +            return Ok(false);
>>>>            }
>>>>            Err(err) => return Err(err).context("failed to read file attributes"),
>>>>        }
>>>>    
>>>>        metadata.stat.flags |= Flags::from_chattr(attr).bits();
>>>>    
>>>> -    Ok(())
>>>> +    Ok(false)
>>>>    }
>>>>    
>>>>    fn get_fat_attr(metadata: &mut Metadata, fd: RawFd, fs_magic: i64) -> Result<(), Error> {
>>>> @@ -1564,30 +1592,34 @@ fn get_quota_project_id(
>>>>        fd: RawFd,
>>>>        flags: Flags,
>>>>        magic: i64,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>
>>> see above
>>>
>>>>        if !(metadata.is_dir() || metadata.is_regular_file()) {
>>>> -        return Ok(());
>>>> +        return Ok(false);
>>>>        }
>>>>    
>>>>        if !flags.contains(Flags::WITH_QUOTA_PROJID) {
>>>> -        return Ok(());
>>>> +        return Ok(false);
>>>>        }
>>>>    
>>>>        use proxmox_sys::linux::magic::*;
>>>>    
>>>>        match magic {
>>>>            EXT4_SUPER_MAGIC | XFS_SUPER_MAGIC | FUSE_SUPER_MAGIC | ZFS_SUPER_MAGIC => (),
>>>> -        _ => return Ok(()),
>>>> +        _ => return Ok(false),
>>>>        }
>>>>    
>>>>        let mut fsxattr = fs::FSXAttr::default();
>>>>        let res = unsafe { fs::fs_ioc_fsgetxattr(fd, &mut fsxattr) };
>>>>    
>>>> +    if let Err(Errno::ESTALE) = res {
>>>> +        return Ok(true);
>>>> +    }
>>>> +
>>>>        // On some FUSE filesystems it can happen that ioctl is not supported.
>>>>        // For these cases projid is set to 0 while the error is ignored.
>>>>        if let Err(errno) = res {
>>>>            if errno_is_unsupported(errno) {
>>>> -            return Ok(());
>>>> +            return Ok(false);
>>>>            } else {
>>>>                return Err(errno).context("error while reading quota project id");
>>>>            }
>>>> @@ -1597,7 +1629,7 @@ fn get_quota_project_id(
>>>>        if projid != 0 {
>>>>            metadata.quota_project_id = Some(pxar::format::QuotaProjectId { projid });
>>>>        }
>>>> -    Ok(())
>>>> +    Ok(false)
>>>>    }
>>>>    
>>>>    fn get_acl(
>>>> @@ -1840,7 +1872,7 @@ mod tests {
>>>>            let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
>>>>            let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
>>>>            let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>> -        let metadata = get_metadata(
>>>> +        let (metadata, _) = get_metadata(
>>>
>>> no use of the new return value
>>>
>>>>                dir.as_raw_fd(),
>>>>                &stat,
>>>>                fs_feature_flags,
>>>> @@ -1937,7 +1969,7 @@ mod tests {
>>>>                let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
>>>>                let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>>    
>>>> -            let metadata = get_metadata(
>>>> +            let (metadata, _) = get_metadata(
>>>
>>> no use either.. so wouldn't it make more sense to pass in a path and log
>>> the context right in get_metadata? or treat the stale FD as an error,
>>> and add the context/path as part of error handling?
>>
>> The first approach seems better, will however not help to differentiate
>> the (hard) errors from the soft error ESTALE, which requires to skip
>> over entries at the `get_metadata` call side conditionally.
>>
>> Returning the stale file handle error as `Anyhow::Error` also does not
>> allow to distinguish from other (hard) errors, so again it cannot be
>> handled as soft error at the call site.
>>
>> And returning all errors as `Errno` has the loss of error context issue
>> as described above.
>>
>> I will see if I can cover this better by refactoring the code, as most
>> of the helpers have a single call side, so it should be possible to
>> reorganize without much side effects.
>>
>>>
>>> the four call sites are:
>>> - two related to tests, we can probably treat ESTALE as hard error there
>>> - the one for obtaining the metadata of the source dir of the archive,
>>>     if that is stale we can't create an archive -> hard error as well
>>> - adding an entry: for the stale case, we already log a warning and
>>>     proceed with the next entry, so we don't benefit from the fact that
>>>     (incomplete) metadata and the staleness is returned, as opposed to
>>>     just treating ESTALE as an error that we can "catch" and handle..
>>>
>>>>                    source_dir.as_raw_fd(),
>>>>                    &stat,
>>>>                    fs_feature_flags,
>>>> -- 
>>>> 2.39.5
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel at lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>
>>





More information about the pbs-devel mailing list