[pbs-devel] [PATCH proxmox-backup 4/4] fix #5853: client: pxar: exclude stale files on metadata read

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Nov 13 14:55:46 CET 2024


On November 13, 2024 2:45 pm, Christian Ebner wrote:
> On 11/11/24 14:37, Fabian Grünbichler wrote:
>> behaviour wise this seems okay to me, but if possible, I'd avoid all the
>> return value tuples, see detailed comments below..
> 
> Agreed, I am not a fan of passing the stale file handle error info along 
> a the boolean in the return value as well.
> 
> But unfortunately passing along the error without loosing pre-existing 
> error context and switching all the `get_metadata` related functions to 
> return an `Errno` is not possible as is.
> 
> E.g. `process_acl` returns an `anyhow::Error` (could be defined to 
> return an e.g. `Errno::EINVALID` instead?), special handling of 
> `Errno::E2BIG` for the xattr case only, ...
> 
> The current approach was choosen to keep the current anyhow error 
> context close to the actual errors when they occur.

I think that should be solvable with some refactoring/defining of a
proper error type..

but you can also attempt to downcast the anyhow error to get the ESTALE?

>   > On November 5, 2024 3:01 pm, Christian Ebner wrote:
>>> Skip and warn the user for files which returned a stale file handle
>>> error while reading the metadata associated to that file.
>>>
>>> Instead of returning with an error when getting the metadata, return
>>> a boolean flag signaling if a stale file handle has been encountered.
>>>
>>> Link to issue in bugtracker:
>>> https://bugzilla.proxmox.com/show_bug.cgi?id=5853
>>>
>>> Link to thread in community forum:
>>> https://forum.proxmox.com/threads/156822/
>>>
>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>> ---
>>>   pbs-client/src/pxar/create.rs | 100 ++++++++++++++++++++++------------
>>>   1 file changed, 66 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>>> index 2a844922c..85be00db4 100644
>>> --- a/pbs-client/src/pxar/create.rs
>>> +++ b/pbs-client/src/pxar/create.rs
>>> @@ -228,7 +228,7 @@ where
>>>       let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>   
>>>       let stat = nix::sys::stat::fstat(source_dir.as_raw_fd())?;
>>> -    let metadata = get_metadata(
>>> +    let (metadata, stale_fd) = get_metadata(
>> 
>> stale_fd here is not used at all..
> 
> Yes, that one should lead to a hard error as you mentioned, so should be 
> handled accordingly. I will adapt this to be a hard error.
> 
>> 
>>>           source_dir.as_raw_fd(),
>>>           &stat,
>>>           feature_flags & fs_feature_flags,
>>> @@ -744,7 +744,7 @@ impl Archiver {
>>>               return Ok(());
>>>           }
>>>   
>>> -        let metadata = get_metadata(
>>> +        let (metadata, stale_fd) = get_metadata(
>> 
>> this one is used
>> 
>>>               fd.as_raw_fd(),
>>>               stat,
>>>               self.flags(),
>>> @@ -753,6 +753,11 @@ impl Archiver {
>>>               self.skip_e2big_xattr,
>>>           )?;
>>>   
>>> +        if stale_fd {
>>> +            log::warn!("Stale filehandle encountered, skip {:?}", self.path);
>>> +            return Ok(());
>>> +        }
>> 
>> for this warning.. but get_metadata already logs (potentially multiple
>> times ;)) that things are incomplete cause of the stale filehandle, this
>> only adds the path context..
> 
> but here there is also an early return, not just the log... this skips 
> over adding this entry, and any sub entries if the entry is a directory.
> 
> The logging could however be moved to the get_metadata call and only be 
> logged once, agreed.
> 
>> 
>>> +
>>>           if self.previous_payload_index.is_none() {
>>>               return self
>>>                   .add_entry_to_archive(encoder, &mut None, c_file_name, stat, fd, &metadata, None)
>>> @@ -1301,7 +1306,14 @@ impl Archiver {
>>>           file_name: &Path,
>>>           metadata: &Metadata,
>>>       ) -> Result<(), Error> {
>>> -        let dest = nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..])?;
>>> +        let dest = match nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..]) {
>>> +            Ok(dest) => dest,
>>> +            Err(Errno::ESTALE) => {
>>> +                log::warn!("Stale file handle encountered, skip {file_name:?}");
>>> +                return Ok(());
>>> +            }
>>> +            Err(err) => return Err(err.into()),
>>> +        };
>>>           encoder.add_symlink(metadata, file_name, dest).await?;
>>>           Ok(())
>>>       }
>>> @@ -1397,9 +1409,10 @@ fn get_metadata(
>>>       fs_magic: i64,
>>>       fs_feature_flags: &mut Flags,
>>>       skip_e2big_xattr: bool,
>>> -) -> Result<Metadata, Error> {
>>> +) -> Result<(Metadata, bool), Error> {
>>>       // required for some of these
>>>       let proc_path = Path::new("/proc/self/fd/").join(fd.to_string());
>>> +    let mut stale_fd = false;
>>>   
>>>       let mut meta = Metadata {
>>>           stat: pxar::Stat {
>>> @@ -1412,18 +1425,27 @@ fn get_metadata(
>>>           ..Default::default()
>>>       };
>>>   
>>> -    get_xattr_fcaps_acl(
>>> +    if get_xattr_fcaps_acl(
>> 
>> only call site, could just bubble up ESTALE
> 
> As mentioned, this has 2 issues: Loss of anyhow error context for which 
> sub-function the Errno occurred and sub-functions like `process_acl` 
> which do not rely on ffi calls at all, returning plain `anyhow::Error`, 
> which granted could be redefined to return an Errno.
> 
>> 
>>>           &mut meta,
>>>           fd,
>>>           &proc_path,
>>>           flags,
>>>           fs_feature_flags,
>>>           skip_e2big_xattr,
>>> -    )?;
>>> -    get_chattr(&mut meta, fd)?;
>>> +    )? {
>>> +        stale_fd = true;
>>> +        log::warn!("Stale filehandle, xattrs incomplete");
>>> +    }
>>> +    if get_chattr(&mut meta, fd)? {
>> 
>> same
>> 
>>> +        stale_fd = true;
>>> +        log::warn!("Stale filehandle, chattr incomplete");
>>> +    }
>>>       get_fat_attr(&mut meta, fd, fs_magic)?;
>>> -    get_quota_project_id(&mut meta, fd, flags, fs_magic)?;
>>> -    Ok(meta)
>>> +    if get_quota_project_id(&mut meta, fd, flags, fs_magic)? {
>> 
>> same
>> 
>>> +        stale_fd = true;
>>> +        log::warn!("Stale filehandle, quota  project id incomplete");
>>> +    }
>> 
>> see above and way down below, IMHO all of these could just bubble up the error..
>> 
>>> +    Ok((meta, stale_fd))
>>>   }
>>>   
>>>   fn get_fcaps(
>>> @@ -1431,22 +1453,23 @@ fn get_fcaps(
>>>       fd: RawFd,
>>>       flags: Flags,
>>>       fs_feature_flags: &mut Flags,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>> 
>> this is only called by get_xattr_fcaps_acl, so could just bubble up
>> ESTALE as well..
>> 
>>>       if !flags.contains(Flags::WITH_FCAPS) {
>>> -        return Ok(());
>>> +        return Ok(false);
>>>       }
>>>   
>>>       match xattr::fgetxattr(fd, xattr::XATTR_NAME_FCAPS) {
>>>           Ok(data) => {
>>>               meta.fcaps = Some(pxar::format::FCaps { data });
>>> -            Ok(())
>>> +            Ok(false)
>>>           }
>>> -        Err(Errno::ENODATA) => Ok(()),
>>> +        Err(Errno::ENODATA) => Ok(false),
>>>           Err(Errno::EOPNOTSUPP) => {
>>>               fs_feature_flags.remove(Flags::WITH_FCAPS);
>>> -            Ok(())
>>> +            Ok(false)
>>>           }
>>> -        Err(Errno::EBADF) => Ok(()), // symlinks
>>> +        Err(Errno::EBADF) => Ok(false), // symlinks
>>> +        Err(Errno::ESTALE) => Ok(true),
>>>           Err(err) => Err(err).context("failed to read file capabilities"),
>>>       }
>>>   }
>>> @@ -1458,32 +1481,35 @@ fn get_xattr_fcaps_acl(
>>>       flags: Flags,
>>>       fs_feature_flags: &mut Flags,
>>>       skip_e2big_xattr: bool,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>>>       if !flags.contains(Flags::WITH_XATTRS) {
>>> -        return Ok(());
>>> +        return Ok(false);
>>>       }
>>>   
>>>       let xattrs = match xattr::flistxattr(fd) {
>>>           Ok(names) => names,
>>>           Err(Errno::EOPNOTSUPP) => {
>>>               fs_feature_flags.remove(Flags::WITH_XATTRS);
>>> -            return Ok(());
>>> +            return Ok(false);
>>>           }
>>>           Err(Errno::E2BIG) => {
>>>               match skip_e2big_xattr {
>>> -                true => return Ok(()),
>>> +                true => return Ok(false),
>>>                   false => {
>>>                       bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>                   }
>>>               };
>>>           }
>>> -        Err(Errno::EBADF) => return Ok(()), // symlinks
>>> +        Err(Errno::EBADF) => return Ok(false), // symlinks
>>> +        Err(Errno::ESTALE) => return Ok(true),
>> 
>> see above
>> 
>>>           Err(err) => return Err(err).context("failed to read xattrs"),
>>>       };
>>>   
>>>       for attr in &xattrs {
>>>           if xattr::is_security_capability(attr) {
>>> -            get_fcaps(meta, fd, flags, fs_feature_flags)?;
>>> +            if get_fcaps(meta, fd, flags, fs_feature_flags)? {
>>> +                return Ok(true);
>> 
>> see above
>> 
>>> +            }
>>>               continue;
>>>           }
>>>   
>>> @@ -1505,35 +1531,37 @@ fn get_xattr_fcaps_acl(
>>>               Err(Errno::EBADF) => (),   // symlinks, shouldn't be able to reach this either
>>>               Err(Errno::E2BIG) => {
>>>                   match skip_e2big_xattr {
>>> -                    true => return Ok(()),
>>> +                    true => return Ok(false),
>>>                       false => {
>>>                           bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>                       }
>>>                   };
>>>               }
>>> +            Err(Errno::ESTALE) => return Ok(true), // symlinks
>> 
>> same here (and stray copy-paste comment I guess?)
>> 
>>>               Err(err) => {
>>>                   return Err(err).context(format!("error reading extended attribute {attr:?}"))
>>>               }
>>>           }
>>>       }
>>>   
>>> -    Ok(())
>>> +    Ok(false)
>>>   }
>>>   
>>> -fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<(), Error> {
>>> +fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<bool, Error> {
>>>       let mut attr: libc::c_long = 0;
>>>   
>>>       match unsafe { fs::read_attr_fd(fd, &mut attr) } {
>>>           Ok(_) => (),
>>> +        Err(Errno::ESTALE) => return Ok(true),
>>>           Err(errno) if errno_is_unsupported(errno) => {
>>> -            return Ok(());
>>> +            return Ok(false);
>>>           }
>>>           Err(err) => return Err(err).context("failed to read file attributes"),
>>>       }
>>>   
>>>       metadata.stat.flags |= Flags::from_chattr(attr).bits();
>>>   
>>> -    Ok(())
>>> +    Ok(false)
>>>   }
>>>   
>>>   fn get_fat_attr(metadata: &mut Metadata, fd: RawFd, fs_magic: i64) -> Result<(), Error> {
>>> @@ -1564,30 +1592,34 @@ fn get_quota_project_id(
>>>       fd: RawFd,
>>>       flags: Flags,
>>>       magic: i64,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>> 
>> see above
>> 
>>>       if !(metadata.is_dir() || metadata.is_regular_file()) {
>>> -        return Ok(());
>>> +        return Ok(false);
>>>       }
>>>   
>>>       if !flags.contains(Flags::WITH_QUOTA_PROJID) {
>>> -        return Ok(());
>>> +        return Ok(false);
>>>       }
>>>   
>>>       use proxmox_sys::linux::magic::*;
>>>   
>>>       match magic {
>>>           EXT4_SUPER_MAGIC | XFS_SUPER_MAGIC | FUSE_SUPER_MAGIC | ZFS_SUPER_MAGIC => (),
>>> -        _ => return Ok(()),
>>> +        _ => return Ok(false),
>>>       }
>>>   
>>>       let mut fsxattr = fs::FSXAttr::default();
>>>       let res = unsafe { fs::fs_ioc_fsgetxattr(fd, &mut fsxattr) };
>>>   
>>> +    if let Err(Errno::ESTALE) = res {
>>> +        return Ok(true);
>>> +    }
>>> +
>>>       // On some FUSE filesystems it can happen that ioctl is not supported.
>>>       // For these cases projid is set to 0 while the error is ignored.
>>>       if let Err(errno) = res {
>>>           if errno_is_unsupported(errno) {
>>> -            return Ok(());
>>> +            return Ok(false);
>>>           } else {
>>>               return Err(errno).context("error while reading quota project id");
>>>           }
>>> @@ -1597,7 +1629,7 @@ fn get_quota_project_id(
>>>       if projid != 0 {
>>>           metadata.quota_project_id = Some(pxar::format::QuotaProjectId { projid });
>>>       }
>>> -    Ok(())
>>> +    Ok(false)
>>>   }
>>>   
>>>   fn get_acl(
>>> @@ -1840,7 +1872,7 @@ mod tests {
>>>           let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
>>>           let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
>>>           let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>> -        let metadata = get_metadata(
>>> +        let (metadata, _) = get_metadata(
>> 
>> no use of the new return value
>> 
>>>               dir.as_raw_fd(),
>>>               &stat,
>>>               fs_feature_flags,
>>> @@ -1937,7 +1969,7 @@ mod tests {
>>>               let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
>>>               let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>   
>>> -            let metadata = get_metadata(
>>> +            let (metadata, _) = get_metadata(
>> 
>> no use either.. so wouldn't it make more sense to pass in a path and log
>> the context right in get_metadata? or treat the stale FD as an error,
>> and add the context/path as part of error handling?
> 
> The first approach seems better, will however not help to differentiate 
> the (hard) errors from the soft error ESTALE, which requires to skip 
> over entries at the `get_metadata` call side conditionally.
> 
> Returning the stale file handle error as `Anyhow::Error` also does not 
> allow to distinguish from other (hard) errors, so again it cannot be 
> handled as soft error at the call site.
> 
> And returning all errors as `Errno` has the loss of error context issue 
> as described above.
> 
> I will see if I can cover this better by refactoring the code, as most 
> of the helpers have a single call side, so it should be possible to 
> reorganize without much side effects.
> 
>> 
>> the four call sites are:
>> - two related to tests, we can probably treat ESTALE as hard error there
>> - the one for obtaining the metadata of the source dir of the archive,
>>    if that is stale we can't create an archive -> hard error as well
>> - adding an entry: for the stale case, we already log a warning and
>>    proceed with the next entry, so we don't benefit from the fact that
>>    (incomplete) metadata and the staleness is returned, as opposed to
>>    just treating ESTALE as an error that we can "catch" and handle..
>> 
>>>                   source_dir.as_raw_fd(),
>>>                   &stat,
>>>                   fs_feature_flags,
>>> -- 
>>> 2.39.5
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>>
>> 
>> 
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>> 
>> 
> 
> 




More information about the pbs-devel mailing list