[pbs-devel] [PATCH proxmox-backup 4/4] fix #5853: client: pxar: exclude stale files on metadata read
Fabian Grünbichler
f.gruenbichler at proxmox.com
Wed Nov 13 14:55:46 CET 2024
On November 13, 2024 2:45 pm, Christian Ebner wrote:
> On 11/11/24 14:37, Fabian Grünbichler wrote:
>> behaviour wise this seems okay to me, but if possible, I'd avoid all the
>> return value tuples, see detailed comments below..
>
> Agreed, I am not a fan of passing the stale file handle error info along
> a the boolean in the return value as well.
>
> But unfortunately passing along the error without loosing pre-existing
> error context and switching all the `get_metadata` related functions to
> return an `Errno` is not possible as is.
>
> E.g. `process_acl` returns an `anyhow::Error` (could be defined to
> return an e.g. `Errno::EINVALID` instead?), special handling of
> `Errno::E2BIG` for the xattr case only, ...
>
> The current approach was choosen to keep the current anyhow error
> context close to the actual errors when they occur.
I think that should be solvable with some refactoring/defining of a
proper error type..
but you can also attempt to downcast the anyhow error to get the ESTALE?
> > On November 5, 2024 3:01 pm, Christian Ebner wrote:
>>> Skip and warn the user for files which returned a stale file handle
>>> error while reading the metadata associated to that file.
>>>
>>> Instead of returning with an error when getting the metadata, return
>>> a boolean flag signaling if a stale file handle has been encountered.
>>>
>>> Link to issue in bugtracker:
>>> https://bugzilla.proxmox.com/show_bug.cgi?id=5853
>>>
>>> Link to thread in community forum:
>>> https://forum.proxmox.com/threads/156822/
>>>
>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>> ---
>>> pbs-client/src/pxar/create.rs | 100 ++++++++++++++++++++++------------
>>> 1 file changed, 66 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>>> index 2a844922c..85be00db4 100644
>>> --- a/pbs-client/src/pxar/create.rs
>>> +++ b/pbs-client/src/pxar/create.rs
>>> @@ -228,7 +228,7 @@ where
>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>
>>> let stat = nix::sys::stat::fstat(source_dir.as_raw_fd())?;
>>> - let metadata = get_metadata(
>>> + let (metadata, stale_fd) = get_metadata(
>>
>> stale_fd here is not used at all..
>
> Yes, that one should lead to a hard error as you mentioned, so should be
> handled accordingly. I will adapt this to be a hard error.
>
>>
>>> source_dir.as_raw_fd(),
>>> &stat,
>>> feature_flags & fs_feature_flags,
>>> @@ -744,7 +744,7 @@ impl Archiver {
>>> return Ok(());
>>> }
>>>
>>> - let metadata = get_metadata(
>>> + let (metadata, stale_fd) = get_metadata(
>>
>> this one is used
>>
>>> fd.as_raw_fd(),
>>> stat,
>>> self.flags(),
>>> @@ -753,6 +753,11 @@ impl Archiver {
>>> self.skip_e2big_xattr,
>>> )?;
>>>
>>> + if stale_fd {
>>> + log::warn!("Stale filehandle encountered, skip {:?}", self.path);
>>> + return Ok(());
>>> + }
>>
>> for this warning.. but get_metadata already logs (potentially multiple
>> times ;)) that things are incomplete cause of the stale filehandle, this
>> only adds the path context..
>
> but here there is also an early return, not just the log... this skips
> over adding this entry, and any sub entries if the entry is a directory.
>
> The logging could however be moved to the get_metadata call and only be
> logged once, agreed.
>
>>
>>> +
>>> if self.previous_payload_index.is_none() {
>>> return self
>>> .add_entry_to_archive(encoder, &mut None, c_file_name, stat, fd, &metadata, None)
>>> @@ -1301,7 +1306,14 @@ impl Archiver {
>>> file_name: &Path,
>>> metadata: &Metadata,
>>> ) -> Result<(), Error> {
>>> - let dest = nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..])?;
>>> + let dest = match nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..]) {
>>> + Ok(dest) => dest,
>>> + Err(Errno::ESTALE) => {
>>> + log::warn!("Stale file handle encountered, skip {file_name:?}");
>>> + return Ok(());
>>> + }
>>> + Err(err) => return Err(err.into()),
>>> + };
>>> encoder.add_symlink(metadata, file_name, dest).await?;
>>> Ok(())
>>> }
>>> @@ -1397,9 +1409,10 @@ fn get_metadata(
>>> fs_magic: i64,
>>> fs_feature_flags: &mut Flags,
>>> skip_e2big_xattr: bool,
>>> -) -> Result<Metadata, Error> {
>>> +) -> Result<(Metadata, bool), Error> {
>>> // required for some of these
>>> let proc_path = Path::new("/proc/self/fd/").join(fd.to_string());
>>> + let mut stale_fd = false;
>>>
>>> let mut meta = Metadata {
>>> stat: pxar::Stat {
>>> @@ -1412,18 +1425,27 @@ fn get_metadata(
>>> ..Default::default()
>>> };
>>>
>>> - get_xattr_fcaps_acl(
>>> + if get_xattr_fcaps_acl(
>>
>> only call site, could just bubble up ESTALE
>
> As mentioned, this has 2 issues: Loss of anyhow error context for which
> sub-function the Errno occurred and sub-functions like `process_acl`
> which do not rely on ffi calls at all, returning plain `anyhow::Error`,
> which granted could be redefined to return an Errno.
>
>>
>>> &mut meta,
>>> fd,
>>> &proc_path,
>>> flags,
>>> fs_feature_flags,
>>> skip_e2big_xattr,
>>> - )?;
>>> - get_chattr(&mut meta, fd)?;
>>> + )? {
>>> + stale_fd = true;
>>> + log::warn!("Stale filehandle, xattrs incomplete");
>>> + }
>>> + if get_chattr(&mut meta, fd)? {
>>
>> same
>>
>>> + stale_fd = true;
>>> + log::warn!("Stale filehandle, chattr incomplete");
>>> + }
>>> get_fat_attr(&mut meta, fd, fs_magic)?;
>>> - get_quota_project_id(&mut meta, fd, flags, fs_magic)?;
>>> - Ok(meta)
>>> + if get_quota_project_id(&mut meta, fd, flags, fs_magic)? {
>>
>> same
>>
>>> + stale_fd = true;
>>> + log::warn!("Stale filehandle, quota project id incomplete");
>>> + }
>>
>> see above and way down below, IMHO all of these could just bubble up the error..
>>
>>> + Ok((meta, stale_fd))
>>> }
>>>
>>> fn get_fcaps(
>>> @@ -1431,22 +1453,23 @@ fn get_fcaps(
>>> fd: RawFd,
>>> flags: Flags,
>>> fs_feature_flags: &mut Flags,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>>
>> this is only called by get_xattr_fcaps_acl, so could just bubble up
>> ESTALE as well..
>>
>>> if !flags.contains(Flags::WITH_FCAPS) {
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>>
>>> match xattr::fgetxattr(fd, xattr::XATTR_NAME_FCAPS) {
>>> Ok(data) => {
>>> meta.fcaps = Some(pxar::format::FCaps { data });
>>> - Ok(())
>>> + Ok(false)
>>> }
>>> - Err(Errno::ENODATA) => Ok(()),
>>> + Err(Errno::ENODATA) => Ok(false),
>>> Err(Errno::EOPNOTSUPP) => {
>>> fs_feature_flags.remove(Flags::WITH_FCAPS);
>>> - Ok(())
>>> + Ok(false)
>>> }
>>> - Err(Errno::EBADF) => Ok(()), // symlinks
>>> + Err(Errno::EBADF) => Ok(false), // symlinks
>>> + Err(Errno::ESTALE) => Ok(true),
>>> Err(err) => Err(err).context("failed to read file capabilities"),
>>> }
>>> }
>>> @@ -1458,32 +1481,35 @@ fn get_xattr_fcaps_acl(
>>> flags: Flags,
>>> fs_feature_flags: &mut Flags,
>>> skip_e2big_xattr: bool,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>>> if !flags.contains(Flags::WITH_XATTRS) {
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>>
>>> let xattrs = match xattr::flistxattr(fd) {
>>> Ok(names) => names,
>>> Err(Errno::EOPNOTSUPP) => {
>>> fs_feature_flags.remove(Flags::WITH_XATTRS);
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>> Err(Errno::E2BIG) => {
>>> match skip_e2big_xattr {
>>> - true => return Ok(()),
>>> + true => return Ok(false),
>>> false => {
>>> bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>> }
>>> };
>>> }
>>> - Err(Errno::EBADF) => return Ok(()), // symlinks
>>> + Err(Errno::EBADF) => return Ok(false), // symlinks
>>> + Err(Errno::ESTALE) => return Ok(true),
>>
>> see above
>>
>>> Err(err) => return Err(err).context("failed to read xattrs"),
>>> };
>>>
>>> for attr in &xattrs {
>>> if xattr::is_security_capability(attr) {
>>> - get_fcaps(meta, fd, flags, fs_feature_flags)?;
>>> + if get_fcaps(meta, fd, flags, fs_feature_flags)? {
>>> + return Ok(true);
>>
>> see above
>>
>>> + }
>>> continue;
>>> }
>>>
>>> @@ -1505,35 +1531,37 @@ fn get_xattr_fcaps_acl(
>>> Err(Errno::EBADF) => (), // symlinks, shouldn't be able to reach this either
>>> Err(Errno::E2BIG) => {
>>> match skip_e2big_xattr {
>>> - true => return Ok(()),
>>> + true => return Ok(false),
>>> false => {
>>> bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>> }
>>> };
>>> }
>>> + Err(Errno::ESTALE) => return Ok(true), // symlinks
>>
>> same here (and stray copy-paste comment I guess?)
>>
>>> Err(err) => {
>>> return Err(err).context(format!("error reading extended attribute {attr:?}"))
>>> }
>>> }
>>> }
>>>
>>> - Ok(())
>>> + Ok(false)
>>> }
>>>
>>> -fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<(), Error> {
>>> +fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<bool, Error> {
>>> let mut attr: libc::c_long = 0;
>>>
>>> match unsafe { fs::read_attr_fd(fd, &mut attr) } {
>>> Ok(_) => (),
>>> + Err(Errno::ESTALE) => return Ok(true),
>>> Err(errno) if errno_is_unsupported(errno) => {
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>> Err(err) => return Err(err).context("failed to read file attributes"),
>>> }
>>>
>>> metadata.stat.flags |= Flags::from_chattr(attr).bits();
>>>
>>> - Ok(())
>>> + Ok(false)
>>> }
>>>
>>> fn get_fat_attr(metadata: &mut Metadata, fd: RawFd, fs_magic: i64) -> Result<(), Error> {
>>> @@ -1564,30 +1592,34 @@ fn get_quota_project_id(
>>> fd: RawFd,
>>> flags: Flags,
>>> magic: i64,
>>> -) -> Result<(), Error> {
>>> +) -> Result<bool, Error> {
>>
>> see above
>>
>>> if !(metadata.is_dir() || metadata.is_regular_file()) {
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>>
>>> if !flags.contains(Flags::WITH_QUOTA_PROJID) {
>>> - return Ok(());
>>> + return Ok(false);
>>> }
>>>
>>> use proxmox_sys::linux::magic::*;
>>>
>>> match magic {
>>> EXT4_SUPER_MAGIC | XFS_SUPER_MAGIC | FUSE_SUPER_MAGIC | ZFS_SUPER_MAGIC => (),
>>> - _ => return Ok(()),
>>> + _ => return Ok(false),
>>> }
>>>
>>> let mut fsxattr = fs::FSXAttr::default();
>>> let res = unsafe { fs::fs_ioc_fsgetxattr(fd, &mut fsxattr) };
>>>
>>> + if let Err(Errno::ESTALE) = res {
>>> + return Ok(true);
>>> + }
>>> +
>>> // On some FUSE filesystems it can happen that ioctl is not supported.
>>> // For these cases projid is set to 0 while the error is ignored.
>>> if let Err(errno) = res {
>>> if errno_is_unsupported(errno) {
>>> - return Ok(());
>>> + return Ok(false);
>>> } else {
>>> return Err(errno).context("error while reading quota project id");
>>> }
>>> @@ -1597,7 +1629,7 @@ fn get_quota_project_id(
>>> if projid != 0 {
>>> metadata.quota_project_id = Some(pxar::format::QuotaProjectId { projid });
>>> }
>>> - Ok(())
>>> + Ok(false)
>>> }
>>>
>>> fn get_acl(
>>> @@ -1840,7 +1872,7 @@ mod tests {
>>> let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
>>> let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>> - let metadata = get_metadata(
>>> + let (metadata, _) = get_metadata(
>>
>> no use of the new return value
>>
>>> dir.as_raw_fd(),
>>> &stat,
>>> fs_feature_flags,
>>> @@ -1937,7 +1969,7 @@ mod tests {
>>> let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>
>>> - let metadata = get_metadata(
>>> + let (metadata, _) = get_metadata(
>>
>> no use either.. so wouldn't it make more sense to pass in a path and log
>> the context right in get_metadata? or treat the stale FD as an error,
>> and add the context/path as part of error handling?
>
> The first approach seems better, will however not help to differentiate
> the (hard) errors from the soft error ESTALE, which requires to skip
> over entries at the `get_metadata` call side conditionally.
>
> Returning the stale file handle error as `Anyhow::Error` also does not
> allow to distinguish from other (hard) errors, so again it cannot be
> handled as soft error at the call site.
>
> And returning all errors as `Errno` has the loss of error context issue
> as described above.
>
> I will see if I can cover this better by refactoring the code, as most
> of the helpers have a single call side, so it should be possible to
> reorganize without much side effects.
>
>>
>> the four call sites are:
>> - two related to tests, we can probably treat ESTALE as hard error there
>> - the one for obtaining the metadata of the source dir of the archive,
>> if that is stale we can't create an archive -> hard error as well
>> - adding an entry: for the stale case, we already log a warning and
>> proceed with the next entry, so we don't benefit from the fact that
>> (incomplete) metadata and the staleness is returned, as opposed to
>> just treating ESTALE as an error that we can "catch" and handle..
>>
>>> source_dir.as_raw_fd(),
>>> &stat,
>>> fs_feature_flags,
>>> --
>>> 2.39.5
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>
>
More information about the pbs-devel
mailing list