[pbs-devel] [PATCH proxmox-backup 4/4] fix #5853: client: pxar: exclude stale files on metadata read
Christian Ebner
c.ebner at proxmox.com
Wed Nov 13 15:04:47 CET 2024
On 11/13/24 14:55, Fabian Grünbichler wrote:
> On November 13, 2024 2:45 pm, Christian Ebner wrote:
>> On 11/11/24 14:37, Fabian Grünbichler wrote:
>>> behaviour wise this seems okay to me, but if possible, I'd avoid all the
>>> return value tuples, see detailed comments below..
>>
>> Agreed, I am not a fan of passing the stale file handle error info along
>> a the boolean in the return value as well.
>>
>> But unfortunately passing along the error without loosing pre-existing
>> error context and switching all the `get_metadata` related functions to
>> return an `Errno` is not possible as is.
>>
>> E.g. `process_acl` returns an `anyhow::Error` (could be defined to
>> return an e.g. `Errno::EINVALID` instead?), special handling of
>> `Errno::E2BIG` for the xattr case only, ...
>>
>> The current approach was choosen to keep the current anyhow error
>> context close to the actual errors when they occur.
>
> I think that should be solvable with some refactoring/defining of a
> proper error type..
>
> but you can also attempt to downcast the anyhow error to get the ESTALE?
Ah true, thanks for the pointer!
I will send a new version of the patches incorporating your feedback.
>
>> > On November 5, 2024 3:01 pm, Christian Ebner wrote:
>>>> Skip and warn the user for files which returned a stale file handle
>>>> error while reading the metadata associated to that file.
>>>>
>>>> Instead of returning with an error when getting the metadata, return
>>>> a boolean flag signaling if a stale file handle has been encountered.
>>>>
>>>> Link to issue in bugtracker:
>>>> https://bugzilla.proxmox.com/show_bug.cgi?id=5853
>>>>
>>>> Link to thread in community forum:
>>>> https://forum.proxmox.com/threads/156822/
>>>>
>>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>>> ---
>>>> pbs-client/src/pxar/create.rs | 100 ++++++++++++++++++++++------------
>>>> 1 file changed, 66 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>>>> index 2a844922c..85be00db4 100644
>>>> --- a/pbs-client/src/pxar/create.rs
>>>> +++ b/pbs-client/src/pxar/create.rs
>>>> @@ -228,7 +228,7 @@ where
>>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>>
>>>> let stat = nix::sys::stat::fstat(source_dir.as_raw_fd())?;
>>>> - let metadata = get_metadata(
>>>> + let (metadata, stale_fd) = get_metadata(
>>>
>>> stale_fd here is not used at all..
>>
>> Yes, that one should lead to a hard error as you mentioned, so should be
>> handled accordingly. I will adapt this to be a hard error.
>>
>>>
>>>> source_dir.as_raw_fd(),
>>>> &stat,
>>>> feature_flags & fs_feature_flags,
>>>> @@ -744,7 +744,7 @@ impl Archiver {
>>>> return Ok(());
>>>> }
>>>>
>>>> - let metadata = get_metadata(
>>>> + let (metadata, stale_fd) = get_metadata(
>>>
>>> this one is used
>>>
>>>> fd.as_raw_fd(),
>>>> stat,
>>>> self.flags(),
>>>> @@ -753,6 +753,11 @@ impl Archiver {
>>>> self.skip_e2big_xattr,
>>>> )?;
>>>>
>>>> + if stale_fd {
>>>> + log::warn!("Stale filehandle encountered, skip {:?}", self.path);
>>>> + return Ok(());
>>>> + }
>>>
>>> for this warning.. but get_metadata already logs (potentially multiple
>>> times ;)) that things are incomplete cause of the stale filehandle, this
>>> only adds the path context..
>>
>> but here there is also an early return, not just the log... this skips
>> over adding this entry, and any sub entries if the entry is a directory.
>>
>> The logging could however be moved to the get_metadata call and only be
>> logged once, agreed.
>>
>>>
>>>> +
>>>> if self.previous_payload_index.is_none() {
>>>> return self
>>>> .add_entry_to_archive(encoder, &mut None, c_file_name, stat, fd, &metadata, None)
>>>> @@ -1301,7 +1306,14 @@ impl Archiver {
>>>> file_name: &Path,
>>>> metadata: &Metadata,
>>>> ) -> Result<(), Error> {
>>>> - let dest = nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..])?;
>>>> + let dest = match nix::fcntl::readlinkat(fd.as_raw_fd(), &b""[..]) {
>>>> + Ok(dest) => dest,
>>>> + Err(Errno::ESTALE) => {
>>>> + log::warn!("Stale file handle encountered, skip {file_name:?}");
>>>> + return Ok(());
>>>> + }
>>>> + Err(err) => return Err(err.into()),
>>>> + };
>>>> encoder.add_symlink(metadata, file_name, dest).await?;
>>>> Ok(())
>>>> }
>>>> @@ -1397,9 +1409,10 @@ fn get_metadata(
>>>> fs_magic: i64,
>>>> fs_feature_flags: &mut Flags,
>>>> skip_e2big_xattr: bool,
>>>> -) -> Result<Metadata, Error> {
>>>> +) -> Result<(Metadata, bool), Error> {
>>>> // required for some of these
>>>> let proc_path = Path::new("/proc/self/fd/").join(fd.to_string());
>>>> + let mut stale_fd = false;
>>>>
>>>> let mut meta = Metadata {
>>>> stat: pxar::Stat {
>>>> @@ -1412,18 +1425,27 @@ fn get_metadata(
>>>> ..Default::default()
>>>> };
>>>>
>>>> - get_xattr_fcaps_acl(
>>>> + if get_xattr_fcaps_acl(
>>>
>>> only call site, could just bubble up ESTALE
>>
>> As mentioned, this has 2 issues: Loss of anyhow error context for which
>> sub-function the Errno occurred and sub-functions like `process_acl`
>> which do not rely on ffi calls at all, returning plain `anyhow::Error`,
>> which granted could be redefined to return an Errno.
>>
>>>
>>>> &mut meta,
>>>> fd,
>>>> &proc_path,
>>>> flags,
>>>> fs_feature_flags,
>>>> skip_e2big_xattr,
>>>> - )?;
>>>> - get_chattr(&mut meta, fd)?;
>>>> + )? {
>>>> + stale_fd = true;
>>>> + log::warn!("Stale filehandle, xattrs incomplete");
>>>> + }
>>>> + if get_chattr(&mut meta, fd)? {
>>>
>>> same
>>>
>>>> + stale_fd = true;
>>>> + log::warn!("Stale filehandle, chattr incomplete");
>>>> + }
>>>> get_fat_attr(&mut meta, fd, fs_magic)?;
>>>> - get_quota_project_id(&mut meta, fd, flags, fs_magic)?;
>>>> - Ok(meta)
>>>> + if get_quota_project_id(&mut meta, fd, flags, fs_magic)? {
>>>
>>> same
>>>
>>>> + stale_fd = true;
>>>> + log::warn!("Stale filehandle, quota project id incomplete");
>>>> + }
>>>
>>> see above and way down below, IMHO all of these could just bubble up the error..
>>>
>>>> + Ok((meta, stale_fd))
>>>> }
>>>>
>>>> fn get_fcaps(
>>>> @@ -1431,22 +1453,23 @@ fn get_fcaps(
>>>> fd: RawFd,
>>>> flags: Flags,
>>>> fs_feature_flags: &mut Flags,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>
>>> this is only called by get_xattr_fcaps_acl, so could just bubble up
>>> ESTALE as well..
>>>
>>>> if !flags.contains(Flags::WITH_FCAPS) {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>>
>>>> match xattr::fgetxattr(fd, xattr::XATTR_NAME_FCAPS) {
>>>> Ok(data) => {
>>>> meta.fcaps = Some(pxar::format::FCaps { data });
>>>> - Ok(())
>>>> + Ok(false)
>>>> }
>>>> - Err(Errno::ENODATA) => Ok(()),
>>>> + Err(Errno::ENODATA) => Ok(false),
>>>> Err(Errno::EOPNOTSUPP) => {
>>>> fs_feature_flags.remove(Flags::WITH_FCAPS);
>>>> - Ok(())
>>>> + Ok(false)
>>>> }
>>>> - Err(Errno::EBADF) => Ok(()), // symlinks
>>>> + Err(Errno::EBADF) => Ok(false), // symlinks
>>>> + Err(Errno::ESTALE) => Ok(true),
>>>> Err(err) => Err(err).context("failed to read file capabilities"),
>>>> }
>>>> }
>>>> @@ -1458,32 +1481,35 @@ fn get_xattr_fcaps_acl(
>>>> flags: Flags,
>>>> fs_feature_flags: &mut Flags,
>>>> skip_e2big_xattr: bool,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>> if !flags.contains(Flags::WITH_XATTRS) {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>>
>>>> let xattrs = match xattr::flistxattr(fd) {
>>>> Ok(names) => names,
>>>> Err(Errno::EOPNOTSUPP) => {
>>>> fs_feature_flags.remove(Flags::WITH_XATTRS);
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>> Err(Errno::E2BIG) => {
>>>> match skip_e2big_xattr {
>>>> - true => return Ok(()),
>>>> + true => return Ok(false),
>>>> false => {
>>>> bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>> }
>>>> };
>>>> }
>>>> - Err(Errno::EBADF) => return Ok(()), // symlinks
>>>> + Err(Errno::EBADF) => return Ok(false), // symlinks
>>>> + Err(Errno::ESTALE) => return Ok(true),
>>>
>>> see above
>>>
>>>> Err(err) => return Err(err).context("failed to read xattrs"),
>>>> };
>>>>
>>>> for attr in &xattrs {
>>>> if xattr::is_security_capability(attr) {
>>>> - get_fcaps(meta, fd, flags, fs_feature_flags)?;
>>>> + if get_fcaps(meta, fd, flags, fs_feature_flags)? {
>>>> + return Ok(true);
>>>
>>> see above
>>>
>>>> + }
>>>> continue;
>>>> }
>>>>
>>>> @@ -1505,35 +1531,37 @@ fn get_xattr_fcaps_acl(
>>>> Err(Errno::EBADF) => (), // symlinks, shouldn't be able to reach this either
>>>> Err(Errno::E2BIG) => {
>>>> match skip_e2big_xattr {
>>>> - true => return Ok(()),
>>>> + true => return Ok(false),
>>>> false => {
>>>> bail!("{} (try --skip-e2big-xattr)", Errno::E2BIG.to_string());
>>>> }
>>>> };
>>>> }
>>>> + Err(Errno::ESTALE) => return Ok(true), // symlinks
>>>
>>> same here (and stray copy-paste comment I guess?)
>>>
>>>> Err(err) => {
>>>> return Err(err).context(format!("error reading extended attribute {attr:?}"))
>>>> }
>>>> }
>>>> }
>>>>
>>>> - Ok(())
>>>> + Ok(false)
>>>> }
>>>>
>>>> -fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<(), Error> {
>>>> +fn get_chattr(metadata: &mut Metadata, fd: RawFd) -> Result<bool, Error> {
>>>> let mut attr: libc::c_long = 0;
>>>>
>>>> match unsafe { fs::read_attr_fd(fd, &mut attr) } {
>>>> Ok(_) => (),
>>>> + Err(Errno::ESTALE) => return Ok(true),
>>>> Err(errno) if errno_is_unsupported(errno) => {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>> Err(err) => return Err(err).context("failed to read file attributes"),
>>>> }
>>>>
>>>> metadata.stat.flags |= Flags::from_chattr(attr).bits();
>>>>
>>>> - Ok(())
>>>> + Ok(false)
>>>> }
>>>>
>>>> fn get_fat_attr(metadata: &mut Metadata, fd: RawFd, fs_magic: i64) -> Result<(), Error> {
>>>> @@ -1564,30 +1592,34 @@ fn get_quota_project_id(
>>>> fd: RawFd,
>>>> flags: Flags,
>>>> magic: i64,
>>>> -) -> Result<(), Error> {
>>>> +) -> Result<bool, Error> {
>>>
>>> see above
>>>
>>>> if !(metadata.is_dir() || metadata.is_regular_file()) {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>>
>>>> if !flags.contains(Flags::WITH_QUOTA_PROJID) {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> }
>>>>
>>>> use proxmox_sys::linux::magic::*;
>>>>
>>>> match magic {
>>>> EXT4_SUPER_MAGIC | XFS_SUPER_MAGIC | FUSE_SUPER_MAGIC | ZFS_SUPER_MAGIC => (),
>>>> - _ => return Ok(()),
>>>> + _ => return Ok(false),
>>>> }
>>>>
>>>> let mut fsxattr = fs::FSXAttr::default();
>>>> let res = unsafe { fs::fs_ioc_fsgetxattr(fd, &mut fsxattr) };
>>>>
>>>> + if let Err(Errno::ESTALE) = res {
>>>> + return Ok(true);
>>>> + }
>>>> +
>>>> // On some FUSE filesystems it can happen that ioctl is not supported.
>>>> // For these cases projid is set to 0 while the error is ignored.
>>>> if let Err(errno) = res {
>>>> if errno_is_unsupported(errno) {
>>>> - return Ok(());
>>>> + return Ok(false);
>>>> } else {
>>>> return Err(errno).context("error while reading quota project id");
>>>> }
>>>> @@ -1597,7 +1629,7 @@ fn get_quota_project_id(
>>>> if projid != 0 {
>>>> metadata.quota_project_id = Some(pxar::format::QuotaProjectId { projid });
>>>> }
>>>> - Ok(())
>>>> + Ok(false)
>>>> }
>>>>
>>>> fn get_acl(
>>>> @@ -1840,7 +1872,7 @@ mod tests {
>>>> let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
>>>> let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
>>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>> - let metadata = get_metadata(
>>>> + let (metadata, _) = get_metadata(
>>>
>>> no use of the new return value
>>>
>>>> dir.as_raw_fd(),
>>>> &stat,
>>>> fs_feature_flags,
>>>> @@ -1937,7 +1969,7 @@ mod tests {
>>>> let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
>>>> let mut fs_feature_flags = Flags::from_magic(fs_magic);
>>>>
>>>> - let metadata = get_metadata(
>>>> + let (metadata, _) = get_metadata(
>>>
>>> no use either.. so wouldn't it make more sense to pass in a path and log
>>> the context right in get_metadata? or treat the stale FD as an error,
>>> and add the context/path as part of error handling?
>>
>> The first approach seems better, will however not help to differentiate
>> the (hard) errors from the soft error ESTALE, which requires to skip
>> over entries at the `get_metadata` call side conditionally.
>>
>> Returning the stale file handle error as `Anyhow::Error` also does not
>> allow to distinguish from other (hard) errors, so again it cannot be
>> handled as soft error at the call site.
>>
>> And returning all errors as `Errno` has the loss of error context issue
>> as described above.
>>
>> I will see if I can cover this better by refactoring the code, as most
>> of the helpers have a single call side, so it should be possible to
>> reorganize without much side effects.
>>
>>>
>>> the four call sites are:
>>> - two related to tests, we can probably treat ESTALE as hard error there
>>> - the one for obtaining the metadata of the source dir of the archive,
>>> if that is stale we can't create an archive -> hard error as well
>>> - adding an entry: for the stale case, we already log a warning and
>>> proceed with the next entry, so we don't benefit from the fact that
>>> (incomplete) metadata and the staleness is returned, as opposed to
>>> just treating ESTALE as an error that we can "catch" and handle..
>>>
>>>> source_dir.as_raw_fd(),
>>>> &stat,
>>>> fs_feature_flags,
>>>> --
>>>> 2.39.5
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel at lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>
>>
More information about the pbs-devel
mailing list