[pbs-devel] [PATCH v8 proxmox-backup 4/9] fix #5982: garbage collection: check atime updates are honored

Max Carrara m.carrara at proxmox.com
Wed Apr 9 12:37:13 CEST 2025


On Sat Apr 5, 2025 at 11:16 AM CEST, Christian Ebner wrote:
> On 4/4/25 17:41, Max Carrara wrote:
> > Regarding my comment in the cover letter: The call to
> > `self.cond_touch_path` above will fail on a directory-based datastore
> > backed by ext4 mounted with -o rw,relatime:
> > 
> >      2025-04-04T16:25:44+02:00: TASK ERROR: atime safety check failed: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> > 
> >  From `man 2 utimensat` (formatted for your reading pleasure):
> > 
> > EPERM
> >      The caller attempted to change one or both timestamps to a value
> >      other than the current time, or to change one of the timestamps to
> >      the current time while leaving the other timestamp unchanged, (i.e.,
> >      times  is  not NULL, neither tv_nsec field is UTIME_NOW, and neither
> >      tv_nsec field is UTIME_OMIT) and either:
> > 
> >      •  the caller's effective user ID does not match the owner of file,
> >         and the caller is not privileged (Linux: does not have the
> >         CAP_FOWNER capability); or,
> > 
> >      •  the file is marked append-only or immutable (see chattr(1)).
> > 
> > Sooo, I investigated a little more, since the EPERM here initially
> > didn't make any sense to me, because the same atime check actually
> > passes when creating the datastore.
> > 
> > Furthermore, the file (chunk) used for this check isn't append-only or
> > immutable (otherwise you'd see an `a` or `i` down there):
> > 
> >      $ lsattr .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> >      --------------e------- .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > 
> > And finally, just for completeness' sake, we're using both UTIME_NOW and
> > UTIME_OMIT in `cond_touch_path`:
> > 
> >      let times: [libc::timespec; 2] = [
> >          // access time -> update to now
> >          libc::timespec {
> >              tv_sec: 0,
> >              tv_nsec: libc::UTIME_NOW,
> >          },
> >          // modification time -> keep as is
> >          libc::timespec {
> >              tv_sec: 0,
> >              tv_nsec: libc::UTIME_OMIT,
> >          },
> >      ];
> > 
> > 
> > According to the docs of `man 2 utimensat`, this would suggest that
> > we're not running under the expected effective UID and don't have
> > the CAP_FOWNER capability.
> > 
> > This is the actual crux of the issue, since the chunk is owned by
> > `root`:
> > 
> >      $ ls -alh .chunks/bb9f
> >      total 1.1M
> >      drwxr-x--- 2 backup backup 4.0K Apr  4 17:11 .
> >      drwxr-x--- 1 backup backup 1.1M Apr  4 17:11 ..
> >      -rw-r--r-- 1 root   root    159 Apr  4 17:11 bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > 
> > Just to add some additional findings -- when the *check is turned off*,
> > GC on that single chunk (no backups made yet!) runs just fine:
> > 
> >      2025-04-04T17:00:52+02:00: starting garbage collection on store test-noatime
> >      2025-04-04T17:00:52+02:00: Access time update check disabled by datastore tuning options.
> >      2025-04-04T17:00:52+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T14:55:52Z
> >      2025-04-04T17:00:52+02:00: Start GC phase1 (mark used chunks)
> >      2025-04-04T17:00:52+02:00: Start GC phase2 (sweep unused chunks)
> >      2025-04-04T17:00:52+02:00: processed 73% (0 chunks)
> >      2025-04-04T17:00:52+02:00: Removed garbage: 0 B
> >      2025-04-04T17:00:52+02:00: Removed chunks: 0
> >      2025-04-04T17:00:52+02:00: Pending removals: 159 B (in 1 chunks)
> >      2025-04-04T17:00:52+02:00: Original data usage: 0 B
> >      2025-04-04T17:00:52+02:00: On-Disk chunks: 0
> >      2025-04-04T17:00:52+02:00: Deduplication factor: 1.00
> >      2025-04-04T17:00:52+02:00: queued notification (id=4bab0755-3f24-4b1a-99b0-de8e9613db9b)
> >      2025-04-04T17:00:52+02:00: TASK OK
> > 
> > *BUT*, when creating a backup via PVE, the check actually runs, despite
> > being turned off -- this appears to be something separate that you might
> > want to check as well:
>
> That is expected if the chunk has the wrong ownership, but not because 
> the fs_atime_check is executed here. Rather this happens because your VM 
> backup includes an all zero chunk, does not have it as known chunk 
> however (since the is no previous index file). This means the chunk will 
> be uploaded to the PBS, but the chunk store detects it as already 
> present. In that case the chunks atime will be updated too, but that 
> then fails because of the permission issue. See the corresponding code 
> paths here:
>   https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-datastore/src/chunk_store.rs;h=dc267d752980a988d22e6d410855d2ceb3c8db46;hb=HEAD#l460

Ah yes, of course! I don't know why I didn't think of that. Thanks for
the clarification!

>
> > 
> >      INFO: starting new backup job: vzdump 100 --mode snapshot --notes-template '{{guestname}}' --remove 0 --node roxy --notification-mode auto --storage pbs-dev-test-atime
> >      INFO: Starting Backup of VM 100 (qemu)
> >      INFO: Backup started at 2025-04-04 17:02:49
> >      INFO: status = running
> >      INFO: VM Name: dns
> >      INFO: include disk 'scsi0' 'local-zfs-main:vm-100-disk-0' 16G
> >      INFO: backup mode: snapshot
> >      INFO: ionice priority: 7
> >      INFO: snapshots found (not included into backup)
> >      INFO: creating Proxmox Backup Server archive 'vm/100/2025-04-04T15:02:49Z'
> >      INFO: issuing guest-agent 'fs-freeze' command
> >      INFO: issuing guest-agent 'fs-thaw' command
> >      ERROR: VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> >      INFO: aborting backup job
> >      INFO: resuming VM again
> >      ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup register image failed: command error: update atime failed for chunk/file "/mnt/datastore/test-noatime/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8" - EPERM: Operation not permitted
> >      INFO: Failed at 2025-04-04 17:02:49
> >      INFO: Backup job finished with errors
> >      TASK ERROR: job errors
> > 
> > Just to be sure:
> > 
> >      $ cat /etc/proxmox-backup/datastore.cfg
> >      # [...]
> >      datastore: test-noatime
> >              comment
> >              gc-schedule daily
> >              notification-mode notification-system
> >              path /mnt/datastore/test-noatime
> >              tuning gc-atime-safety-check=0
> > 
> > 
> > 
> > So, fixing the permissions with:
> > 
> >      # chown backup: .chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8
> > 
> > .. allows the check to pass:
> > 
> >      2025-04-04T17:14:34+02:00: starting garbage collection on store test-noatime
> >      2025-04-04T17:14:35+02:00: Access time update check successful, proceeding with GC.
> >      2025-04-04T17:14:35+02:00: Using access time cutoff 1d 5m, minimum access time is 2025-04-03T15:09:34Z
> >      2025-04-04T17:14:35+02:00: Start GC phase1 (mark used chunks)
> >      2025-04-04T17:14:35+02:00: Start GC phase2 (sweep unused chunks)
> >      2025-04-04T17:14:35+02:00: processed 73% (0 chunks)
> >      2025-04-04T17:14:35+02:00: Removed garbage: 0 B
> >      2025-04-04T17:14:35+02:00: Removed chunks: 0
> >      2025-04-04T17:14:35+02:00: Original data usage: 0 B
> >      2025-04-04T17:14:35+02:00: On-Disk chunks: 1
> >      2025-04-04T17:14:35+02:00: Deduplication factor: 0.00
> >      2025-04-04T17:14:35+02:00: Average chunk size: 159 B
> >      2025-04-04T17:14:35+02:00: queued notification (id=06d76349-26af-43da-9ddb-5b0ee2fabfa8)
> >      2025-04-04T17:14:35+02:00: TASK OK
> > 
> > So, guess you just have to change the UID and GID when creating the
> > test chunk ;P
>
> Yes, in v9 of the series, the chunk files ownership will be set if an 
> `root`s UID is detected as effective UID. This fixes the issue here and 
> makes the chunk insertion more robust in general IMHO.
>
> Thanks again for catching this early! If you manage to give it another 
> spin would be great.

Yeah I saw that--and it's already applied, too. So, guess another spin's
not needed ;)





More information about the pbs-devel mailing list