[pbs-devel] [PATCH v6 proxmox-backup 47/65] fix #3174: client: pxar: enable caching and meta comparison
Christian Ebner
c.ebner at proxmox.com
Fri May 24 10:50:21 CEST 2024
On 5/22/24 16:45, Dominik Csapak wrote:
> hight level comment
>
> this patch does many things and feels like it could be broken up more
> clearly
Okay, yes I will split this up into more consumable patches.
>
> could this be better if those were in an optional struct?
>
> e.g.
>
> cache: Option<PxarCreateCache>,
>
> not sure if it makes the code easier to read or not...
I will check if I can move some of the logic and create a struct for
this to the lookahead cache introduced by the patch before this one.
>
> IIUC we don't encode a '.pxarexclude-cli' when there is a previous ref ?
> does that really make sense
As already discussed offlist, this makes it impossible to reuse and even
worse, it will encode the file entry at some point breaking with the
logic of the cache, leading to a corrupt archive.
I did hava a dedicated cache entry type just for this at some point, but
this edge case was really a problem.
Logically it makes more sense to put this into the prelude section,
instead of encoding this as a file in the archive.
>
>> self.encode_pxarexclude_cli(encoder,
>> &file_entry.name, old_patterns_count)
>> .await?;
>> continue;
>> @@ -336,6 +363,7 @@ impl Archiver {
>> .await
>> .map_err(|err| self.wrap_err(err))?;
>> }
>> +
>> self.path = old_path;
>> self.entry_counter = entry_counter;
>> self.patterns.truncate(old_patterns_count);
>> @@ -618,8 +646,6 @@ impl Archiver {
>> c_file_name: &CStr,
>> stat: &FileStat,
>> ) -> Result<(), Error> {
>> - use pxar::format::mode;
>> -
>> let file_mode = stat.st_mode & libc::S_IFMT;
>> let open_mode = if file_mode == libc::S_IFREG || file_mode
>> == libc::S_IFDIR {
>> OFlag::empty()
>> @@ -657,6 +683,96 @@ impl Archiver {
>> self.skip_e2big_xattr,
>> )?;
>> + if self.previous_payload_index.is_none() {
>> + return self
>> + .add_entry_to_archive(
>> + encoder,
>> + previous_metadata,
>> + c_file_name,
>> + stat,
>> + fd,
>> + &metadata,
>> + None,
>> + )
>> + .await;
>> + }
>> +
>> + // Avoid having to many open file handles in cached entries
>> + if self.cached_entries.len() > MAX_CACHE_SIZE {
>> + log::debug!("Max cache size reached, reuse cached entries");
>> + self.flush_cached_reusing_if_below_threshold(encoder, true)
>> + .await?;
>> + }
>> +
>> + if metadata.is_regular_file() {
>> + self.cache_or_flush_entries(
>> + encoder,
>> + previous_metadata,
>> + c_file_name,
>> + stat,
>> + fd,
>> + &metadata,
>> + )
>> + .await
>> + } else if self.caching_enabled {
>> + if stat.st_mode & libc::S_IFMT == libc::S_IFDIR {
>> + let fd_clone = fd.try_clone()?;
>> + let cache_entry =
>> CacheEntry::DirEntry(CacheEntryData::new(
>> + fd,
>> + c_file_name.into(),
>> + *stat,
>> + metadata.clone(),
>> + PayloadOffset::default(),
>> + ));
>> + self.cached_entries.push(cache_entry);
>> +
>> + let dir = Dir::from_fd(fd_clone.into_raw_fd())?;
>> + self.add_directory(
>> + encoder,
>> + previous_metadata,
>> + dir,
>> + c_file_name,
>> + &metadata,
>> + stat,
>> + )
>> + .await?;
>> + } else {
>> + let cache_entry =
>> CacheEntry::RegEntry(CacheEntryData::new(
>> + fd,
>> + c_file_name.into(),
>> + *stat,
>> + metadata,
>> + PayloadOffset::default(),
>> + ));
>> + self.cached_entries.push(cache_entry);
>> + }
>> + Ok(())
>> + } else {
>> + self.add_entry_to_archive(
>> + encoder,
>> + previous_metadata,
>> + c_file_name,
>> + stat,
>> + fd,
>> + &metadata,
>> + None,
>> + )
>> + .await
>> + }
>
> a few questions here:
>
> 'caching_enabled' is IMHO a bit misleading as this is only used for non
> file entries?
No, this is used for all types of entries!
The only distinction made for cache entries is whether they are
directories or not. For directories, entry and exit point have to be
considered, in order to generate a consistent archive.
Maybe above code path is not so clear. For non file entries, when
caching is enabled, they are cached, otherwise encoded. For file entries
one needs to first check if they break with the caching, as they might
be not reusable. Therefore the distiction above.
If not in caching mode, simply reencode the entry.
>
> personally i find the split handling of caching vs flushing here and in
> cache_or_flush_entries
> a bit confusing:
>
> hardlinks/files are handled there, while dirs and all other filetypes
> are handled here?
> why ?
>
> i'd rather have all types handled there, wihch would make the code here
> much more readable
> and having the code for all types there should also not be that ugly
I agree that this is not the most readable code, it was a bit hard to
cover all the possible edge cases one might encounter during caching and
therefore grew rather organically instead of having a clear vision on
where to go. Let me see if the factoring out of some of the logic to a
dedicated `PxarLookaheadCache` might help to improves this.
More information about the pbs-devel
mailing list