[pbs-devel] [PATCH v6 proxmox-backup 47/65] fix #3174: client: pxar: enable caching and meta comparison

Fri May 24 10:50:21 CEST 2024

On 5/22/24 16:45, Dominik Csapak wrote:
> hight level comment
> 
> this patch does many things and feels like it could be broken up more 
> clearly

Okay, yes I will split this up into more consumable patches.

> 
> could this be better if those were in an optional struct?
> 
> e.g.
> 
> cache: Option<PxarCreateCache>,
> 
> not sure if it makes the code easier to read or not...

I will check if I can move some of the logic and create a struct for 
this to the lookahead cache introduced by the patch before this one.

> 
> IIUC we don't encode a '.pxarexclude-cli' when there is a previous ref ?
> does that really make sense

As already discussed offlist, this makes it impossible to reuse and even 
worse, it will encode the file entry at some point breaking with the 
logic of the cache, leading to a corrupt archive.
I did hava a dedicated cache entry type just for this at some point, but 
this edge case was really a problem.
Logically it makes more sense to put this into the prelude section, 
instead of encoding this as a file in the archive.

> 
>>                       self.encode_pxarexclude_cli(encoder, 
>> &file_entry.name, old_patterns_count)
>>                           .await?;
>>                       continue;
>> @@ -336,6 +363,7 @@ impl Archiver {
>>                   .await
>>                   .map_err(|err| self.wrap_err(err))?;
>>               }
>> +
>>               self.path = old_path;
>>               self.entry_counter = entry_counter;
>>               self.patterns.truncate(old_patterns_count);
>> @@ -618,8 +646,6 @@ impl Archiver {
>>           c_file_name: &CStr,
>>           stat: &FileStat,
>>       ) -> Result<(), Error> {
>> -        use pxar::format::mode;
>> -
>>           let file_mode = stat.st_mode & libc::S_IFMT;
>>           let open_mode = if file_mode == libc::S_IFREG || file_mode 
>> == libc::S_IFDIR {
>>               OFlag::empty()
>> @@ -657,6 +683,96 @@ impl Archiver {
>>               self.skip_e2big_xattr,
>>           )?;
>> +        if self.previous_payload_index.is_none() {
>> +            return self
>> +                .add_entry_to_archive(
>> +                    encoder,
>> +                    previous_metadata,
>> +                    c_file_name,
>> +                    stat,
>> +                    fd,
>> +                    &metadata,
>> +                    None,
>> +                )
>> +                .await;
>> +        }
>> +
>> +        // Avoid having to many open file handles in cached entries
>> +        if self.cached_entries.len() > MAX_CACHE_SIZE {
>> +            log::debug!("Max cache size reached, reuse cached entries");
>> +            self.flush_cached_reusing_if_below_threshold(encoder, true)
>> +                .await?;
>> +        }
>> +
>> +        if metadata.is_regular_file() {
>> +            self.cache_or_flush_entries(
>> +                encoder,
>> +                previous_metadata,
>> +                c_file_name,
>> +                stat,
>> +                fd,
>> +                &metadata,
>> +            )
>> +            .await
>> +        } else if self.caching_enabled {
>> +            if stat.st_mode & libc::S_IFMT == libc::S_IFDIR {
>> +                let fd_clone = fd.try_clone()?;
>> +                let cache_entry = 
>> CacheEntry::DirEntry(CacheEntryData::new(
>> +                    fd,
>> +                    c_file_name.into(),
>> +                    *stat,
>> +                    metadata.clone(),
>> +                    PayloadOffset::default(),
>> +                ));
>> +                self.cached_entries.push(cache_entry);
>> +
>> +                let dir = Dir::from_fd(fd_clone.into_raw_fd())?;
>> +                self.add_directory(
>> +                    encoder,
>> +                    previous_metadata,
>> +                    dir,
>> +                    c_file_name,
>> +                    &metadata,
>> +                    stat,
>> +                )
>> +                .await?;
>> +            } else {
>> +                let cache_entry = 
>> CacheEntry::RegEntry(CacheEntryData::new(
>> +                    fd,
>> +                    c_file_name.into(),
>> +                    *stat,
>> +                    metadata,
>> +                    PayloadOffset::default(),
>> +                ));
>> +                self.cached_entries.push(cache_entry);
>> +            }
>> +            Ok(())
>> +        } else {
>> +            self.add_entry_to_archive(
>> +                encoder,
>> +                previous_metadata,
>> +                c_file_name,
>> +                stat,
>> +                fd,
>> +                &metadata,
>> +                None,
>> +            )
>> +            .await
>> +        }
> 
> a few questions here:
> 
> 'caching_enabled' is IMHO a bit misleading as this is only used for non 
> file entries?

No, this is used for all types of entries!
The only distinction made for cache entries is whether they are 
directories or not. For directories, entry and exit point have to be 
considered, in order to generate a consistent archive.

Maybe above code path is not so clear. For non file entries, when 
caching is enabled, they are cached, otherwise encoded. For file entries 
one needs to first check if they break with the caching, as they might 
be not reusable. Therefore the distiction above.

If not in caching mode, simply reencode the entry.
> 
> personally i find the split handling of caching vs flushing here and in 
> cache_or_flush_entries
> a bit confusing:
> 
> hardlinks/files are handled there, while dirs and all other filetypes 
> are handled here?
> why ?
> 
> i'd rather have all types handled there, wihch would make the code here 
> much more readable
> and having the code for all types there should also not be that ugly

I agree that this is not the most readable code, it was a bit hard to 
cover all the possible edge cases one might encounter during caching and 
therefore grew rather organically instead of having a clear vision on 
where to go. Let me see if the factoring out of some of the logic to a 
dedicated `PxarLookaheadCache` might help to improves this.