[pbs-devel] [PATCH proxmox-backup] chunk store: handle insertion edge cases

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Mar 31 10:43:45 CEST 2023

these were previously called out in a comment, but should now be handled (as
much as they can be).

the performance impact shouldn't be too bad, since we only look at the magic 8
bytes at the start of the existing chunk (we already did a stat on it, so that
might even be prefetched already by storage), and only if there is a size
mismatch and encryption is enabled.

Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>

    we could verify the CRC when deciding between existing and incoming encrypted
    chunk, but that would require reading the full chunk, and that would be quite
    the load if we ever upgrade ZSTD or change its parameters and the new version
    is substantially better or worse at compressing.. the CRC is verified on
    every regular load (such as verify, sync, restore) anyway.
    we cannot decide which of the two potentially invalid encrypted chunks to keep
    based on the size and compression status: uncompressed chunks should always
    be bigger than compressed ones, but both the size and the compression bit is
    100% under a potential attacker's control anyhow, so we cannot prevent them
    from sending us rather small chunks that we still need to discard out of
    caution, even if they are smaller than the existing one.

 pbs-datastore/src/chunk_store.rs | 36 ++++++++++++++++++++++++++------
 pbs-datastore/src/data_blob.rs   |  6 ++++++
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 1944ae00..ff1229bc 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -5,6 +5,7 @@ use std::sync::{Arc, Mutex};
 use anyhow::{bail, format_err, Error};
 use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
+use proxmox_io::ReadExt;
 use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
 use proxmox_sys::process_locker::{
     ProcessLockExclusiveGuard, ProcessLockSharedGuard, ProcessLocker,
@@ -12,6 +13,9 @@ use proxmox_sys::process_locker::{
 use proxmox_sys::task_log;
 use proxmox_sys::WorkerTaskContext;
+use crate::file_formats::{
 use crate::DataBlob;
 /// File system based chunk store
@@ -460,13 +464,33 @@ impl ChunkStore {
                 return Ok((true, old_size));
             } else if old_size == 0 {
                 log::warn!("found empty chunk '{digest_str}' in store {name}, overwriting");
+            } else if chunk.is_encrypted() {
+                // incoming chunk is encrypted, possible attack or hash collision!
+                let mut existing_file = std::fs::File::open(&chunk_path)?;
+                let magic = existing_file.read_exact_allocated(8)?;
+                // going from unencrypted to encrypted can never be right, since the digest
+                // includes data derived from the encryption key
+                if magic == UNCOMPRESSED_BLOB_MAGIC_1_0 || magic == COMPRESSED_BLOB_MAGIC_1_0 {
+                    bail!("Overwriting unencrypted chunk '{digest_str}' on store '{name}' with encrypted chunk with same digest not allowed!");
+                }
+                // if both chunks are uncompressed and encrypted and have the same digest, but
+                // their sizes are different, one of them *must* be invalid
+                if magic == ENCRYPTED_BLOB_MAGIC_1_0 && !chunk.is_compressed() {
+                    bail!("Overwriting existing (encrypted) chunk '{digest_str}' on store '{name}' is not allowed!")
+                }
+                // we can't tell if either chunk is invalid if either or both of them are
+                // compressed, the size mismatch could be caused by different zstd versions
+                // so let's keep the one that was uploaded first, bit-rot is hopefully detected by
+                // verification at some point..
+                return Ok((true, old_size));
+            } else if old_size < encoded_size {
+                log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is smaller, discarding uploaded one.");
+                return Ok((true, old_size));
             } else {
-                // other sizes can happen in legitimate and illegitimate ways:
-                //  - illegitimate: encryped chunks and bad actor client
-                //  - legitimate: same chunk but newer zstd version (or compression level) can
-                //    compress it better (or worse) so the
-                //  Ideally we could take the actual smaller chunk so that improved zstd tech gets
-                //  leveraged, but we could only allow to do that for un-encrypted ones.
+                log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is bigger, replacing with uploaded one.");
diff --git a/pbs-datastore/src/data_blob.rs b/pbs-datastore/src/data_blob.rs
index 9c47bd45..f37c7a34 100644
--- a/pbs-datastore/src/data_blob.rs
+++ b/pbs-datastore/src/data_blob.rs
@@ -295,6 +295,12 @@ impl DataBlob {
         magic == &ENCR_COMPR_BLOB_MAGIC_1_0 || magic == &ENCRYPTED_BLOB_MAGIC_1_0
+    /// Returns if chunk is compressed
+    pub fn is_compressed(&self) -> bool {
+        let magic = self.magic();
+        magic == &ENCR_COMPR_BLOB_MAGIC_1_0 || magic == &COMPRESSED_BLOB_MAGIC_1_0
+    }
     /// Verify digest and data length for unencrypted chunks.
     /// To do that, we need to decompress data first. Please note that

More information about the pbs-devel mailing list