[PVE-User] pbs incremental backups
lists at merit.unu.edu
Wed Mar 3 08:48:20 CET 2021
Hi Arjen, Andreas and specially Fabian for your elaborate reply,
Thanks! It all makes much more sense now.
As said before, PBS is a great addition to the proxmox line of products.
On 3/2/21 11:48 AM, Fabian Grünbichler wrote:
> On March 2, 2021 11:22 am, mj wrote:
>> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very
>> quick, very cool. :-)
>> We have a question. Something we wonder about.
>> In our current backup software, we make weekly full_system backups, and
>> daily incremental_system backups, each incremental based on the same
>> full_system backup. So: each daily incremental backup becomes bigger,
>> until the weekend. Then we make a new full_system backup to base the
>> next set of incrementals on.
>> In PBS I cannot specify if a backup is full or incremental, we assume
>> this means that automatically the first backup is a full_system backup,
>> and subsequent backups are incremental. The PBS backup logs confirm this
>> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:
>> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>> And now the question: At what point in time is a new full_system backup
>> created, to rebase incremental backups on?
>> Or is each incremental backup based on the previous incremental? And if
>> that is the case, how will we ever be able to delete one of the
>> in-between incrementals, because that would then break to whole chain of
>> We have read the page
>> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does
>> not seem to answer this.
>> Anyone care to share some insight on this logic and how PBS works?
> you might want to take a look at
> but, the short summary:
> PBS does not do full or incremental backups in the classical sense, it
> uses a chunk-based deduplicated approach. the backup content is split
> into chunks, those chunks are then hashed to get a chunk ID. the
> incremental part happens on different levels:
> - if the backup is of a VM that has been backed up before, and that
> previous backup still exists on the server, and the VM has not been
> stopped in the meantime, only chunks which contain changed blocks
> (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,
> the rest is re-used. (FAST incremental)
> - for all backups, if a previous backup exists, it's index is
> downloaded, all local data is read and hashed, but only chunks which
> are missing on the server are actually uploaded (incremental)
> - if no previous backups exists, all local data is read and hashed and
> uploaded ("full" backup)
> additionally, an uploaded chunk might still exist on the server (e.g.,
> from backups in another backup group), in which case the server will
> still re-use the existing chunk.
> so, fast incremental does the least work, incremental does full reading
> and hashing but less uploading, and we try to avoid unnecessary writes
> on the server side in all cases. all of the above is also true when you
> add in encryption, although obviously changing encryption mode or keys
> will invalidate previous backups for purposes of reusing chunks (and
> thus lead to a single full backup even if previous ones exist).
> on the server side, a backup snapshots does not consist of a base and a
> series of incremental diffs, it always references all the chunks that
> represent this full snapshot. the magic is in the chunking and
> deduplication, which allows us to store all those snapshots efficiently.
> ALL snapshots are equivalent, whether it was the first one or not has no
> bearing on how the snapshot or its referenced data is stored, just on
> how much work it was to create and transfer it in the first place.
> pve-user mailing list
> pve-user at lists.proxmox.com
More information about the pve-user