[pbs-devel] [PATCH proxmox-backup v2 1/3] docs: centralise and update garbage collection description

Wed Apr 10 15:38:23 CEST 2024

On 4/8/24 11:20, Fabian Grünbichler wrote:
> On April 5, 2024 3:05 pm, Hannes Duerr wrote:
>> The "backup client usage" chapter describes a grace period that is 24
>> hours and 5 minutes long, and unconnected to this a cut-off time is
>> mentioned under "maintenance tasks", which leads to confusion. Therefore
>> we summarise the entire description of garbage collection under
>> "maintenance tasks" and link to it in the "backup client usage" chapter
>>
>> Signed-off-by: Hannes Duerr <h.duerr at proxmox.com>
>> ---
>>   docs/backup-client.rst | 16 +++---------
>>   docs/maintenance.rst   | 57 ++++++++++++++++++++++++++++++------------
>>   2 files changed, 44 insertions(+), 29 deletions(-)
>>
>> diff --git a/docs/backup-client.rst b/docs/backup-client.rst
>> index 00a1abbb..d015b844 100644
>> --- a/docs/backup-client.rst
>> +++ b/docs/backup-client.rst
>> @@ -735,25 +735,15 @@ command. It is recommended to carry out garbage collection on a regular basis.
>>   
>>   The garbage collection works in two phases. In the first phase, all
>>   data blocks that are still in use are marked. In the second phase,
>> -unused data blocks are removed.
>> +unused data blocks are removed. A more detailed description of the GC
>> +can be found :ref:`here <maintenance_gc>`.
>> +
>>   
>>   .. note:: This command needs to read all existing backup index files
>>     and touches the complete chunk-store. This can take a long time
>>     depending on the number of chunks and the speed of the underlying
>>     disks.
>>   
>> -.. note:: The garbage collection will only remove chunks that haven't been used
>> -   for at least one day (exactly 24h 5m). This grace period is necessary because
>> -   chunks in use are marked by touching the chunk which updates the ``atime``
>> -   (access time) property. Filesystems are mounted with the ``relatime`` option
>> -   by default. This results in a better performance by only updating the
>> -   ``atime`` property if the last access has been at least 24 hours ago. The
>> -   downside is that touching a chunk within these 24 hours will not always
>> -   update its ``atime`` property.
>> -
>> -   Chunks in the grace period will be logged at the end of the garbage
>> -   collection task as *Pending removals*.
>> -
>>   .. code-block:: console
>>   
>>     # proxmox-backup-client garbage-collect
>> diff --git a/docs/maintenance.rst b/docs/maintenance.rst
>> index 6dbb6941..e25c8f19 100644
>> --- a/docs/maintenance.rst
>> +++ b/docs/maintenance.rst
>> @@ -171,8 +171,8 @@ It's recommended to setup a schedule to ensure that unused space is cleaned up
>>   periodically. For most setups a weekly schedule provides a good interval to
>>   start.
>>   
>> -GC Background
>> -^^^^^^^^^^^^^
>> +Overview
>> +^^^^^^^^
>>   
>>   In `Proxmox Backup`_ Server, backup data is not saved directly, but rather as
>>   chunks that are referred to by the indexes of each backup snapshot. This
>> @@ -187,26 +187,51 @@ references to the same chunks on every snapshot deletion. Moreover, locking the
>>   entire datastore is not feasible because new backups would be blocked until the deletion
>>   process was complete.
>>   
>> -Therefore, Proxmox Backup Server uses a garbage collection (GC) process to
>> +Therefore, Proxmox Backup Server uses a `tracing garbage collection
>> +<https://en.wikipedia.org/wiki/Tracing_garbage_collection>`_ algorithm to
>>   identify and remove the unused backup chunks that are no longer needed by any
>> -snapshot in the datastore. The GC process is designed to efficiently reclaim
>> +snapshot in the datastore. The GC algorithm is designed to efficiently reclaim
>>   the space occupied by these chunks with low impact on the performance of the
>>   datastore or interfering with other backups.
>>   
>> -The garbage collection (GC) process is performed per datastore and is split
>> -into two phases:
>> +The GC is performed per datastore and is split into two phases:
>>   
>> -- Phase one: Mark
>> -  All index files are read, and the access time of the referred chunk files is
>> -  updated.
>> +- Phase one - Mark:
>> +
>> +  Read all index files and update the ``atime`` (access time) of the relevant
>> +  chunk files.
> I'd replace "relevant" with "referenced" here, it is more concrete and
> matches the terminology below
>
>> +
>> +- Phase two - Sweep:
>> +
>> +  Iterate over all chunks and check the ``atime`` of the files. If
>> +  the ``atime`` is older than the cut-off time, the chunk was neither
>> +  referenced in a backup index nor is it part of a running backup that
>> +  does not yet have an index to search. As such, safely remove the chunk.
> nor was it recently created as part of a running backup task, but is not
> referenced yet by any finished index file. Such chunks can be safely
> removed since they are no longer needed.
>
> (Safely remove implies that we do some special removing that is safe ;))
>
>> +
>> +
>> +Cut-off Time
>> +^^^^^^^^^^^^
>> +
>> +The GC only clears the chunks that were last accessed before the
> s/clears/removes/
>
>> +cut-off time. The cut-off time is determined by whichever is earlier:
> is determined *at the start of the GC task*
>
> this is an important detail that helps understanding for more
> technically inclined readers
>
>> +
>> +- 24 hours and 5 minutes before the start of the garbage collection
>> +  due to the mounting of the data storage with ``relatime``, or
> "before the start of .. due to" is a bit confusing. maybe:
>
> - 24 hours before the start of the garbage collection (to
>    account for the datastore potentially being mounted with ``relatime``).
>
>> +
>> +- the start time of the oldest active backup job that has been running
>> +  for longer than 24 hours and 5 minutes at the beginning of the
>> +  garbage collection. This is necessary because the newly created
>> +  backup could refer to blocks, but the GC would not notice this as
>> +  there is no index of the backup that could be searched.
> the whole "that has been" can be dropped. the cut off is determined by
> whichever is earlier:
> - now - 24h
> - start time of oldest backup writer
*
>
> with an extra 5m of safety margin added in any case - not just the 24h
> one!
>
> - the start time of the oldest active backup job (to account for newly
>    written chunks that are not yet referenced by any finished snapshot)
>
> is a bit shorter and IMHO conveys the same information
>
>> +
>> +Chunks accessed after the cut-off time are marked as *Pending removals*
>> +by the GC as it cannot be certain whether they are still needed.
> this is rather incomplete and a bit hard to parse as well. I'd replace
> "accessed after" with "with an atime after".
>
> pending is actually:
> - chunks with atime between the cut-off and the oldest writer (if one
>    exists)
At this point i am slightly confused as we defined earlier:
the cut-off is the start of oldest backup writer* (if one exists)

Which would lead to the following:

- chunks with atime between the cut-off (which is the start of the 
oldest existing writer) and the oldest writer (if one exists)

which does not make any sense, where is my mistake ?

> - chunks with atime between the cut-off and the start of GC (if no
>    writer exists at the start)
>
> this normally means chunks of snapshots which have been recently
> forgotten/pruned. it can also mean freshly uploaded chunks of recently
> aborted backup tasks.
>
>> +
>> +.. Note:: Mounting a volume with ``relatime`` means that the ``atime``
>> +   of the chunk files is not updated every time, but only when the
>> +   data has changed or the ``atime`` was before a certain time,
>> +   which is 24 hours by default.
>>   
>> -- Phase two: Sweep
>> -  The task iterates over all chunks, checks their file access time, and if it
>> -  is older than the cutoff time (i.e., the time when GC started, plus some
>> -  headroom for safety and Linux file system behavior), the task knows that the
>> -  chunk was neither referred to in any backup index nor part of any currently
>> -  running backup that has no index to scan for. As such, the chunk can be
>> -  safely deleted.
>>   
>>   Manually Starting GC
>>   ^^^^^^^^^^^^^^^^^^^^
>> -- 
>> 2.39.2
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>