[pbs-devel] Some problems with fullstorage and cleanup

Mon Aug 31 15:10:57 CEST 2020

On August 31, 2020 2:32 pm, Harald Leithner wrote:
> 
> 
> On 2020-08-31 13:39, Fabian Grünbichler wrote:
>> On August 31, 2020 1:14 pm, Harald Leithner wrote:
>>> Hi,
>>>
>>> my test stroage run out of diskspace.
>>>
>>> This happens at Version 0.8-11
>>>
>>> I tried to forget the olders snapshots but this doesn't change the disk
>>> usage. After this I started a manual GC (I assume it's garbage
>>> collection). It failed after phase one with an error message. I didn't
>>> copied the error message... (I miss the log panel in the gui).
>>>
>>> I also configured GC and Prune Schedules at the same time and told the
>>> vm that backups to the pbs to keep only 20 copies.
>>>
>>> After a while I came back to the gui and has only 3 snapshots left and
>>> one backup in progress (Thats maybe correct because of 1 yearly 1
>>> monthly 1 daily and 2 last).
>>> The Statistics Tab still says 100% usage and "zfs list" lists the same
>>> usage.
>>>
>>> Starting now a manual GC ends in the error message:
>>>
>>> unable to parse active worker status
>>> 'UPID:backup-erlach3-test:00000400:0000036A:00000000:5F4CD753:termproxy::root:'
>>> - not a valid user id
>> 
>> this was a known issue that should be fixed on upgrade to 0.8.11-1. can 
>> you run 'grep termproxy /var/log/apt/term.log' on the PBS server?
>> 
> 
> the only entry is "Fixing up termproxy user id in task log..."
> btw. my first version was 0.8.9-1 not 0.8.11 I upgraded later to this
> version
> 
>> you can fixup the task index by running the sed command from 
>> 
>> /var/lib/dpkg/info/proxmox-backup-server.postinst
>> 
>> which replaces the invalid user 'root' with the correct 'root at pam'
>> 
> 
> ok after running the sed command manually the GC works again.
> 
> but is complains about no diskspace:
> 
> 2020-08-31T14:29:16+02:00: WARN: warning: unable to access chunk
> 135e565dc79f80d3a9980688bfe161409bf229fb4d11ab7290b5b1e58b27bc63,
> required by "/test3/vm/3011/2020-08-30T22:00:02Z/drive-scsi1.img.fidx" -
> update atime failed for chunk
> "/test3/.chunks/135e/135e565dc79f80d3a9980688bfe161409bf229fb4d11ab7290b5b1e58b27bc63"
> - ENOSPC: No space left on device

if you don't have enough space to touch a chunk, that is rather bad.. 
you can attempt to free up some more space by deleting backup metadata 
of snapshots you no longer needed, either by 'rm'-ing the directory that 
represents them, or by using 'forget' on the GUI if that works..

what does 'df -m /test3' report? and/or the equivalent command for 
whatever storage the datastore is on (e.g., zfs list -o space 
path/to/dataset).