[pve-devel] [PATCH storage] Use bigger timeouts for zfs operations

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Oct 10 10:29:46 CEST 2019


On October 10, 2019 8:55 am, Fabian Ebner wrote:
> On 10/1/19 12:28 PM, Fabian Grünbichler wrote:
>> On October 1, 2019 12:17 pm, Fabian Ebner wrote:
>>> Seems like 'zfs destroy' can take longer than 5 seconds, see [0].
>>> I changed the timeout to 15 seconds and also changed the default
>>> timeout to 10 instead of 5 seconds, to be on the safe side
>>> for other commands like 'zfs create'.
>>>
>>> [0]: https://forum.proxmox.com/threads/timeout-beim-l%C3%B6schen-des-entfernen-replikats-bei-entfernung-der-replikation.58467/
>> NAK, we have a 30s timeout for synchronous API requests that call this,
>> and they might do more than 2 zfs_requests. we'd rather timeout and have
>> time to do error handling, than finish successfully sometimes, and die
>> without cleanup other times.
>>
>> the real solution for this is to convert the remaining synchronous API
>> calls that trigger storage operations to async ones, and switch the GUI
>> over to use the new variants. we had previous discussions about this
>> issue already. to implement it really nice, we'd need to things:
>> - tasks with structured return value (to implement async content listing
>>    and similar operations, and make conversion of sync to async API calls
>>    easier)
>> - light-weight / ephemeral tasks (not included in regular task
>>    lists/log, cleaned up after final poll / 1 day after finishing / ...)
>>
>> in case of the mentioned report, please investigate whether this call in
>> 'pvesr' context is wrongly treated as sync API call (i.e., is_worker
>> should maybe return true for pvesr calls?)
> I looked at setting the worker flag for the (whole) pvesr environment, 
> but it seems a
> bit much. E.g. in prepare we need to delete stale snapshots before 
> starting the replication
> and we probably don't want to wait all too long for that to complete, 
> i.e. not
> have the worker flag when calling volume_snapshot_delete.
> 
> The situation is different with the vdisk_free call, since there we 
> don't need to
> sit and wait for the result, so there it would make sense to be treated 
> as a worker.
> So we could set the worker flag locally before that call and use our own 
> timeout,
> but it feels like a bad workaround. Also we can't use run_with_timeout 
> to set our own
> timeout, since run_command is called inside vdisk_free, which resets the 
> alarm.

but we could use run_fork_with_timeout ;)

> Since we have a mechanism to spawn a worker in vdisk_free already 
> (currently only
> used by LVM with saferemove), wouldn't it make sense to use that for zfs 
> as well? So
> having free_image in the zfs plugin return a subroutine instead of 
> executing zfs destroy
> directly. Also other callers of vdisk_free should be fine with such a 
> change, since it already
> can happen that vdisk_free spawns a worker (but I haven't looked in detail).

but that is the non-default behaviour, not the default one. I'd rather 
not switch all ZFS vdisk removals to forking a task if there are other 
solutions.

AFAICT, this vdisk_free happens only for previously, but no-longer 
replicated volumes? a simple solution might be to re-factor 139ff of 
pvesr.pm to fork a worker if any vdisks need to be removed, and remove 
them all asynchronously in a single task?

OTOH, if vdisk_free runs into this problem, it's only a question of load 
for other zfs_requests (like listing volumes, snapshots, creating 
snapshots, removing snapshots) to also run into the low timeouts.. so 
the big question still remains - do we want to increase those timeouts 
in general for pvesr, and if we do, how do we do it?

> It would introduce a bit of noise since it would create a task on every 
> vdisk_free for a zfs volume.
> There the light-weight tasks you mentioned would be ideal, but for now 
> we don't have those.
> And maybe the timeout is hit very rarely and so it's not worth the 
> change, what do you think?
> 
>>> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
>>> ---
>>>   PVE/Storage/ZFSPoolPlugin.pm | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
>>> index f66b277..3ce06be 100644
>>> --- a/PVE/Storage/ZFSPoolPlugin.pm
>>> +++ b/PVE/Storage/ZFSPoolPlugin.pm
>>> @@ -182,7 +182,7 @@ sub zfs_request {
>>>       my $msg = '';
>>>       my $output = sub { $msg .= "$_[0]\n" };
>>>   
>>> -    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 5 if !$timeout;
>>> +    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 10 if !$timeout;
>>>   
>>>       run_command($cmd, errmsg => "zfs error", outfunc => $output, timeout => $timeout);
>>>   
>>> @@ -346,7 +346,7 @@ sub zfs_delete_zvol {
>>>   
>>>       for (my $i = 0; $i < 6; $i++) {
>>>   
>>> -	eval { $class->zfs_request($scfg, undef, 'destroy', '-r', "$scfg->{pool}/$zvol"); };
>>> +	eval { $class->zfs_request($scfg, 15, 'destroy', '-r', "$scfg->{pool}/$zvol"); };
>>>   	if ($err = $@) {
>>>   	    if ($err =~ m/^zfs error:(.*): dataset is busy.*/) {
>>>   		sleep(1);
>>> -- 
>>> 2.20.1
>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel at pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>
>>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
> 
> 




More information about the pve-devel mailing list