[pve-devel] [PATCH storage] Use bigger timeouts for zfs operations

Fabian Ebner f.ebner at proxmox.com
Thu Oct 10 08:55:26 CEST 2019


On 10/1/19 12:28 PM, Fabian Grünbichler wrote:
> On October 1, 2019 12:17 pm, Fabian Ebner wrote:
>> Seems like 'zfs destroy' can take longer than 5 seconds, see [0].
>> I changed the timeout to 15 seconds and also changed the default
>> timeout to 10 instead of 5 seconds, to be on the safe side
>> for other commands like 'zfs create'.
>>
>> [0]: https://forum.proxmox.com/threads/timeout-beim-l%C3%B6schen-des-entfernen-replikats-bei-entfernung-der-replikation.58467/
> NAK, we have a 30s timeout for synchronous API requests that call this,
> and they might do more than 2 zfs_requests. we'd rather timeout and have
> time to do error handling, than finish successfully sometimes, and die
> without cleanup other times.
>
> the real solution for this is to convert the remaining synchronous API
> calls that trigger storage operations to async ones, and switch the GUI
> over to use the new variants. we had previous discussions about this
> issue already. to implement it really nice, we'd need to things:
> - tasks with structured return value (to implement async content listing
>    and similar operations, and make conversion of sync to async API calls
>    easier)
> - light-weight / ephemeral tasks (not included in regular task
>    lists/log, cleaned up after final poll / 1 day after finishing / ...)
>
> in case of the mentioned report, please investigate whether this call in
> 'pvesr' context is wrongly treated as sync API call (i.e., is_worker
> should maybe return true for pvesr calls?)
I looked at setting the worker flag for the (whole) pvesr environment, 
but it seems a
bit much. E.g. in prepare we need to delete stale snapshots before 
starting the replication
and we probably don't want to wait all too long for that to complete, 
i.e. not
have the worker flag when calling volume_snapshot_delete.

The situation is different with the vdisk_free call, since there we 
don't need to
sit and wait for the result, so there it would make sense to be treated 
as a worker.
So we could set the worker flag locally before that call and use our own 
timeout,
but it feels like a bad workaround. Also we can't use run_with_timeout 
to set our own
timeout, since run_command is called inside vdisk_free, which resets the 
alarm.

Since we have a mechanism to spawn a worker in vdisk_free already 
(currently only
used by LVM with saferemove), wouldn't it make sense to use that for zfs 
as well? So
having free_image in the zfs plugin return a subroutine instead of 
executing zfs destroy
directly. Also other callers of vdisk_free should be fine with such a 
change, since it already
can happen that vdisk_free spawns a worker (but I haven't looked in detail).

It would introduce a bit of noise since it would create a task on every 
vdisk_free for a zfs volume.
There the light-weight tasks you mentioned would be ideal, but for now 
we don't have those.
And maybe the timeout is hit very rarely and so it's not worth the 
change, what do you think?

>> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
>> ---
>>   PVE/Storage/ZFSPoolPlugin.pm | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
>> index f66b277..3ce06be 100644
>> --- a/PVE/Storage/ZFSPoolPlugin.pm
>> +++ b/PVE/Storage/ZFSPoolPlugin.pm
>> @@ -182,7 +182,7 @@ sub zfs_request {
>>       my $msg = '';
>>       my $output = sub { $msg .= "$_[0]\n" };
>>   
>> -    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 5 if !$timeout;
>> +    $timeout = PVE::RPCEnvironment->is_worker() ? 60*60 : 10 if !$timeout;
>>   
>>       run_command($cmd, errmsg => "zfs error", outfunc => $output, timeout => $timeout);
>>   
>> @@ -346,7 +346,7 @@ sub zfs_delete_zvol {
>>   
>>       for (my $i = 0; $i < 6; $i++) {
>>   
>> -	eval { $class->zfs_request($scfg, undef, 'destroy', '-r', "$scfg->{pool}/$zvol"); };
>> +	eval { $class->zfs_request($scfg, 15, 'destroy', '-r', "$scfg->{pool}/$zvol"); };
>>   	if ($err = $@) {
>>   	    if ($err =~ m/^zfs error:(.*): dataset is busy.*/) {
>>   		sleep(1);
>> -- 
>> 2.20.1
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>





More information about the pve-devel mailing list