[pve-devel] [RFC storage] zfspoolplugin: check if mounted instead of imported

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Feb 18 14:34:38 CET 2021


On 18.02.21 14:15, Stoiko Ivanov wrote:
> Thanks for looking at the patch!
> 
> On Thu, 18 Feb 2021 13:20:55 +0100
> Thomas Lamprecht <t.lamprecht at proxmox.com> wrote:
> 
>> On 18.02.21 12:33, Stoiko Ivanov wrote:
>>> This patch addresses an issue we recently saw on a production machine:
>>> * after booting a ZFS pool failed to get imported (due to an empty
>>>   /etc/zfs/zpool.cache)
>>> * pvestatd/guest-startall eventually tried to import the pool
>>> * the pool was imported, yet the datasets of the pool remained
>>>   not mounted
>>>
>>> A bit of debugging showed that `zpool import <poolname>` is not
>>> atomic, in fact it does fork+exec `mount` with appropriate parameters.
>>> If an import ran longer than the hardcoded timeout of 15s, it could
>>> happen that the pool got imported, but the zpool command (and its
>>> forks) got terminated due to timing out.
>>>
>>> reproducing this is straight-forward by setting (drastic) bw+iops
>>> limits on a guests' disk (which contains a zpool) - e.g.:
>>> `qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\
>>> iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,mbps_wr_max=15`
>>> afterwards running `timeout 15 zpool import <poolname>` resulted in
>>> that situation in the guest on my machine
>>>
>>> The patch changes the check in activate_storage for the ZFSPoolPlugin,
>>> to check if the configured 'pool' (which can also be a sub-dataset) is
>>> mounted (via `zfs get mounted <dataset>`).
>>>
>>> * In case this errors out we try to import the pool (and run `zfs
>>>   mount -a` if the dataset is not mounted after importing)
>>> * In case it succeeds but reports the dataset as not mounted,
>>>   we run `zfs mount -a`
>>>
>>> The choice of running `zfs mount -a` has potential for regression, in
>>> case someone manually umounts some dataset after booting the system
>>> (but does not set the canmount option)  
>>
>> Why not
>> # zfs mount <dataset>
>> ?
> That would only mount the dataset and not anything beneath it, which
> is different from what `zpool import` does (without the -N option).
> AFAICT there is no way to recursively mount everything below a dataset
> (short of manually walking through the tree)
> 

seems like an actual valuable feature ZFS could/should gain someday (IMO).

> While we mitigated those issues with container-datasets recently (by
> explicitly mounting them before starting the container) - this does not
> hold true for manually created datasets (e.g. for setting different
> properties for certain guests, or for storing backups....)
> 
> (in case of the original reproducer, there was a PBS-datastore defined on
> a sub-dataset)

OK, then let's keep going for `zfs mount -a` now.

>>
>>>
>>> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
>>> ---
>>> * sending as RFC since I have the feeling that I miss quite a few corner-cases.
>>> * an alternative to getting the mounted property via `zfs get` might be
>>>   parsing /proc/mounts (but zfs get mounted does mostly just that +stating
>>>   files in /sys, and 2 (potentially blocking) ioctls
>>> * could reproduce the issue with ZFS 2.0.1 and 0.8.6 - so my current guess
>>>   is the issue on the production box might be related to some other
>>>   performance regression in its disk-subsystem (maybe introduced with the
>>>   new kernel)
>>> * huge thanks to Dominik for many suggestions (including the idea of
>>>   limiting the disk-speed via our stack instead of dm-delay :)
>>>
>>>  PVE/Storage/ZFSPoolPlugin.pm | 25 +++++++++++++++++++++----
>>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
>>> index 105d802..32ad1c9 100644
>>> --- a/PVE/Storage/ZFSPoolPlugin.pm
>>> +++ b/PVE/Storage/ZFSPoolPlugin.pm
>>> @@ -525,8 +525,17 @@ sub activate_storage {
>>>      my ($class, $storeid, $scfg, $cache) = @_;
>>>  
>>>      # Note: $scfg->{pool} can include dataset <pool>/<dataset>
>>> -    my $pool = $scfg->{pool};
>>> -    $pool =~ s!/.*$!!;
>>> +    my $dataset = $scfg->{pool};
>>> +    my $pool = ($dataset =~ s!/.*$!!r);
>>> +
>>> +    my $dataset_mounted = sub {
>>> +	my $mounted = eval { $class->zfs_get_properties($scfg, 'mounted', "$dataset") };  
>>
>> quotes don't do anything here, just gets you a "undefined value in string ..." warning
>> if undefined (and underlying run_command does not change behavior either).
>>
>> I.e.,
>>
>> # perl -we 'use PVE::Tools qw(run_command); my $arg; run_command(["touch", "$arg"])'
>> Use of uninitialized value $arg in string at -e line 1.
>> touch: cannot touch '': No such file or directory
>> command 'touch ''' failed: exit code 1
>>
>> vs.
>>
>> # perl -we 'use PVE::Tools qw(run_command); my $arg; run_command(["touch", $arg])'
>> Use of uninitialized value $_[1] in exec at /usr/share/perl/5.28/IPC/Open3.pm line 178.
>> touch: cannot touch '': No such file or directory
>> command 'touch ''' failed: exit code 1
>>
>> I like to avoid using quotes in such cases to avoid suggesting that they have
>> any use.
> sorry - that was a bit shoddy copying - will fix that in a v2
> 
>>
>>> +	if ($@) {
>>> +	    warn "$@\n";
>>> +	    return undef;
>>> +	}  
>>
>> I know this is just copied, but both should just use a simple
>> warn "$@\n" if ($@);
>>
>> as the return is correctly using a defined check anyway
> sounds good
> 
>>
>>
>>> +	return defined($mounted) && $mounted =~ m/^yes$/;
>>> +    };
>>>  
>>>      my $pool_imported = sub {
>>>  	my @param = ('-o', 'name', '-H', "$pool");
>>> @@ -538,13 +547,21 @@ sub activate_storage {
>>>  	return defined($res) && $res =~ m/$pool/;
>>>      };
>>>  
>>> -    if (!$pool_imported->()) {
>>> +    if (!defined($dataset_mounted->())) {  
>>
>> don't complicated boolean, drop the defined.
> in that case the return value of $dataset_mounted->() is ternary:

Then the more NAK'd for my taste, that method being ternary is IMO
not nice and sensible.

Please, make it boolean (in the perverse sense perl defines bool)

> * undefined - error during `zfs get` -> we most likely need to import the
>   pool
> * false - pool is imported, but dataset is not mounted -> we need to mount
>   only (the elsif branch)

But if we get false here, we go straight to `return 1;`, i mean we go directly
to that return 1 in any case the return value from "is mounted" is defined??

Or am I overlooking something?

> * true - everything as it should be
> 
> coming to think about it:
> this would be problematic, if the storage has a 'pool', which points to a 
> dataset configured with 'canmount=off' - see the description of 'canmount'
> in zfsprops(8) for when this might make sense):
> with the current check (if the pool is imported) this would work fine, with
> the patch in place this would result in each activate_storage() to call zfs
> mount -a. 
> My expectation is that this does not happen in the wild (manually setting
> 'canmount' to off for the root-dataset of a storage) - but it is
> technically possible - but maybe I'm missing the use-case for this?

I mean the use case is the same as for a non-root dataset, having it for
property inheriting only, surely niche though.

If you really want to care for that you could also get that property?
But I'd rather ignore it for now..

> 
>>
>>> +	# error from zfs get mounted - pool import necessary
>>>  	# import can only be done if not yet imported!
>>>  	my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', "$pool");
>>>  	eval { $class->zfs_request($scfg, undef, 'zpool_import', @param) };
>>>  	if (my $err = $@) {
>>>  	    # just could've raced with another import, so recheck if it is imported
>>> -	    die "could not activate storage '$storeid', $@\n" if !$pool_imported->();
>>> +	    die "could not activate storage '$storeid', $err\n" if !$pool_imported->();
>>> +	}
>>> +	if (!$dataset_mounted->()) {
>>> +	    eval { $class->zfs_request($scfg, undef, 'mount', '-a') };
>>> +	    if (my $err = $@) {
>>> +		# just could've raced with another import, so recheck if it is imported  
>>
>> but you do not (re-)check anything?
> will refactor that a bit in the v2
> 
>>
>>> +		die "could not activate storage '$storeid', $err\n";
>>> +	    }
>>>  	}
>>>      }
>>>      return 1;
>>>   
>>
> 





More information about the pve-devel mailing list