[pve-devel] [PATCH storage] zfspoolplugin: check if mounted instead of imported

Fri Feb 19 09:39:16 CET 2021

On 18.02.21 17:36, Stoiko Ivanov wrote:
> This patch addresses an issue we recently saw on a production machine:
> * after booting a ZFS pool failed to get imported (due to an empty
>   /etc/zfs/zpool.cache)
> * pvestatd/guest-startall eventually tried to import the pool
> * the pool was imported, yet the datasets of the pool remained
>   not mounted
> 
> A bit of debugging showed that `zpool import <poolname>` is not
> atomic, in fact it does fork+exec `mount` with appropriate parameters.
> If an import ran longer than the hardcoded timeout of 15s, it could
> happen that the pool got imported, but the zpool command (and its
> forks) got terminated due to timing out.
> 
> reproducing this is straight-forward by setting (drastic) bw+iops
> limits on a guests' disk (which contains a zpool) - e.g.:
> `qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\
> iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\
> mbps_wr_max=15`
> afterwards running `timeout 15 zpool import <poolname>` resulted in
> that situation in the guest on my machine
> 
> The patch changes the check in activate_storage for the ZFSPoolPlugin,
> to check the 'mounted' and 'canmount' properties of the configured
> 'pool' (which can also be a sub-dataset):
> * if the zfs get fails, we try to import the pool (and run `zfs
>   mount -a` if the dataset is not mounted after importing)
> * if canmount != 'yes' we assume everything is ok
> * if it reports the dataset as not mounted, we run `zfs mount -a`
> 
> The choice of running `zfs mount -a` has potential for regression, in
> case someone manually umounts some dataset after booting the system
> (but does not set the canmount option).
> 
> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> ---
> rfc -> v1:
> * incorporated Thomas' feedback (Thanks):
> ** we now check both properties mounted,canmount and only run mount -a
>    if the dataset is not mounted, but should be by config
> ** this prevents the code from constantly running zfs mount -a in case
>    the system has canmount=off on the root-dataset
> ** however in that case, if the (child) datasets were not mounted due to
>    a timeout, they will not get mounted (as before)
> 
>  PVE/Storage/ZFSPoolPlugin.pm | 38 ++++++++++++++++++++++++++++--------
>  1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm
> index 105d802..3d25e6a 100644
> --- a/PVE/Storage/ZFSPoolPlugin.pm
> +++ b/PVE/Storage/ZFSPoolPlugin.pm
> @@ -525,27 +525,49 @@ sub activate_storage {
>      my ($class, $storeid, $scfg, $cache) = @_;
>  
>      # Note: $scfg->{pool} can include dataset <pool>/<dataset>
> -    my $pool = $scfg->{pool};
> -    $pool =~ s!/.*$!!;
> +    my $dataset = $scfg->{pool};
> +    my $pool = ($dataset =~ s!/.*$!!r);
>  
> -    my $pool_imported = sub {
> -	my @param = ('-o', 'name', '-H', "$pool");
> -	my $res = eval { $class->zfs_request($scfg, undef, 'zpool_list', @param) };
> +    my $dataset_mounted = sub {
> +	my ($mounted, $canmount) = eval {
> +	    $class->zfs_get_properties($scfg, 'mounted,canmount', $dataset)
> +	};
>  	if ($@) {
>  	    warn "$@\n";
>  	    return undef;
>  	}
> +	#canmount is off|noauto, zfs get worked - assume pool is correctly imported

but didn't we learn that imported does not mean that it is necessarily mounted???
Shouldn't a method named "dataset_mounted" observer that difference?

> +	return 'yes' if $canmount =~ /^(?:off|noauto)$/;
> +	return $mounted;

why not return 1 and 0, or 1 and undef and just stick to if ($mounted) below?

For such a simple thing this seems doing it a bit complicated, IMO

> +    };
> +
> +    my $mount_all = sub {
> +	eval { $class->zfs_request($scfg, undef, 'mount', '-a') };
> +	die "could not activate storage '$storeid', $@\n" if $@;
> +    };
> +
> +    my $pool_imported = sub {
> +	my @param = ('-o', 'name', '-H', "$pool");
> +	my $res = eval { $class->zfs_request($scfg, undef, 'zpool_list', @param) };
> +	warn "$@\n" if $@;
> +

that pool_imported refactoring is unrelated and should be in its own patch

>  	return defined($res) && $res =~ m/$pool/;
>      };
>  
> -    if (!$pool_imported->()) {
> +    my $mounted = $dataset_mounted->();
> +    if (!$mounted) {
> +	# error from zfs get mounted - pool import necessary
>  	# import can only be done if not yet imported!
> -	my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', "$pool");
> +	my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', $pool);

unrelated to patch, so should be in own

>  	eval { $class->zfs_request($scfg, undef, 'zpool_import', @param) };
>  	if (my $err = $@) {
>  	    # just could've raced with another import, so recheck if it is imported
> -	    die "could not activate storage '$storeid', $@\n" if !$pool_imported->();
> +	    die "could not activate storage '$storeid', $err\n" if !$pool_imported->();

forgot to mention in v1, but that should be done in an preparatory patch

You can bundle a few cleanups together, but cleanups and the actual meat of a rather
important patch fixing an issue we had off and one for years should be separated.

>  	}
> +	$mount_all->() if !$dataset_mounted->();
> +    } elsif ($mounted =~ /^no$/) {

if we go for this, then use `$mounted eq 'no'`

> +	# best effort mount everything
> +	$mount_all->();
>      }
>      return 1;
>  }
>