[pve-devel] [PATCH common 1/2] JSONSchema: add format validator support and cleanup check_format

Tue Apr 21 17:07:09 CEST 2020

sorry, took a bit longer than I wanted, but here it is. thanks for 
looking into this! :) at first glance it seemed like it's more 
straight-forward than I expected - but as always with JSONSchema.pm, 
some rabbit-hole chasing later I am not quite so sure anymore. I hope my 
walls of text below are not too confusing ;)

one sentence summary:
to make this workable with the current approach, we need to switch all 
calls of check_format and parse_property_string that use a registered 
format to use the format's ID instead of the format description hash.

On April 16, 2020 4:39 pm, Stefan Reiter wrote:
> Adds a third, optional parameter to register_format that allows specifying
> a function that will be called after parsing and can validate the parsed
> data. A validator should die on failed validation, and can also change the
> parsed object by returning a modified version of it.
> 
> This is useful so one can register a format with its hash, thus allowing
> documentation to be generated automatically, while still enforcing certain
> validation rules.
> 
> And since I found it impossible to extend the current check_format code,
> clean that function up as well (which pretty much amounts to a rewrite).
> Also use get_format consistently to avoid future breakages.
> 
> No existing functionality is intentionally changed.
> 
> Signed-off-by: Stefan Reiter <s.reiter at proxmox.com>
> ---
> 
> @Fabian G.: Since we discussed this off-list, is this good? At least there
> shouldn't be a possibility for a parser/hash format drift the way I've
> implemented it.
> 
>  src/PVE/JSONSchema.pm | 60 +++++++++++++++++++++++++++----------------
>  1 file changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
> index 01a3cce..106a812 100644
> --- a/src/PVE/JSONSchema.pm
> +++ b/src/PVE/JSONSchema.pm
> @@ -120,19 +120,26 @@ register_standard_option('pve-snapshot-name', {
>  });
>  
>  my $format_list = {};
> +my $format_validators = {};
>  
>  sub register_format {
> -    my ($format, $code) = @_;
> +    my ($name, $format, $validator) = @_;
>  
> -    die "JSON schema format '$format' already registered\n"
> -	if $format_list->{$format};
> +    die "JSON schema format '$name' already registered\n"
> +	if $format_list->{$name};
>  
> -    $format_list->{$format} = $code;
> +    if ($validator) {
> +	die "A \$validator function can only be specified for hash-based formats\n"
> +	    if ref($format) ne 'HASH';
> +	$format_validators->{$name} = $validator;
> +    }
> +
> +    $format_list->{$name} = $format;
>  }
>  
>  sub get_format {
> -    my ($format) = @_;
> -    return $format_list->{$format};
> +    my ($name) = @_;
> +    return $format_list->{$name};
>  }
>  
>  my $renderer_hash = {};
> @@ -646,13 +653,16 @@ sub pve_verify_tfa_secret {
>  sub check_format {
>      my ($format, $value, $path) = @_;
>  
> -    return parse_property_string($format, $value, $path) if ref($format) eq 'HASH';

>      return if $format eq 'regex';
>  
> -    if ($format =~ m/^(.*)-a?list$/) {
> +    my $parsed;

>  
> -	my $code = $format_list->{$1};
> +    if (ref($format) eq 'HASH') {
> +	# in case it's a hash we can't have a validator registered, so return

it took me several tries to full get this comment and the 
implications. if we pass a format hash to check_format, then 
check_format can't know whether there is an associated validator 
since it doesn't know the name. so in effect this means we should 
avoid calling check_format with a hash directly, when that hash is 
also registered as format with an ID. this should probably be 
documented somewhere, as it also extends to parse_property_string 
(see comments there), and needs careful analysis of the code paths 
ending up in check_format so that no caller already resolves format ID 
to HASH prematurely.

I think we could switch this around and start check_format with:

if (ref($format) eq 'HASH') {
  # don't know whether a validator exists
  return parse_property_string($format, ...);
}

if (ref($format) eq 'CODE') {
  # we are the (sole, old-style) validator
  return $format->(...);
}

return if $format eq 'regex';
# AFAICT regex is not registered as actual format, so this should not be able to clash with the rest

my ($name, $registered_format, $parsed);

big existing if, but adapted to set those three variables as 
appropriate

if (defined($name) && ref($registered_format) eq 'HASH') {
  my $validator = $format_validators->{$name};
  $parsed = $validator->($parsed) if $validator;
  return $parsed;
}

if we want to support -opt and -a?list with hash based formats that's 
fine as well (and I guess it's not unlikely we'll run into that if 
we start converting some existing parse_/verify_foo into hash + 
validator). those branches then just need to check whether 
$registered_format is a CODE-ref (existing behaviour) or not (it's a 
HASH, so parse it accordingly). at that point we could probably combine 
the -opt and else branches, since they just differ in whether the 
empty value is allowed.

for -a?list it is indeed tricky what the semantics of the validator 
should be.. does it get called on a list of verified values? what 
should the return value be? a list? or is the validator responsible 
for converting it back to the string representation before returning?

note: I only found a single instance of check_format being called 
with the return value checked (in QemuServer.pm, in a format 
validator sub ;), so that can be safely dropped to just return the 
original verified value if check_format didn't die).

if PMG does not use that functionality either, we could also just say 
we drop check_format returning anything, and it's just die on error 
(either parse_property_string, or direct validator sub, or 
post-parse_property_string extra validator, or undefined format) or 
return on success? some comments below become irrelevant if we go 
down that route..

that would make -a?list implementation trivial (validator gets the 
full verified-according-to-format-hash list, and can then do both checks 
on the list as a whole and each element depending on use case.

> +	return parse_property_string($format, $value, $path);
>  
> +    } elsif ($format =~ m/^(.*)-a?list$/) {
> +	my $code = get_format($1);
>  	die "undefined format '$format'\n" if !$code;
>  
>  	# Note: we allow empty lists
> @@ -660,25 +670,31 @@ sub check_format {
>  	    &$code($v);

nit: this should be switched to $code->($v) as well

>  	}
>  
> -    } elsif ($format =~ m/^(.*)-opt$/) {
> -
> -	my $code = $format_list->{$1};
> +	# since the list might contain multiple values, we don't allow running
> +	# validator functions either
> +	return undef;

but not because of multiple values, but because these lists here 
already only work with format=validator sub, and not format=hash ? 
see above as well..

>  
> +    } elsif ($format =~ m/^(.*)-opt$/) {
> +	my $code = get_format($1);
>  	die "undefined format '$format'\n" if !$code;
>  
> -	return if !$value; # allow empty string
> -
> - 	&$code($value);
> +	# empty string is allowed
> +	$parsed = $code->($value) if $value;

so this is a bit tricky - the &$code($value) was previously the last 
statement of this branch, with no code after the conditional, 
meaning that its return value was returned by check_format.

also, since we can't have a validator for a format that is a 
validator sub itself, shouldn't this be just

return if !$value;
return $code->{$value);

for now?

>  
>     } else {
> +	my $registered_format = get_format($format);
> +	die "undefined format '$format'\n" if !$registered_format;
>  
> -	my $code = $format_list->{$format};
> -
> -	die "undefined format '$format'\n" if !$code;
> -
> -	return parse_property_string($code, $value, $path) if ref($code) eq 'HASH';
> -	&$code($value);
> +	if (ref($registered_format) eq 'HASH') {
> +	    $parsed = parse_property_string($registered_format, $value, $path);
> +	} else {
> +	    $parsed = $registered_format->($value);

same as above for -opt applies here as well (we can't have a 
validator, so we might as well make this explicit here and return 
early.

> +	}
>      }
> +
> +    my $validator = $format_validators->{$format};
> +    $parsed = $validator->($parsed) if $validator;
> +    return $parsed;
>  }
>  
>  sub parse_size {
> @@ -735,7 +751,7 @@ sub parse_property_string {
>  
>      # Support named formats here, too:
>      if (!ref($format)) {
> -	if (my $desc = $format_list->{$format}) {
> +	if (my $desc = get_format($format)) {
>  	    $format = $desc;

parse_property_string called directly with a format ID is rarely used 
(only found two hits in PVE code), but this was already missing a 
check for $desc being a hash ref, and is now missing support for 
calling the validator before returning at the end of the sub..

it would be quite unexpected that check_format('foo', 'bar', 
'/path') and parse_property_string('foo', 'bar', '/path') where 'foo' 
is a hash-ref format with new-fangled validator function behave 
differently. funnily enough, parse_property_string does then 
transitively call check_format for the individual properties, so I 
guess at least that stays consistent ;)

note that a format that has a validator NEEDs to call 
parse_property_string with the format name, not the format hash, 
otherwise the validator does not take effect. not a very nice 
interface, especially given that that variant is basically unused :-/

maybe we do need to back to the drawing board? or do we take this one as 
a tree-wide, post 6.2 release overhaul of our JSONSchema declaration? go 
through all the registered formats, convert those that are currently 
using a CODE-ref definition into a HASH + validator, and convert all 
call sites of the corresponding parse_property_string to use the format 
ID instead of the format HASH.

print_property_string also assumes that any named format is a 
hash-based format btw.

>  	} else {
>  	    die "unknown format: $format\n";
> -- 
> 2.26.1
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>