[pve-devel] [PATCH common 1/2] JSONSchema: add format validator support and cleanup check_format
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Apr 21 17:07:09 CEST 2020
sorry, took a bit longer than I wanted, but here it is. thanks for
looking into this! :) at first glance it seemed like it's more
straight-forward than I expected - but as always with JSONSchema.pm,
some rabbit-hole chasing later I am not quite so sure anymore. I hope my
walls of text below are not too confusing ;)
one sentence summary:
to make this workable with the current approach, we need to switch all
calls of check_format and parse_property_string that use a registered
format to use the format's ID instead of the format description hash.
On April 16, 2020 4:39 pm, Stefan Reiter wrote:
> Adds a third, optional parameter to register_format that allows specifying
> a function that will be called after parsing and can validate the parsed
> data. A validator should die on failed validation, and can also change the
> parsed object by returning a modified version of it.
>
> This is useful so one can register a format with its hash, thus allowing
> documentation to be generated automatically, while still enforcing certain
> validation rules.
>
> And since I found it impossible to extend the current check_format code,
> clean that function up as well (which pretty much amounts to a rewrite).
> Also use get_format consistently to avoid future breakages.
>
> No existing functionality is intentionally changed.
>
> Signed-off-by: Stefan Reiter <s.reiter at proxmox.com>
> ---
>
> @Fabian G.: Since we discussed this off-list, is this good? At least there
> shouldn't be a possibility for a parser/hash format drift the way I've
> implemented it.
>
> src/PVE/JSONSchema.pm | 60 +++++++++++++++++++++++++++----------------
> 1 file changed, 38 insertions(+), 22 deletions(-)
>
> diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
> index 01a3cce..106a812 100644
> --- a/src/PVE/JSONSchema.pm
> +++ b/src/PVE/JSONSchema.pm
> @@ -120,19 +120,26 @@ register_standard_option('pve-snapshot-name', {
> });
>
> my $format_list = {};
> +my $format_validators = {};
>
> sub register_format {
> - my ($format, $code) = @_;
> + my ($name, $format, $validator) = @_;
>
> - die "JSON schema format '$format' already registered\n"
> - if $format_list->{$format};
> + die "JSON schema format '$name' already registered\n"
> + if $format_list->{$name};
>
> - $format_list->{$format} = $code;
> + if ($validator) {
> + die "A \$validator function can only be specified for hash-based formats\n"
> + if ref($format) ne 'HASH';
> + $format_validators->{$name} = $validator;
> + }
> +
> + $format_list->{$name} = $format;
> }
>
> sub get_format {
> - my ($format) = @_;
> - return $format_list->{$format};
> + my ($name) = @_;
> + return $format_list->{$name};
> }
>
> my $renderer_hash = {};
> @@ -646,13 +653,16 @@ sub pve_verify_tfa_secret {
> sub check_format {
> my ($format, $value, $path) = @_;
>
> - return parse_property_string($format, $value, $path) if ref($format) eq 'HASH';
> return if $format eq 'regex';
>
> - if ($format =~ m/^(.*)-a?list$/) {
> + my $parsed;
>
> - my $code = $format_list->{$1};
> + if (ref($format) eq 'HASH') {
> + # in case it's a hash we can't have a validator registered, so return
it took me several tries to full get this comment and the
implications. if we pass a format hash to check_format, then
check_format can't know whether there is an associated validator
since it doesn't know the name. so in effect this means we should
avoid calling check_format with a hash directly, when that hash is
also registered as format with an ID. this should probably be
documented somewhere, as it also extends to parse_property_string
(see comments there), and needs careful analysis of the code paths
ending up in check_format so that no caller already resolves format ID
to HASH prematurely.
I think we could switch this around and start check_format with:
if (ref($format) eq 'HASH') {
# don't know whether a validator exists
return parse_property_string($format, ...);
}
if (ref($format) eq 'CODE') {
# we are the (sole, old-style) validator
return $format->(...);
}
return if $format eq 'regex';
# AFAICT regex is not registered as actual format, so this should not be able to clash with the rest
my ($name, $registered_format, $parsed);
big existing if, but adapted to set those three variables as
appropriate
if (defined($name) && ref($registered_format) eq 'HASH') {
my $validator = $format_validators->{$name};
$parsed = $validator->($parsed) if $validator;
return $parsed;
}
if we want to support -opt and -a?list with hash based formats that's
fine as well (and I guess it's not unlikely we'll run into that if
we start converting some existing parse_/verify_foo into hash +
validator). those branches then just need to check whether
$registered_format is a CODE-ref (existing behaviour) or not (it's a
HASH, so parse it accordingly). at that point we could probably combine
the -opt and else branches, since they just differ in whether the
empty value is allowed.
for -a?list it is indeed tricky what the semantics of the validator
should be.. does it get called on a list of verified values? what
should the return value be? a list? or is the validator responsible
for converting it back to the string representation before returning?
note: I only found a single instance of check_format being called
with the return value checked (in QemuServer.pm, in a format
validator sub ;), so that can be safely dropped to just return the
original verified value if check_format didn't die).
if PMG does not use that functionality either, we could also just say
we drop check_format returning anything, and it's just die on error
(either parse_property_string, or direct validator sub, or
post-parse_property_string extra validator, or undefined format) or
return on success? some comments below become irrelevant if we go
down that route..
that would make -a?list implementation trivial (validator gets the
full verified-according-to-format-hash list, and can then do both checks
on the list as a whole and each element depending on use case.
> + return parse_property_string($format, $value, $path);
>
> + } elsif ($format =~ m/^(.*)-a?list$/) {
> + my $code = get_format($1);
> die "undefined format '$format'\n" if !$code;
>
> # Note: we allow empty lists
> @@ -660,25 +670,31 @@ sub check_format {
> &$code($v);
nit: this should be switched to $code->($v) as well
> }
>
> - } elsif ($format =~ m/^(.*)-opt$/) {
> -
> - my $code = $format_list->{$1};
> + # since the list might contain multiple values, we don't allow running
> + # validator functions either
> + return undef;
but not because of multiple values, but because these lists here
already only work with format=validator sub, and not format=hash ?
see above as well..
>
> + } elsif ($format =~ m/^(.*)-opt$/) {
> + my $code = get_format($1);
> die "undefined format '$format'\n" if !$code;
>
> - return if !$value; # allow empty string
> -
> - &$code($value);
> + # empty string is allowed
> + $parsed = $code->($value) if $value;
so this is a bit tricky - the &$code($value) was previously the last
statement of this branch, with no code after the conditional,
meaning that its return value was returned by check_format.
also, since we can't have a validator for a format that is a
validator sub itself, shouldn't this be just
return if !$value;
return $code->{$value);
for now?
>
> } else {
> + my $registered_format = get_format($format);
> + die "undefined format '$format'\n" if !$registered_format;
>
> - my $code = $format_list->{$format};
> -
> - die "undefined format '$format'\n" if !$code;
> -
> - return parse_property_string($code, $value, $path) if ref($code) eq 'HASH';
> - &$code($value);
> + if (ref($registered_format) eq 'HASH') {
> + $parsed = parse_property_string($registered_format, $value, $path);
> + } else {
> + $parsed = $registered_format->($value);
same as above for -opt applies here as well (we can't have a
validator, so we might as well make this explicit here and return
early.
> + }
> }
> +
> + my $validator = $format_validators->{$format};
> + $parsed = $validator->($parsed) if $validator;
> + return $parsed;
> }
>
> sub parse_size {
> @@ -735,7 +751,7 @@ sub parse_property_string {
>
> # Support named formats here, too:
> if (!ref($format)) {
> - if (my $desc = $format_list->{$format}) {
> + if (my $desc = get_format($format)) {
> $format = $desc;
parse_property_string called directly with a format ID is rarely used
(only found two hits in PVE code), but this was already missing a
check for $desc being a hash ref, and is now missing support for
calling the validator before returning at the end of the sub..
it would be quite unexpected that check_format('foo', 'bar',
'/path') and parse_property_string('foo', 'bar', '/path') where 'foo'
is a hash-ref format with new-fangled validator function behave
differently. funnily enough, parse_property_string does then
transitively call check_format for the individual properties, so I
guess at least that stays consistent ;)
note that a format that has a validator NEEDs to call
parse_property_string with the format name, not the format hash,
otherwise the validator does not take effect. not a very nice
interface, especially given that that variant is basically unused :-/
maybe we do need to back to the drawing board? or do we take this one as
a tree-wide, post 6.2 release overhaul of our JSONSchema declaration? go
through all the registered formats, convert those that are currently
using a CODE-ref definition into a HASH + validator, and convert all
call sites of the corresponding parse_property_string to use the format
ID instead of the format HASH.
print_property_string also assumes that any named format is a
hash-based format btw.
> } else {
> die "unknown format: $format\n";
> --
> 2.26.1
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
More information about the pve-devel
mailing list