[pve-devel] [RFC common] schema_get_type_text: summarize enumeration for docs

Wed Nov 7 07:48:47 CET 2018

On 11/6/18 8:00 PM, Fabian Grünbichler wrote:
> On Tue, Nov 06, 2018 at 04:03:16PM +0100, Thomas Lamprecht wrote:
>> We use the changed method mainly in pve-docs to generate synopsis
>> from a schema. For now enums where always printed fully, which is not
>> a big problem for those with a relative small set of possibilities.
>>
>> But we allow to use up to 256 mounpoints in pve-container now, and a
>> patch is pending to sync this number for (unused ?) disks in VMs,
>> so manpages which have an option referring to such a enum get pretty
>> long and not really nice, e.g. fsck command in man pct (ensure to
>> look in a recent version).
>>
>> Initially I played with the idea to add a new type, 'range' which
>> would declare a basic key and a min (default to 0) to max range,
>> e.g. for the mountpoint case we'd had something alike:
>>
>> mp => {
>>     description => 'A Mountpoint...',
>>     range => {
>>         base => 'mp',
>>         max => 255,
>>     }
>> }
>>
>> but as in the containercase we often have the set of ( mp0, mp1, ...,
>> mp255, rootfs) this is not really practicable I guess.
>>
>> So I opted to summarize enum entries which consist of a word plus a
>> number, e.g., in the pct fsck man page case I output:
>>
>>> pct fsck <vmid> [OPTIONS]
>>> ...
>>> --device <mp[0..255] | rootfs>
>>
>> This happens only on build time for packages using pve-docs to
>> generate manpages, so even if it's a bit expensive that doesn't
>> matters much - and it should be pretty fast.
>>
>> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
>> ---
>>
>> RFC for obvious reasons, it worked at the first try which worries me a
>> bit, so some subtle breakage may be there ^^
> 
> at first glance (i.e., not tested, just looking), it seems to handle
> 'base base2 base3'[1] or enums with holes like 'base1 base4 base6'[2]
> incorrectly.

ah yeah, sure, but the principal design looks OK for you?

> 
> also, I think the RE needs to be anchored, because in the 'w2k' example
> from [2] the w2k ones would be matched as $base = 'k', $id = 3 (or 8)?
> 
> I'd suggest a two-pass approach:
> - keep the existing scan, but record all candidate bases/ids separately
>   from the non-candidate values

I had that initially, just optimized to record the min/max directly

> - scan those candidate values again, and collapse only those sequences
>   without holes which contain more than X elements (where x is something
>   like 5 or 10?)

or 3, like we do in generate_typetext to decide if we print the full
enumeration values out or the verbatim text 'enum'?
But 5 sound good too.

We could also ouput another format, like:

( mp1 | mp2 | ... | mp255 )

makes it a bit more useable as some values can be copied and used immediately,
and it's, personally observed, a common notation.

> - combine and sort non-candidate values and reduced values
> 
> since this is only done at build time, the extra pass does not hurt that
> much, but makes the end result much less error prone for all those weird
> corner cases.

sounds good in general, if there are no objections to the base design
I'll get on it.

> 
> 1: QemuServer.pm has 'qcow' and 'qcow2' in the same enum, as well as
> 'pentium', 'pentium2', 'pentium3' in another

although pentium does not count here ^^

> 2: again, QemuServer.pm has 'w2k', 'w2k3', 'w2k8', as well as 'win7',
> 'win8', 'win10', as well as 'l24' and 'l26' all in one enum ;)

yes, holes mustn't be printed like this, that's for sure.

Thanks for your comments, appreciated!

> 
>>  src/PVE/JSONSchema.pm | 28 +++++++++++++++++++++++++++-
>>  1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
>> index fb58ad3..92a5b04 100644
>> --- a/src/PVE/JSONSchema.pm
>> +++ b/src/PVE/JSONSchema.pm
>> @@ -1807,7 +1807,33 @@ sub schema_get_type_text {
>>      } elsif ($phash->{format_description}) {
>>  	return "<$phash->{format_description}>";
>>      } elsif ($phash->{enum}) {
>> -	return "<" . join(' | ', sort @{$phash->{enum}}) . ">";
>> +	my $items = $phash->{enum};
>> +	my $reduced = {};
>> +	foreach my $v (@$items) {
>> +	    if ($v =~ /([^\s\d]+)(\d+)/) {
>> +		my ($base, $id) = ($1, $2);
>> +		if (!exists($reduced->{$base})) {
>> +		    $reduced->{$base}->{max} = $reduced->{$base}->{min} = $id;
>> +		} elsif ($id > $reduced->{$base}->{max}) {
>> +		    $reduced->{$base}->{max} = $id;
>> +		} elsif ($id < $reduced->{$base}->{min}) {
>> +		    $reduced->{$base}->{min} = $id;
>> +		}
>> +	    } else {
>> +		$reduced->{$v} = undef;
>> +	    }
>> +	}
>> +	$items = [];
>> +	foreach my $k (sort keys %$reduced) {
>> +	    if (!defined($reduced->{$k})) {
>> +		push @$items, $k;
>> +	    } else {
>> +		my ($min, $max) = $reduced->{$k}->@{qw(min max)};
>> +		push @$items, "${k}[$min..$max]";
>> +	    }
>> +	}
>> +
>> +	return "<" . join(' | ', @$items) . ">";
>>      } elsif ($phash->{pattern}) {
>>  	return $phash->{pattern};
>>      } elsif ($type eq 'integer' || $type eq 'number') {
>> -- 
>> 2.19.1