[pve-devel] [PATCH ha-manager 02/15] tools: add hash set helper subroutines

Fri Apr 11 13:24:37 CEST 2025

Thanks here for the feedback from both of you.

I agree with all the comments and will make the helpers more reusable so 
that they can be moved to a new data structure/hash module in PVE::Tools.

On 4/3/25 14:16, Fabian Grünbichler wrote:
> On March 25, 2025 6:53 pm, Thomas Lamprecht wrote:
>> Am 25.03.25 um 16:12 schrieb Daniel Kral:
>>> Implement helper subroutines, which implement basic set operations done
>>> on hash sets, i.e. hashes with elements set to a true value, e.g. 1.
>>>
>>> These will be used for various tasks in the HA Manager colocation rules,
>>> e.g. for verifying the satisfiability of the rules or applying the
>>> colocation rules on the allowed set of nodes.
>>>
>>> Signed-off-by: Daniel Kral <d.kral at proxmox.com>
>>> ---
>>> If they're useful somewhere else, I can move them to PVE::Tools
>>> post-RFC, but it'd be probably useful to prefix them with `hash_` there.
>>
>> meh, not a big fan of growing the overly generic PVE::Tools more, if, this
>> should go into a dedicated module for hash/data structure helpers ...
>>
>>> AFAICS there weren't any other helpers for this with a quick grep over
>>> all projects and `PVE::Tools::array_intersect()` wasn't what I needed.
>>
>> ... which those existing one should then also move into, but out of scope
>> of this series.
>>
>>>
>>>   src/PVE/HA/Tools.pm | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 42 insertions(+)
>>>
>>> diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
>>> index 0f9e9a5..fc3282c 100644
>>> --- a/src/PVE/HA/Tools.pm
>>> +++ b/src/PVE/HA/Tools.pm
>>> @@ -115,6 +115,48 @@ sub write_json_to_file {
>>>       PVE::Tools::file_set_contents($filename, $raw);
>>>   }
>>>   
>>> +sub is_disjoint {
>>
>> IMO a bit too generic name for being in a Tools named module, maybe
>> prefix them all with hash_ or hashes_ ?

Yes, good call, I think I'll go for what Fabian mentioned below to 
prefix them with hash_set_ / set_ or something similar.

And as we're working towards making those helpers more accessible for 
other use cases, I'll also move them to a separate PVE::Tools::* module 
as suggested above :)

> 
> is_disjoint also only really makes sense as a name if you see it as an
> operation *on* $hash1, rather than an operation involving both hashes..
> 
> i.e., in Rust
> 
> set1.is_disjoint(&set2);
> 
> makes sense..
> 
> in Perl
> 
> is_disjoint($set1, $set2)
> 
> reads weird, and should maybe be
> 
> check_disjoint($set1, $set2)
> 
> or something like that?

Yes makes sense, I was going for `are_disjoint`, but both are fine for me.

> 
>>
>>> +    my ($hash1, $hash2) = @_;
>>> +
>>> +    for my $key (keys %$hash1) {
>>> +	return 0 if exists($hash2->{$key});
>>> +    }
>>> +
>>> +    return 1;
>>> +};
>>> +
>>> +sub intersect {
>>> +    my ($hash1, $hash2) = @_;
>>> +
>>> +    my $result = { map { $_ => $hash2->{$_} } keys %$hash1 };
> 
> this is a bit dangerous if $hash2->{$key} is itself a reference? if I
> later modify $result I'll modify $hash2.. I know the commit message says
> that the hashes are all just of the form key => 1, but nothing here
> tells me that a year later when I am looking for a generic hash
> intersection helper ;) I think this should also be clearly mentioned in
> the module, and ideally, also in the helper names (i.e., have "set"
> there everywhere and a comment above each that it only works for
> hashes-as-sets and not generic hashes).
> 
> wouldn't it be faster/simpler to iterate over either hash once?
> 
> my $result = {};
> for my $key (keys %$hash1) {
>      $result->{$key} = 1 if $hash1->{$key} && $hash2->{$key};
> }
> return $result;

I haven't thought too much about what { map {} } would cost here for the 
RFC, but the above is both easier to read and also safer, so I'll adapt 
the subroutine to the above, thanks :).

> 
> 
>>> +
>>> +    for my $key (keys %$result) {
>>> +	delete $result->{$key} if !defined($result->{$key});
>>> +    }
>>> +
>>> +    return $result;
>>> +};
>>> +
>>> +sub set_difference {
>>> +    my ($hash1, $hash2) = @_;
>>> +
>>> +    my $result = { map { $_ => 1 } keys %$hash1 };
> 
> if $hash1 is only of the form key => 1, then this is just
> 
> my $result = { %$hash1 };

But $result would then be a copy instead of a reference to %$hash1 here 
right? But only if there's no other references in there?

> 
>>> +
>>> +    for my $key (keys %$result) {
>>> +	delete $result->{$key} if defined($hash2->{$key});
>>> +    }
>>> +
> 
> but the whole thing can be
> 
> return { map { $hash2->{$_} ? ($_ => 1) : () } keys %$hash1 };
> 
> this transforms hash1 into its keys, and then returns either ($key => 1)
> if the key is true in $hash2, or the empty tuple if not. the outer {}
> then turn this sequence of tuples into a hash again, which skips empty
> tuples ;) can of course also be adapted to use the value from either
> hash, check for definedness instead of truthiness, ..

I'll have to check out more of the perldoc of the more common functions, 
didn't know that map will skip empty lists here, thanks :)

> 
>>> +    return $result;
>>> +};
>>> +
>>> +sub union {
>>> +    my ($hash1, $hash2) = @_;
>>> +
>>> +    my $result = { map { $_ => 1 } keys %$hash1, keys %$hash2 };
>>> +
>>> +    return $result;
>>> +};
>>> +
>>>   sub count_fenced_services {
>>>       my ($ss, $node) = @_;
>>>   
>>
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>>
>>