[pmg-devel] [PATCH pmg-api 03/12] RuleCache: reorganize how we gather marks and spaminfo

Stoiko Ivanov s.ivanov at proxmox.com
Tue Feb 20 12:10:35 CET 2024


On Fri,  9 Feb 2024 13:54:27 +0100
Dominik Csapak <d.csapak at proxmox.com> wrote:

> instead of collecting the spaminfo (+match) seperately, collect this
> per target together with the regular marks. With this, we can omit the
> 'global' marks list, since each target has their own anyway.
> 
> We want this, since when we'll implement and/invert for matches, the marks
> can differ between targets, since the spamlevel can diverge for them and
> that can be and-combined with objects that add marks. For that to be
> possible we have to save each match + info per target instead of
> globally.
> 
> Since we don't change the actual matching behaviour with this patch,
> for the remove action, we can simply use the marks from the first target
> (as they currently have to be identical).
I don't think this premise holds - or rather the reasoning seems a bit off?

* marks are generated with what_matches
* global (not-per-part) matches are virus, spam - these just mark with an
  empty array-ref [] - indicating they affect the whole mail
* per-part what-matches are MatchField, and the content-type/filename
  matches - they add a list of all parts they match
* the only what_match that might differ per user/target is the spam-match,
  which marks the complete mail

marks are identical per rule across all targets, because the only place
where they could differ just pushes the contents of an empty array to the
list. 

(sorry if this sounds a bit pedantic - but it sadly took me 30 minutes
with Data::Dumper to get my head around this)

> 
> Conversely, we currently save the spaminfo per target, but later in
> pmg-smtp-filter we only ever use the first one we encounter, so instead
> save it only the first time and use that.
we currently get the spaminfo as part of the resulting hashref from
RuleCache::what_match, next to the only other member 'targets'.
Maybe we could return that as second value from what_match and save
ourselves the second level of nesting (see inline)
Please disregard if this becomes obsolete by one of the later patches

> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  src/PMG/RuleCache.pm     | 32 ++++++++++----------------------
>  src/PMG/RuleDB/Remove.pm | 19 +++++++++++++++----
>  src/bin/pmg-smtp-filter  | 18 +++++-------------
>  3 files changed, 30 insertions(+), 39 deletions(-)
> 
> diff --git a/src/PMG/RuleCache.pm b/src/PMG/RuleCache.pm
> index fd22a16..4f7ebe7 100644
> --- a/src/PMG/RuleCache.pm
> +++ b/src/PMG/RuleCache.pm
> @@ -304,37 +304,25 @@ sub what_match {
>      if (scalar($what->{groups}->@*) == 0) {
>  	# match all targets
>  	foreach my $target (@{$msginfo->{targets}}) {
> -	    $res->{$target}->{marks} = [];
> +	    $res->{targets}->{$target}->{marks} = [];
here this could become $res->{$target}->{marks}
>  	}
> -
> -	$res->{marks} = [];
>  	return $res;
>      }
>  
> -    my $marks;
> -
>      for my $group ($what->{groups}->@*) {
>  	for my $obj ($group->{objects}->@*) {
>  	    if (!$obj->can('what_match_targets')) {
>  		if (my $match = $obj->what_match($queue, $element, $msginfo, $dbh)) {
> -		    push @$marks, @$match;
> +		    for my $target ($msginfo->{targets}->@*) {
> +			push $res->{targets}->{$target}->{marks}->@*, $match->@*;
here as well

> +		    }
>  		}
> -	    }
> -	}
> -    }
> -
> -    foreach my $target (@{$msginfo->{targets}}) {
> -	$res->{$target}->{marks} = $marks;
> -	$res->{marks} = $marks;
> -    }
> -
> -    for my $group ($what->{groups}->@*) {
> -	for my $obj ($group->{objects}->@*) {
> -	    if ($obj->can ("what_match_targets")) {
> -		my $target_info;
> -		if ($target_info = $obj->what_match_targets($queue, $element, $msginfo, $dbh)) {
> -		    foreach my $k (keys %$target_info) {
> -			$res->{$k} = $target_info->{$k};
> +	    } else {
> +		if (my $target_info = $obj->what_match_targets($queue, $element, $msginfo, $dbh)) {
> +		    foreach my $k (keys $target_info->%*) {
> +			push $res->{targets}->{$k}->{marks}->@*, $target_info->{$k}->{marks}->@*;
and here
> +			# only save spaminfo once
> +			$res->{spaminfo} = $target_info->{$k}->{spaminfo} if !defined($res->{spaminfo});
this would need to be changed (and returned as second value below)

>  		    }
>  		}
>  	    }
> diff --git a/src/PMG/RuleDB/Remove.pm b/src/PMG/RuleDB/Remove.pm
> index e7c353c..5812602 100644
> --- a/src/PMG/RuleDB/Remove.pm
> +++ b/src/PMG/RuleDB/Remove.pm
> @@ -198,9 +198,15 @@ sub execute {
>  
>      my $rulename = encode('UTF-8', $vars->{RULE} // 'unknown');
>  
> -    if (!$self->{all} && ($#$marks == -1)) {
> -	# no marks
> -	return;
> +    if (!$self->{all}) {
> +	my $found_mark = 0;
> +	for my $target (keys $marks->{targets}->%*) {
> +	    if (scalar($marks->{targets}->{$target}->{marks}->@*) > 0) {
> +		$found_mark = 1;
> +		last;
> +	    }
> +	}
> +	return if !$found_mark;
>      }
>  
>      my $subgroups = $mod_group->subgroups ($targets);
> @@ -256,7 +262,12 @@ sub execute {
>  	}
>  
>  	$self->{message_seen} = 0;
> -	$self->delete_marked_parts($queue, $entity, $html, $rtype, $marks, $rulename);
> +
> +	# since all matches are or combinded, marks for all targets must be the same if they exist
> +	# so simply use the first one here
maybe "since currently all marks are equal for all targets, use the first
one"?
> +	my $match_marks = $marks->{targets}->{$tg->[0]}->{marks};
> +
> +	$self->delete_marked_parts($queue, $entity, $html, $rtype, $match_marks, $rulename);
>  	delete $self->{message_seen};
>  
>  	if ($msginfo->{testmode}) {
> diff --git a/src/bin/pmg-smtp-filter b/src/bin/pmg-smtp-filter
> index 7da3de8..71043b0 100755
> --- a/src/bin/pmg-smtp-filter
> +++ b/src/bin/pmg-smtp-filter
> @@ -276,8 +276,9 @@ sub apply_rules {
>  	foreach my $target (@{$msginfo->{targets}}) {
>  	    next if $final->{$target};
>  	    next if !defined ($rule_marks{$rule->{id}});
> -	    next if !defined ($rule_marks{$rule->{id}}->{$target});
> -	    next if !defined ($rule_marks{$rule->{id}}->{$target}->{marks});
> +	    next if !defined ($rule_marks{$rule->{id}}->{targets});
here you could get rid of this line - if the what_match returns the spaminfo as second value.

> +	    next if !defined ($rule_marks{$rule->{id}}->{targets}->{$target});
> +	    next if !defined ($rule_marks{$rule->{id}}->{targets}->{$target}->{marks});
and here get rid of {targets}->
>  	    next if !$rulecache->to_match ($rule->{id}, $target, $ldap);
>  
>  	    $final->{$target} = $fin;
> @@ -320,24 +321,15 @@ sub apply_rules {
>  	my $targets = $rule_targets{$rule->{id}};
>  	next if !$targets;
>  
> -	my $spaminfo;
> -	foreach my $t (@$targets) {
> -	    if ($rule_marks{$rule->{id}}->{$t} && $rule_marks{$rule->{id}}->{$t}->{spaminfo}) {
> -		$spaminfo = $rule_marks{$rule->{id}}->{$t}->{spaminfo};
> -		# we assume spam info is the same for all matching targets
> -		last;
> -	    }
> -	}
> -
>  	my $vars = $self->get_prox_vars (
> -	    $queue, $entity, $msginfo, $rule, $rule_targets{$rule->{id}}, $spaminfo);
> +	    $queue, $entity, $msginfo, $rule, $rule_targets{$rule->{id}}, $rule_marks{$rule->{id}}->{spaminfo});
>  
>  	my @sorted_actions = sort {$a->priority <=> $b->priority} @{$rule_actions{$rule->{id}}};
>  
>  	foreach my $action (@sorted_actions) {
>  	    $action->execute(
>  		$queue, $self->{ruledb}, $mod_group, $rule_targets{$rule->{id}}, $msginfo, $vars,
> -		$rule_marks{$rule->{id}}->{marks}, $ldap
> +		$rule_marks{$rule->{id}}, $ldap
>  	    );
>  	    last if $action->final;
>  	}





More information about the pmg-devel mailing list