[pmg-devel] [RFC pmg-api/docs] minimal before queue filtering support

Tue Nov 12 15:35:09 CET 2019

On Tue, 12 Nov 2019 15:16:09 +0100
Stoiko Ivanov <s.ivanov at proxmox.com> wrote:

> This patchset should eventually address a very often asked-for feature for PMG:
> before-queue filtering (apart from 1/4 for pmg-api, which is a tiny nit I
> caught by accident).
> 
> Technically the work follows the postfix before queue content filtering howto
> [0], which makes it possible to scan before-queue without the need to implement
> a dedicated milter-protocol.
> 
> pmg-smtp-filter (rather PMG::SMTP) already had a currently unused branch for
> handling SMTP connections (the current after-queue solution uses LMTP), which
> only needed a slight adaptation.
> 
> Handling mails with single recepients now should work without any problems:
> 
> * if the result of the rule-system is 'Block' pmg-smtp-filter rejects the mail
>   with '554 5.7.1 Rejected for policy reasons' (the same response code
>   postscreen uses for rbl-hits)
> 
> * if the result is either 'Accept' or 'Quarantine' pmg-smtp-filter accepts the
>   mail
> 
> * if there's a problem in handing the mail back to postfix (10025) then the
>   response is a temporary failure
> 
> The situation is slightly more complicated (I'd say a general thing with SMTP)
> if one mail is to be delivered to multiple recepients:
> 
> * if the rule-system 'Blocks' the mail for all recepients - the mail gets
>   rejected (with 554)
> 
> * if at least one recepient accepts the mail pmg-smtp-filter returns 250.
>   Additionally in order to be compliant with the requirement some users have
>   of never dropping mail, without notification, a bounce-message (NDR) is
>   generated for all users, which 'Blocked' the message (if any).
>   The sending of the NDR can be configured with a flag in pmg.conf (as can the
>   activiation of before queue filtering).
>   The different result for multiple users can probably happen in the default
>   ruleset of PMG (by the User Black/Whitelist), or by (probably too) complicated
>   rulesets.
> 
> Given that the smtpd_proxy_filter is called quite late by the postfix pipeline
> PMG still profits from the protections by postscreen, the pmgpolicy service
> (greylisting, hard SPF evaluation).
> 
> Things still missing in the RFC:
> * the bounces generated are not yet adapted to RFC 6533 (internationalized
>   bounces when announcing SMTPUTF8 extension)
Forgot to mention that the log-tracker (and thus the tracking center) would 
also need some adaptation (seems the match between postfix/qmgr and the initial
pmg-smtp-filter message is not working with the different logs from postfix
(there is no line with the message-id from qmgr)
However I would suggest to tackle that once the logtracker series from Mira is
accepted.


> 
> Preliminary Tests:
> Given that replacing the postfix smtpd (and its queue) on the front-line by
> pmg-smtp-filter (which is not the fastest, since it does quite a lot (mostly
> run spamassassin analyze)) will have some effect on the behavior of the system
> I tried running 2 test-scenarios:
> * use 'postal' [1] for benchmarking:
> ** setup: `timeout 2m postal -M 25 -m 500 -t 10 -c 50 -f senders <pmg-ip> recepients`
>    (run 10 threads each sending 50 mails before opening a new connection
>    sending random text (short of the minimal set of headers) between 25k and
>    500k)
> ** the random data seems to be (probably not too much of a surprise) a rather
>    bad case scenario for SpamAssassin (many complicated regex-matches for
>    mail-text) - the processing time per mail was on average between 10s and
>    120s (99.99 % due to spamassassin)
> ** the throughput (mails actually going out of pmg) is roughly the same between
>    before and after queue filtering
> ** with after-queue filtering 'postal' was (of course) able to deliver far
>    more mails to PMG (2.5k in 120 seconds) - they were queued by postfix and
>    would have been eventually delivered, but the output of PMG was the same
>    (around 30-45)
> 
> * queue up 3x500 mails (2 rather small testmails and 1 350k mail)
>   in a postfix-queue (on a separate host while postfix is not running) and
>   start postfix (the runtimes for analyzing these mails is far closer to what
>   we see in production (<1s - 3s (the large mail))
> ** with this test-set the queue in the original postfix got emptied within
>    3 minutes (yielding about 8.3 mails/second on my test-installation)
> 
> 
> While looking around the extremely long time spamassassin took for the random
> data mails - I tried to precompile the spamassassin rules with sa-compile [2].
> Result:
> * on the random data quite a speedup was achieved (115 vs 45)
> * on the 3x500 the processing time did not change too much (if noticeable at
>   all).
> 
> 
> Would be very grateful for feedback (especially suggestions for further testing
> which would make sense)
> 
> [0] http://www.postfix.org/SMTPD_PROXY_README.html
> [1] https://esmtp.email/blog/2017/11/04/postal-benchmark/
> [2] https://cwiki.apache.org/confluence/display/spamassassin/FasterPerformance
> 
> pmg-api:
> Stoiko Ivanov (4):
>   add missing use MIME::Entity in PMG::Utils
>   PMG::Config: refactor dns info collection
>   add generate_ndr to PMG::SMTP
>   add support for before queue filtering
> 
>  src/PMG/Config.pm          | 42 +++++++++++++++----
>  src/PMG/SMTP.pm            | 85 ++++++++++++++++++++++++++++++++++----
>  src/PMG/Utils.pm           |  1 +
>  src/templates/master.cf.in | 14 +++++++
>  4 files changed, 124 insertions(+), 18 deletions(-)
> 
> pmg-docs:
> Stoiko Ivanov (1):
>   add before_queue params to gen-pmg.conf.5.-opts.pl
> 
>  gen-pmg.conf.5-opts.pl | 2 ++
>  1 file changed, 2 insertions(+)
>