[pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Nov 25 14:28:14 CET 2021


On 25.11.21 12:22, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
> 
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
> 
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..987dc39 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,16 @@ use HTML::Scrubber;
>  use PMG::Utils;
>  use PMG::MIMEUtils;
>  
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> +    my ($value) = @_;
> +    # remove all urls with a protocol, this leaves partially invalid
> +    # css, but prevents the browser from loading them
> +    $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;

isn't the image() css function also problematic?
https://developer.mozilla.org/en-US/docs/Web/CSS/image/image()

And isn't this dangerous as an attacker could use this to craft the actual url,
for example:

urlurl(https://https:/my.super.evil.site.com/foo.png)

perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"'

prints:

url(https:/my.super.evil.site.com/foo.png)

So maybe rather just prefix the bad function names with __ or so, that would be safe
against such things, removing partial things won't ever be, at least not at the regex
level.

> +}
> +
>  sub dump_html {
> -    my ($tree, $cid_hash) = @_;
> +    my ($tree, $cid_hash, $viewimages) = @_;

I know this is the same name we use at caller side, but I'd still rather keep
local naming style: $view_images

>  
>      my @html = ();
>  
> @@ -37,6 +45,11 @@ sub dump_html {
>  			    $node->{src} = $datauri;
>  			}
>  		    }
> +		} elsif ($tag eq 'style' && !$viewimages) {
> +		    for my $el ($node->content_refs_list()) {
> +			next if ref $$el;
> +			remove_urls($el);
> +		    }

could be (and really just to let my line-bloat-reduction policy get out of hand):

		    remove_urls($_) for grep { !ref $$el } $node->content_refs_list();
>  		}
>  
>  		if($start) { # on the way in
> @@ -137,7 +150,13 @@ sub getscrubber {
>  	    span => 1,
>  	    src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
>  	    start => 1,
> -	    style => 1,
> +	    style => $viewimages ? 1 : sub {
> +		my ($obj, $tag_name, $attr_name, $value) = @_;
> +
> +		remove_urls(\$value);
> +
> +		return $value;
> +	    },

may be nicer do declare that separately and pass the code-ref directly by variable
or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along)

>  	    summary => 1,
>  	    tabindex => 1,
>  	    target => 1,
> @@ -267,7 +286,7 @@ sub entity_to_html {
>  	$tree->parse($raw);
>  	$tree->eof();
>  
> -	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
> +	my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags

the comment is confusing, dump_html does more than just scrubbing style tags,
doesn't it?

Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash
and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any
real overhead. But, no to hard feeling here and I did not looked at the full picture
to closely so I may miss something that would make that a bad idea.


>  	$tree->delete;
>  
>  	# remove dangerous/unneeded elements
> 





More information about the pmg-devel mailing list