[pmg-devel] [PATCH pmg-api] fix #3734: scrub 'url' from style tags/attributes
Thomas Lamprecht
t.lamprecht at proxmox.com
Thu Nov 25 14:28:14 CET 2021
On 25.11.21 12:22, Dominik Csapak wrote:
> if 'view images' for the quarantine is disabled, it is expected that
> *no* images will be loaded. but in addition to img (src/href/etc.)
> also css can load external images via the 'url' directive
>
> since html scrubber does not parse/iterate over css, we simply remove
> the url+protocol part of those tags/attributes. this technically leaves behind
> invalid css, but the browsers should cope with that.
> (we cannot 'cleanly' remove without much more effort because of quoting)
>
> also we have to scrub the style tags in 'dump_html' since HTML::Scrubber
> does not have a way to modify the *content* of a tag, only the
> attributes...
>
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
> src/PMG/HTMLMail.pm | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/src/PMG/HTMLMail.pm b/src/PMG/HTMLMail.pm
> index b69a596..987dc39 100644
> --- a/src/PMG/HTMLMail.pm
> +++ b/src/PMG/HTMLMail.pm
> @@ -15,8 +15,16 @@ use HTML::Scrubber;
> use PMG::Utils;
> use PMG::MIMEUtils;
>
> +# $value is a ref to a string scalar
> +my sub remove_urls {
> + my ($value) = @_;
> + # remove all urls with a protocol, this leaves partially invalid
> + # css, but prevents the browser from loading them
> + $$value =~ s|url\s*\(\s*(['"]?)[a-z]+://|($1|gi;
isn't the image() css function also problematic?
https://developer.mozilla.org/en-US/docs/Web/CSS/image/image()
And isn't this dangerous as an attacker could use this to craft the actual url,
for example:
urlurl(https://https:/my.super.evil.site.com/foo.png)
perl -we 'my $a = "urlurl(http://http:/my.super.evil.site.com/foo.png)"; $a =~ s|url\s*\(\s*(["]?)[a-z]+://|($1|gi; print "$a"'
prints:
url(https:/my.super.evil.site.com/foo.png)
So maybe rather just prefix the bad function names with __ or so, that would be safe
against such things, removing partial things won't ever be, at least not at the regex
level.
> +}
> +
> sub dump_html {
> - my ($tree, $cid_hash) = @_;
> + my ($tree, $cid_hash, $viewimages) = @_;
I know this is the same name we use at caller side, but I'd still rather keep
local naming style: $view_images
>
> my @html = ();
>
> @@ -37,6 +45,11 @@ sub dump_html {
> $node->{src} = $datauri;
> }
> }
> + } elsif ($tag eq 'style' && !$viewimages) {
> + for my $el ($node->content_refs_list()) {
> + next if ref $$el;
> + remove_urls($el);
> + }
could be (and really just to let my line-bloat-reduction policy get out of hand):
remove_urls($_) for grep { !ref $$el } $node->content_refs_list();
> }
>
> if($start) { # on the way in
> @@ -137,7 +150,13 @@ sub getscrubber {
> span => 1,
> src => $viewimages ? qr{^(?!(?:java)?script)}i : 0,
> start => 1,
> - style => 1,
> + style => $viewimages ? 1 : sub {
> + my ($obj, $tag_name, $attr_name, $value) = @_;
> +
> + remove_urls(\$value);
> +
> + return $value;
> + },
may be nicer do declare that separately and pass the code-ref directly by variable
or sub-ref-name here (not sure from top of my head if `my sub foo` can be passed along)
> summary => 1,
> tabindex => 1,
> target => 1,
> @@ -267,7 +286,7 @@ sub entity_to_html {
> $tree->parse($raw);
> $tree->eof();
>
> - my $whtml = dump_html($tree, $viewimages ? $cid_hash : {});
> + my $whtml = dump_html($tree, $viewimages ? $cid_hash : {}, $viewimages); #scrubs style tags
the comment is confusing, dump_html does more than just scrubbing style tags,
doesn't it?
Seems also like we split "show images" logic now a bit more, maybe always pass $cid_hash
and decide in dump_html if it's used or not, as it's just a ref it wouldn't be any
real overhead. But, no to hard feeling here and I did not looked at the full picture
to closely so I may miss something that would make that a bad idea.
> $tree->delete;
>
> # remove dangerous/unneeded elements
>
More information about the pmg-devel
mailing list