[pve-devel] [PATCH http-server 1/3] accept-phase: fix conn_count "leak"

Fri Dec 4 08:41:35 CET 2020

On 03.12.20 19:43, Stoiko Ivanov wrote:
> When handling new connections in 'accept_connections' the number of
> active connections got increased before the AnyEvent::Handle
> registered the callback which would decrement it on error.
> 
> Any error/die beforehand would skip the decrement, and leave the
> process in an endless loop upon exiting in wait_end_loop.
> 
> This can happen e.g. when the call to getpeername fails, or if the
> connection is denied by the ALLOW_FROM/DENY_FROM settings in
> '/etc/default/pveproxy' (which is also the simplest reproducer for
> that).
> 
> Additionally it can cause a denial of service, by attempting to
> connect from a denied ip until the connection count exeeds the maximum
> connections of all child-processes.
> 
> Reported via our community-forum:
> https://forum.proxmox.com/threads/pveproxy-eats-available-ram.79617/
> 
> Co-Authored-by: Dominik Csapak <d.csapak at proxmox.com>
> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> ---
>  PVE/APIServer/AnyEvent.pm | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/APIServer/AnyEvent.pm b/PVE/APIServer/AnyEvent.pm
> index c55da7f..c5f5fdc 100644
> --- a/PVE/APIServer/AnyEvent.pm
> +++ b/PVE/APIServer/AnyEvent.pm
> @@ -1479,8 +1479,6 @@ sub accept {
>  
>      fh_nonblocking $clientfh, 1;
>  
> -    $self->{conn_count}++;
> -
>      return $clientfh;
>  }
>  
> @@ -1561,7 +1559,7 @@ sub accept_connections {
>  	    my $reqstate = { keep_alive => $self->{keep_alive} };
>  
>  	    # stop keep-alive when there are many open connections
> -	    if ($self->{conn_count} >= $self->{max_conn_soft_limit}) {
> +	    if ($self->{conn_count}+1 >= $self->{max_conn_soft_limit}) {

style nit: don't glue operators together `self->{conn_count} + 1`

>  		$reqstate->{keep_alive} = 0;
>  	    }
>  
> @@ -1600,6 +1598,9 @@ sub accept_connections {
>  		},
>  		($self->{tls_ctx} ? (tls => "accept", tls_ctx => $self->{tls_ctx}) : ()));
>  
> +	    $self->{conn_count}++;
> +

But isn't this wrong too? The FH could already get a EOF here, and thus get reduced
before increased - one could maybe argue "well it should get increased again after,
here, so brought in sync again", i.e.:

1. Get's registered
                                            2. clientfh EOF
                                                -> $self->client_do_disconnect($reqstate);
                                                -> $self->{conn_count}--;

! Wrong counter here, could lead to possible wrong decisions now already (not checked
for sure) or when adding/changing something (as this is non-obvious, not even a comment
hinting it!)

3. resume here, brought in sync again, reminds me of a short comic strip I recently
   run into [0].

So between 2. and 3. we are in limbo, while short it still matters, every race triggers
sooner or later, computers are just to fast and scheduling to nondeterministic for that
to not happen.

Why not move the $self->{conn_count}++; before AnyEvent Handle instance is created,
i.e., where we do $early_err = 0; as this effectively is the barrier for the connection
being valid or not. We could also add handling for when the handle creation itself fails,
setting a flag afterwards and checking both, that flag and $early_err in the existing
error handling branch outside of the eval, and decrement in that case.

Or, do you have some documented behavior, not stated here in the commit, that this all
just cannot happen at all?

[0]: https://i.redd.it/m4zbw3u7rbk21.jpg

> +
>  	    print "$$: ACCEPT FH" .  $clientfh->fileno() . " CONN$self->{conn_count}\n" if $self->{debug};
>  
>  	    $self->push_request_header($reqstate);
>