[pve-devel] [PATCH http-server v2] fix #5391: proxy request: avoid HTTP 599 Too many redirections

Fri Oct 4 11:43:11 CEST 2024

The API server proxies HTTP requests in two cases:

- between cluster nodes (pveproxy->pveproxy)
- between daemons on one node for protected API endpoints
  (pveproxy->pvedaemon)

The API server uses AnyEvent::HTTP for proxying, with unfortunate
settings for connection reuse (details below). With these settings,
long-running synchronous API requests on the proxy destination's side
can cause unrelated proxied requests to fail with a misleading HTTP
599 "Too many redirections" error response. In order to avoid these
errors, improve the connection reuse settings.

In more detail:

Per default, AnyEvent::HTTP reuses previously-opened connections for
requests with idempotent HTTP verbs, e.g. GET/PUT/DELETE [1]. However,
when trying to reuse a previously-opened connection, it can happen
that the destination unexpectedly closes the connection. In case of
idempotent requests, AnyEvent::HTTP's http_request will retry by
recursively calling itself. Since the API server disallows recursion
by passing `recurse => 0` to http_request initially, the recursive
call fails with "HTTP 599 Too many redirections".

This can happen both for pveproxy->pveproxy and pveproxy->pvedaemon,
as connection reuse is enabled in both cases. Connection reuse being
enabled in the pveproxy->pvedaemon case was likely not intended: A
comment mentions that "keep alive for localhost is not worth it", but
only sets `keepalive => 0` and not `persistent => 0`. This setting
switches from HTTP/1.1 persistent connections to HTTP/1.0-style
keep-alive connections, but still allows connection reuse.

The destination unexpectedly closing the connection can be due to
unfortunate timing, but it becomes much more likely in case of
long-running synchronous requests. An example sequence:

1) A pveproxy worker P1 handles a protected request R1 and proxies it
   to a pvedaemon worker D1, opening a pveproxy worker->pvedaemon
   worker connection C1. The pvedaemon worker D1 is relatively fast
   (<1s) in handling R1. P1 saves connection C1 for later reuse.
2) A different pveproxy worker P2 handles a protected request R2 and
   proxies it to the same pvedaemon worker D1, opening a new pveproxy
   worker->pvedaemon connection C2. Handling this request takes a long
   time (>5s), for example because it queries a slow storage. While
   the request is being handled, the pvedaemon worker D1 cannot do
   anything else.
3) Since pvedaemon worker D1 sets a timeout of 5s when accepting
   connections and it did not see anything on connection C1 for >5s
   (because it was busy handling R2), it closes the connection C1.
4) pveproxy worker P1 handles a protected idempotent request R3. Since
   the request is idempotent, it tries to reuse connection C1. But C1
   was just closed by D1, so P1 fails request R3 with HTTP 599 as
   described above.

In addition, AnyEvent::HTTP's default of reusing connections for all
idempotent HTTP verbs is problematic in our case, as not all PUT
requests of the PVE API are actually idempotent, e.g. /sendkey [2].

To fix the issues above, improve the connection reuse settings:

a) Actually disable connection reuse for pveproxy->pvedaemon requests,
   by passing `persistent => 0`.
b) For pveproxy->pveproxy requests, enable connection reuse for GET
   requests only, as these should be actually idempotent.
c) If connection reuse is enabled, allow one retry by passing `recurse
   => 1`, to avoid the HTTP 599 errors.

With a) and b), the API server will reuse connections less often,
which can theoretically result in a performance drop. To gain
confidence that the performance impact is tolerable, here are the
results of a simple benchmark.

The benchmark runs hey [3] against a virtual 3-node PVE cluster, with
or without the patch applied. It performs 10000 requests in 2 worker
threads to `PUT $HTTP_NODE:8006/api2/json/nodes/$PROXY_NODE/config`
with a JSON payload that sets a 32KiB ASCII `description`. The
shortened hey invocation:

    hey -H "$TOKEN" -m PUT -T application/json -D payload.json \
        --disable-keepalive -n 10000 -c 2 "$URL"

The endpoint was chosen because it is performs little work (locks and
writes a config file), it is protected (to test behavior change a)),
and it is a PUT endpoint (to test behavior change b)).

The command is ran two times:

- With $HTTP_NODE == $PROXY_NODE for pveproxy->pvedaemon proxying
- With $HTTP_NODE != $PROXY_NODE for pveproxy->pveproxy->pvedaemon
  proxying

For each invocation, we record the response times.

Without this patch:

  $HTTP_NODE == $PROXY_NODE

  Slowest:      0.0215 secs
  Fastest:      0.0061 secs
  Average:      0.0090 secs
  0.006 [1]     |
  0.008 [2409]  |■■■■■■■■■■■■■■■■■■■■■■■■
  0.009 [4065]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.011 [1781]  |■■■■■■■■■■■■■■■■■■
  0.012 [1024]  |■■■■■■■■■■
  0.014 [414]   |■■■■
  0.015 [196]   |■■
  0.017 [85]    |■
  0.018 [21]    |
  0.020 [2]     |
  0.022 [2]     |

  $HTTP_NODE != $PROXY_NODE

  Slowest:      0.0584 secs
  Fastest:      0.0075 secs
  Average:      0.0105 secs
  0.007 [1]     |
  0.013 [8445]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.018 [1482]  |■■■■■■■
  0.023 [56]    |
  0.028 [5]     |
  0.033 [1]     |
  0.038 [0]     |
  0.043 [0]     |
  0.048 [0]     |
  0.053 [5]     |
  0.058 [5]     |

With this patch:

  $HTTP_NODE == $PROXY_NODE

  Slowest:      0.0194 secs
  Fastest:      0.0062 secs
  Average:      0.0088 secs
  0.006 [1]     |
  0.007 [1980]  |■■■■■■■■■■■■■■■■■■■
  0.009 [4134]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.010 [1874]  |■■■■■■■■■■■■■■■■■■
  0.011 [1406]  |■■■■■■■■■■■■■■
  0.013 [482]   |■■■■■
  0.014 [93]    |■
  0.015 [16]    |
  0.017 [5]     |
  0.018 [4]     |
  0.019 [5]     |

  $HTTP_NODE != $PROXY_NODE

  Slowest:      0.0369 secs
  Fastest:      0.0091 secs
  Average:      0.0121 secs
  0.009 [1]     |
  0.012 [5711]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.015 [3392]  |■■■■■■■■■■■■■■■■■■■■■■■■
  0.017 [794]   |■■■■■■
  0.020 [79]    |■
  0.023 [16]    |
  0.026 [3]     |
  0.029 [2]     |
  0.031 [0]     |
  0.034 [1]     |
  0.037 [1]     |

Comparing the averages, there is

- little difference when $HTTP_NODE == $PROXY_NODE (0.009s vs
  0.0088s). So for pveproxy->pvedaemon proxying, the effect of
  disabling connection reuse seems negligible.
- ~15% overhead when $HTTP_NODE != $PROXY_NODE (0.0105s vs 0.0121s).
  Such an increase for pveproxy->pveproxy->pvedaemon proxying is not
  nothing, but in real-world workloads I'd expect the response time
  for non-idempotent requests to be dominated by other factors.

[1] https://metacpan.org/pod/AnyEvent::HTTP#persistent-=%3E-$boolean
[2] https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}/sendkey
[3] https://github.com/rakyll/hey

Suggested-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
---

Notes:
    Not sure if this particular benchmark is the best way to measure the
    impact of this patch, if you have suggestions, please let me know.
    
    When applied, it might make sense to have this patch in its own
    pve-http-server bump, so if users should notice significant
    performance drops, they could go back to an earlier version to see if
    this patch is responsible.
    
    v2:
    - no code changes
    - add benchmark to commit message
    - fix typos in commit message

 src/PVE/APIServer/AnyEvent.pm | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
index a87a696..24209a1 100644
--- a/src/PVE/APIServer/AnyEvent.pm
+++ b/src/PVE/APIServer/AnyEvent.pm
@@ -708,7 +708,12 @@ sub proxy_request {
 
     eval {
 	my $target;
-	my $keep_alive = 1;
+
+	# By default, AnyEvent::HTTP reuses connections for the idempotent
+	# request methods GET/HEAD/PUT/DELETE. But not all of our PUT requests
+	# are idempotent, hence, reuse connections for GET requests only, as
+	# these should in fact be idempotent.
+	my $persistent = $method eq 'GET';
 
 	# stringify URI object and verify it starts with a slash
 	$uri = "$uri";
@@ -720,8 +725,8 @@ sub proxy_request {
 	my $may_stream_file;
 	if ($host eq 'localhost') {
 	    $target = "http://$host:85$uri";
-	    # keep alive for localhost is not worth (connection setup is about 0.2ms)
-	    $keep_alive = 0;
+	    # connection reuse for localhost is not worth (connection setup is about 0.2ms)
+	    $persistent = 0;
 	    $may_stream_file = 1;
 	} elsif (Net::IP::ip_is_ipv6($host)) {
 	    $target = "https://[$host]:8006$uri";
@@ -796,9 +801,13 @@ sub proxy_request {
 	    $method => $target,
 	    headers => $headers,
 	    timeout => 30,
-	    recurse => 0,
 	    proxy => undef, # avoid use of $ENV{HTTP_PROXY}
-	    keepalive => $keep_alive,
+	    persistent => $persistent,
+	    # if connection reuse is enabled ($persistent is 1), allow one retry, to avoid returning
+	    # HTTP 599 Too many redirections if the server happens to close the connection
+	    recurse => $persistent ? 1 : 0,
+	    # when reusing a connection, send keep-alive headers
+	    keepalive => 1,
 	    body => $content,
 	    tls_ctx => AnyEvent::TLS->new(%{$tls}),
 	    sub {
-- 
2.39.5