[pve-devel] [PATCH http-server 1/2] http-server: anyevent: add new subroutine for processing requests

Tue Feb 28 15:05:48 CET 2023

On February 17, 2023 4:25 pm, Max Carrara wrote:
> The `begin_process_next_request` subroutine and its auxiliary
> subroutines are added to serve as a replacement for
> `push_request_header` and `unshift_read_header` in order to simplify
> the parsing logic. The entire header is read by a single callback
> in `begin_process_next_request` and then processed step-by-step
> instead of reading each line and handling it separately.
> This is done in order to separate reading the incoming header from
> the processing algorithm's logic while simultaneously making the
> steps used to process the header more transparent and explicit.
> 
> This comes with one noteworthy change in behaviour:
> Because the header is now read all at once from the input buffer,
> the header field count limit is only checked for after the entire
> header was already read. In `unshift_read_header` the limit is
> checked for each line that is read.
> 
> The behaviour otherwise remains fully identical.
> 
> Signed-off-by: Max Carrara <m.carrara at proxmox.com>

Like Thomas already remarked - splitting these two patches doesn't make much
sense, since it's still needed to look at the total diff across both to see what
has changed..

I started going through the changes below and noticed some stuff, but quickly
came to the conclusion that we actually gain very little from switching to
parsing the full read buffer in one go instead of by line
- in the end, the actual parsing still happens by line
- unshift_read_header already contains the state machine handling all of the
steps except for parsing the very first line (and the state is rather
trivial/serial anyway)
- adding 'no TLS, respond with redirect' logic there should be a few lines of
code

OTOH, switching to non-line-based parsing here requires careful analysis of
interactions with AnyEvent internals and buffer sizes with potential for subtle
breakage (e.g., how does this fare under load).

it would still be possible to factor out the post-processing and
authentication/handling part, but that doesn't require turning the header
parsing onto its head.. unshift_read_header could simply call a post_processing
and a authenticate_and_handle_request sub instead of in-lining all that..

> ---
>  src/PVE/APIServer/AnyEvent.pm | 492 ++++++++++++++++++++++++++++++++++
>  1 file changed, 492 insertions(+)
> 
> Note: The subroutines in `@process_steps` of
> `begin_process_next_request` mimic the steps the original algorithm
> takes. If necessary, these can of course be split up into smaller
> steps.
> 
> diff --git a/src/PVE/APIServer/AnyEvent.pm b/src/PVE/APIServer/AnyEvent.pm
> index 3cd77fa..bf7957a 100644
> --- a/src/PVE/APIServer/AnyEvent.pm
> +++ b/src/PVE/APIServer/AnyEvent.pm
> @@ -1570,6 +1570,498 @@ sub push_request_header {
>      warn $@ if $@;
>  }
>  
> +sub begin_process_next_request {
> +    my ($self, $reqstate) = @_;
> +
> +    # Match for *end* of HTTP header fields
> +    my $re_accept = qr<\r?\n\r?\n>;
> +
> +    # Don't reject anything - rejection causes EBADMSG otherwise, aborting the callback
> +    # and might match for unexpected things (such as TLS handshake data!)
> +    my $re_reject = undef;
> +
> +    # Skip over all header fields, but don't match for CRs and NLs
> +    # as those are required for re_accept above
> +    my $re_skip = qr<^.*[^\r\n]>;
> +
> +    eval {
> +	$reqstate->{hdl}->push_read(regex => $re_accept, $re_reject, $re_skip, sub {
> +	    my ($handle, $data) = @_;
> +
> +	    # $data now contains the entire (raw) HTTP header
> +	    $self->dprint("parse_request_header: handle: $handle [$data]");
> +
> +	    eval {
> +		$reqstate->{keep_alive}--;
> +
> +		my $http_header = $data =~ s/^\s+|\s+$//gr ;

why the 'r' modifier? is this the replacement for removing empty lines at the
start of the request - if so, it lost the comment that states this reason and
the RE is unnecessarily opaque.

> +		my ($http_method_line, $http_header_field_lines) = split(/\r?\n/, $http_header, 2);

so now we are splitting by line again

> +
> +		$reqstate->{http_method_line} = $http_method_line;
> +		$reqstate->{http_header_field_lines} = $http_header_field_lines;
> +
> +		# Array of references to the subroutines used to process the request
> +		my @process_steps = \(
> +		    &parse_request_header_method,

and this one here will parse just the first line

> +		    &parse_request_header_fields,

and this one will parse by line again..

> +		    &postprocess_header_fields,
> +		    &authenticate_and_handle_request,

I can actually see the benefit of having these two split out since they actually
only operate on the parsed data - see top-level comment above.

> +		);
> +
> +		for my $step (@process_steps) {
> +		    my $result = $step->($self, $reqstate);
> +
> +		    my $success = $result->{success} || 0;
> +		    if (!$success) {
> +			$self->make_response_from_unsuccessful_result($reqstate, $result);
> +			return;
> +		    }
> +		}
> +
> +	    };
> +
> +	    $self->dprint("parse_request_header: handle $handle DONE");
> +	    warn $@ if $@;
> +
> +	});
> +    };
> +    warn $@ if $@;
> +}
> +
> +sub make_response_from_unsuccessful_result {
> +    my ($self, $reqstate, $result) = @_;
> +
> +    die "No HTTP response status code provided\n"
> +	if !$result->{http_code};
> +
> +    eval {
> +	my $response_object = HTTP::Response->new(
> +	    $result->{http_code},
> +	    $result->{http_message},
> +	    $result->{http_header},
> +	    $result->{http_content}
> +	);
> +
> +	$self->dprint("Sending response: " . $result->{http_code} . " " . $result->{http_message});
> +
> +	$self->response(
> +	    $reqstate,
> +	    $response_object,
> +	    $result->{response_mtime},
> +	    $result->{response_nocomp},
> +	    $result->{response_delay},
> +	    $result->{response_stream_fh}
> +	);
> +
> +	$self->client_do_disconnect($reqstate);
> +    };
> +
> +    warn $@ if $@;
> +}
> +
> +sub parse_request_header_method {
> +    my ($self, $reqstate) = @_;
> +
> +    my $http_method_line = $reqstate->{http_method_line};
> +    die "http_method_line not within reqstate\n"
> +	if !$http_method_line;
> +
> +    my $re_http_method = qr<^(\S+)\040(\S+)\040HTTP\/(\d+)\.(\d+)>;

this RE is different from the original one: qr<> vs =~ //o
- I don't think the 'o' does anything meaningful here, so dropping is probably okay
- the '/' that is escaped in the expression doesn't make sense now, since '/' is
not the delimiter anymore

> +
> +    $self->dprint("Processing initial header line: $http_method_line");

additional debug print, so it's not behaving identical as described by the
commit message ;)

> +
> +    if ($http_method_line =~ $re_http_method) {
> +	my ($method, $uri, $major, $minor) = ($1, $2, $3, $4);
> +
> +	if ($major != 1) {
> +	    return {
> +		success => 0,
> +		http_code => 506,
> +		http_message => "HTTP protocol version $major.$minor not supported",
> +	    };
> +	}
> +
> +	# if an '@' comes before the first slash, proxy forwarding might consider
> +	# the frist part of the uri to be part of an authority
> +	if ($uri =~ m|^[^/]*@|) {
> +	    return {
> +		success => 0,
> +		http_code => 400,
> +		http_message => 'invalid uri',
> +	    };
> +	}
> +

this used to also save the request line (the line being parsed here) for logging
purposes. so /var/log/pveproxy/access.log now looks like this:

::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:46 +0100] "-" 200 1024
::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:47 +0100] "-" 200 1352
::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:48 +0100] "-" 200 73
::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:50 +0100] "-" 200 1028
::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:51 +0100] "-" 200 1360
::ffff:192.168.X.Y - user at realm [28/02/2023:13:49:52 +0100] "-" 200 73

which is not very helpful ;)

> +	$reqstate->{proto}->{str} = "HTTP/$major.$minor";
> +	$reqstate->{proto}->{maj} = $major;
> +	$reqstate->{proto}->{min} = $minor;
> +	$reqstate->{proto}->{ver} = $major*1000 + $minor;
> +
> +	$reqstate->{request} = HTTP::Request->new($method, $uri);
> +	$reqstate->{starttime} = [gettimeofday];
> +
> +	# Only requests with valid HTTP methods are counted
> +	$self->{request_count}++;
> +
> +	if ($self->{request_count} >= $self->{max_requests}) {
> +	    $self->{end_loop} = 1;
> +	}
> +
> +	return { success => 1 };
> +    }
> +
> +    $self->dprint("Could not match for HTTP method / proto: $http_method_line");

same as above

> +    return {
> +	success => 0,
> +	http_code => 400,
> +	http_message => 'bad request',
> +    };
> +}
> +
> +sub parse_request_header_fields {
> +    my ($self, $reqstate) = @_;
> +
> +    my $http_header_field_lines = $reqstate->{http_header_field_lines};
> +
> +    die "http_header_field_lines not within reqstate\n"
> +	if !$reqstate->{http_header_field_lines};
> +
> +    die "request object not within reqstate\n"
> +	if !$reqstate->{request};
> +
> +    my $request = $reqstate->{request};
> +    my $state = { size => 0 };
> +
> +    # Matches a header field definition, e.g.
> +    # 'Foo: Bar'
> +    my $re_field = qr<^([^:\s]+)\s*:\s*(.*)>;
> +
> +    # Matches a line beginning with at least one whitespace character,
> +    # followed by some other data; used to match multiline headers,
> +    # e.g. if 'Foo: Bar' was matched previously, and the next line
> +    # contains '   Qux', the resulting field will be 'Foo: Bar Qux'
> +    my $re_field_multiline = qr<^\s+(.*)>;
> +
> +    my @lines = split(/\r?\n/, $http_header_field_lines);
> +
> +    die "too many http header lines (> $limit_max_headers)\n"
> +        if scalar(@lines) >= $limit_max_headers;
> +
> +
> +    for my $line (@lines) {
> +	$self->dprint("Processing header line: $line");
> +
> +	die "http header too large\n"
> +	    if ($state->{size} += length($line)) >= $limit_max_header_size;
> +
> +	if ($line =~ $re_field) {
> +	    $request->push_header($state->{key}, $state->{value})
> +		if $state->{key};
> +	    ($state->{key}, $state->{value}) = ($1, $2);
> +
> +	} elsif ($line =~ $re_field_multiline) {
> +
> +	    if (!$state->{key}) {  # sanity check for invalid header
> +		return {
> +		    success => 0,
> +		    http_code => 400,
> +		    http_message => 'bad request',
> +		};
> +	    }
> +
> +	    $state->{value} .= " $1";
> +
> +	} else {
> +	    return {
> +		success => 0,
> +		http_code => 506,
> +		http_message => 'unable to parse request header',
> +	    };
> +	}
> +
> +    }
> +
> +    $request->push_header($state->{key}, $state->{value})
> +	if $state->{key};
> +
> +    return { success => 1 };
> +}
> +
> +sub postprocess_header_fields {
> +    my ($self, $reqstate) = @_;
> +
> +    die "request object not within reqstate\n"
> +	if !$reqstate->{request};
> +
> +    my $request = $reqstate->{request};
> +    my $method = $request->method();
> +
> +    if (!$known_methods->{$method}) {
> +	return {
> +	    success => 0,
> +	    http_code => HTTP_NOT_IMPLEMENTED,
> +	    http_message => "method '$method' not available",
> +	};
> +    }
> +
> +    my $h_connection = $request->header('Connection');
> +    my $h_accept_encoding = $request->header('Accept-Encoding');
> +
> +    $reqstate->{accept_gzip} = ($h_accept_encoding && $h_accept_encoding =~ m/gzip/) ? 1 : 0;
> +
> +    if ($h_connection) {
> +	$reqstate->{keep_alive} = 0 if $h_connection =~ m/close/oi;
> +    } else {
> +	if ($reqstate->{proto}->{ver} < 1001) {
> +	    $reqstate->{keep_alive} = 0;
> +	}
> +    }
> +
> +    my $h_transfer_encoding  = $request->header('Transfer-Encoding');
> +    if ($h_transfer_encoding && lc($h_transfer_encoding) eq 'chunked') {
> +	# Handle chunked transfer encoding
> +	return {
> +	    success => 0,
> +	    http_code => 501,
> +	    http_message => 'chunked transfer encoding not supported',
> +	};
> +
> +    } elsif ($h_transfer_encoding) {
> +	return {
> +	    success => 0,
> +	    http_code => 501,
> +	    http_message => "Unknown transfer encoding '$h_transfer_encoding'",
> +	};
> +    }
> +
> +    my $h_pveclientip = $request->header('PVEClientIP');
> +
> +    # FIXME: how can we make PVEClientIP header trusted?
> +    if ($self->{trusted_env} && $h_pveclientip) {
> +	$reqstate->{peer_host} = $h_pveclientip;
> +    } else {
> +	$request->header('PVEClientIP', $reqstate->{peer_host});
> +    }
> +
> +    if (my $rpcenv = $self->{rpcenv}) {
> +	$rpcenv->set_request_host($request->header('Host'));
> +    }
> +
> +    return { success => 1 };
> +}
> +
> +sub authenticate_and_handle_request {
> +    my ($self, $reqstate) = @_;
> +
> +    die "request not within reqstate\n"
> +	if !$reqstate->{request};
> +
> +    my $request = $reqstate->{request};
> +    my $method = $request->method();
> +    my $path = uri_unescape($request->uri->path());
> +    my $base_uri = $self->{base_uri};
> +
> +    # Begin authentication
> +    $self->dprint("Begin authentication");
> +    my $auth = {};
> +
> +    if ($self->{spiceproxy}) {
> +	$self->dprint("Authenticating via spiceproxy");
> +	my $connect_str = $request->header('Host');
> +	my ($vmid, $node, $port) = $self->verify_spice_connect_url($connect_str);
> +
> +	if (!(defined($vmid) && $node && $port)) {
> +	    return {
> +		success => 0,
> +		http_code => HTTP_UNAUTHORIZED,
> +		http_message => 'invalid ticket',
> +	    };
> +	}
> +
> +	$self->handle_spice_proxy_request($reqstate, $connect_str, $vmid, $node, $port);
> +	$self->dprint("spiceproxy request handled");
> +	return { success => 1 };
> +
> +    } elsif ($path =~ m/^\Q$base_uri\E/) {
> +	$self->dprint("Authenticating via base_uri regex");
> +	my $token = $request->header('CSRFPreventionToken');
> +	my $cookie = $request->header('Cookie');
> +	my $auth_header = $request->header('Authorization');
> +
> +	# prefer actual cookie
> +	my $ticket = PVE::APIServer::Formatter::extract_auth_value($cookie, $self->{cookie_name});
> +
> +	# fallback to cookie in 'Authorization' header
> +	$ticket = PVE::APIServer::Formatter::extract_auth_value($auth_header, $self->{cookie_name})
> +	    if !$ticket;
> +
> +	# finally, fallback to API token if no ticket has been provided so far
> +	my $api_token;
> +	$api_token = PVE::APIServer::Formatter::extract_auth_value($auth_header, $self->{apitoken_name})
> +	    if !$ticket;
> +
> +	my ($rel_uri, $format) = $split_abs_uri->($path, $base_uri);
> +
> +	if (!$format) {
> +	    return {
> +		success => 0,
> +		http_code => HTTP_NOT_IMPLEMENTED,
> +		http_message => 'no such uri',
> +	    };
> +	}
> +
> +	$self->dprint("Calling auth_handler");
> +	eval {
> +	    $auth = $self->auth_handler(
> +		$method,
> +		$rel_uri,
> +		$ticket,
> +		$token,
> +		$api_token,
> +		$reqstate->{peer_host}
> +	    );
> +	};
> +	$self->dprint("auth_handler done");
> +
> +	if (my $err = $@) {
> +	    $self->dprint("auth_handler threw error $@");
> +	    # HACK: see Note 1
> +	    Net::SSLeay::ERR_clear_error();
> +
> +	    my $result = {
> +		success => 0,
> +		response_delay => 3, # always delay unauthorized calls by 3 seconds
> +		response_nocomp => 0,
> +	    };
> +
> +	    if (ref($err) eq "PVE::Exception") {
> +		$result->{http_code} = $err->{code} || HTTP_INTERNAL_SERVER_ERROR;
> +		$result->{http_message} = $err->{msg};
> +
> +	    } elsif (my $formatter = PVE::APIServer::Formatter::get_login_formatter($format)) {
> +		my ($raw, $ct, $nocomp) =
> +		    $formatter->($path, $auth, $self->{formatter_config});
> +
> +		if (ref($raw) && (ref($raw) eq 'HTTP::Response')) {
> +		    $result->{http_code} = $raw->code;
> +		    $result->{http_message} = $raw->message;
> +		    $result->{http_header} = $raw->headers;
> +		    $result->{http_content} = $raw->content;
> +
> +		} else {
> +		    $result->{http_code} = HTTP_UNAUTHORIZED;
> +		    $result->{http_message} = 'Login Required';
> +		    $result->{http_header} = HTTP::Headers->new('Content-Type' => $ct);
> +		    $result->{http_content} = $raw;
> +
> +		}
> +
> +		$result->{response_nocomp} = $nocomp;
> +
> +	    } else {
> +		$result->{http_code} = HTTP_UNAUTHORIZED;
> +		$result->{http_message} = $err;
> +	    }
> +
> +	    return $result;
> +	}
> +    }
> +
> +    $reqstate->{log}->{userid} = $auth->{userid};
> +
> +    # Continue handling request
> +    $self->dprint("Continuing to handle request");
> +
> +    my $h_content_length = $request->header('Content-Length');
> +    my $h_content_type = $request->header('Content-Type');
> +
> +    if (!$h_content_length) {
> +	$self->dprint("No Content-Length provided; handling request the usual way");
> +	$self->handle_request($reqstate, $auth, $method, $path);
> +	return { success => 1 };
> +    }
> +
> +    if (!($method eq 'PUT' || $method eq 'POST')) {
> +	return {
> +	    success => 0,
> +	    http_code => 501,
> +	    http_message => "Unexpected content for method '$method'",
> +	};
> +    }
> +
> +    my ($content, $boundary) = $h_content_type ? parse_content_type($h_content_type) : ();
> +
> +    if ($auth->{isUpload} && !$self->{trusted_env}) {
> +	die "upload 'Content-Type '$h_content_type' not implemented\n"
> +	    if !($boundary && $content && ($content eq 'multipart/form-data'));
> +
> +	die "upload without content length header not supported"
> +	    if !$h_content_length;
> +
> +	$self->dprint("start upload $path $content $boundary");
> +
> +	my $tmpfilename = get_upload_filename();
> +	my $outfh = IO::File->new($tmpfilename, O_RDWR|O_CREAT|O_EXCL, 0600) ||
> +	    die "unable to create temporary upload file '$tmpfilename'";
> +
> +	$reqstate->{keep_alive} = 0;
> +
> +	my $boundlen = length($boundary) + 8; # \015?\012--$boundary--\015?\012
> +
> +	my $state = {
> +	    size => $h_content_length,
> +	    boundary => $boundary,
> +	    ctx => Digest::MD5->new,
> +	    boundlen =>  $boundlen,
> +	    maxheader => 2048 + $boundlen, # should be large enough
> +	    params => decode_urlencoded($request->url->query()),
> +	    phase => 0,
> +	    read => 0,
> +	    post_size => 0,
> +	    starttime => [gettimeofday],
> +	    outfh => $outfh,
> +	};
> +
> +	$reqstate->{tmpfilename} = $tmpfilename;
> +	$reqstate->{hdl}->on_read(sub {
> +	    $self->file_upload_multipart($reqstate, $auth, $method, $path, $state);
> +	});
> +
> +	return { success => 1 };
> +    }
> +
> +    if ($h_content_length > $limit_max_post) {
> +	return {
> +	    success => 0,
> +	    http_code => 501,
> +	    'for data too large',
> +	};
> +    }
> +
> +    if (
> +	!$content
> +	|| $content eq 'application/x-www-form-urlencoded'
> +	|| $content eq 'application/json'
> +    ) {
> +	$self->dprint("Received content '$content', reading chunk length of '$h_content_length'");
> +
> +	$reqstate->{hdl}->unshift_read(chunk => $h_content_length, sub {
> +	    my ($hdl, $data) = @_;
> +	    $self->dprint("Read data for content '$content'");
> +	    $self->dprint("Continuing to handle request after reading content");
> +	    $request->content($data);
> +	    $self->handle_request($reqstate, $auth, $method, $path);
> +	});
> +	return { success => 1 };
> +
> +    } else {
> +	return {
> +	    success => 0,
> +	    http_code => 506,
> +	    http_message => "upload 'Content-Type '$h_content_type' not implemented",
> +	};
> +    }
> +}
> +
>  sub accept {
>      my ($self) = @_;
>  
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
>