[pbs-devel] [PATCH proxmox master v1] fix #6738: rest-server: improve TLS handshake error logging
Hannes Laimer
h.laimer at proxmox.com
Fri Nov 7 15:24:42 CET 2025
comment + small nit inline
On 9/10/25 17:40, Max R. Carrara wrote:
> Right now, when a client connects to the server, we continuously peek
> into the TCP stream's buffer until we can perform the TLS handshake
> check. If the client closes the connection after sending < 5 bytes, we
> exit the peek loop. If the client instead keeps the connection open,
> the future times out if the handshake check isn't made.
>
> Instead of exiting with `ErrorKind::ConnectionAborted`, exit with
> `ErrorKind::Other` instead, as that variant isn't ever returned by
> stdlib functions. That way we can be sure we don't accidentally ignore
> an error.
>
I don't think this is accurate, stdlib maps things it doesn't have an
ErrorKind for to ::Other [1]
Could we maybe have something like
```
#[derive(Debug, Error)]
#[error("client stopped sending data while waiting for TLS handshake
bytes")]
struct HandshakeNoProgress;
...
if peek_len == previous_peek_len {
Err(HandshakeNoProgress.into())
} else {
...
if err.downcast_ref::<HandshakeNoProgress>().is_some() {
return;
}
log::error!("[{peer}] TLS handshake error: {err:?}");
```
note: did not test this. Like that we could avoid unnecessary logging,
but don't lose out on potentially actual errors.
[1]
https://github.com/rust-lang/rust/blob/96064126a086a8428d66e07fb3b91421bb86a512/library/std/src/sys/pal/unix/mod.rs#L235
> Reduce the amount of log noise by ignoring our `ErrorKind::Other`
> errors, as those are usually only returned when some monitoring
> solution opens and closes a TCP connection for its healthchecks [0].
>
> Log timeouts without their error chain (like before), as
> there isn't any additional context for them.
>
> Because any other kinds of errors should be sufficiently rare, log
> them with their error chain. The error chain is formatted to stay in
> one line, as the default formatting looks quite awkward otherwise.
>
> Make the context added to the `async_io()` coroutine call more
> general and informative, as not only the stream buffer peeking can
> fail. Shorten the general log message as well.
>
> [0]: https://forum.proxmox.com/threads/subject-intermittent-backup-errors-and-repeated-tls-handshake-failures.170089/post-795100
>
> Fixes: #6738
> Signed-off-by: Max R. Carrara <m.carrara at proxmox.com>
> ---
>
> Tested this interactively in my Python console by opening a raw TCP
> stream using the `socket` module.
>
> proxmox-rest-server/src/connection.rs | 29 ++++++++++++++++++++++++---
> 1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/proxmox-rest-server/src/connection.rs b/proxmox-rest-server/src/connection.rs
> index 9511b7cb..81c0e05e 100644
> --- a/proxmox-rest-server/src/connection.rs
> +++ b/proxmox-rest-server/src/connection.rs
> @@ -464,7 +464,28 @@ impl AcceptBuilder {
> }
> }
> Err(err) => {
> - log::error!("[{peer}] failed to check for TLS handshake: {err}");
> + // Ignore io::Errors of kind "Other", as that means the client didn't send any
> + // additional data between stream buffer peeks in our case
> + if let Some(io::ErrorKind::Other) =
> + err.downcast_ref::<io::Error>().map(io::Error::kind)
> + {
> + return;
> + }
> +
> + // Timeouts are printed without their cause
> + if err.downcast_ref::<tokio::time::error::Elapsed>().is_some() {
> + log::error!("[{peer}] TLS handshake error: {err}");
> + return;
> + }
> +
> + // As other kinds of errors are relatively rare, print them with their error chain
> + let mut err_chain = err.chain().map(ToString::to_string);
> + let err_text = err_chain.by_ref().take(1).collect::<Vec<_>>().join("");
> + let cause = err_chain.collect::<Vec<_>>().join(" - ");
> +
> + log::error!(
> + "[{peer}] TLS handshake error: {err_text} - Caused by: {cause}"
`cause` could be empty, no? we should probably skip the ' - Caused by: '
in that case
> + );
> }
> }
> }
> @@ -505,7 +526,7 @@ impl AcceptBuilder {
> // if we detect the same peek len again but still got a readable stream,
> // the connection was probably closed, so abort here
> if peek_len == previous_peek_len {
> - Err(io::ErrorKind::ConnectionAborted.into())
> + Err(io::ErrorKind::Other.into())
> } else {
> previous_peek_len = peek_len;
> Err(io::ErrorKind::WouldBlock.into())
> @@ -516,7 +537,9 @@ impl AcceptBuilder {
> }
> })
> .await
> - .context("couldn't peek into incoming TCP stream")
> + .context(
> + "failed to determine whether the peer is initiating a TLS handshake or not",
> + )
> };
>
> tokio::time::timeout(timeout, future)
More information about the pbs-devel
mailing list