[pbs-devel] [PATCH proxmox v2] daemon: clean up middle process of double fork
Wolfgang Bumiller
w.bumiller at proxmox.com
Fri Feb 7 15:35:59 CET 2025
On Tue, Dec 03, 2024 at 12:23:49PM +0100, Dominik Csapak wrote:
> so we don't leave around a zombie process when the old daemon still
> needs to run, because of e.g. a running task.
>
> Since this is mostly a cosmetic issue though, only try a clean up with a
> 10 second timeout, so we don't block forever. (It could happen that it
> didn't exit at that point, but it's very unlikely.)
>
> In case we do run into the timeout here, the process will not be
> collected until the parent process exits and the middle process is
> collected by pid 1.
>
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
> changes from v1:
> * use a timeout
> * log the error
>
> proxmox-daemon/src/server.rs | 33 +++++++++++++++++++++++++++++++--
> 1 file changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/proxmox-daemon/src/server.rs b/proxmox-daemon/src/server.rs
> index efea9078..27153fc2 100644
> --- a/proxmox-daemon/src/server.rs
> +++ b/proxmox-daemon/src/server.rs
> @@ -8,6 +8,7 @@ use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
> use std::panic::UnwindSafe;
> use std::path::PathBuf;
> use std::pin::{pin, Pin};
> +use std::time::Duration;
>
> use anyhow::{bail, format_err, Error};
> use futures::future::{self, Either};
> @@ -15,6 +16,7 @@ use nix::unistd::{fork, ForkResult};
>
> use proxmox_sys::fd::fd_change_cloexec;
> use proxmox_sys::fs::CreateOptions;
> +use proxmox_sys::linux::timer;
>
> type BoxedStoreFunc = Box<dyn FnOnce() -> Result<String, Error> + UnwindSafe + Send>;
>
> @@ -165,10 +167,12 @@ impl Reloader {
The error case above here does a `log::error!()` - that is literally the
only thing which could possibly hang.
We could have a 2nd socketpair for stringified errors (or turn the
existing one into a datagram based socket with a type prefix for
messages) and just use a blocking wait().
But if we really want to use a timeout, I'd rather add a pidfd wrapper,
use `pidfd_open()` (it's our fork, so since we don't `wait()` for it, we
should get the correct process, and we could - in the future - replace
`fork()` with an equivalent `clone()` returning a pidfd).
The pidfd can be used with poll/epoll (so it would even work for async
code via `AsyncFd`).
This is also a case where we could consider switching to a vfork() type
of fork. That might even play nice with whatever logger we might be
using, making it even less likely for anything to block...
> // No matter how we managed to get here, this is the time where we bail out quickly:
> unsafe { libc::_exit(-1) }
> }
> - Ok(ForkResult::Parent { child }) => {
> + Ok(ForkResult::Parent {
> + child: middle_child,
> + }) => {
> log::debug!(
> "forked off a new server (first pid: {}), waiting for 2nd pid",
> - child
> + middle_child
> );
> std::mem::drop(pnew);
> let mut pold = std::fs::File::from(pold);
> @@ -211,6 +215,10 @@ impl Reloader {
> log::error!("child vanished during reload: {}", e);
> }
>
> + if let Err(e) = waitpid_with_timeout(middle_child, Duration::from_secs(10)) {
> + log::error!("waitpid for middle process failed: {e}");
> + }
> +
> Ok(())
> }
> Err(e) => {
> @@ -230,6 +238,27 @@ impl Reloader {
> }
> }
>
> +fn waitpid_with_timeout(pid: nix::unistd::Pid, timeout: std::time::Duration) -> Result<(), Error> {
> + // unblock the timeout signal temporarily
> + let _sigblock_guard = timer::unblock_timeout_signal();
> +
> + // setup a timeout timer
> + let mut timer = timer::Timer::create(
> + timer::Clock::Realtime,
> + timer::TimerEvent::ThisThreadSignal(timer::SIGTIMEOUT),
> + )?;
> +
> + timer.arm(
> + timer::TimerSpec::new()
> + .value(Some(timeout))
> + .interval(Some(Duration::from_millis(10))),
> + )?;
> +
> + nix::sys::wait::waitpid(pid, None)?;
> +
> + Ok(())
> +}
> +
> fn fd_store_func(fd: RawFd) -> Result<BoxedStoreFunc, Error> {
> let fd = unsafe {
> OwnedFd::from_raw_fd(nix::fcntl::fcntl(
> --
> 2.39.5
More information about the pbs-devel
mailing list