[pve-devel] [PATCH ha-manager 3/3] watchdog: sync journal after sending expiration related messages
Thomas Lamprecht
t.lamprecht at proxmox.com
Tue Jun 17 08:21:56 CEST 2025
Am 19.05.25 um 15:09 schrieb Maximiliano Sandoval:
> One sync comes after warning that the watchdog is about to expire, and a
> second right after the watchdog expires.
>
> To maximize the chances the log will contain entries relevant to a fence
> event. This would be extremely useful for detecting whether a node
> fenced.
>
> Signed-off-by: Maximiliano Sandoval <m.sandoval at proxmox.com>
> ---
> src/watchdog-mux.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
> index e14c768..8669b10 100644
> --- a/src/watchdog-mux.c
> +++ b/src/watchdog-mux.c
> @@ -268,11 +268,13 @@ main(void)
> ) {
> client_list[i].warning_state = WARNING_ISSUED;
> fprintf(stderr, "client watchdog is about to expire\n");
> + sync_journal_unsafe();
The "unsafe" is there for a reason, on a loaded machine doing above
might trigger a few times and create a zombie left over process for
each of those.
Simplest fix might be doing a double fork there so that the parent
process does not exist anymore, in which case systemd collects the
child process exit status, albeit that wouldn't be the most efficient
solution.
> }
>
> if ((ctime - client_list[i].time) > client_watchdog_timeout) {
> update_watchdog = 0;
> fprintf(stderr, "client watchdog expired - disable watchdog updates\n");
> + sync_journal_unsafe();
This is basically useless compared to the status quo, there is already
such a call a few (compiled) instructions after that branch hits anyway
as we break the main loop then.
> }
> }
> }
More information about the pve-devel
mailing list