[pve-devel] applied: Re: [RFC ha-manager] explicitly sync journal when disabling watchdog updates
Thomas Lamprecht
t.lamprecht at proxmox.com
Wed Jun 21 05:55:02 CEST 2017
Just for the record: this got applied to master by Dietmar
On 05/23/2017 02:35 PM, Thomas Lamprecht wrote:
> Without syncing the journal could loose logs for a small interval (ca
> 10-60 seconds), but these last seconds are really interesting for
> analyzing the cause of a triggered watchdog.
>
> Also without this often the
>> "client did not stop watchdog - disable watchdog updates"
> messages wasn't flushed to persistent storage and so some users had a
> hard time to figure out why the machine reset.
>
> Use the '--sync' switch of journalctl which - to quote its man page -
> "guarantees that any log messages written before its invocation are
> safely stored on disk at the time it returns."
>
> Use execl to call `journalctl --sync` in a child process, do not care
> for any error checks or recovery as we will be reset anyway. This is
> just a hit or miss try to log the situation more consistently, if it
> fails we cannot really do anything anyhow.
>
> We call the function on two points:
> a) if we exit with active connections, here the watchdog will be
> triggered soon and we want to ensure that this is logged.
> b) if a client closes the connection without sending the magic close
> byte, here the watchdog would trigger while we hang in epoll at
> the beginning of the loop, so sync the log here also.
>
> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> ---
> src/watchdog-mux.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/src/watchdog-mux.c b/src/watchdog-mux.c
> index 7367077..a10187e 100644
> --- a/src/watchdog-mux.c
> +++ b/src/watchdog-mux.c
> @@ -27,6 +27,8 @@
>
> #define WATCHDOG_DEV "/dev/watchdog"
>
> +#define JOURNALCTL_BIN "/bin/journalctl"
> +
> int watchdog_fd = -1;
> int watchdog_timeout = 10;
> int client_watchdog_timeout = 60;
> @@ -98,7 +100,21 @@ watchdog_close(void)
>
> watchdog_fd = -1;
> }
> -
> +
> +static void
> +sync_journal_unsafe(void)
> +{
> +
> + pid_t child = fork();
> +
> + // do not care about fork error or collecting the childs exit status,
> + // we are resetting soon anyway and just want to sync out the journal
> + if (child == 0) {
> + execl(JOURNALCTL_BIN, JOURNALCTL_BIN, "--sync", NULL);
> + exit(-1);
> + }
> +}
> +
> int
> main(void)
> {
> @@ -327,6 +343,7 @@ main(void)
>
> if (!wd_client->magic_close) {
> fprintf(stderr, "client did not stop watchdog - disable watchdog updates\n");
> + sync_journal_unsafe();
> update_watchdog = 0;
> } else {
> free_client(wd_client);
> @@ -346,6 +363,7 @@ main(void)
> int active_count = active_client_count();
> if (active_count > 0) {
> fprintf(stderr, "exit watchdog-mux with active connections\n");
> + sync_journal_unsafe();
> } else {
> fprintf(stderr, "clean exit\n");
> watchdog_close();
More information about the pve-devel
mailing list