[pve-devel] [PATCH qemu-server v2 1/3] qmeventd: rework 'forced_cleanup' handling and set timeout to 60s

Wolfgang Bumiller w.bumiller at proxmox.com
Fri Sep 23 10:16:06 CEST 2022


On Thu, Sep 22, 2022 at 04:19:33PM +0200, Dominik Csapak wrote:
> currently, the 'forced_cleanup' (sending SIGKILL to the qemu process),
> is intended to be triggered 5 seconds after sending the initial shutdown
> signal (SIGTERM) which is sadly not enough for some setups.
> 
> Accidentally, it could be triggered earlier than 5 seconds, if a
> SIGALRM triggers in the timespan directly before setting it again.
> 
> Also, this approach means that depending on when machines are shutdown
> their forced cleanup may happen after 5 seconds, or any time after, if
> new vms are shut off in the meantime.
> 
> Improve this situation by reworking the way we deal with this cleanup.
> We save the time incl. timeout in the CleanupData, and set a timeout
> to 'epoll_wait' of 10 seconds, which will then trigger a forced_cleanup.
> Remove entries from the forced_cleanup list when that entry is killed,
> or when the normal cleanup took place.
> 
> To improve the shutdown behaviour, increase the default timeout to 60
> seconds, which should be enough, but add a commandline toggle where
> users can set it to a different value.
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  qmeventd/qmeventd.c | 73 +++++++++++++++++++++++----------------------
>  qmeventd/qmeventd.h |  2 ++
>  2 files changed, 39 insertions(+), 36 deletions(-)
> 
> diff --git a/qmeventd/qmeventd.c b/qmeventd/qmeventd.c
> index 8d32827..46bc7eb 100644
> --- a/qmeventd/qmeventd.c
> +++ b/qmeventd/qmeventd.c
> @@ -551,27 +558,16 @@ handle_client(struct Client *client)
>      json_tokener_free(tok);
>  }
>  
> -
> -/*
> - * SIGALRM and cleanup handling
> - *
> - * terminate_client will set an alarm for 5 seconds and add its client's PID to
> - * the forced_cleanups list - when the timer expires, we iterate the list and
> - * attempt to issue SIGKILL to all processes which haven't yet stopped.
> - */
> -
> -static void
> -alarm_handler(__attribute__((unused)) int signum)
> -{
> -    alarm_triggered = 1;
> -}
> -
>  static void
> -sigkill(void *ptr, __attribute__((unused)) void *unused)

If you change the style here... (which I'm not a fan of btw.)

> +sigkill(struct CleanupData *ptr, time_t *cur_time)
>  {
> -    struct CleanupData data = *((struct CleanupData *)ptr);
> +    struct CleanupData data = *ptr;

...at least get rid of this line completely ^
and just use `ptr->` instea of `data.`, I see no reason to keep copying
the data onto the stack?
(or with the old style, make `data` a pointer and skip the cast)

>      int err;
>  
> +    if (data.timeout > *cur_time) {
> +	return;
> +    }
> +
>      if (data.pidfd > 0) {
>  	err = pidfd_send_signal(data.pidfd, SIGKILL, NULL, 0);
>  	(void)close(data.pidfd);





More information about the pve-devel mailing list