[pve-devel] [PATCH qemu-server 1/3] qmeventd: rework 'forced_cleanup' handling and set timeout to 60s

Thu Sep 22 14:46:08 CEST 2022

On Thu, Sep 22, 2022 at 02:22:45PM +0200, Dominik Csapak wrote:
> On 9/22/22 14:01, Wolfgang Bumiller wrote:
> > On Thu, Sep 22, 2022 at 01:31:49PM +0200, Dominik Csapak wrote:
> > > [snip]
> > > > > -/*
> > > > > - * SIGALRM and cleanup handling
> > > > > - *
> > > > > - * terminate_client will set an alarm for 5 seconds and add its client's PID to
> > > > > - * the forced_cleanups list - when the timer expires, we iterate the list and
> > > > > - * attempt to issue SIGKILL to all processes which haven't yet stopped.
> > > > > - */
> > > > > -
> > > > > -static void
> > > > > -alarm_handler(__attribute__((unused)) int signum)
> > > > > -{
> > > > > -    alarm_triggered = 1;
> > > > > -}
> > > > > -
> > > > 
> > > > wasn't this intentionally decoupled like this?
> > > > 
> > > > alarm_handler just sets the flag
> > > > actual force cleanup is conditionalized on the alarm having triggered,
> > > > but the cleanup happens outside of the signal handler..
> > > > 
> > > > is there a reason from switching away from these scheme? we don't need
> > > > to do the cleanup in the signal handler (timing is already plenty fuzzy
> > > > anyway ;))
> > > 
> > > no real reason, i found the code somewhat cleaner, but you're right,
> > > we probably want to keep that, and just trigger it regularly
> > 
> >  From what I can tell the only point of this signal is to interrupt
> > `epoll()` after a while to call the cleanup/kill handler since we only
> > have a single worker here that needs to do some work after a timeout.
> > 
> > Why not either:
> >    - set a bool instead of calling `alarm()` which causes the next
> >      `epoll()` call to use a timeout and call the cleanups if epoll turns
> >      up empty >    - or create a timerfd (timerfd_create(2)) in the beginning which we
> >      add to the epoll context and use `timerfd_settime(2)` in place of
> >      `alarm()`, which will also wake up the epoll call without having to add
> >      timeouts to it
> > 
> > `alarm()` is just such a gross interface...
> > In theory we'd also be able to ditch all of those `EINTR` loops as we
> > wouldn't be expecting any interrupts anymore... (and if we did expect
> > them, we could add a `signalfd(2)` to `epoll()` as well ;-)
> 
> first one sounds much simpler but the second one sounds much more elegant ;)
> i'll see what works/feels better
> 
> couldn't we also directly add a new timerfd for each client that
> needs such a timeout instead of managing some list ?

I'm not really a fan of abusing kernel-side resources for a user-space
scheduling problem ;-).

If you want a per-client timeout, I'd much rather just have a list of
points in time that epoll() should target.

That list may well be `struct Client` itself if you want to merge the
data as you described below. The `struct Client` could just receive a
`STAILQ_ENTRY(Client) cleanup_entries;` member (queue(7), stailq(3)),
and a cleanup time which is used to generate the timeout for `epoll()`,
and on every wakeup, we can march through the already expired queue
entries.

> the cleanupdata could go into the even.data.ptr and we wouldn't
> have to do anything periodically, just handle the timeout
> when epoll wakes up?
> 
> we probably would have to merge the client and clenaupdata structs
> so that we can see which is which, but that should not be
> that of a problem?