PVE child process behavior question

Wed May 21 15:13:01 CEST 2025

Hello,

We had an issue with a customer migrating a VM between nodes using our 
shared storage solution.

On the target host the OOM killer killed the main migration process, but 
the child process (which actually performs the migration) kept on 
working, which we did not expect, and that caused some issues.

This leads us to the broader question - after a request is submitted, 
the parent can be terminated, and not return a response to the client, 
while the work is being done, and the request can be wrongly retried or 
considered unfinished.

Should the child processes terminate together with the parent to guard 
against this, or is this expected behavior?


Here is an example patch to do this:

diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm

index bfde7e6..744fffc 100644

--- a/src/PVE/RESTEnvironment.pm

+++ b/src/PVE/RESTEnvironment.pm

@@ -13,8 +13,9 @@ use Fcntl qw(:flock);

  use IO::File;

  use IO::Handle;

  use IO::Select;

-use POSIX qw(:sys_wait_h EINTR);

+use POSIX qw(:sys_wait_h EINTR SIGKILL);

  use AnyEvent;

+use Linux::Prctl qw(set_pdeathsig);


  use PVE::Exception qw(raise raise_perm_exc);

  use PVE::INotify;

@@ -549,6 +550,9 @@ sub fork_worker {

POSIX::setsid();

     }


+   # The signal that the calling process will get when its parent dies

+   set_pdeathsig(SIGKILL);

+

POSIX::close ($psync[0]);

POSIX::close ($ctrlfd[0]) if $sync;

POSIX::close ($csync[1]);