[pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
Stefan Reiter
s.reiter at proxmox.com
Thu Jan 14 16:39:21 CET 2021
This attempts to solve the issue where on slow network storages,
aborting a backup job (which may wait for buffers to flush) could take
longer than 5 seconds, and would thus result in the task being killed by
SIGKILL, not removing the backup lock in the process.
Make the implementation future-proof by using a map from task type to a
timeout value. Default stays at 5, so tasks other than 'vzdump' are not
affected.
Signed-off-by: Stefan Reiter <s.reiter at proxmox.com>
---
src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index d5b84d0..8a0cb9a 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -365,8 +365,16 @@ sub active_workers {
return $res;
}
+my $timeout_map = {
+ # backup cancellation on slow target storages might take a while, avoid
+ # leaving the VM in locked state
+ "vzdump" => 60,
+};
+
my $kill_process_group = sub {
- my ($pid, $pstart) = @_;
+ my ($pid, $pstart, $timeout) = @_;
+
+ $timeout //= 5;
# send kill to process group (negative pid)
my $kpid = -$pid;
@@ -374,8 +382,7 @@ my $kill_process_group = sub {
# always send signal to all pgrp members
kill(15, $kpid); # send TERM signal
- # give max 5 seconds to shut down
- for (my $i = 0; $i < 5; $i++) {
+ for (my $i = 0; $i < $timeout; $i++) {
return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
sleep (1);
}
@@ -394,7 +401,8 @@ sub check_worker {
return 0 if !$running;
if ($killit) {
- &$kill_process_group($task->{pid});
+ my $type = $task->{type};
+ &$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
return 0;
}
--
2.20.1
More information about the pve-devel
mailing list