[pve-devel] [RFC container] fix: shutdown: if lxc-stop fails, wait for socket closing with timeout

Friedrich Weber f.weber at proxmox.com
Thu Jan 19 13:39:02 CET 2023


When trying to shutdown a hung container with `forceStop=0` (e.g. via
the Web UI), the shutdown task may run indefinitely while holding a
lock on the container config. The reason is that the shutdown
subroutine waits for the LXC command socket to close, even if the
`lxc-stop` command has failed due to timeout. This prevents other
tasks (such as a stop task) from acquiring the lock. In order to stop
the container, the shutdown task has to be explicitly killed first,
which is inconvenient. This occurs e.g. when trying to shutdown a hung
CentOS 7 container (with systemd <v232) in a cgroupv2 environment.

This fix imposes a timeout on the socket read operation if the
`lxc-stop` command has failed. Behavior in case `lxc-stop` succeeds is
unchanged. This reintroduces some code from b1bad293. The timeout
duration is the given shutdown timeout, meaning that the final task
duration in the scenario above is twice the shutdown timeout.

Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
---

I stumbled upon the hanging CentOS 7 container shutdown task while
looking into #4474. However, it is quite the edge case and only
slightly inconvenient, so I'm not sure whether it needs to be
addressed -- and if it needs to be addressed, I'm not sure whether the
attached fix is the way to go. :) So I'm submitting it as an RFC. Let
me know what you think.

 src/PVE/LXC.pm | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
index ce6d5a5..9b3cd64 100644
--- a/src/PVE/LXC.pm
+++ b/src/PVE/LXC.pm
@@ -2473,11 +2473,21 @@ sub vm_stop {
     }
 
     eval { run_command($cmd, timeout => $shutdown_timeout) };
+
+    my $result = 1;
+    my $wait = sub { $result = <$sock>; };
+
+    # Wait until the command socket is closed.
+    # In case the lxc-stop call failed, reading from the command socket may block forever,
+    # so read with another timeout to avoid freezing the shutdown task.
     if (my $err = $@) {
-	warn $@ if $@;
-    }
+	warn $err if $err;
 
-    my $result = <$sock>;
+	eval { PVE::Tools::run_with_timeout($shutdown_timeout, $wait); };
+	warn "read from command socket failed: $@" if $@;
+    } else {
+	$wait->();
+    }
 
     return if !defined $result; # monitor is gone and the ct has stopped.
     die "container did not stop\n";
-- 
2.30.2






More information about the pve-devel mailing list