[pve-devel] [RFC container] Improve feedback for startup

Thu Aug 27 10:44:04 CEST 2020

On Thu, Aug 20, 2020 at 11:36:39AM +0200, Thomas Lamprecht wrote:
> On 19.08.20 12:30, Fabian Ebner wrote:
> > Since it was necessary to switch to 'Type=Simple' in the systemd
> > service (see 545d6f0a13ac2bf3a8d3f224c19c0e0def12116d ),
> > 'systemctl start pve-container at ID' would not wait for the 'lxc-start'
> > command anymore. Thus every container start was reported as a success
> > and the 'post-start' hook would trigger immediately after the
> > 'systemctl start' command.
> > 
> > Use 'lxc-monitor' to get the necessary information and detect
> > startup failure and only run the 'post-start' hookscript after
> > the container is effectively running. If something goes wrong
> > with the monitor, fall back to the old behavior.
> > 
> > Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> > ---
> >  src/PVE/LXC.pm | 36 +++++++++++++++++++++++++++++++++++-
> >  1 file changed, 35 insertions(+), 1 deletion(-)
> > 
> 
> appreciate the effort!
> We could also directly connect to /run/lxc/var/lib/lxc/monitor-fifo (or the abstract
> unix socket, but not much gained/difference here) of the lxc-monitord which publishes
> all state changes and unpack the new state [0] directly.
> 
> [0] https://github.com/lxc/lxc/blob/8bdacc22a48f9c09902a1d2febd71439cb38c082/src/lxc/state.h#L10
> 
> @Wolfgang, what do you think?

Just tested adding a state client to our Command.pm directly, seems to
work, so we would depend neither on lxc-monitor nor lxc-monitord.

Example & code follow below. The only issue with it is that we'd need to
retry connecting to the command socket a few times since we don't know
when it becomes available, but that shouldn't be too bad IMO.

But first some inline comments to the original patch.

> > diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> > index db5b8ca..35dc54c 100644
> > --- a/src/PVE/LXC.pm
> > +++ b/src/PVE/LXC.pm
> > @@ -2191,10 +2191,44 @@ sub vm_start {
> >  
> >      PVE::Storage::activate_volumes($storage_cfg, $vollist);
> >  
> > +    my $monitor_pid = open(my $monitor_fh, '-|', "/usr/bin/lxc-monitor -n $vmid")
> > +	or warn "could not open pipe to lxc-monitor\n";
> > +
> >      my $cmd = ['systemctl', 'start', "pve-container\@$vmid"];
> >  
> >      PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1);
> > -    eval { PVE::Tools::run_command($cmd); };
> > +    eval {
> > +	PVE::Tools::run_command($cmd);
> > +
> > +	my $success;
> > +	if ($monitor_pid) {
> > +	    eval {
> > +		local $SIG{ALRM} = sub { die "got timeout\n" };
> > +		alarm(10); # 'STARTING' should appear quickly

We have a `run_with_timeout` helper in Tools which is more robust than
this. There can be all sorts of interactions between alarms, `eval` and
the rest of the code. For one, this overrides a previous alarm without
restoring it.

> > +
> > +		while (my $line = <$monitor_fh>) {
> > +		    if ($line =~ m/^'$vmid' changed state to \[([A-Z]*)\]$/) {
> > +			my $status = $1;
> > +			alarm(0);
> > +			$success = 1 if $status eq 'RUNNING';
> > +			$success = 0 if $status eq 'ABORTING'
> > +				     || $status eq 'STOPPING'
> > +				     || $status eq 'STOPPED';
> > +			if (defined($success)) {
> > +			    kill('KILL', $monitor_pid);
> > +			    waitpid($monitor_pid, 0);
> > +			}
> > +		    } else {
> > +			die "unexpected output from lxc-monitor: $line\n";
> > +		    }
> > +		}
> > +	    };
> > +	    warn "Problem with lxc-monitor: $@" if $@;

Secondly, if we reach this part successfully, the alarm can still
trigger *HERE* which is already outside the `eval`, where we'd throw
back the `got timeout` message instead of the `lxc-start failed ...`
message, which could get potentially confusing. We do have a 2nd eval
around in this code, but that's a little "far". Eg. the 'unexpected
output from lxc-monitor' error might get swallowed if it triggers and
the alarm goes off before the above `warn` line, replacing the current
contents of '$@'... interrupts are pure pain.

> > +	    alarm(0);
> > +	}
> > +	die "'lxc-start' failed for container '$vmid'\n"
> > +	    if defined($success) && !$success;
> > +    };
> >      if (my $err = $@) {
> >  	unlink $skiplock_flag_fn;
> >  	die $err;
> > 

Usage example:

    use PVE::LXC::Command;

    my $sock = PVE::LXC::Command::get_state_client(404);
    die "not running\n" if !defined($sock);

    while (1) {
        my ($type, $name, $value) = PVE::LXC::Command::read_lxc_message($sock);
        last if !defined($type);
        print("$name: $type => $value\n");
    }

Patch for Command.pm:

---8<---
>From 6ac578ef889a3a9c8aefc4f05215b4ec66049546 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller at proxmox.com>
Date: Thu, 27 Aug 2020 10:31:06 +0200
Subject: [PATCH container] command: add state client functions

Signed-off-by: Wolfgang Bumiller <w.bumiller at proxmox.com>
---
 src/PVE/LXC/Command.pm | 91 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/src/PVE/LXC/Command.pm b/src/PVE/LXC/Command.pm
index beed890..6df767d 100644
--- a/src/PVE/LXC/Command.pm
+++ b/src/PVE/LXC/Command.pm
@@ -11,20 +11,36 @@ use warnings;
 
 use IO::Socket::UNIX;
 use Socket qw(SOCK_STREAM SOL_SOCKET SO_PASSCRED);
+use POSIX qw(NAME_MAX);
 
 use base 'Exporter';
 
 use constant {
+    LXC_CMD_GET_STATE => 3,
     LXC_CMD_GET_CGROUP => 6,
+    LXC_CMD_ADD_STATE_CLIENT => 10,
     LXC_CMD_FREEZE => 15,
     LXC_CMD_UNFREEZE => 16,
     LXC_CMD_GET_LIMITING_CGROUP => 19,
 };
 
+use constant {
+    STATE_STOPPED => 0,
+    STATE_STARTING => 1,
+    STATE_RUNNING => 2,
+    STATE_STOPPING => 3,
+    STATE_ABORTING => 4,
+    STATE_FREEZING => 5,
+    STATE_FROZEN => 6,
+    STATE_THAWED => 7,
+    MAX_STATE => 8,
+};
+
 our @EXPORT_OK = qw(
     raw_command_transaction
     simple_command
     get_cgroup_path
+    get_state_client
 );
 
 # Get the command socket for a container.
@@ -81,6 +97,33 @@ my sub _unpack_lxc_cmd_rsp($) {
     return ($ret, $len);
 }
 
+my $LXC_MSG_SIZE = length(pack('I! Z'.(NAME_MAX+1).' x![I] I', 0, "", 0));
+# Unpack an lxc_msg struct.
+my sub _unpack_lxc_msg($) {
+    my ($packet) = @_;
+
+    # struct lxc_msg {
+    #     lxc_msg_type_t type;
+    #     char name[NAME_MAX+1];
+    #     int value;
+    # };
+
+    my ($type, $name, $value) = unpack('I!Z'.(NAME_MAX+1).'I!', $packet);
+
+    if ($type == 0) {
+	$type = 'STATE';
+    } elsif ($type == 1) {
+	$type = 'PRIORITY';
+    } elsif ($type == 2) {
+	$type = 'EXITCODE';
+    } else {
+	warn "unsupported lxc message type $type received\n";
+	$type = undef;
+    }
+
+    return ($type, $name, $value);
+}
+
 # Send a complete packet:
 my sub _do_send($$) {
     my ($sock, $data) = @_;
@@ -206,4 +249,52 @@ sub unfreeze($$) {
     return $res;
 }
 
+# Add this command socket as a state client.
+#
+# Currently all states are observed.
+#
+# Returns undef if the container is not running, dies on errors.
+sub get_state_client($) {
+    my ($vmid) = @_;
+
+    my $socket = _get_command_socket($vmid)
+	or return undef;
+
+    # For now we want all states (except 'reboots', since we would never see those, reboots would
+    # use a value of '2' for STATE_RUNNING)
+    my $states = pack('I!', 2) x MAX_STATE;
+
+    my ($res, undef) = raw_command_transaction($socket, LXC_CMD_ADD_STATE_CLIENT, $states);
+    if ($res != MAX_STATE) {
+	die "container is currently in unexpected state $res\n";
+    }
+
+    return $socket;
+}
+
+# Read an lxc message from a socket.
+#
+# Returns undef on EOF (if lxc exits).
+# Otherwise returns a (type, vmid, value) tuple.
+#
+# The returned 'type' currently can be 'STATE', 'PRIORITY' or 'EXITSTATUS'.
+sub read_lxc_message($) {
+    my ($socket) = @_;
+
+    my $msg;
+    my $got = recv($socket, $msg, $LXC_MSG_SIZE, 0)
+	// die "failed to read from state socket: $!\n";
+
+    if (length($msg) == 0) {
+	return undef;
+    }
+
+    die "short read on state socket ($LXC_MSG_SIZE != ".length($msg).")\n"
+	if length($msg) != $LXC_MSG_SIZE;
+
+    my ($type, $name, $value) = _unpack_lxc_msg($msg);
+
+    return ($type, $name, $value);
+}
+
 1;
-- 
2.20.1