[pve-devel] [RFC container] Improve feedback for startup
Thomas Lamprecht
t.lamprecht at proxmox.com
Mon Sep 7 18:32:32 CEST 2020
On 27.08.20 10:44, Wolfgang Bumiller wrote:
> On Thu, Aug 20, 2020 at 11:36:39AM +0200, Thomas Lamprecht wrote:
>> On 19.08.20 12:30, Fabian Ebner wrote:
>>> Since it was necessary to switch to 'Type=Simple' in the systemd
>>> service (see 545d6f0a13ac2bf3a8d3f224c19c0e0def12116d ),
>>> 'systemctl start pve-container at ID' would not wait for the 'lxc-start'
>>> command anymore. Thus every container start was reported as a success
>>> and the 'post-start' hook would trigger immediately after the
>>> 'systemctl start' command.
>>>
>>> Use 'lxc-monitor' to get the necessary information and detect
>>> startup failure and only run the 'post-start' hookscript after
>>> the container is effectively running. If something goes wrong
>>> with the monitor, fall back to the old behavior.
>>>
>>> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
>>> ---
>>> src/PVE/LXC.pm | 36 +++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 35 insertions(+), 1 deletion(-)
>>>
>>
>> appreciate the effort!
>> We could also directly connect to /run/lxc/var/lib/lxc/monitor-fifo (or the abstract
>> unix socket, but not much gained/difference here) of the lxc-monitord which publishes
>> all state changes and unpack the new state [0] directly.
>>
>> [0] https://github.com/lxc/lxc/blob/8bdacc22a48f9c09902a1d2febd71439cb38c082/src/lxc/state.h#L10
>>
>> @Wolfgang, what do you think?
>
> Just tested adding a state client to our Command.pm directly, seems to
> work, so we would depend neither on lxc-monitor nor lxc-monitord.
>
> Example & code follow below. The only issue with it is that we'd need to
> retry connecting to the command socket a few times since we don't know
> when it becomes available, but that shouldn't be too bad IMO.
>
> [..snip..]
With below I never get the initial stopped -> running edge, though.
I can monitor the CT getting stopped, but not the other way around.
Adding extra code to check if the CTs running to abort the recv would
feel like this missing the point a bit...
>
> Usage example:
>
> use PVE::LXC::Command;
>
> my $sock = PVE::LXC::Command::get_state_client(404);
> die "not running\n" if !defined($sock);
>
> while (1) {
> my ($type, $name, $value) = PVE::LXC::Command::read_lxc_message($sock);
> last if !defined($type);
> print("$name: $type => $value\n");
> }
>
> Patch for Command.pm:
>
> ---8<---
> From 6ac578ef889a3a9c8aefc4f05215b4ec66049546 Mon Sep 17 00:00:00 2001
> From: Wolfgang Bumiller <w.bumiller at proxmox.com>
> Date: Thu, 27 Aug 2020 10:31:06 +0200
> Subject: [PATCH container] command: add state client functions
>
> Signed-off-by: Wolfgang Bumiller <w.bumiller at proxmox.com>
> ---
> src/PVE/LXC/Command.pm | 91 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 91 insertions(+)
>
> diff --git a/src/PVE/LXC/Command.pm b/src/PVE/LXC/Command.pm
> index beed890..6df767d 100644
> --- a/src/PVE/LXC/Command.pm
> +++ b/src/PVE/LXC/Command.pm
> @@ -11,20 +11,36 @@ use warnings;
>
> use IO::Socket::UNIX;
> use Socket qw(SOCK_STREAM SOL_SOCKET SO_PASSCRED);
> +use POSIX qw(NAME_MAX);
>
> use base 'Exporter';
>
> use constant {
> + LXC_CMD_GET_STATE => 3,
> LXC_CMD_GET_CGROUP => 6,
> + LXC_CMD_ADD_STATE_CLIENT => 10,
> LXC_CMD_FREEZE => 15,
> LXC_CMD_UNFREEZE => 16,
> LXC_CMD_GET_LIMITING_CGROUP => 19,
> };
>
> +use constant {
> + STATE_STOPPED => 0,
> + STATE_STARTING => 1,
> + STATE_RUNNING => 2,
> + STATE_STOPPING => 3,
> + STATE_ABORTING => 4,
> + STATE_FREEZING => 5,
> + STATE_FROZEN => 6,
> + STATE_THAWED => 7,
> + MAX_STATE => 8,
> +};
> +
> our @EXPORT_OK = qw(
> raw_command_transaction
> simple_command
> get_cgroup_path
> + get_state_client
> );
>
> # Get the command socket for a container.
> @@ -81,6 +97,33 @@ my sub _unpack_lxc_cmd_rsp($) {
> return ($ret, $len);
> }
>
> +my $LXC_MSG_SIZE = length(pack('I! Z'.(NAME_MAX+1).' x![I] I', 0, "", 0));
> +# Unpack an lxc_msg struct.
> +my sub _unpack_lxc_msg($) {
> + my ($packet) = @_;
> +
> + # struct lxc_msg {
> + # lxc_msg_type_t type;
> + # char name[NAME_MAX+1];
> + # int value;
> + # };
> +
> + my ($type, $name, $value) = unpack('I!Z'.(NAME_MAX+1).'I!', $packet);
> +
> + if ($type == 0) {
> + $type = 'STATE';
> + } elsif ($type == 1) {
> + $type = 'PRIORITY';
> + } elsif ($type == 2) {
> + $type = 'EXITCODE';
> + } else {
> + warn "unsupported lxc message type $type received\n";
> + $type = undef;
> + }
> +
> + return ($type, $name, $value);
> +}
> +
> # Send a complete packet:
> my sub _do_send($$) {
> my ($sock, $data) = @_;
> @@ -206,4 +249,52 @@ sub unfreeze($$) {
> return $res;
> }
>
> +# Add this command socket as a state client.
> +#
> +# Currently all states are observed.
> +#
> +# Returns undef if the container is not running, dies on errors.
> +sub get_state_client($) {
> + my ($vmid) = @_;
> +
> + my $socket = _get_command_socket($vmid)
> + or return undef;
> +
> + # For now we want all states (except 'reboots', since we would never see those, reboots would
> + # use a value of '2' for STATE_RUNNING)
> + my $states = pack('I!', 2) x MAX_STATE;
> +
> + my ($res, undef) = raw_command_transaction($socket, LXC_CMD_ADD_STATE_CLIENT, $states);
> + if ($res != MAX_STATE) {
> + die "container is currently in unexpected state $res\n";
> + }
> +
> + return $socket;
> +}
> +
> +# Read an lxc message from a socket.
> +#
> +# Returns undef on EOF (if lxc exits).
> +# Otherwise returns a (type, vmid, value) tuple.
> +#
> +# The returned 'type' currently can be 'STATE', 'PRIORITY' or 'EXITSTATUS'.
> +sub read_lxc_message($) {
> + my ($socket) = @_;
> +
> + my $msg;
> + my $got = recv($socket, $msg, $LXC_MSG_SIZE, 0)
> + // die "failed to read from state socket: $!\n";
> +
> + if (length($msg) == 0) {
> + return undef;
> + }
> +
> + die "short read on state socket ($LXC_MSG_SIZE != ".length($msg).")\n"
> + if length($msg) != $LXC_MSG_SIZE;
> +
> + my ($type, $name, $value) = _unpack_lxc_msg($msg);
> +
> + return ($type, $name, $value);
> +}
> +
> 1;
>
More information about the pve-devel
mailing list