[pve-devel] NAK: [PATCH v2 container] vmstatus: include detected IP address of running containers

Thu Jul 29 16:37:43 CEST 2021

On 29/07/2021 14:26, Oguz Bektas wrote:
> add a helper 'find_lxc_ip_address' to fetch IP address of container from
> its network namespace using lxc-info.
> 
> for the moment it can be queried with the pct tool:
> $ pct status 1000 --verbose
> cpu: 0
> cpus: 1
> disk: 6422528
> diskread: 368640
> diskwrite: 0
> ipaddress: 192.168.31.83        <----
> maxdisk: 4294967296
> maxmem: 536870912
> maxswap: 536870912
> mem: 864256
> name: CT1000
> netin: 3281265
> netout: 15794
> pid: 34897
> status: running
> swap: 94208
> type: lxc
> uptime: 11088
> vmid: 1000
> 
> Signed-off-by: Oguz Bektas <o.bektas at proxmox.com>
> ---
> v1->v2:
> * add to vmstatus_return_properties
> * add a global variable for caching results
> * handle multiple addresses
> * pct also needs to handle array when printing verbose status
> 

You forgot to include the simple and proved "full" approach I asked for but added
some per-worker caching (so data is eventually 4x in memory) with no rational for
that in the comment message (NAK!) some magic time-limit values with again no
rationale in commit message or the code? That's not really ideal IMO.

Besides that, looks like it would actually work now, but I'm NAK'ing this approach
in general for now (@other maintainers, please do not apply for now). Just way to
much overhead (thought at scale) so run_command open3 + lxc-info + library loading
unshare + clone and that every 6'th statd vmstatus call with a memory cost in statd
and api proxy? meh at best, ideally we'd reduce overhead not add new one.

We could argue it away with the `full` flag, as then it'd be only set for CLI, but
as that's not included, the pve-rs approach wasn't explored at all (there we'D never
exec anything, the lib would be already loaded and a single clone would be required),
and it's just not critical I do not want to rush this through.

Still, some comments inline.

@wobu why the heck do I need to set ns+clone anyway, couldn't I just tell netlink
which NS I want the info from (from priv. process) the kernel knows all that
anyway.
I have a slight feeling that using bpftrace or consorty to hook into adding/deleting
IP addresses and just save the state directly in /run or the like would be more
efficient, but that *really* just feels wrong ;P

>  src/PVE/CLI/pct.pm |  7 ++++++-
>  src/PVE/LXC.pm     | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm
> index 254b3b3..827a68c 100755
> --- a/src/PVE/CLI/pct.pm
> +++ b/src/PVE/CLI/pct.pm
> @@ -66,7 +66,12 @@ __PACKAGE__->register_method ({
>  	    foreach my $k (sort (keys %$stat)) {
>  		my $v = $stat->{$k};
>  		next if !defined($v);
> -		print "$k: $v\n";
> +		if (ref($v) eq 'ARRAY') {
> +		    my $str = join(",", @$v);
> +		    print "$k: $str\n";
> +		} else {
> +		    print "$k: $v\n";
> +		}
>  	    }
>  	} else {
>  	    my $status = $stat->{status} || 'unknown';
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index 139f901..3354290 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -118,6 +118,9 @@ sub get_container_disk_usage {
>  
>  my $last_proc_vmid_stat;
>  
> +
> +my $container_ip_cache = {};
> +
>  our $vmstatus_return_properties = {
>      vmid => get_standard_option('pve-vmid'),
>      status => {
> @@ -148,6 +151,11 @@ our $vmstatus_return_properties = {
>  	type => 'string',
>  	optional => 1,
>      },
> +    ipaddress => {

ip-address

> +	description => "IP addresses.",
> +	type => 'array', items => { type => 'string' },

new line for "items"

> +	optional => 1,
> +    },
>      uptime => {
>  	description => "Uptime.",
>  	type => 'integer',
> @@ -245,6 +253,11 @@ sub vmstatus {
>  	my $d = $list->{$vmid};
>  	my $pid = $d->{pid};
>  
> +	$container_ip_cache->{$vmid} = {
> +	    ipaddress => [],
> +	    timestamp => $cdtime,
> +	};
> +
>  	next if !$pid; # skip stopped CTs
>  
>  	my $proc_pid_stat = PVE::ProcFSTools::read_proc_pid_stat($pid);
> @@ -254,6 +267,13 @@ sub vmstatus {
>  
>  	my $cgroups = PVE::LXC::CGroup->new($vmid);
>  
> +	my $cache_period = 60; # seconds

as IPs change every 60s or what?

> +	if ($cdtime - $container_ip_cache->{$vmid}->{timestamp} < $cache_period) {
> +	    $container_ip_cache->{$vmid}->{timestamp} = gettimeofday;

so, you added yet another call per CT and time-stamp saved in every CT's
hash which could just be there once at the top level of the cache?

Don't mind to change for v3 as this approach is just NAK'd for now, but
please, one doesn't have to always bent completely backwards and micro optimizations
are a time sink, but just a bit of general efficiency awareness for the design as a
whole would be good here. Even if it may not matter to much for a small test setup,
but it adds up and our perl processes aren't already exactly lightweight; and even
if it'd not matter at all this feels just wrong from principle.

And just to be clear, IF (and we for now won't) do this then it'd be something like:

$ct_ips {
  last => $ts,
  cts => {
    100 => [ ... ],
    101 => [ ... ],
    ...
  }
}

my $ct_ips_outdated = $ct_ips-{last} + 60 < $now;

for $vmid (...) {

   ...

   if (!exists $ct_ips-{cts}->{$vmid} || $ct_ips_outdated) {
        $ct_ips->{cts}->{$vmid} = find_lxc_ip_address($vmid);
   }
}

> +	    $container_ip_cache->{$vmid}->{ipaddress} = find_lxc_ip_address($vmid);
> +	    $d->{ipaddress} = $container_ip_cache->{$vmid}->{ipaddress};
> +	}
> +
>  	if (defined(my $mem = $cgroups->get_memory_stat())) {
>  	    $d->{mem} = int($mem->{mem});
>  	    $d->{swap} = int($mem->{swap});
> @@ -397,6 +417,21 @@ sub open_ppid {
>      return ($ppid, $fd);
>  }
>  

no comment about the method, something like 
# uses XYZ to get the addresses returned as array ref

can help
> +sub find_lxc_ip_address {

s/address/addresses/; it's always an array or?

> +    my ($vmid) = @_;
> +
> +    my $ip_list = [];
> +
> +    my $parser = sub {
> +	my $line = shift;
> +	my $ip = $1 if $line =~ m/^IP:\s+(.*)$/;
> +	push @$ip_list, $ip;
> +    };
> +
> +    PVE::Tools::run_command(['lxc-info', '-n', $vmid, '--ips'], outfunc => $parser, errfunc => sub {});

silencing the error can be ok sometimes, but it also can be confusing?

> +    return $ip_list;
> +}
> +
>  # Note: we cannot use Net:IP, because that only allows strict
>  # CIDR networks
>  sub parse_ipv4_cidr {
>