[pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line

Thomas Lamprecht t.lamprecht at proxmox.com
Tue Jul 18 15:02:29 CEST 2023


Am 18/07/2023 um 14:00 schrieb Philipp Hufnagl:
> Sorry forgott to tag as v2

and also forgot to document the patch changelog like asked yesterday..

> 
> On 7/18/23 13:58, Philipp Hufnagl wrote:
>>   When called from the command line, it was not possible to calculate
>>   cpu load because there was no 2nd data point available for the
>>   calculation. Now (when called) from the command line, cpu stats will
>>   be fetched twice with a minimum delta of 20ms. That way the load can
>>   be calculated

@Maximiliano, didn't we decide to just drop it instead? This isn't really
useful, once can get much better data from the pressure stall information
(PSI) which is tracked per cgroup and tells a user much more than a 20 ms
sample interval..

https://docs.kernel.org/accounting/psi.html#cgroup2-interface

Still a few comments inline.

>>
>> fixes #4765

Add this to the start of the commit subject: fix #4765: 

>>
>> Signed-off-by: Philipp Hufnagl <p.hufnagl at proxmox.com>
>> ---
>>   src/PVE/CLI/pct.pm |  4 ++--
>>   src/PVE/LXC.pm     | 32 +++++++++++++++++++++++++++++---
>>   2 files changed, 31 insertions(+), 5 deletions(-)
>>

> diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm
> index ff75d33..e531b27 100755
> --- a/src/PVE/CLI/pct.pm
> +++ b/src/PVE/CLI/pct.pm
> @@ -60,8 +60,8 @@ __PACKAGE__->register_method ({
>  
>  	# test if CT exists
>  	my $conf = PVE::LXC::Config->load_config ($param->{vmid});
> -
> -	my $vmstatus = PVE::LXC::vmstatus($param->{vmid});
> +	# workaround to get cpu usage is to fetch cpu stats twice with a delay
> +	my $vmstatus = PVE::LXC::vmstatus($param->{vmid}, 20);
>  	my $stat = $vmstatus->{$param->{vmid}};
>  	if ($param->{verbose}) {
>  	    foreach my $k (sort (keys %$stat)) {
> diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm
> index a531ea5..9fc171f 100644
> --- a/src/PVE/LXC.pm
> +++ b/src/PVE/LXC.pm
> @@ -12,7 +12,7 @@ use IO::Poll qw(POLLIN POLLHUP);
>  use IO::Socket::UNIX;
>  use POSIX qw(EINTR);
>  use Socket;
> -use Time::HiRes qw (gettimeofday);
> +use Time::HiRes qw (gettimeofday usleep);
>  
>  use PVE::AccessControl;
>  use PVE::CGroup;
> @@ -171,11 +171,37 @@ our $vmstatus_return_properties = {
>      }
>  };
>  
> +sub get_first_cpu {

would expect that this actually returns something, i.e., the ID of the first CPU
or something like that, so method name should be telling more about what this does,
e.g.:

prime_vmstatus_cpu_sampling

> +    my ($list, $measure_timespan_ms) = @_;
> +    my $cdtime = gettimeofday;
> +
> +    foreach my $vmid (keys %$list) {
> +	my $cgroups = PVE::LXC::CGroup->new($vmid);
> +	if (defined(my $cpu = $cgroups->get_cpu_stat())) {
> +	    # Total time (in milliseconds) used up by the cpu.
> +	    my $used_ms = $cpu->{utime} + $cpu->{stime};
> +	    $last_proc_vmid_stat->{$vmid} = {
> +		time => $cdtime,
> +		used => $used_ms,
> +		cpu => 0,
> +	    };
> +	}
> +    }
> +    usleep($measure_timespan_ms * 1000);

this is rather ugly, the reading the call site one definitively does not expect
that a innocent named get_first_cpu unconditionally sleeps even though that only
the caller would require this for their sampling..

If we don't just drop this CPU load stuff in pct status I'd rather do one of four
options:

1) rename this to prime_vmstatus_cpu_sampling and just do it for a single vmid,
then call this new method in PVE::CLI::pct->status and do the sleep there, as
that's actually the one call sites that cares about it, the existing vmstatus
method then just needs one change:

-           if ($delta_ms > 1000.0) {
+           if ($delta_ms > 1000.0 || $old->{cpu} == 0) {

2) The same as 1) but instead of adding the prime_vmstatus_cpu_sampling helper
just call vmstatus twice with sleeping in-between (and the same change to the if
condition as for 1).

3) get the data where it's already available, i.e., pvestatd, might need more
rework though

4) switch over to reporting the PSI from /sys/fs/cgroup/lxc/VMID/cpu.pressure
this is pretty simple as in PSI ~ 0 -> no overload 0 >> PSI > 1 -> some overload
and PSI >> 1 a lot of overload.

Option 4 sounds niceish, but needs more work and has not that high of a benefit
(users can already query this easily themselves), option 1 or 2 would be OK-ish,
but IMO not ideal, as we'd use a 20ms avg here compared to a >> 1s average elswhere,
which can be confusing as it can be quite, well spikey. option 3 would be better here
but as mentioned also more rework and possible more intrusive one, so IMO just
dropping it sounds almost the nicest and def. most simple one.

> +}
> +
>  sub vmstatus {
> -    my ($opt_vmid) = @_;
> +    my ($opt_vmid, $measure_timespan_ms) = @_;

nit: in control theory, signal processing and acquiring stats in general, using
"sampling period" or "sampling interval" is a bit more common for describing what
you do here with "$measure_timespan_ms".

>  
>      my $list = $opt_vmid ? { $opt_vmid => { type => 'lxc', vmid => int($opt_vmid) }} : config_list();
>  
> +    if (defined($measure_timespan_ms))
> +    {

Doesn't follows our coding style guide:
https://pve.proxmox.com/wiki/Perl_Style_Guide#Spacing_and_syntax_usage

> +	get_first_cpu($list, $measure_timespan_ms);
> +    }
> +
> +    $measure_timespan_ms //= 1000;

could just put that in the else block.

> +
>      my $active_hash = list_active_containers();
>  
>      my $cpucount = $cpuinfo->{cpus} || 1;
> @@ -285,7 +311,7 @@ sub vmstatus {
>  	    }
>  
>  	    my $delta_ms = ($cdtime - $old->{time}) * $cpucount * 1000.0;
> -	    if ($delta_ms > 1000.0) {
> +	    if ($delta_ms > $measure_timespan_ms) {
>  		my $delta_used_ms = $used_ms - $old->{used};
>  		$d->{cpu} = (($delta_used_ms / $delta_ms) * $cpucount) / $d->{cpus};
>  		$last_proc_vmid_stat->{$vmid} = {





More information about the pve-devel mailing list