[pve-devel] [PATCH qemu-server] fix #6935: vmstatus: use CGroup for host memory usage

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Jan 9 13:04:54 CET 2026


ping once more:

https://forum.proxmox.com/threads/pvesh-get-nodes-node-qemu-qm-list-very-slow-since-proxmox-9.176789/#post-828981

Quoting Fabian Grünbichler (2025-12-17 10:36:34)
> ping - we get semi-regular reports of people running into this since
> upgrading..
> 
> On November 28, 2025 11:36 am, Fabian Grünbichler wrote:
> > after a certain amount of KSM sharing, PSS lookups become prohibitively
> > expensive. instead of reverting to the old broken method, simply use the
> > cgroup's memory usage as `memhost` value.
> > 
> > this does not account for merged pages because of KSM anymore.
> > 
> > I benchmarked this with 4 VMs running with different levels of KSM sharing. in
> > the output below, "merged pages" refers to the contents of
> > /proc/$pid/ksm_merging_pages, the extract_* benchmark runs refer to four
> > different variants of extracting memory usage, with the actual extraction part
> > running 1000x in a loop for each run to amortize perl/process setup costs,
> > qm_status_stock is `qm status $vmid --verbose`, and qm_status_pateched is `perl
> > -I./src/PVE ./src/bin/qm status $vmid --verbose` with this patch applied.
> > 
> > the variants:
> > - extract_pss: status before this patch, query smaps_rollup for each process
> >   that is part of the qemu.slice of the VM
> > - extract_rss: extract VmRSS from the `/proc/$pid/status` file of the main
> >   process
> > - extract_rss_cgroup: like _rss, but for each process of the slice
> > - extract_cgroup: use PVE::QemuServer::CGroup get_memory_stat (this patch)
> > 
> > first, with no KSM active
> > 
> > VMID: 113
> > 
> > pss:        724971520
> > rss:        733282304
> > cgroup:     727617536
> > rss_cgroup: 733282304
> > 
> > Benchmark 1: extract_pss        271.2 ms ±   6.0 ms    [User: 226.3 ms, System: 44.7 ms]
> > Benchmark 2: extract_rss        267.8 ms ±   3.6 ms    [User: 223.9 ms, System: 43.7 ms]
> > Benchmark 3: extract_cgroup     273.5 ms ±   6.2 ms    [User: 227.2 ms, System: 46.2 ms]
> > Benchmark 4: extract_rss_cgroup 270.5 ms ±   3.7 ms    [User: 225.0 ms, System: 45.3 ms]
> > 
> > both reported usage and runtime in the same ballpark
> > 
> > VMID: 838383 (with 48G of memory):
> > 
> > pss:        40561564672
> > rss:        40566108160
> > cgroup:     40961339392
> > rss-cgroup: 40572141568
> > 
> > usage in the same ballpark
> > 
> > Benchmark 1: extract_pss        732.0 ms ±   4.4 ms    [User: 224.8 ms, System: 506.8 ms]
> > Benchmark 2: extract_rss        272.1 ms ±   5.2 ms    [User: 227.8 ms, System: 44.0 ms]
> > Benchmark 3: extract_cgroup     274.2 ms ±   2.2 ms    [User: 227.8 ms, System: 46.2 ms]
> > Benchmark 4: extract_rss_cgroup 270.9 ms ±   3.9 ms    [User: 224.9 ms, System: 45.8 ms]
> > 
> > but PSS already a lot slower..
> > 
> > Benchmark 1: qm_status_stock   820.9 ms ±   7.5 ms    [User: 293.1 ms, System: 523.3 ms]
> > Benchmark 2: qm_status_patched 356.2 ms ±   5.6 ms    [User: 290.2 ms, System: 61.5 ms]
> > 
> > which is also visible in the before and after
> > 
> > the other two VMs behaved as 113
> > 
> > and now with KSM active
> > 
> > VMID: 113
> > merged pages: 10747 (very little)
> > 
> > pss:        559815680
> > rss:        594853888
> > cgroup:     568197120
> > rss-cgroup: 594853888
> > 
> > Benchmark 1: extract_pss        280.0 ms ±   2.4 ms    [User: 229.5 ms, System: 50.2 ms]
> > Benchmark 2: extract_rss        274.8 ms ±   3.7 ms    [User: 225.9 ms, System: 48.7 ms]
> > Benchmark 3: extract_cgroup     279.0 ms ±   4.6 ms    [User: 228.0 ms, System: 50.7 ms]
> > Benchmark 4: extract_rss_cgroup 274.7 ms ±   6.7 ms    [User: 228.0 ms, System: 46.4 ms]
> > 
> > still same ball park
> > 
> > VMID: 838383 (with 48G of memory)
> > merged pages: 6696434 (a lot - this is 25G worth of pages!)
> > 
> > pss:        12411169792
> > rss:        38772117504
> > cgroup:     12799062016
> > rss-cgroup: 38778150912
> > 
> > RSS based are roughly the same, but cgroup gives us almost the same numbers as
> > PSS despite KSM being active!
> > 
> > Benchmark 1: extract_pss        691.7 ms ±   3.4 ms    [User: 225.5 ms, System: 465.8 ms]
> > Benchmark 2: extract_rss        276.3 ms ±   7.1 ms    [User: 227.4 ms, System: 48.6 ms]
> > Benchmark 3: extract_cgroup     277.8 ms ±   4.4 ms    [User: 228.5 ms, System: 49.1 ms]
> > Benchmark 4: extract_rss_cgroup 274.7 ms ±   3.5 ms    [User: 226.6 ms, System: 47.8 ms]
> > 
> > but it is still fast!
> > 
> > Benchmark 1: qm_status_stock   771.8 ms ±   7.2 ms    [User: 296.0 ms, System: 471.0 ms]
> > Benchmark 2: qm_status_patched 360.2 ms ±   5.1 ms    [User: 287.1 ms, System: 68.5 ms]
> > 
> > confirmed by `qm status` as well
> > 
> > VMID: 838384
> > merged pages: 165540 (little, this is about 645MB worth of pages)
> > 
> > pss:        2522527744
> > rss:        2927058944
> > cgroup:     2500329472
> > rss-cgroup: 2932944896
> > 
> > Benchmark 1: extract_pss        318.4 ms ±   3.6 ms    [User: 227.3 ms, System: 90.8 ms]
> > Benchmark 2: extract_rss        273.9 ms ±   5.8 ms    [User: 226.5 ms, System: 47.2 ms]
> > Benchmark 3: extract_cgroup     276.3 ms ±   4.1 ms    [User: 225.4 ms, System: 50.7 ms]
> > Benchmark 4: extract_rss_cgroup 276.5 ms ±   8.6 ms    [User: 226.1 ms, System: 50.1 ms]
> > 
> > Benchmark 1: qm_status_stock   400.2 ms ±   6.6 ms    [User: 292.1 ms, System: 103.5 ms]
> > Benchmark 2: qm_status_patched 357.0 ms ±   4.1 ms    [User: 288.7 ms, System: 63.7 ms]
> > 
> > results match those of 838383, just with less effect
> > 
> > the fourth VM matches this as well.
> > 
> > Fixes/Reverts: d426de6c7d81a4d04950f2eaa9afe96845d73f7e ("vmstatus: add memhost for host view of vm mem consumption")
> > 
> > Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> > ---
> > 
> > Notes:
> >     given the numbers, going with the CGroup-based approach seems best - it gives
> >     us accurate numbers without the slowdown, and gives users an insight into how
> >     KSM affects their guests host memory usage without flip-flopping.
> > 
> >  src/PVE/QemuServer.pm | 35 ++++-------------------------------
> >  1 file changed, 4 insertions(+), 31 deletions(-)
> > 
> > diff --git a/src/PVE/QemuServer.pm b/src/PVE/QemuServer.pm
> > index a7fbec14..62d835a5 100644
> > --- a/src/PVE/QemuServer.pm
> > +++ b/src/PVE/QemuServer.pm
> > @@ -2324,35 +2324,6 @@ sub vzlist {
> >      return $vzlist;
> >  }
> >  
> > -# Iterate over all PIDs inside a VMID's cgroup slice and accumulate their PSS (proportional set
> > -# size) to get a relatively telling effective memory usage of all processes involved with a VM.
> > -my sub get_vmid_total_cgroup_memory_usage {
> > -    my ($vmid) = @_;
> > -
> > -    my $memory_usage = 0;
> > -    if (my $procs_fh = IO::File->new("/sys/fs/cgroup/qemu.slice/${vmid}.scope/cgroup.procs", "r")) {
> > -        while (my $pid = <$procs_fh>) {
> > -            chomp($pid);
> > -
> > -            open(my $smaps_fh, '<', "/proc/${pid}/smaps_rollup")
> > -                or $!{ENOENT}
> > -                or die "failed to open PSS memory-stat from process - $!\n";
> > -            next if !defined($smaps_fh);
> > -
> > -            while (my $line = <$smaps_fh>) {
> > -                if ($line =~ m/^Pss:\s+([0-9]+) kB$/) {
> > -                    $memory_usage += int($1) * 1024;
> > -                    last; # end inner while loop, go to next $pid
> > -                }
> > -            }
> > -            close $smaps_fh;
> > -        }
> > -        close($procs_fh);
> > -    }
> > -
> > -    return $memory_usage;
> > -}
> > -
> >  our $vmstatus_return_properties = {
> >      vmid => get_standard_option('pve-vmid'),
> >      status => {
> > @@ -2614,9 +2585,11 @@ sub vmstatus {
> >  
> >          $d->{uptime} = int(($uptime - $pstat->{starttime}) / $cpuinfo->{user_hz});
> >  
> > -        $d->{memhost} = get_vmid_total_cgroup_memory_usage($vmid);
> > +        my $cgroup = PVE::QemuServer::CGroup->new($vmid);
> > +        my $cgroup_mem = $cgroup->get_memory_stat();
> > +        $d->{memhost} = $cgroup_mem->{mem} // 0;
> >  
> > -        $d->{mem} = $d->{memhost}; # default to cgroup PSS sum, balloon info can override this below
> > +        $d->{mem} = $d->{memhost}; # default to cgroup, balloon info can override this below
> >  
> >          my $pressures = PVE::ProcFSTools::read_cgroup_pressure("qemu.slice/${vmid}.scope");
> >          $d->{pressurecpusome} = $pressures->{cpu}->{some}->{avg10} * 1;
> > -- 
> > 2.47.3
> > 
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> > 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




More information about the pve-devel mailing list