[PVE-User] Kernel Memory Leak on PVE6?

Chris Hofstaedtler | Deduktiva chris.hofstaedtler at deduktiva.com
Fri Sep 20 14:31:17 CEST 2019


Hi,

I'm seeing a very interesting problem on PVE6: one of our machines
appears to leak kernel memory over time, up to the point where only
a reboot helps. Shutting down all KVM VMs does not release this
memory.

I'll attach some information below, because I just couldn't figure
out what this memory is used for. Once before shutting down the VMs,
and once after. I had to reboot the PVE host now, but I guess
in a few days it will be at least noticable again.

This machine has the same (except CPU) hardware as the box next to
it; however this one was freshly installed with PVE6, the other one
is an upgrade from PVE5 and doesn't exhibit this problem. It's quite
puzzling because I haven't seen this symptom at all at all the
customer installations.

Here are some graphs showing the memory consumption over time:
  http://zeha.at/~ch/T/20190920-pve6_meminfo_0.png
  http://zeha.at/~ch/T/20190920-pve6_meminfo_1.png

Looking forward to any debug help, suggestions, ...

Chris


** Almost out of memory, before VM shutdown: **

top - 10:24:19 up 22 days, 22:29,  1 user,  load average: 1.85, 1.57, 1.32
Tasks: 530 total,   1 running, 529 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.8 us,  0.4 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  80413.1 total,    509.9 free,  70879.7 used,   9023.5 buff/cache
MiB Swap:  20480.0 total,   6516.6 free,  13963.4 used.   8699.0 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                             
   3183 root      20   0   10.6g   6.0g   2960 S   8.7   7.6   5861:52 /usr/bin/kvm -id 103 -name puppet -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
   3349 root      20   0 9266032   4.3g   2972 S   6.8   5.4   3834:41 /usr/bin/kvm -id 2017 -name go-test-srv01 -chardev socket,id=qmp,path=/var/run/qemu-server/2017.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=+
   3068 root      20   0 5060928   3.7g   2900 S   6.8   4.7   3110:01 /usr/bin/kvm -id 101 -name backup -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
   3399 root      20   0 5094772   2.3g   2944 S  50.5   2.9  10780:07 /usr/bin/kvm -id 3002 -name monitor01 -chardev socket,id=qmp,path=/var/run/qemu-server/3002.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-+
   3254 root      20   0   32.8g   1.9g   3040 S   1.0   2.4 490:39.29 /usr/bin/kvm -id 2005 -name debbuild -chardev socket,id=qmp,path=/var/run/qemu-server/2005.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-e+
   2994 root      20   0 2656268 658428   2980 S   9.7   0.8   2895:15 /usr/bin/kvm -id 100 -name pbx -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
   2927 root      20   0 2664232 479372   2944 S   6.8   0.6   2343:43 /usr/bin/kvm -id 102 -name ns1 -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
   2417 root      rt   0  606912 211336  51444 S   1.9   0.3 613:27.87 /usr/sbin/corosync -f                                                                                                                                               
2023020 root      20   0  246556  98020  97044 S   0.0   0.1  15:47.80 /lib/systemd/systemd-journald                                                                                                                                       
   1806 root      20   0  967944  32724  23612 S   0.0   0.0  53:49.62 /usr/bin/pmxcfs                                                                                                                                                     
   2801 root      20   0  314488  32428   6464 S   0.0   0.0 322:58.23 pvestatd                                                                                                                                                           +
3771741 root      20   0  150776  31728   3700 S   0.0   0.0   0:12.81 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize                                                                              
   2799 root      20   0  316056  27452   5656 S   0.0   0.0  95:49.25 pve-firewall                                                                                                                                                       +
   2909 root      20   0  325248  12684   5268 S   1.0   0.0   7:03.91 pve-ha-lrm                                                                                                                                                         +
 868033 ch        20   0   21660   9104   7280 S   0.0   0.0   0:00.12 /lib/systemd/systemd --user                                                                                                                                         
 868009 root      20   0   16912   7988   6856 S   0.0   0.0   0:00.03 sshd: ch [priv]                                                                                                                                                     
      1 root      20   0  171820   7640   5032 S   0.0   0.0  19:58.80 /lib/systemd/systemd --system --deserialize 37                                                                                                                      
   2876 root      20   0  325544   7124   4988 S   0.0   0.0   4:18.16 pve-ha-crm                                                                                                                                                         +
   1654 Debian-+  20   0   40488   7096   2864 S   0.0   0.0  77:37.18 /usr/sbin/snmpd -Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf -f -p /run/snmpd.pid                                            
 868045 ch        20   0   10240   5404   3996 S   0.0   0.0   0:00.11 -zsh                                                                                                                                                                
 868044 ch        20   0   16912   4636   3492 S   0.0   0.0   0:00.02 sshd: ch at pts/0                                                                                                                                                      
   1644 root      20   0   29608   4520   3496 S   0.0   0.0   4:59.62 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal                                                                       
 868336 root      20   0    7716   4372   3092 S   0.0   0.0   0:00.03 -bash                                                                                                                                                               
1761096 root      20   0  351564   4180   3336 S   0.0   0.0   1:12.83 pvedaemon worker                                                                                                                                                   +
1776171 root      20   0  351696   4076   3352 S   0.0   0.0   1:18.27 pvedaemon worker                                                                                                                                                   +
 868370 root      20   0   11680   4016   2964 R   2.9   0.0   0:00.68 top                                                                                                                                                                 
1780591 root      20   0  351696   4008   3248 S   0.0   0.0   1:11.73 pvedaemon worker                                                                                                                                                   +
   1086 root      20   0   19540   3984   3720 S   0.0   0.0   3:11.21 /lib/systemd/systemd-logind                                                                                                                                         
 868335 root      20   0   10156   3788   3364 S   0.0   0.0   0:00.01 sudo -i                                                                                                                                                             
   2899 www-data  20   0  121256   3412   3080 S   0.0   0.0   0:33.99 spiceproxy                                                                                                                                                         +
2000791 www-data  20   0  344932   3412   2604 S   0.0   0.0   1:16.39 pveproxy worker                                                                                                                                                    +
2000792 www-data  20   0  344932   3348   2604 S   0.0   0.0   1:07.07 pveproxy worker                                                                                                                                                    +
   1251 root      20   0  225816   3296   2424 S   0.0   0.0   9:47.44 /usr/sbin/rsyslogd -n -iNONE                                                                                                                                        
   1258 message+  20   0    9212   3268   2820 S   0.0   0.0   6:41.36 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only                                                            

root at vn03:~# uname -a
Linux vn03 5.0.21-1-pve #1 SMP PVE 5.0.21-1 (Tue, 20 Aug 2019 17:16:32 +0200) x86_64 GNU/Linux
root at vn03:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          80413       70877         515         101        9019        8708
Swap:         20479       13963        6516
root at vn03:~# dpkg -l pve\*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version      Architecture Description
+++-=======================-============-============-======================================================
ii  pve-cluster             6.0-5        amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container           3.0-5        all          Proxmox VE Container management tool
ii  pve-docs                6.0-4        all          Proxmox VE Documentation
ii  pve-edk2-firmware       2.20190614-1 all          edk2 based firmware modules for virtual machines
ii  pve-firewall            4.0-7        amd64        Proxmox VE Firewall
ii  pve-firmware            3.0-2        all          Binary firmware code for the pve-kernel
ii  pve-ha-manager          3.0-2        amd64        Proxmox VE HA Manager
ii  pve-i18n                2.0-2        all          Internationalization support for Proxmox VE
un  pve-kernel              <none>       <none>       (no description available)
ii  pve-kernel-5.0          6.0-7        all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.0.15-1-pve 5.0.15-1     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.0.18-1-pve 5.0.18-3     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.0.21-1-pve 5.0.21-1     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper       6.0-7        all          Function for various kernel maintenance tasks.
un  pve-kvm                 <none>       <none>       (no description available)
ii  pve-manager             6.0-6        amd64        Proxmox Virtual Environment Management Tools
ii  pve-qemu-kvm            4.0.0-5      amd64        Full virtualization on x86 hardware
un  pve-qemu-kvm-2.6.18     <none>       <none>       (no description available)
ii  pve-xtermjs             3.13.2-1     all          HTML/JS Shell client
root at vn03:~# slabtop -o | head -50 
 Active / Total Objects (% used)    : 205425461 / 212231433 (96.8%)
 Active / Total Slabs (% used)      : 4949759 / 4949759 (100.0%)
 Active / Total Caches (% used)     : 114 / 161 (70.8%)
 Active / Total Size (% used)       : 60112896.56K / 60714678.54K (99.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.29K / 16.62K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
43583592 43542487  99%    0.20K 1117528       39   8940224K vm_area_struct         
26520256 26518592  99%    0.06K 414379       64   1657516K anon_vma_chain         
16788000 16434450  97%    0.25K 524625       32   4197000K filp                   
13079680 13078464  99%    0.03K 102185      128    408740K kmalloc-32             
11544320 5261058  45%    0.06K 180380       64    721520K dmaengine-unmap-2      
10128740 10127452  99%    0.09K 220190       46    880760K anon_vma               
9602484 9602484 100%    0.04K  94142      102    376568K pde_opener             
7442736 7442572  99%    0.19K 177208       42   1417664K cred_jar               
7213200 7209695  99%    0.13K 240440       30    961760K kernfs_node_cache      
6023850 5992341  99%    0.19K 143425       42   1147400K dentry                 
5704350 5704350 100%    0.08K 111850       51    447400K task_delay_info        
5054066 5054066 100%    0.69K 109871       46   3515872K files_cache            
4664512 4664481  99%    0.12K 145766       32    583064K pid                    
4591440 4591440 100%    1.06K 153048       30   4897536K mm_struct              
4207445 4203908  99%    0.58K  76499       55   2447968K inode_cache            
4104480 4104291  99%    0.62K  80480       51   2575360K sock_inode_cache       
3901440 3900588  99%    0.06K  60960       64    243840K kmalloc-64             
3856230 3856160  99%    1.06K 128541       30   4113312K signal_cache           
3423826 3417982  99%    0.65K  69874       49   2235968K proc_inode_cache       
3139584 3138382  99%    0.01K   6132      512     24528K kmalloc-8              
2983344 2983255  99%    0.19K  71032       42    568256K kmalloc-192            
2426976 2426413  99%    1.00K  75843       32   2426976K kmalloc-1k             
1939854 1931355  99%    0.09K  46187       42    184748K kmalloc-96             
1649895 1649895 100%    2.06K 109993       15   3519776K sighand_cache          
1280544 1280544 100%    1.00K  40017       32   1280544K UNIX                   
1052928 1050819  99%    0.50K  32904       32    526464K kmalloc-512            
1029792 1029312  99%    0.25K  32181       32    257448K skbuff_head_cache      
940624 940559  99%    4.00K 117578        8   3762496K kmalloc-4k             
799895 787069  98%    5.75K 159979        5   5119328K task_struct            
735696 724643  98%    0.10K  18864       39     75456K buffer_head            
525504 525378  99%    2.00K  32844       16   1051008K kmalloc-2k             
433024 426780  98%    0.06K   6766       64     27064K kmem_cache_node        
310710 301758  97%    1.05K  10357       30    331424K ext4_inode_cache       
292340 290078  99%    0.68K   6220       47    199040K shmem_inode_cache      
215250 214814  99%    0.38K   5125       42     82000K kmem_cache             
212296 196761  92%    0.57K   7582       28    121312K radix_tree_node        
158464 158464 100%    0.02K    619      256      2476K kmalloc-16             
149925 149925 100%    1.25K   5997       25    191904K UDPv6                  
 71424  71140  99%    0.12K   2232       32      8928K kmalloc-128            
 70020  70020 100%    0.16K   1376       51     11008K kvm_mmu_page_header    
 40032  40009  99%    0.25K   1251       32     10008K kmalloc-256            
 34944  33823  96%    0.09K    832       42      3328K kmalloc-rcl-96         
 34816  32567  93%    0.06K    544       64      2176K kmalloc-rcl-64         
root at vn03:~# pct list
root at vn03:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID       
       100 pbx                  running    2048              16.00 2994      
       101 backup               running    4096              32.00 3068      
       102 ns1                  running    2048              32.00 2927      
       103 puppet               running    10240             16.00 3183      
      2005 debbuild             running    32768             40.00 3254      
      2017 go-test-srv01        running    8192              20.00 3349      
      3002 monitor01            running    4096              32.00 3399      
      5001 salsa-runner-01      stopped    16384             32.00 0         
      6001 deduktiva-runner-01  stopped    32768             32.00 0         
      6901 mac                  stopped    4096               0.25 0         
root at vn03:~# sysctl -a | grep hugepages
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0


*** After shutdown of all VMs: ***

top - 10:39:56 up 22 days, 22:44,  2 users,  load average: 0.83, 1.84, 1.88
Tasks: 491 total,   1 running, 490 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  80413.1 total,  18276.4 free,  52704.9 used,   9431.8 buff/cache
MiB Swap:  20480.0 total,  19393.6 free,   1086.4 used.  26801.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                             
   2417 root      rt   0  606908 211332  51444 S   1.0   0.3 613:46.50 /usr/sbin/corosync -f                                                                                                                                               
   2878 www-data  20   0  344800 133424  21784 S   0.0   0.2   0:36.09 pveproxy                                                                                                                                                           +
 883317 www-data  20   0  361776 133084  11056 S   0.0   0.2   0:01.04 pveproxy worker                                                                                                                                                    +
   2836 root      20   0  343228 132060  21764 S   0.0   0.2   0:38.88 pvedaemon                                                                                                                                                          +
 883319 www-data  20   0  360688 130992  11148 S   1.0   0.2   0:01.26 pveproxy worker                                                                                                                                                    +
 883318 www-data  20   0  358056 128864  11148 S   0.0   0.2   0:01.75 pveproxy worker                                                                                                                                                    +
 883166 root      20   0  351912 121884  10220 S   0.0   0.1   0:00.96 pvedaemon worker                                                                                                                                                   +
 883165 root      20   0  351848 121584   9952 S   0.0   0.1   0:00.40 pvedaemon worker                                                                                                                                                   +
 883164 root      20   0  351712 121560  10060 S   0.0   0.1   0:00.65 pvedaemon worker                                                                                                                                                   +
   2801 root      20   0  307252  92952  20996 S   0.0   0.1 323:07.31 pvestatd                                                                                                                                                           +
2023020 root      20   0  267408  90508  89344 S   0.0   0.1  15:48.85 /lib/systemd/systemd-journald                                                                                                                                       
   2899 www-data  20   0  121260  59804  12212 S   0.0   0.1   0:34.77 spiceproxy                                                                                                                                                         +
 883544 www-data  20   0  121500  51260   3448 S   0.0   0.1   0:00.05 spiceproxy worker                                                                                                                                                  +
 876236 root      20   0  524564  50188  37612 S   0.0   0.1   0:01.90 /usr/bin/pmxcfs                                                                                                                                                     
3771741 root      20   0  150776  30880   3264 S   0.0   0.0   0:12.86 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize                                                                              
   2799 root      20   0  316112  28352   5840 S   0.0   0.0  95:51.91 pve-firewall                                                                                                                                                       +
   2909 root      20   0  325212  14196   5404 S   0.0   0.0   7:04.14 pve-ha-lrm                                                                                                                                                         +
   2876 root      20   0  325564   9600   5224 S   0.0   0.0   4:18.33 pve-ha-crm                                                                                                                                                         +
 868033 ch        20   0   21660   8844   7020 S   0.0   0.0   0:00.14 /lib/systemd/systemd --user                                                                                                                                         

root at vn03:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          80413       52700       18281         115        9431       26805
Swap:         20479        1086       19393
root at vn03:~# slabtop -o | head -50 
 Active / Total Objects (% used)    : 199865696 / 200976971 (99.4%)
 Active / Total Slabs (% used)      : 4771440 / 4771440 (100.0%)
 Active / Total Caches (% used)     : 114 / 161 (70.8%)
 Active / Total Size (% used)       : 59688763.91K / 59945034.02K (99.6%)
 Minimum / Average / Maximum Object : 0.01K / 0.30K / 16.62K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
43540380 43499279  99%    0.20K 1116420       39   8931360K vm_area_struct         
26459776 26457217  99%    0.06K 413434       64   1653736K anon_vma_chain         
16782720 16429406  97%    0.25K 524460       32   4195680K filp                   
13075712 13074728  99%    0.03K 102154      128    408616K kmalloc-32             
10104728 10103625  99%    0.09K 219668       46    878672K anon_vma               
9599628 9599628 100%    0.04K  94114      102    376456K pde_opener             
7442106 7442024  99%    0.19K 177193       42   1417544K cred_jar               
7211280 7207550  99%    0.13K 240376       30    961504K kernfs_node_cache      
5999322 5970370  99%    0.19K 142841       42   1142728K dentry                 
5691447 5691447 100%    0.08K 111597       51    446388K task_delay_info        
5052594 5052594 100%    0.69K 109839       46   3514848K files_cache            
4657408 4657315  99%    0.12K 145544       32    582176K pid                    
4590750 4590721  99%    1.06K 153025       30   4896800K mm_struct              
4206400 4202839  99%    0.58K  76480       55   2447360K inode_cache            
4091424 4091235  99%    0.62K  80224       51   2567168K sock_inode_cache       
3903104 3901440  99%    0.06K  60986       64    243944K kmalloc-64             
3855600 3855530  99%    1.06K 128520       30   4112640K signal_cache           
3416133 3410170  99%    0.65K  69717       49   2230944K proc_inode_cache       
3124224 3123017  99%    0.01K   6102      512     24408K kmalloc-8              
2982840 2982826  99%    0.19K  71020       42    568160K kmalloc-192            
2425760 2424977  99%    1.00K  75805       32   2425760K kmalloc-1k             
1940694 1932266  99%    0.09K  46207       42    184828K kmalloc-96             
1649415 1649346  99%    2.06K 109961       15   3518752K sighand_cache          
1279520 1279520 100%    1.00K  39985       32   1279520K UNIX                   
1043392 1040142  99%    0.50K  32606       32    521696K kmalloc-512            
1021152 1020672  99%    0.25K  31911       32    255288K skbuff_head_cache      
938880 938777  99%    4.00K 117360        8   3755520K kmalloc-4k             
797715 784886  98%    5.75K 159543        5   5105376K task_struct            
713388 699031  97%    0.10K  18292       39     73168K buffer_head            
643008  73139  11%    0.06K  10047       64     40188K dmaengine-unmap-2      
525520 525326  99%    2.00K  32845       16   1051040K kmalloc-2k             
432768 426806  98%    0.06K   6762       64     27048K kmem_cache_node        
308100 298326  96%    1.05K  10270       30    328640K ext4_inode_cache       
292387 289915  99%    0.68K   6221       47    199072K shmem_inode_cache      
215250 214971  99%    0.38K   5125       42     82000K kmem_cache             
212380 180327  84%    0.57K   7585       28    121360K radix_tree_node        
157952 157952 100%    0.02K    617      256      2468K kmalloc-16             
150150 150150 100%    1.25K   6006       25    192192K UDPv6                  
 71008  70660  99%    0.12K   2219       32      8876K kmalloc-128            
 40064  40056  99%    0.25K   1252       32     10016K kmalloc-256            
 34986  34259  97%    0.09K    833       42      3332K kmalloc-rcl-96         
 34368  32733  95%    0.06K    537       64      2148K kmalloc-rcl-64         
 33660  33300  98%    0.05K    396       85      1584K ftrace_event_field     



typical VM config:

balloon: 0
bootdisk: virtio0
cores: 2
cpu: Haswell-noTSX
ide2: none,media=cdrom
memory: 4096
name: backup
net0: virtio=52:54:00:b7:e0:ba,bridge=vmbr100
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=39d362a5-6bae-41b7-9803-b76279e2280f
sockets: 1
virtio0: datastore:vm-101-disk-1,cache=writeback,size=32G
virtio1: datastore:vm-101-disk-2,cache=writeback,size=100G


-- 
Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien)
www.deduktiva.com / +43 1 353 1707



More information about the pve-user mailing list