[PVE-User] Kernel Memory Leak on PVE6?

Aaron Lauterer a.lauterer at proxmox.com
Fri Sep 20 14:58:38 CEST 2019


Curious, I do have a very similar case at the moment with a slab of 
~155GB, out of ~190GB RAM installed.

I am not sure yet what causes it but things I plan to investigate are:

* hanging NFS mount
* possible (PVE) service starting too many threads -> restarting each 
and checking the memory / slab usage.



On 9/20/19 2:31 PM, Chris Hofstaedtler | Deduktiva wrote:
> Hi,
> 
> I'm seeing a very interesting problem on PVE6: one of our machines
> appears to leak kernel memory over time, up to the point where only
> a reboot helps. Shutting down all KVM VMs does not release this
> memory.
> 
> I'll attach some information below, because I just couldn't figure
> out what this memory is used for. Once before shutting down the VMs,
> and once after. I had to reboot the PVE host now, but I guess
> in a few days it will be at least noticable again.
> 
> This machine has the same (except CPU) hardware as the box next to
> it; however this one was freshly installed with PVE6, the other one
> is an upgrade from PVE5 and doesn't exhibit this problem. It's quite
> puzzling because I haven't seen this symptom at all at all the
> customer installations.
> 
> Here are some graphs showing the memory consumption over time:
>    http://zeha.at/~ch/T/20190920-pve6_meminfo_0.png
>    http://zeha.at/~ch/T/20190920-pve6_meminfo_1.png
> 
> Looking forward to any debug help, suggestions, ...
> 
> Chris
> 
> 
> ** Almost out of memory, before VM shutdown: **
> 
> top - 10:24:19 up 22 days, 22:29,  1 user,  load average: 1.85, 1.57, 1.32
> Tasks: 530 total,   1 running, 529 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.8 us,  0.4 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem :  80413.1 total,    509.9 free,  70879.7 used,   9023.5 buff/cache
> MiB Swap:  20480.0 total,   6516.6 free,  13963.4 used.   8699.0 avail Mem
> 
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>     3183 root      20   0   10.6g   6.0g   2960 S   8.7   7.6   5861:52 /usr/bin/kvm -id 103 -name puppet -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
>     3349 root      20   0 9266032   4.3g   2972 S   6.8   5.4   3834:41 /usr/bin/kvm -id 2017 -name go-test-srv01 -chardev socket,id=qmp,path=/var/run/qemu-server/2017.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=+
>     3068 root      20   0 5060928   3.7g   2900 S   6.8   4.7   3110:01 /usr/bin/kvm -id 101 -name backup -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
>     3399 root      20   0 5094772   2.3g   2944 S  50.5   2.9  10780:07 /usr/bin/kvm -id 3002 -name monitor01 -chardev socket,id=qmp,path=/var/run/qemu-server/3002.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-+
>     3254 root      20   0   32.8g   1.9g   3040 S   1.0   2.4 490:39.29 /usr/bin/kvm -id 2005 -name debbuild -chardev socket,id=qmp,path=/var/run/qemu-server/2005.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-e+
>     2994 root      20   0 2656268 658428   2980 S   9.7   0.8   2895:15 /usr/bin/kvm -id 100 -name pbx -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
>     2927 root      20   0 2664232 479372   2944 S   6.8   0.6   2343:43 /usr/bin/kvm -id 102 -name ns1 -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
>     2417 root      rt   0  606912 211336  51444 S   1.9   0.3 613:27.87 /usr/sbin/corosync -f
> 2023020 root      20   0  246556  98020  97044 S   0.0   0.1  15:47.80 /lib/systemd/systemd-journald
>     1806 root      20   0  967944  32724  23612 S   0.0   0.0  53:49.62 /usr/bin/pmxcfs
>     2801 root      20   0  314488  32428   6464 S   0.0   0.0 322:58.23 pvestatd                                                                                                                                                           +
> 3771741 root      20   0  150776  31728   3700 S   0.0   0.0   0:12.81 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize
>     2799 root      20   0  316056  27452   5656 S   0.0   0.0  95:49.25 pve-firewall                                                                                                                                                       +
>     2909 root      20   0  325248  12684   5268 S   1.0   0.0   7:03.91 pve-ha-lrm                                                                                                                                                         +
>   868033 ch        20   0   21660   9104   7280 S   0.0   0.0   0:00.12 /lib/systemd/systemd --user
>   868009 root      20   0   16912   7988   6856 S   0.0   0.0   0:00.03 sshd: ch [priv]
>        1 root      20   0  171820   7640   5032 S   0.0   0.0  19:58.80 /lib/systemd/systemd --system --deserialize 37
>     2876 root      20   0  325544   7124   4988 S   0.0   0.0   4:18.16 pve-ha-crm                                                                                                                                                         +
>     1654 Debian-+  20   0   40488   7096   2864 S   0.0   0.0  77:37.18 /usr/sbin/snmpd -Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf -f -p /run/snmpd.pid
>   868045 ch        20   0   10240   5404   3996 S   0.0   0.0   0:00.11 -zsh
>   868044 ch        20   0   16912   4636   3492 S   0.0   0.0   0:00.02 sshd: ch at pts/0
>     1644 root      20   0   29608   4520   3496 S   0.0   0.0   4:59.62 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
>   868336 root      20   0    7716   4372   3092 S   0.0   0.0   0:00.03 -bash
> 1761096 root      20   0  351564   4180   3336 S   0.0   0.0   1:12.83 pvedaemon worker                                                                                                                                                   +
> 1776171 root      20   0  351696   4076   3352 S   0.0   0.0   1:18.27 pvedaemon worker                                                                                                                                                   +
>   868370 root      20   0   11680   4016   2964 R   2.9   0.0   0:00.68 top
> 1780591 root      20   0  351696   4008   3248 S   0.0   0.0   1:11.73 pvedaemon worker                                                                                                                                                   +
>     1086 root      20   0   19540   3984   3720 S   0.0   0.0   3:11.21 /lib/systemd/systemd-logind
>   868335 root      20   0   10156   3788   3364 S   0.0   0.0   0:00.01 sudo -i
>     2899 www-data  20   0  121256   3412   3080 S   0.0   0.0   0:33.99 spiceproxy                                                                                                                                                         +
> 2000791 www-data  20   0  344932   3412   2604 S   0.0   0.0   1:16.39 pveproxy worker                                                                                                                                                    +
> 2000792 www-data  20   0  344932   3348   2604 S   0.0   0.0   1:07.07 pveproxy worker                                                                                                                                                    +
>     1251 root      20   0  225816   3296   2424 S   0.0   0.0   9:47.44 /usr/sbin/rsyslogd -n -iNONE
>     1258 message+  20   0    9212   3268   2820 S   0.0   0.0   6:41.36 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
> 
> root at vn03:~# uname -a
> Linux vn03 5.0.21-1-pve #1 SMP PVE 5.0.21-1 (Tue, 20 Aug 2019 17:16:32 +0200) x86_64 GNU/Linux
> root at vn03:~# free -m
>                total        used        free      shared  buff/cache   available
> Mem:          80413       70877         515         101        9019        8708
> Swap:         20479       13963        6516
> root at vn03:~# dpkg -l pve\*
> Desired=Unknown/Install/Remove/Purge/Hold
> | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                    Version      Architecture Description
> +++-=======================-============-============-======================================================
> ii  pve-cluster             6.0-5        amd64        Cluster Infrastructure for Proxmox Virtual Environment
> ii  pve-container           3.0-5        all          Proxmox VE Container management tool
> ii  pve-docs                6.0-4        all          Proxmox VE Documentation
> ii  pve-edk2-firmware       2.20190614-1 all          edk2 based firmware modules for virtual machines
> ii  pve-firewall            4.0-7        amd64        Proxmox VE Firewall
> ii  pve-firmware            3.0-2        all          Binary firmware code for the pve-kernel
> ii  pve-ha-manager          3.0-2        amd64        Proxmox VE HA Manager
> ii  pve-i18n                2.0-2        all          Internationalization support for Proxmox VE
> un  pve-kernel              <none>       <none>       (no description available)
> ii  pve-kernel-5.0          6.0-7        all          Latest Proxmox VE Kernel Image
> ii  pve-kernel-5.0.15-1-pve 5.0.15-1     amd64        The Proxmox PVE Kernel Image
> ii  pve-kernel-5.0.18-1-pve 5.0.18-3     amd64        The Proxmox PVE Kernel Image
> ii  pve-kernel-5.0.21-1-pve 5.0.21-1     amd64        The Proxmox PVE Kernel Image
> ii  pve-kernel-helper       6.0-7        all          Function for various kernel maintenance tasks.
> un  pve-kvm                 <none>       <none>       (no description available)
> ii  pve-manager             6.0-6        amd64        Proxmox Virtual Environment Management Tools
> ii  pve-qemu-kvm            4.0.0-5      amd64        Full virtualization on x86 hardware
> un  pve-qemu-kvm-2.6.18     <none>       <none>       (no description available)
> ii  pve-xtermjs             3.13.2-1     all          HTML/JS Shell client
> root at vn03:~# slabtop -o | head -50
>   Active / Total Objects (% used)    : 205425461 / 212231433 (96.8%)
>   Active / Total Slabs (% used)      : 4949759 / 4949759 (100.0%)
>   Active / Total Caches (% used)     : 114 / 161 (70.8%)
>   Active / Total Size (% used)       : 60112896.56K / 60714678.54K (99.0%)
>   Minimum / Average / Maximum Object : 0.01K / 0.29K / 16.62K
> 
>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 43583592 43542487  99%    0.20K 1117528       39   8940224K vm_area_struct
> 26520256 26518592  99%    0.06K 414379       64   1657516K anon_vma_chain
> 16788000 16434450  97%    0.25K 524625       32   4197000K filp
> 13079680 13078464  99%    0.03K 102185      128    408740K kmalloc-32
> 11544320 5261058  45%    0.06K 180380       64    721520K dmaengine-unmap-2
> 10128740 10127452  99%    0.09K 220190       46    880760K anon_vma
> 9602484 9602484 100%    0.04K  94142      102    376568K pde_opener
> 7442736 7442572  99%    0.19K 177208       42   1417664K cred_jar
> 7213200 7209695  99%    0.13K 240440       30    961760K kernfs_node_cache
> 6023850 5992341  99%    0.19K 143425       42   1147400K dentry
> 5704350 5704350 100%    0.08K 111850       51    447400K task_delay_info
> 5054066 5054066 100%    0.69K 109871       46   3515872K files_cache
> 4664512 4664481  99%    0.12K 145766       32    583064K pid
> 4591440 4591440 100%    1.06K 153048       30   4897536K mm_struct
> 4207445 4203908  99%    0.58K  76499       55   2447968K inode_cache
> 4104480 4104291  99%    0.62K  80480       51   2575360K sock_inode_cache
> 3901440 3900588  99%    0.06K  60960       64    243840K kmalloc-64
> 3856230 3856160  99%    1.06K 128541       30   4113312K signal_cache
> 3423826 3417982  99%    0.65K  69874       49   2235968K proc_inode_cache
> 3139584 3138382  99%    0.01K   6132      512     24528K kmalloc-8
> 2983344 2983255  99%    0.19K  71032       42    568256K kmalloc-192
> 2426976 2426413  99%    1.00K  75843       32   2426976K kmalloc-1k
> 1939854 1931355  99%    0.09K  46187       42    184748K kmalloc-96
> 1649895 1649895 100%    2.06K 109993       15   3519776K sighand_cache
> 1280544 1280544 100%    1.00K  40017       32   1280544K UNIX
> 1052928 1050819  99%    0.50K  32904       32    526464K kmalloc-512
> 1029792 1029312  99%    0.25K  32181       32    257448K skbuff_head_cache
> 940624 940559  99%    4.00K 117578        8   3762496K kmalloc-4k
> 799895 787069  98%    5.75K 159979        5   5119328K task_struct
> 735696 724643  98%    0.10K  18864       39     75456K buffer_head
> 525504 525378  99%    2.00K  32844       16   1051008K kmalloc-2k
> 433024 426780  98%    0.06K   6766       64     27064K kmem_cache_node
> 310710 301758  97%    1.05K  10357       30    331424K ext4_inode_cache
> 292340 290078  99%    0.68K   6220       47    199040K shmem_inode_cache
> 215250 214814  99%    0.38K   5125       42     82000K kmem_cache
> 212296 196761  92%    0.57K   7582       28    121312K radix_tree_node
> 158464 158464 100%    0.02K    619      256      2476K kmalloc-16
> 149925 149925 100%    1.25K   5997       25    191904K UDPv6
>   71424  71140  99%    0.12K   2232       32      8928K kmalloc-128
>   70020  70020 100%    0.16K   1376       51     11008K kvm_mmu_page_header
>   40032  40009  99%    0.25K   1251       32     10008K kmalloc-256
>   34944  33823  96%    0.09K    832       42      3328K kmalloc-rcl-96
>   34816  32567  93%    0.06K    544       64      2176K kmalloc-rcl-64
> root at vn03:~# pct list
> root at vn03:~# qm list
>        VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
>         100 pbx                  running    2048              16.00 2994
>         101 backup               running    4096              32.00 3068
>         102 ns1                  running    2048              32.00 2927
>         103 puppet               running    10240             16.00 3183
>        2005 debbuild             running    32768             40.00 3254
>        2017 go-test-srv01        running    8192              20.00 3349
>        3002 monitor01            running    4096              32.00 3399
>        5001 salsa-runner-01      stopped    16384             32.00 0
>        6001 deduktiva-runner-01  stopped    32768             32.00 0
>        6901 mac                  stopped    4096               0.25 0
> root at vn03:~# sysctl -a | grep hugepages
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.nr_overcommit_hugepages = 0
> 
> 
> *** After shutdown of all VMs: ***
> 
> top - 10:39:56 up 22 days, 22:44,  2 users,  load average: 0.83, 1.84, 1.88
> Tasks: 491 total,   1 running, 490 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem :  80413.1 total,  18276.4 free,  52704.9 used,   9431.8 buff/cache
> MiB Swap:  20480.0 total,  19393.6 free,   1086.4 used.  26801.1 avail Mem
> 
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>     2417 root      rt   0  606908 211332  51444 S   1.0   0.3 613:46.50 /usr/sbin/corosync -f
>     2878 www-data  20   0  344800 133424  21784 S   0.0   0.2   0:36.09 pveproxy                                                                                                                                                           +
>   883317 www-data  20   0  361776 133084  11056 S   0.0   0.2   0:01.04 pveproxy worker                                                                                                                                                    +
>     2836 root      20   0  343228 132060  21764 S   0.0   0.2   0:38.88 pvedaemon                                                                                                                                                          +
>   883319 www-data  20   0  360688 130992  11148 S   1.0   0.2   0:01.26 pveproxy worker                                                                                                                                                    +
>   883318 www-data  20   0  358056 128864  11148 S   0.0   0.2   0:01.75 pveproxy worker                                                                                                                                                    +
>   883166 root      20   0  351912 121884  10220 S   0.0   0.1   0:00.96 pvedaemon worker                                                                                                                                                   +
>   883165 root      20   0  351848 121584   9952 S   0.0   0.1   0:00.40 pvedaemon worker                                                                                                                                                   +
>   883164 root      20   0  351712 121560  10060 S   0.0   0.1   0:00.65 pvedaemon worker                                                                                                                                                   +
>     2801 root      20   0  307252  92952  20996 S   0.0   0.1 323:07.31 pvestatd                                                                                                                                                           +
> 2023020 root      20   0  267408  90508  89344 S   0.0   0.1  15:48.85 /lib/systemd/systemd-journald
>     2899 www-data  20   0  121260  59804  12212 S   0.0   0.1   0:34.77 spiceproxy                                                                                                                                                         +
>   883544 www-data  20   0  121500  51260   3448 S   0.0   0.1   0:00.05 spiceproxy worker                                                                                                                                                  +
>   876236 root      20   0  524564  50188  37612 S   0.0   0.1   0:01.90 /usr/bin/pmxcfs
> 3771741 root      20   0  150776  30880   3264 S   0.0   0.0   0:12.86 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize
>     2799 root      20   0  316112  28352   5840 S   0.0   0.0  95:51.91 pve-firewall                                                                                                                                                       +
>     2909 root      20   0  325212  14196   5404 S   0.0   0.0   7:04.14 pve-ha-lrm                                                                                                                                                         +
>     2876 root      20   0  325564   9600   5224 S   0.0   0.0   4:18.33 pve-ha-crm                                                                                                                                                         +
>   868033 ch        20   0   21660   8844   7020 S   0.0   0.0   0:00.14 /lib/systemd/systemd --user
> 
> root at vn03:~# free -m
>                total        used        free      shared  buff/cache   available
> Mem:          80413       52700       18281         115        9431       26805
> Swap:         20479        1086       19393
> root at vn03:~# slabtop -o | head -50
>   Active / Total Objects (% used)    : 199865696 / 200976971 (99.4%)
>   Active / Total Slabs (% used)      : 4771440 / 4771440 (100.0%)
>   Active / Total Caches (% used)     : 114 / 161 (70.8%)
>   Active / Total Size (% used)       : 59688763.91K / 59945034.02K (99.6%)
>   Minimum / Average / Maximum Object : 0.01K / 0.30K / 16.62K
> 
>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 43540380 43499279  99%    0.20K 1116420       39   8931360K vm_area_struct
> 26459776 26457217  99%    0.06K 413434       64   1653736K anon_vma_chain
> 16782720 16429406  97%    0.25K 524460       32   4195680K filp
> 13075712 13074728  99%    0.03K 102154      128    408616K kmalloc-32
> 10104728 10103625  99%    0.09K 219668       46    878672K anon_vma
> 9599628 9599628 100%    0.04K  94114      102    376456K pde_opener
> 7442106 7442024  99%    0.19K 177193       42   1417544K cred_jar
> 7211280 7207550  99%    0.13K 240376       30    961504K kernfs_node_cache
> 5999322 5970370  99%    0.19K 142841       42   1142728K dentry
> 5691447 5691447 100%    0.08K 111597       51    446388K task_delay_info
> 5052594 5052594 100%    0.69K 109839       46   3514848K files_cache
> 4657408 4657315  99%    0.12K 145544       32    582176K pid
> 4590750 4590721  99%    1.06K 153025       30   4896800K mm_struct
> 4206400 4202839  99%    0.58K  76480       55   2447360K inode_cache
> 4091424 4091235  99%    0.62K  80224       51   2567168K sock_inode_cache
> 3903104 3901440  99%    0.06K  60986       64    243944K kmalloc-64
> 3855600 3855530  99%    1.06K 128520       30   4112640K signal_cache
> 3416133 3410170  99%    0.65K  69717       49   2230944K proc_inode_cache
> 3124224 3123017  99%    0.01K   6102      512     24408K kmalloc-8
> 2982840 2982826  99%    0.19K  71020       42    568160K kmalloc-192
> 2425760 2424977  99%    1.00K  75805       32   2425760K kmalloc-1k
> 1940694 1932266  99%    0.09K  46207       42    184828K kmalloc-96
> 1649415 1649346  99%    2.06K 109961       15   3518752K sighand_cache
> 1279520 1279520 100%    1.00K  39985       32   1279520K UNIX
> 1043392 1040142  99%    0.50K  32606       32    521696K kmalloc-512
> 1021152 1020672  99%    0.25K  31911       32    255288K skbuff_head_cache
> 938880 938777  99%    4.00K 117360        8   3755520K kmalloc-4k
> 797715 784886  98%    5.75K 159543        5   5105376K task_struct
> 713388 699031  97%    0.10K  18292       39     73168K buffer_head
> 643008  73139  11%    0.06K  10047       64     40188K dmaengine-unmap-2
> 525520 525326  99%    2.00K  32845       16   1051040K kmalloc-2k
> 432768 426806  98%    0.06K   6762       64     27048K kmem_cache_node
> 308100 298326  96%    1.05K  10270       30    328640K ext4_inode_cache
> 292387 289915  99%    0.68K   6221       47    199072K shmem_inode_cache
> 215250 214971  99%    0.38K   5125       42     82000K kmem_cache
> 212380 180327  84%    0.57K   7585       28    121360K radix_tree_node
> 157952 157952 100%    0.02K    617      256      2468K kmalloc-16
> 150150 150150 100%    1.25K   6006       25    192192K UDPv6
>   71008  70660  99%    0.12K   2219       32      8876K kmalloc-128
>   40064  40056  99%    0.25K   1252       32     10016K kmalloc-256
>   34986  34259  97%    0.09K    833       42      3332K kmalloc-rcl-96
>   34368  32733  95%    0.06K    537       64      2148K kmalloc-rcl-64
>   33660  33300  98%    0.05K    396       85      1584K ftrace_event_field
> 
> 
> 
> typical VM config:
> 
> balloon: 0
> bootdisk: virtio0
> cores: 2
> cpu: Haswell-noTSX
> ide2: none,media=cdrom
> memory: 4096
> name: backup
> net0: virtio=52:54:00:b7:e0:ba,bridge=vmbr100
> numa: 0
> onboot: 1
> ostype: l26
> scsihw: virtio-scsi-pci
> serial0: socket
> smbios1: uuid=39d362a5-6bae-41b7-9803-b76279e2280f
> sockets: 1
> virtio0: datastore:vm-101-disk-1,cache=writeback,size=32G
> virtio1: datastore:vm-101-disk-2,cache=writeback,size=100G
> 
> 




More information about the pve-user mailing list