[PVE-User] Kernel Memory Leak on PVE6?
Aaron Lauterer
a.lauterer at proxmox.com
Fri Sep 20 14:58:38 CEST 2019
Curious, I do have a very similar case at the moment with a slab of
~155GB, out of ~190GB RAM installed.
I am not sure yet what causes it but things I plan to investigate are:
* hanging NFS mount
* possible (PVE) service starting too many threads -> restarting each
and checking the memory / slab usage.
On 9/20/19 2:31 PM, Chris Hofstaedtler | Deduktiva wrote:
> Hi,
>
> I'm seeing a very interesting problem on PVE6: one of our machines
> appears to leak kernel memory over time, up to the point where only
> a reboot helps. Shutting down all KVM VMs does not release this
> memory.
>
> I'll attach some information below, because I just couldn't figure
> out what this memory is used for. Once before shutting down the VMs,
> and once after. I had to reboot the PVE host now, but I guess
> in a few days it will be at least noticable again.
>
> This machine has the same (except CPU) hardware as the box next to
> it; however this one was freshly installed with PVE6, the other one
> is an upgrade from PVE5 and doesn't exhibit this problem. It's quite
> puzzling because I haven't seen this symptom at all at all the
> customer installations.
>
> Here are some graphs showing the memory consumption over time:
> http://zeha.at/~ch/T/20190920-pve6_meminfo_0.png
> http://zeha.at/~ch/T/20190920-pve6_meminfo_1.png
>
> Looking forward to any debug help, suggestions, ...
>
> Chris
>
>
> ** Almost out of memory, before VM shutdown: **
>
> top - 10:24:19 up 22 days, 22:29, 1 user, load average: 1.85, 1.57, 1.32
> Tasks: 530 total, 1 running, 529 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 1.8 us, 0.4 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> MiB Mem : 80413.1 total, 509.9 free, 70879.7 used, 9023.5 buff/cache
> MiB Swap: 20480.0 total, 6516.6 free, 13963.4 used. 8699.0 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3183 root 20 0 10.6g 6.0g 2960 S 8.7 7.6 5861:52 /usr/bin/kvm -id 103 -name puppet -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
> 3349 root 20 0 9266032 4.3g 2972 S 6.8 5.4 3834:41 /usr/bin/kvm -id 2017 -name go-test-srv01 -chardev socket,id=qmp,path=/var/run/qemu-server/2017.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=+
> 3068 root 20 0 5060928 3.7g 2900 S 6.8 4.7 3110:01 /usr/bin/kvm -id 101 -name backup -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event+
> 3399 root 20 0 5094772 2.3g 2944 S 50.5 2.9 10780:07 /usr/bin/kvm -id 3002 -name monitor01 -chardev socket,id=qmp,path=/var/run/qemu-server/3002.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-+
> 3254 root 20 0 32.8g 1.9g 3040 S 1.0 2.4 490:39.29 /usr/bin/kvm -id 2005 -name debbuild -chardev socket,id=qmp,path=/var/run/qemu-server/2005.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-e+
> 2994 root 20 0 2656268 658428 2980 S 9.7 0.8 2895:15 /usr/bin/kvm -id 100 -name pbx -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
> 2927 root 20 0 2664232 479372 2944 S 6.8 0.6 2343:43 /usr/bin/kvm -id 102 -name ns1 -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
> 2417 root rt 0 606912 211336 51444 S 1.9 0.3 613:27.87 /usr/sbin/corosync -f
> 2023020 root 20 0 246556 98020 97044 S 0.0 0.1 15:47.80 /lib/systemd/systemd-journald
> 1806 root 20 0 967944 32724 23612 S 0.0 0.0 53:49.62 /usr/bin/pmxcfs
> 2801 root 20 0 314488 32428 6464 S 0.0 0.0 322:58.23 pvestatd +
> 3771741 root 20 0 150776 31728 3700 S 0.0 0.0 0:12.81 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize
> 2799 root 20 0 316056 27452 5656 S 0.0 0.0 95:49.25 pve-firewall +
> 2909 root 20 0 325248 12684 5268 S 1.0 0.0 7:03.91 pve-ha-lrm +
> 868033 ch 20 0 21660 9104 7280 S 0.0 0.0 0:00.12 /lib/systemd/systemd --user
> 868009 root 20 0 16912 7988 6856 S 0.0 0.0 0:00.03 sshd: ch [priv]
> 1 root 20 0 171820 7640 5032 S 0.0 0.0 19:58.80 /lib/systemd/systemd --system --deserialize 37
> 2876 root 20 0 325544 7124 4988 S 0.0 0.0 4:18.16 pve-ha-crm +
> 1654 Debian-+ 20 0 40488 7096 2864 S 0.0 0.0 77:37.18 /usr/sbin/snmpd -Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf -f -p /run/snmpd.pid
> 868045 ch 20 0 10240 5404 3996 S 0.0 0.0 0:00.11 -zsh
> 868044 ch 20 0 16912 4636 3492 S 0.0 0.0 0:00.02 sshd: ch at pts/0
> 1644 root 20 0 29608 4520 3496 S 0.0 0.0 4:59.62 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
> 868336 root 20 0 7716 4372 3092 S 0.0 0.0 0:00.03 -bash
> 1761096 root 20 0 351564 4180 3336 S 0.0 0.0 1:12.83 pvedaemon worker +
> 1776171 root 20 0 351696 4076 3352 S 0.0 0.0 1:18.27 pvedaemon worker +
> 868370 root 20 0 11680 4016 2964 R 2.9 0.0 0:00.68 top
> 1780591 root 20 0 351696 4008 3248 S 0.0 0.0 1:11.73 pvedaemon worker +
> 1086 root 20 0 19540 3984 3720 S 0.0 0.0 3:11.21 /lib/systemd/systemd-logind
> 868335 root 20 0 10156 3788 3364 S 0.0 0.0 0:00.01 sudo -i
> 2899 www-data 20 0 121256 3412 3080 S 0.0 0.0 0:33.99 spiceproxy +
> 2000791 www-data 20 0 344932 3412 2604 S 0.0 0.0 1:16.39 pveproxy worker +
> 2000792 www-data 20 0 344932 3348 2604 S 0.0 0.0 1:07.07 pveproxy worker +
> 1251 root 20 0 225816 3296 2424 S 0.0 0.0 9:47.44 /usr/sbin/rsyslogd -n -iNONE
> 1258 message+ 20 0 9212 3268 2820 S 0.0 0.0 6:41.36 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
>
> root at vn03:~# uname -a
> Linux vn03 5.0.21-1-pve #1 SMP PVE 5.0.21-1 (Tue, 20 Aug 2019 17:16:32 +0200) x86_64 GNU/Linux
> root at vn03:~# free -m
> total used free shared buff/cache available
> Mem: 80413 70877 515 101 9019 8708
> Swap: 20479 13963 6516
> root at vn03:~# dpkg -l pve\*
> Desired=Unknown/Install/Remove/Purge/Hold
> | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version Architecture Description
> +++-=======================-============-============-======================================================
> ii pve-cluster 6.0-5 amd64 Cluster Infrastructure for Proxmox Virtual Environment
> ii pve-container 3.0-5 all Proxmox VE Container management tool
> ii pve-docs 6.0-4 all Proxmox VE Documentation
> ii pve-edk2-firmware 2.20190614-1 all edk2 based firmware modules for virtual machines
> ii pve-firewall 4.0-7 amd64 Proxmox VE Firewall
> ii pve-firmware 3.0-2 all Binary firmware code for the pve-kernel
> ii pve-ha-manager 3.0-2 amd64 Proxmox VE HA Manager
> ii pve-i18n 2.0-2 all Internationalization support for Proxmox VE
> un pve-kernel <none> <none> (no description available)
> ii pve-kernel-5.0 6.0-7 all Latest Proxmox VE Kernel Image
> ii pve-kernel-5.0.15-1-pve 5.0.15-1 amd64 The Proxmox PVE Kernel Image
> ii pve-kernel-5.0.18-1-pve 5.0.18-3 amd64 The Proxmox PVE Kernel Image
> ii pve-kernel-5.0.21-1-pve 5.0.21-1 amd64 The Proxmox PVE Kernel Image
> ii pve-kernel-helper 6.0-7 all Function for various kernel maintenance tasks.
> un pve-kvm <none> <none> (no description available)
> ii pve-manager 6.0-6 amd64 Proxmox Virtual Environment Management Tools
> ii pve-qemu-kvm 4.0.0-5 amd64 Full virtualization on x86 hardware
> un pve-qemu-kvm-2.6.18 <none> <none> (no description available)
> ii pve-xtermjs 3.13.2-1 all HTML/JS Shell client
> root at vn03:~# slabtop -o | head -50
> Active / Total Objects (% used) : 205425461 / 212231433 (96.8%)
> Active / Total Slabs (% used) : 4949759 / 4949759 (100.0%)
> Active / Total Caches (% used) : 114 / 161 (70.8%)
> Active / Total Size (% used) : 60112896.56K / 60714678.54K (99.0%)
> Minimum / Average / Maximum Object : 0.01K / 0.29K / 16.62K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 43583592 43542487 99% 0.20K 1117528 39 8940224K vm_area_struct
> 26520256 26518592 99% 0.06K 414379 64 1657516K anon_vma_chain
> 16788000 16434450 97% 0.25K 524625 32 4197000K filp
> 13079680 13078464 99% 0.03K 102185 128 408740K kmalloc-32
> 11544320 5261058 45% 0.06K 180380 64 721520K dmaengine-unmap-2
> 10128740 10127452 99% 0.09K 220190 46 880760K anon_vma
> 9602484 9602484 100% 0.04K 94142 102 376568K pde_opener
> 7442736 7442572 99% 0.19K 177208 42 1417664K cred_jar
> 7213200 7209695 99% 0.13K 240440 30 961760K kernfs_node_cache
> 6023850 5992341 99% 0.19K 143425 42 1147400K dentry
> 5704350 5704350 100% 0.08K 111850 51 447400K task_delay_info
> 5054066 5054066 100% 0.69K 109871 46 3515872K files_cache
> 4664512 4664481 99% 0.12K 145766 32 583064K pid
> 4591440 4591440 100% 1.06K 153048 30 4897536K mm_struct
> 4207445 4203908 99% 0.58K 76499 55 2447968K inode_cache
> 4104480 4104291 99% 0.62K 80480 51 2575360K sock_inode_cache
> 3901440 3900588 99% 0.06K 60960 64 243840K kmalloc-64
> 3856230 3856160 99% 1.06K 128541 30 4113312K signal_cache
> 3423826 3417982 99% 0.65K 69874 49 2235968K proc_inode_cache
> 3139584 3138382 99% 0.01K 6132 512 24528K kmalloc-8
> 2983344 2983255 99% 0.19K 71032 42 568256K kmalloc-192
> 2426976 2426413 99% 1.00K 75843 32 2426976K kmalloc-1k
> 1939854 1931355 99% 0.09K 46187 42 184748K kmalloc-96
> 1649895 1649895 100% 2.06K 109993 15 3519776K sighand_cache
> 1280544 1280544 100% 1.00K 40017 32 1280544K UNIX
> 1052928 1050819 99% 0.50K 32904 32 526464K kmalloc-512
> 1029792 1029312 99% 0.25K 32181 32 257448K skbuff_head_cache
> 940624 940559 99% 4.00K 117578 8 3762496K kmalloc-4k
> 799895 787069 98% 5.75K 159979 5 5119328K task_struct
> 735696 724643 98% 0.10K 18864 39 75456K buffer_head
> 525504 525378 99% 2.00K 32844 16 1051008K kmalloc-2k
> 433024 426780 98% 0.06K 6766 64 27064K kmem_cache_node
> 310710 301758 97% 1.05K 10357 30 331424K ext4_inode_cache
> 292340 290078 99% 0.68K 6220 47 199040K shmem_inode_cache
> 215250 214814 99% 0.38K 5125 42 82000K kmem_cache
> 212296 196761 92% 0.57K 7582 28 121312K radix_tree_node
> 158464 158464 100% 0.02K 619 256 2476K kmalloc-16
> 149925 149925 100% 1.25K 5997 25 191904K UDPv6
> 71424 71140 99% 0.12K 2232 32 8928K kmalloc-128
> 70020 70020 100% 0.16K 1376 51 11008K kvm_mmu_page_header
> 40032 40009 99% 0.25K 1251 32 10008K kmalloc-256
> 34944 33823 96% 0.09K 832 42 3328K kmalloc-rcl-96
> 34816 32567 93% 0.06K 544 64 2176K kmalloc-rcl-64
> root at vn03:~# pct list
> root at vn03:~# qm list
> VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
> 100 pbx running 2048 16.00 2994
> 101 backup running 4096 32.00 3068
> 102 ns1 running 2048 32.00 2927
> 103 puppet running 10240 16.00 3183
> 2005 debbuild running 32768 40.00 3254
> 2017 go-test-srv01 running 8192 20.00 3349
> 3002 monitor01 running 4096 32.00 3399
> 5001 salsa-runner-01 stopped 16384 32.00 0
> 6001 deduktiva-runner-01 stopped 32768 32.00 0
> 6901 mac stopped 4096 0.25 0
> root at vn03:~# sysctl -a | grep hugepages
> vm.nr_hugepages = 0
> vm.nr_hugepages_mempolicy = 0
> vm.nr_overcommit_hugepages = 0
>
>
> *** After shutdown of all VMs: ***
>
> top - 10:39:56 up 22 days, 22:44, 2 users, load average: 0.83, 1.84, 1.88
> Tasks: 491 total, 1 running, 490 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> MiB Mem : 80413.1 total, 18276.4 free, 52704.9 used, 9431.8 buff/cache
> MiB Swap: 20480.0 total, 19393.6 free, 1086.4 used. 26801.1 avail Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2417 root rt 0 606908 211332 51444 S 1.0 0.3 613:46.50 /usr/sbin/corosync -f
> 2878 www-data 20 0 344800 133424 21784 S 0.0 0.2 0:36.09 pveproxy +
> 883317 www-data 20 0 361776 133084 11056 S 0.0 0.2 0:01.04 pveproxy worker +
> 2836 root 20 0 343228 132060 21764 S 0.0 0.2 0:38.88 pvedaemon +
> 883319 www-data 20 0 360688 130992 11148 S 1.0 0.2 0:01.26 pveproxy worker +
> 883318 www-data 20 0 358056 128864 11148 S 0.0 0.2 0:01.75 pveproxy worker +
> 883166 root 20 0 351912 121884 10220 S 0.0 0.1 0:00.96 pvedaemon worker +
> 883165 root 20 0 351848 121584 9952 S 0.0 0.1 0:00.40 pvedaemon worker +
> 883164 root 20 0 351712 121560 10060 S 0.0 0.1 0:00.65 pvedaemon worker +
> 2801 root 20 0 307252 92952 20996 S 0.0 0.1 323:07.31 pvestatd +
> 2023020 root 20 0 267408 90508 89344 S 0.0 0.1 15:48.85 /lib/systemd/systemd-journald
> 2899 www-data 20 0 121260 59804 12212 S 0.0 0.1 0:34.77 spiceproxy +
> 883544 www-data 20 0 121500 51260 3448 S 0.0 0.1 0:00.05 spiceproxy worker +
> 876236 root 20 0 524564 50188 37612 S 0.0 0.1 0:01.90 /usr/bin/pmxcfs
> 3771741 root 20 0 150776 30880 3264 S 0.0 0.0 0:12.86 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent --no-daemonize
> 2799 root 20 0 316112 28352 5840 S 0.0 0.0 95:51.91 pve-firewall +
> 2909 root 20 0 325212 14196 5404 S 0.0 0.0 7:04.14 pve-ha-lrm +
> 2876 root 20 0 325564 9600 5224 S 0.0 0.0 4:18.33 pve-ha-crm +
> 868033 ch 20 0 21660 8844 7020 S 0.0 0.0 0:00.14 /lib/systemd/systemd --user
>
> root at vn03:~# free -m
> total used free shared buff/cache available
> Mem: 80413 52700 18281 115 9431 26805
> Swap: 20479 1086 19393
> root at vn03:~# slabtop -o | head -50
> Active / Total Objects (% used) : 199865696 / 200976971 (99.4%)
> Active / Total Slabs (% used) : 4771440 / 4771440 (100.0%)
> Active / Total Caches (% used) : 114 / 161 (70.8%)
> Active / Total Size (% used) : 59688763.91K / 59945034.02K (99.6%)
> Minimum / Average / Maximum Object : 0.01K / 0.30K / 16.62K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 43540380 43499279 99% 0.20K 1116420 39 8931360K vm_area_struct
> 26459776 26457217 99% 0.06K 413434 64 1653736K anon_vma_chain
> 16782720 16429406 97% 0.25K 524460 32 4195680K filp
> 13075712 13074728 99% 0.03K 102154 128 408616K kmalloc-32
> 10104728 10103625 99% 0.09K 219668 46 878672K anon_vma
> 9599628 9599628 100% 0.04K 94114 102 376456K pde_opener
> 7442106 7442024 99% 0.19K 177193 42 1417544K cred_jar
> 7211280 7207550 99% 0.13K 240376 30 961504K kernfs_node_cache
> 5999322 5970370 99% 0.19K 142841 42 1142728K dentry
> 5691447 5691447 100% 0.08K 111597 51 446388K task_delay_info
> 5052594 5052594 100% 0.69K 109839 46 3514848K files_cache
> 4657408 4657315 99% 0.12K 145544 32 582176K pid
> 4590750 4590721 99% 1.06K 153025 30 4896800K mm_struct
> 4206400 4202839 99% 0.58K 76480 55 2447360K inode_cache
> 4091424 4091235 99% 0.62K 80224 51 2567168K sock_inode_cache
> 3903104 3901440 99% 0.06K 60986 64 243944K kmalloc-64
> 3855600 3855530 99% 1.06K 128520 30 4112640K signal_cache
> 3416133 3410170 99% 0.65K 69717 49 2230944K proc_inode_cache
> 3124224 3123017 99% 0.01K 6102 512 24408K kmalloc-8
> 2982840 2982826 99% 0.19K 71020 42 568160K kmalloc-192
> 2425760 2424977 99% 1.00K 75805 32 2425760K kmalloc-1k
> 1940694 1932266 99% 0.09K 46207 42 184828K kmalloc-96
> 1649415 1649346 99% 2.06K 109961 15 3518752K sighand_cache
> 1279520 1279520 100% 1.00K 39985 32 1279520K UNIX
> 1043392 1040142 99% 0.50K 32606 32 521696K kmalloc-512
> 1021152 1020672 99% 0.25K 31911 32 255288K skbuff_head_cache
> 938880 938777 99% 4.00K 117360 8 3755520K kmalloc-4k
> 797715 784886 98% 5.75K 159543 5 5105376K task_struct
> 713388 699031 97% 0.10K 18292 39 73168K buffer_head
> 643008 73139 11% 0.06K 10047 64 40188K dmaengine-unmap-2
> 525520 525326 99% 2.00K 32845 16 1051040K kmalloc-2k
> 432768 426806 98% 0.06K 6762 64 27048K kmem_cache_node
> 308100 298326 96% 1.05K 10270 30 328640K ext4_inode_cache
> 292387 289915 99% 0.68K 6221 47 199072K shmem_inode_cache
> 215250 214971 99% 0.38K 5125 42 82000K kmem_cache
> 212380 180327 84% 0.57K 7585 28 121360K radix_tree_node
> 157952 157952 100% 0.02K 617 256 2468K kmalloc-16
> 150150 150150 100% 1.25K 6006 25 192192K UDPv6
> 71008 70660 99% 0.12K 2219 32 8876K kmalloc-128
> 40064 40056 99% 0.25K 1252 32 10016K kmalloc-256
> 34986 34259 97% 0.09K 833 42 3332K kmalloc-rcl-96
> 34368 32733 95% 0.06K 537 64 2148K kmalloc-rcl-64
> 33660 33300 98% 0.05K 396 85 1584K ftrace_event_field
>
>
>
> typical VM config:
>
> balloon: 0
> bootdisk: virtio0
> cores: 2
> cpu: Haswell-noTSX
> ide2: none,media=cdrom
> memory: 4096
> name: backup
> net0: virtio=52:54:00:b7:e0:ba,bridge=vmbr100
> numa: 0
> onboot: 1
> ostype: l26
> scsihw: virtio-scsi-pci
> serial0: socket
> smbios1: uuid=39d362a5-6bae-41b7-9803-b76279e2280f
> sockets: 1
> virtio0: datastore:vm-101-disk-1,cache=writeback,size=32G
> virtio1: datastore:vm-101-disk-2,cache=writeback,size=100G
>
>
More information about the pve-user
mailing list