[pve-devel] applied: [PATCH container] fix #1042: inotify: increase watches, instances & queue default limits

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Jul 17 18:43:42 CEST 2019


Some recent distributions running as a LXC container eat the relative
low default limits up very fast. Thus increase all those
(semi-related) limits by a factor of 512. This was chosen by using
one of our bigger know CT setup (~1500 CTs per host) and the fact
that I can have only a very low count (circa 5 - 7) of running
"inotify watch hungry" CTs (e.g., ones with a recent systemd > 240).

So, as 5 * 512 is well >> 1500, we can assume with confidence to
allow most reasonable and existing setups by default.

As with the kernel commit d46eb14b735b11927d4bdc2d1854c311af19de6d
"fs: fsnotify: account fsnotify metadata to kmemcg" [0] the memory
usage from the watch and queue overhead is accounted to the users
respective memory CGroup (i.e., for LXC containers their memory
limit) we can do this without to much fear of negative implications.

[0]: https://git.kernel.org/torvalds/c/d46eb14b735b11927d4bdc2d1854c311af19de6d

Don't change the hardcoded kernel default values directly though,
ship a sysctl.d configuration file, which is a bit more transparent
about what happens and can be shipped by the component needing this
(i.e., pve-container).

Follow the considerations of `man 5 sysctl.d` for shipping:
> Packages should install their configuration files in /lib/. Files
> in /etc/ are reserved for the local administrator, who may use this
> logic to override the configuration files installed by vendor
> packages. All configuration files are sorted by their filename in
> lexicographic order, regardless of which of the directories they
> reside in. If multiple files specify the same option, the entry in
> the file with the lexicographically latest name will take
> precedence. It is recommended to prefix all filenames with a
> two-digit number and a dash, to simplify the ordering of the files.

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---

superseeds my earlier patch for the Kernel with same commit subject[1]
[1]: https://pve.proxmox.com/pipermail/pve-devel/2019-July/038254.html

 src/10-pve-ct-inotify-limits.conf | 12 ++++++++++++
 src/Makefile                      |  1 +
 2 files changed, 13 insertions(+)
 create mode 100644 src/10-pve-ct-inotify-limits.conf

diff --git a/src/10-pve-ct-inotify-limits.conf b/src/10-pve-ct-inotify-limits.conf
new file mode 100644
index 0000000..1194231
--- /dev/null
+++ b/src/10-pve-ct-inotify-limits.conf
@@ -0,0 +1,12 @@
+# increase kernel hardcoded defaults by a factor of 512 to allow running more
+# than a very limited count of inotfiy hungry CTs (i.e., those with newer
+# systemd >= 240). This can be done as the memory used by the queued events and
+# watches is accounted to the respective memory CGroup.
+# One can override this by using a /etc/sysctl.d/*.conf file
+
+# 2^23
+fs.inotify.max_queued_events = 8388608
+# 2^16
+fs.inotify.max_user_instances = 65536
+# 2^22
+fs.inotify.max_user_watches = 4194304
diff --git a/src/Makefile b/src/Makefile
index b0c30de..e930a77 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -67,6 +67,7 @@ install: pct lxc-pve.conf lxc-pve-prestart-hook lxc-pve-autodev-hook lxc-pve-pos
 	install -m 0644 pct.conf.5 ${MAN5DIR}
 	gzip -9 ${MAN5DIR}/pct.conf.5
 	cd ${MAN5DIR}; ln -s pct.conf.5.gz ct.conf.5.gz
+	install -D -m 0644 10-pve-ct-inotify-limits.conf ${LIBDIR}/sysctl.d/10-pve-ct-inotify-limits.conf
 	
 	# Note: for backwards compatibility only
 	# lxc at .service.d, snippet and reboot script can be removed in PVE 6.0
-- 
2.20.1





More information about the pve-devel mailing list