[pve-devel] [PATCH ct/common] mount point hotplugging & new mount api
Thomas Lamprecht
t.lamprecht at proxmox.com
Mon Nov 11 17:13:27 CET 2019
On 11/8/19 11:06 AM, Wolfgang Bumiller wrote:
> The pve-common path of this patch set should be straight forward:
> minor additions to ProcFSTools and Tools, as well as the new mount api
> constants added to Syscall.pm.
>
It is, thus applied that one already.
> The container part then makes use of the new mount api in case the
> currently running kernel supports it. The hope for the future would be
> to simplify the code a bit once we can stop supporting kernels older
> than 5.2.
So 7.X material right there ^^
> For now, it starts with the ability to stage a mount point, and then
> moves on to changing the startup process to use this.
> Previously, the startup goes through the mount points in order and
> mounts them directly at the target location. This is prone to symlink
> attacks (especially when using nested shared bind mounts).
> When staging a mount in a fixed directory first, we can pick it up
> afterwards with the new `open_tree()` syscall, and move it in place with
> the new `move_mount()` syscall, which can work relative to directory
> file descriptors and has flags for whether or not the paths are allowed
> to follow symlinks. (In the future this can be hardened even more using
> `openat2()` using the container's root directory as "implicit chroot"
> while looking up the target directory and then issuing a `move_mount()`
> right onto the resulting path file descriptor via
> `MOVE_MOUNT_T_EMPTY_PATH`.)
>
> The main advantage of the new API however, is that we can pick up the
> mounts as file descriptors, then switch into the running container's
> mount namespace and `move_mount()` the mount point in place, without
> having to rely on an existing MS_SHARED mount point "hack". Hence the
> final patch adds support for mount point hotplugging - but only hotplug,
> not un-plug, since unmounting has a lot of issues (open file
> descriptors, unshared MS_PRIVATE mount namespaces referencing the mount
> (as well as those namespaces opened as file descriptors...), mounts
> having been moved (if they were previously hotplugged at least), ...).
sounds all good, need to take a closer look at the meat tough :)
Oguz, can you please give this also a testing spin?
>
> Wolfgang Bumiller (8):
> implement "staged mountpoints"
> add open_pid_fd, open_lxc_pid, open_ppid helpers
> split open_namespace out of enter_namespace
> add get_container_namespace helper
> add mount stage directory helpers
> prestart-hook: use staged mountpoints on newer kernels
> config: vmconfig_apply_pending_mountpoint helper
> implement mountpoint hotplugging
>
> src/PVE/LXC.pm | 183 ++++++++++++++++++++++++++++++++++++--
> src/PVE/LXC/Config.pm | 87 ++++++++++++------
> src/lxc-pve-prestart-hook | 79 +++++++++++++---
> 3 files changed, 304 insertions(+), 45 deletions(-)
>
More information about the pve-devel
mailing list