[pve-devel] [PATCH ct/common] mount point hotplugging & new mount api

Thomas Lamprecht t.lamprecht at proxmox.com
Mon Nov 11 17:13:27 CET 2019


On 11/8/19 11:06 AM, Wolfgang Bumiller wrote:
> The pve-common path of this patch set should be straight forward:
> minor additions to ProcFSTools and Tools, as well as the new mount api
> constants added to Syscall.pm.
> 

It is, thus applied that one already.

> The container part then makes use of the new mount api in case the
> currently running kernel supports it. The hope for the future would be
> to simplify the code a bit once we can stop supporting kernels older
> than 5.2.

So 7.X material right there ^^

> For now, it starts with the ability to stage a mount point, and then
> moves on to changing the startup process to use this.
> Previously, the startup goes through the mount points in order and
> mounts them directly at the target location. This is prone to symlink
> attacks (especially when using nested shared bind mounts).
> When staging a mount in a fixed directory first, we can pick it up
> afterwards with the new `open_tree()` syscall, and move it in place with
> the new `move_mount()` syscall, which can work relative to directory
> file descriptors and has flags for whether or not the paths are allowed
> to follow symlinks. (In the future this can be hardened even more using
> `openat2()` using the container's root directory as "implicit chroot"
> while looking up the target directory and then issuing a `move_mount()`
> right onto the resulting path file descriptor via
> `MOVE_MOUNT_T_EMPTY_PATH`.)
> 
> The main advantage of the new API however, is that we can pick up the
> mounts as file descriptors, then switch into the running container's
> mount namespace and `move_mount()` the mount point in place, without
> having to rely on an existing MS_SHARED mount point "hack". Hence the
> final patch adds support for mount point hotplugging - but only hotplug,
> not un-plug, since unmounting has a lot of issues (open file
> descriptors, unshared MS_PRIVATE mount namespaces referencing the mount
> (as well as those namespaces opened as file descriptors...), mounts
> having been moved (if they were previously hotplugged at least), ...).

sounds all good, need to take a closer look at the meat tough :)
Oguz, can you please give this also a testing spin?

> 
> Wolfgang Bumiller (8):
>   implement "staged mountpoints"
>   add open_pid_fd, open_lxc_pid, open_ppid helpers
>   split open_namespace out of enter_namespace
>   add get_container_namespace helper
>   add mount stage directory helpers
>   prestart-hook: use staged mountpoints on newer kernels
>   config: vmconfig_apply_pending_mountpoint helper
>   implement mountpoint hotplugging
> 
>  src/PVE/LXC.pm            | 183 ++++++++++++++++++++++++++++++++++++--
>  src/PVE/LXC/Config.pm     |  87 ++++++++++++------
>  src/lxc-pve-prestart-hook |  79 +++++++++++++---
>  3 files changed, 304 insertions(+), 45 deletions(-)
> 




More information about the pve-devel mailing list