[PVE-User] Debian buster, systemd, container and nesting=1

Stoiko Ivanov s.ivanov at proxmox.com
Thu Feb 27 16:26:08 CET 2020

On Wed, 26 Feb 2020 12:01:56 +0100
Marco Gaiarin <gaio at sv.lnf.it> wrote:

> Mandi! Stoiko Ivanov
>   In chel di` si favelave...
> > > i can convert this container to an unprivileged ones, but other no, for
> > > examples some containers are samba domain controller, that need a
> > > privileged container.  
> > not sure - but why would a samba need to be privileged?  
> 	https://lists.samba.org/archive/samba/2019-December/227626.html
> samba, as AD Domain Controller, not as general 'share service', need
> the use of 'SYSTEM' namespace, that in containers is reserved by root.
> Indeed, if there's some 'caps' to relax that permit to use system
> namespace with unprivileged containers, they are welcomed!
AFAICU one robust (although not very performant way) to run a AD DC with
NTACLs on a unprivileged container would be to use the xattr_tdb module
(not actively tested though):

> > > There's another/better way to make systemd work on containers?  
> > I guess my preferred actions in order:
> > * setup new unprivileged container and migrate the workload/services from
> >   the old one (optionally enabling nesting if needed)
> > * try backup/restore to get a privileged container to an unprivileged one
> > * keep the privileged container with nesting off
> > * migrate the setup into a qemu-guest
> > * edit the unit files of the affected services (e.g. apache) - usually
> >   it's the PrivateTmp option which causes this (it wants to mount --rbind
> >   -o rw /) - and drop the PrivateTmp option (see [0])
> > * consider making an apparmor override for this particular mount
> >   combination+container (which also can potentially be a security hole
> >   (some apparmor rules are bound to absolute paths and using rbind you can
> >   change the path)
> > * turn on nesting for a privileged container (keep in mind that you then
> >   open it up quite a bit for breakouts)
> > of course probably not all of those options can be applied in your
> > environment.
> > [0]https://forum.proxmox.com/threads/apache2-service-failed-to-set-up-mount-namespacing-permission-denied.56871/  
> Mmmh... i'm a bit confused.
> Firstly, it is not clear to me if nesting is needed because the
> container is privileged, or privileged/unprivileged and nesting/non
> nesting are property totally indipendent.
They are independent - a good explanation of what nesting does can be
found in our source:
(it allows among other things to mount /proc, and /sys, which is
problematic for privileged containers

The issue with apache('s systemd-unit) in the privileged container, is
that the mount is denied by apparmor (the apparmor rules are stricter for
privileged containers, than for unprivileged, because if someone breaks
out of an unprivileged container they are only a regular user on the host)

I hope this explains it.

> Second, in a PVE6 installation i've creared a debian buster container
> (unprivileged, without nesting), installed apache and run correctly,
> without tackling systemd units:
>  root at vbaculalpb:~# systemctl status apache2
>  ● apache2.service - The Apache HTTP Server
>    Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
>    Active: active (running) since Wed 2020-02-26 11:35:29 CET; 15min ago
>      Docs: https://httpd.apache.org/docs/2.4/
>   Main PID: 1992 (apache2)
>     Tasks: 54 (limit: 4915)
>    Memory: 6.7M
>    CGroup: /system.slice/apache2.service
>            ├─1992 /usr/sbin/apache2 -k start
>            ├─1994 /usr/sbin/apache2 -k start
>            └─1995 /usr/sbin/apache2 -k start
>  feb 26 11:35:29 vbaculalpb systemd[1]: Starting The Apache HTTP Server...
>  feb 26 11:35:29 vbaculalpb systemd[1]: Started The Apache HTTP Server.
>  root at vbaculalpb:~# systemctl show apache2 | grep PrivateTmp
>  PrivateTmp=yes
> This could lead to the answer to first question (nesting is needed only
> for privileged containers), but also could lead to the fact that
> container management could be diffierent between PVE5 (the original
> request) and PVE6 (this test).
> So, thanks for the answer but i hope in some more clue.

More information about the pve-user mailing list