[pve-devel] proxmox training week : error starting lxc with network interface

Tue Mar 12 08:55:30 CET 2019

On Tue, Mar 12, 2019 at 08:40:44AM +0100, Alexandre DERUMIER wrote:
> Also,
> 
> not related, but I have noticed than when container don't start,
> 
> the cgroup is not removed 
> 
> /sys/fs/cgroup/*/lxc/<ctid>
> 
> and when you start the ct again, a new cgroup is created with a suffix
> 
> /sys/fs/cgroup/*/lxc/<citd>-1
> 
> 
> That mean than if you fix the mac address, then start the CT again, it's working,
> 
> but dynamic cpus,memory changes don't work anymore, because we use the wrong cgroup (we always use /sys/fs/cgroup/*/lxc/<ctid>)
> 
> I don't known if they are a clean way to remove the cgroup if ct crash ? (systemd ?)

lxc is supposed to do the cleanup. There are some cases where that
doesn't seems to fail, so if you can narrow it down to a simple config
where this happens please report a bug. Either on our bugzilla, or if
you can narrow it down to an even smaller /ver/lib/lxc/*/config directly
on the lxc issue tracker.
I've been meaning to track these cases down and take a closer look, but
usually when I run into that I'm busy tracking something else down and
then forget to reproduce the cleanup bug afterwards... :-\

If you don't have this in your shell-history yet, here's a cleanup to
copy-paste (the `-depth` part is the key since cgroups need to be
removed starting from the inner-most directory):
# find /sys/fs/cgroup/*/lxc/<insert vmid here>* -depth -type d -print -delete

As for systemd - the cgroup's not really part of one of its services, so
it doesn't consider that its job - although... it's possible that it
works with the new lxc.monitor/lxc.payload layout - but for that we'd
have to adapt quite a bit of our code.

Another option might be doing this in our ExecStopPost hook, actually.
(But ideally we do first figure out why it fails in the first place and
try to fix the root cause.)