[pve-devel] proxmox training week : error starting lxc with network interface

Tue Mar 12 09:04:19 CET 2019

>>lxc is supposed to do the cleanup.

ok great. I didn't known if it could happen in case of a crash.

 There are some cases where that 
>>doesn't seems to fail, so if you can narrow it down to a simple config 
>>where this happens please report a bug.

Just done it.

https://bugzilla.proxmox.com/show_bug.cgi?id=2130

it's a simple config with the wrong multicast macaddress.

----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "Alexandre Derumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mardi 12 Mars 2019 08:55:30
Objet: Re: [pve-devel] proxmox training week : error starting lxc with network interface

On Tue, Mar 12, 2019 at 08:40:44AM +0100, Alexandre DERUMIER wrote: 
> Also, 
> 
> not related, but I have noticed than when container don't start, 
> 
> the cgroup is not removed 
> 
> /sys/fs/cgroup/*/lxc/<ctid> 
> 
> and when you start the ct again, a new cgroup is created with a suffix 
> 
> /sys/fs/cgroup/*/lxc/<citd>-1 
> 
> 
> That mean than if you fix the mac address, then start the CT again, it's working, 
> 
> but dynamic cpus,memory changes don't work anymore, because we use the wrong cgroup (we always use /sys/fs/cgroup/*/lxc/<ctid>) 
> 
> I don't known if they are a clean way to remove the cgroup if ct crash ? (systemd ?) 

lxc is supposed to do the cleanup. There are some cases where that 
doesn't seems to fail, so if you can narrow it down to a simple config 
where this happens please report a bug. Either on our bugzilla, or if 
you can narrow it down to an even smaller /ver/lib/lxc/*/config directly 
on the lxc issue tracker. 
I've been meaning to track these cases down and take a closer look, but 
usually when I run into that I'm busy tracking something else down and 
then forget to reproduce the cleanup bug afterwards... :-\ 

If you don't have this in your shell-history yet, here's a cleanup to 
copy-paste (the `-depth` part is the key since cgroups need to be 
removed starting from the inner-most directory): 
# find /sys/fs/cgroup/*/lxc/<insert vmid here>* -depth -type d -print -delete 

As for systemd - the cgroup's not really part of one of its services, so 
it doesn't consider that its job - although... it's possible that it 
works with the new lxc.monitor/lxc.payload layout - but for that we'd 
have to adapt quite a bit of our code. 

Another option might be doing this in our ExecStopPost hook, actually. 
(But ideally we do first figure out why it fails in the first place and 
try to fix the root cause.)