[pve-devel] [PATCH manager 1/1] pve6to7: check for containers not supporting pure cgroupv2
Thomas Lamprecht
t.lamprecht at proxmox.com
Sat Jul 3 00:32:42 CEST 2021
On 02.07.21 20:21, Stoiko Ivanov wrote:
> Ordered as much as possible to exit early, still might take quite some
> time on systems with many containers (which do support cgroupv2).
The early abort once one is found seems like a good idea in general, but
I still do not really like that happening unconditionally, this could get hidden
behind and opt-in CLI option flag - with a single skip log if not taken.
An admin with only bleeding-edge Arch Linux container then could then just
snicker over software from the stone age and just continue ;)
Also, you're currently missing some cheap optimizations like skipping devuan/alpine
config ostypes early, doing needless work for them.
>
> needs a versioned bump on pve-container
I'd rather prefer copying the required helpers over, as this is mainly required
for stable-6, and it would make it way easier than having versioned dependency
handling for just this in two releases.
>
> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> ---
> PVE/CLI/pve6to7.pm | 68 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> diff --git a/PVE/CLI/pve6to7.pm b/PVE/CLI/pve6to7.pm
> index 60edac11..3d7c67bd 100644
> --- a/PVE/CLI/pve6to7.pm
> +++ b/PVE/CLI/pve6to7.pm
> @@ -23,6 +23,9 @@ use PVE::Tools qw(run_command split_list);
> use PVE::QemuConfig;
> use PVE::QemuServer;
> use PVE::VZDump::Common;
> +use PVE::LXC;
> +use PVE::LXC::Config;
> +use PVE::LXC::Setup;
>
> use Term::ANSIColor;
>
> @@ -890,6 +893,70 @@ sub check_storage_content {
> log_pass("no problems found");
> }
> }
> +sub check_containers_cgroup_compat {
> +
> + my $kernel_cli = PVE::Tools::file_get_contents('/proc/cmdline');
> + if ($kernel_cli =~ /systemd.unified_cgroup_hierarchy=0/){
> + log_skip("System explicitly configured for legacy hybrid cgroup hierarchy.");
> + return;
> + }
> +
> + my $cts = eval { PVE::API2::LXC->vmlist({ node => $nodename }) };
> + if ($@) {
> + log_warn("Failed to retrieve information about this node's CTs - $@");
> + return;
> + }
> +
> + if (!defined($cts) || !scalar(@$cts)) {
> + log_skip("No containers on node detected.");
> + return;
> + }
> + my @running_vmids = map { $_->{status} eq 'running' ? $_->{vmid} : () } @$cts;
> + my @offline_vmids = map { $_->{status} ne 'running' ? $_->{vmid} : () } @$cts;
nit, but why not grep? Would make it a bit more explicit here, avoiding that any
innocent reader thinks map makes this not work and then spent time getting proved
otherwise ;-)
> +
> + my $legacy_container=0;
> +
> + for my $ctid (@running_vmids) {
> + my $pid = eval { PVE::LXC::find_lxc_pid($ctid) };
> + if (my $err = $@) {
> + log_warn("Failed to get PID for running CT $ctid - $err");
> + next;
> + }
> + my $rootdir = "/proc/$pid/root";
> + my $conf = PVE::LXC::Config->load_config($ctid);
> + my $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> + if (!$lxc_setup->unified_cgroupv2_support()) {
> + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
Maybe start with "Found at least one CT ($ctid) which does not supp...", makes the
nature of the check slightly less subtle IMO.
> + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
> + "skipping further checks");
> + return;
> + }
> + }
> +
> + my $storage_cfg = PVE::Storage::config();
> + for my $ctid (@offline_vmids) {
> + my ($conf, $rootdir, $lxc_setup);
> + eval {
> + $conf = PVE::LXC::Config->load_config($ctid);
> + $rootdir = PVE::LXC::mount_all($ctid, $storage_cfg, $conf);
> + $lxc_setup = PVE::LXC::Setup->new($conf, $rootdir);
> + };
> + if (my $err = $@) {
> + log_warn("Failed to load config and mount CT $ctid - $err");
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + next;
> + }
> + if (!$lxc_setup->unified_cgroupv2_support()) {
> + log_warn("CT $ctid does not support running in a unified cgroup v2 layout - either " .
> + "upgrade it or set systemd.unified_cgroup_hierarchy=0 in the kernel cmdline - " .
> + "skipping further checks");
maybe factor out the common part of that specific log message
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + last;
> + }
> +
> + eval { PVE::LXC::umount_all($ctid, $storage_cfg, $conf) };
> + }
> +};
>
> sub check_misc {
> print_header("MISCELLANEOUS CHECKS");
> @@ -986,6 +1053,7 @@ sub check_misc {
> check_custom_pool_roles();
> check_description_lengths();
> check_storage_content();
> + check_containers_cgroup_compat();
> }
>
> __PACKAGE__->register_method ({
>
More information about the pve-devel
mailing list