[pve-devel] [RFC common v3 10/34] env: add module with helpers to run a Perl subroutine in a user namespace

Wed Nov 13 11:08:47 CET 2024

On 12.11.24 3:20 PM, Fabian Grünbichler wrote:
> On November 7, 2024 5:51 pm, Fiona Ebner wrote:
>> The first use case is running the container backup subroutine for
>> +package PVE::Env;
> 
> I agree with Thomas that this name might be a bit too generic ;)
> 
> I also wonder - since this seems to be only used in pve-container, and
> it really mostly makes sense in that context, wouldn't it be better off
> there? or do we expect other areas where we need userns handling?
> (granted, some of the comments below would require other changes to
> pve-common anyway ;))
> 

The only other use-case I'm aware of is where I got the code from
originally, i.e. pve-builpkg. Sure, I can move it to pve-container to
start out.

>> +
>> +use strict;
>> +use warnings;
>> +
>> +use Fcntl qw(O_WRONLY);
>> +use POSIX qw(EINTR);
>> +use Socket;
>> +
>> +require qw(syscall.ph);
> 
> PVE::Syscall already does this, and has the following:
> 
> BEGIN {
>     die "syscall.ph can only be required once!\n" if $INC{'syscall.ph'};
>     require("syscall.ph");
> 
> don't those two clash? I think those syscall related parts should
> probably move there?
> 

Hm, never experienced this error, but sure, will move the relevant parts.

>> +
>> +use constant {CLONE_NEWNS   => 0x00020000,
>> +              CLONE_NEWUSER => 0x10000000};
>> +
>> +sub unshare($) {
>> +    my ($flags) = @_;
>> +    return 0 == syscall(272, $flags);
>> +}
> 
> this is PVE::Tools::unshare, maybe the latter should move here?
> 

I'll just re-use the one from Tools then when moving to pve-container.

>> +
>> +sub __set_id_map($$$) {
>> +    my ($pid, $what, $value) = @_;
>> +    sysopen(my $fd, "/proc/$pid/${what}_map", O_WRONLY)
>> +	or die "failed to open child process' ${what}_map\n";
>> +    my $rc = syswrite($fd, $value);
>> +    if (!$rc || $rc != length($value)) {
>> +	die "failed to set sub$what: $!\n";
>> +    }
>> +    close($fd);
>> +}
>> +
>> +sub set_id_map($$) {
>> +    my ($pid, $id_map) = @_;
>> +
>> +    my $gid_map = '';
>> +    my $uid_map = '';
>> +
>> +    for my $map ($id_map->@*) {
>> +	my ($type, $ct, $host, $length) = $map->@*;
>> +
>> +	$gid_map .= "$ct $host $length\n" if $type eq 'g';
>> +	$uid_map .= "$ct $host $length\n" if $type eq 'u';
>> +    }
>> +
>> +    __set_id_map($pid, 'gid', $gid_map) if $gid_map;
>> +    __set_id_map($pid, 'uid', $uid_map) if $uid_map;
>> +}
> 
> do we gain a lot here from not just using newuidmap/newgidmap?
> 

I didn't know those commands existed :P Running commands seems more
wasteful then just writing a file, but will change if you insist.

>> +sub forked(&%) {
> 
> this seems very similar to the already existing PVE::Tools::run_fork /
> run_fork_with_timeout helpers.. any reason we can't extend those with
> `afterfork` support and use them?
> 

Haven't looked into it, but will do!

>> +
>> +sub run_in_userns(&;$) {
>> +    my ($code, $id_map) = @_;
>> +    socketpair(my $sp, my $sc, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
>> +	or die "socketpair: $!\n";
>> +    forked(sub {
>> +	close($sp);
>> +	unshare(CLONE_NEWUSER|CLONE_NEWNS) or die "unshare(NEWUSER|NEWNS): $!\n";
> 
> I guess we can't set our "own" maps here for lack of capabilities and
> avoid the whole afterfork thing entirely? at least I couldn't get it to
> work ;)
> 

AFAIU, yes.