[pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support

Fiona Ebner f.ebner at proxmox.com
Tue Dec 20 11:26:32 CET 2022


Am 20.12.22 um 10:50 schrieb DERUMIER, Alexandre:
>>
>>> +    my $blocksize = ($MAX_MEM - $static_memory) / 32000;>>> +    #round next power of 2
>>> +    $blocksize = 2**(int(log($blocksize)/log(2))+1);
>>
>> What if log($blocksize)/log(2) is exactly an integer? Do we still
>> want
>> to add 1 then? If not, please use ceil(...) instead of int(...)+1.
>> Well,
>> I guess it can't happen in practice by what values are possible for
>> $MAX_MEM and $static_memory, but still.
>>
> The math seem to be good to have power of 2 as blocksize
> 
> log(1)/log(2):0  blocksize:2
> log(2)/log(2):1  blocksize:4

Yes, but here you'd use 4, when 2 would also be good. If that doesn't
matter, feel free to keep it as-is; I was just wondering. It's likely
not even relevant in practice, because with the values $MAX_MEM and
$static_memory can take, I'm not sure we can even get integers for the
argument to the first log()?

> with a ceil, we don't have block begin to 2
> 
> log(1)/log(2):0  blocksize:1
> log(2)/log(2):1  blocksize:2

Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater than
1? And if it could get smaller than 1, we also might have issues with
the int()+1 approach, because the result of the first log() will become
negative.

To be on the safe side we could just move the minimum check up:

my $blocksize = ($MAX_MEM - $static_memory) / 32000;
$blocksize = 2 if $blocksize < 2;
$blocksize = 2**(ceil(log($blocksize)/log(2)));

>>> +    #2MB is the minimum to be aligned with THP
>>> +    $blocksize = 2 if $blocksize < 2;
>>
>> Nit: $blocksize is at least 2**1 after the previous caluclation, so
>> this
>> isn't really needed.
> yes,indeed.
> 
>>
>>> +           my $mem_device = "virtio-mem-pci,block-
>>> size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-
>>> $id,node=$i$pciaddr";
>>> +           $mem_device .= ",prealloc=on" if $conf->{hugepages};
>>
>> So prealloc=on for the device, but not prealloc=yes for the object
>> below[0]. Would you mind explaining it to me? I just found the part
>> mentioned for v7.0 here
>>
> oh, yes, sorry.
> Just look here for details:
> https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00902.html
> "A common use case for hugetlb will be using "reserve=off,prealloc=off"
> for
> the memory backend and "prealloc=on" for the virtio-mem device. This
> way, no huge pages will be reserved for the process, but we can recover
> if there are no actual huge pages when plugging memory. Libvirt is
> already prepared for this.
> "
> 

Thanks! Could you please add it to the commit message?

>>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
>>> index a18b974..0187c74 100644
>>> --- a/PVE/QemuServer/PCI.pm
>>> +++ b/PVE/QemuServer/PCI.pm
>>> @@ -249,6 +249,14 @@ sub get_pci_addr_map {
>>>         'scsihw2' => { bus => 4, addr => 1 },
>>>         'scsihw3' => { bus => 4, addr => 2 },
>>>         'scsihw4' => { bus => 4, addr => 3 },
>>> +       'virtiomem0' => { bus => 4, addr => 4 },
>>> +       'virtiomem1' => { bus => 4, addr => 5 },
>>> +       'virtiomem2' => { bus => 4, addr => 6 },
>>> +       'virtiomem3' => { bus => 4, addr => 7 },
>>> +       'virtiomem4' => { bus => 4, addr => 8 },
>>> +       'virtiomem5' => { bus => 4, addr => 9 },
>>> +       'virtiomem6' => { bus => 4, addr => 10 },
>>> +       'virtiomem7' => { bus => 4, addr => 11 },
>>
>> What if $conf->{sockets} > 8? Maybe mention the limitation in the
>> description of the 'virtio' property in the 'memory' string. Is the
>> plan
>> to just add on more virtiomem PCI devices in the future?
>>
> Well, do we really support more than 8 sockets ? 
> I'm not sure than current numa implementation is working with more than
> 8 socket, and in real world, I never have seen servers with more than 8
> sockets. (I think super-expensive hardware with 16 - 32 sockets is the
> max)
> AFAIK, The main usecase of sockets/numa node, is for auto
> numabalancing, to try to map the qemu numa nodes to physical numa
> nodes.
> 
> If needed, we could a dedicated pci bridge (like for virtioscsi), with
> 32 devices.
> and use something like virtiomem0 = addr=1 , virtiomem1=addr=2
> 
> 

Okay, makes sense. You never know what the future brings, but if it'll
take some time until we need to add more (and not that much more) I
guess it's fine.





More information about the pve-devel mailing list