[pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support
Fiona Ebner
f.ebner at proxmox.com
Tue Dec 20 11:26:32 CET 2022
Am 20.12.22 um 10:50 schrieb DERUMIER, Alexandre:
>>
>>> + my $blocksize = ($MAX_MEM - $static_memory) / 32000;>>> + #round next power of 2
>>> + $blocksize = 2**(int(log($blocksize)/log(2))+1);
>>
>> What if log($blocksize)/log(2) is exactly an integer? Do we still
>> want
>> to add 1 then? If not, please use ceil(...) instead of int(...)+1.
>> Well,
>> I guess it can't happen in practice by what values are possible for
>> $MAX_MEM and $static_memory, but still.
>>
> The math seem to be good to have power of 2 as blocksize
>
> log(1)/log(2):0 blocksize:2
> log(2)/log(2):1 blocksize:4
Yes, but here you'd use 4, when 2 would also be good. If that doesn't
matter, feel free to keep it as-is; I was just wondering. It's likely
not even relevant in practice, because with the values $MAX_MEM and
$static_memory can take, I'm not sure we can even get integers for the
argument to the first log()?
> with a ceil, we don't have block begin to 2
>
> log(1)/log(2):0 blocksize:1
> log(2)/log(2):1 blocksize:2
Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater than
1? And if it could get smaller than 1, we also might have issues with
the int()+1 approach, because the result of the first log() will become
negative.
To be on the safe side we could just move the minimum check up:
my $blocksize = ($MAX_MEM - $static_memory) / 32000;
$blocksize = 2 if $blocksize < 2;
$blocksize = 2**(ceil(log($blocksize)/log(2)));
>>> + #2MB is the minimum to be aligned with THP
>>> + $blocksize = 2 if $blocksize < 2;
>>
>> Nit: $blocksize is at least 2**1 after the previous caluclation, so
>> this
>> isn't really needed.
> yes,indeed.
>
>>
>>> + my $mem_device = "virtio-mem-pci,block-
>>> size=${blocksize}M,requested-size=${node_mem}M,id=$id,memdev=mem-
>>> $id,node=$i$pciaddr";
>>> + $mem_device .= ",prealloc=on" if $conf->{hugepages};
>>
>> So prealloc=on for the device, but not prealloc=yes for the object
>> below[0]. Would you mind explaining it to me? I just found the part
>> mentioned for v7.0 here
>>
> oh, yes, sorry.
> Just look here for details:
> https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg00902.html
> "A common use case for hugetlb will be using "reserve=off,prealloc=off"
> for
> the memory backend and "prealloc=on" for the virtio-mem device. This
> way, no huge pages will be reserved for the process, but we can recover
> if there are no actual huge pages when plugging memory. Libvirt is
> already prepared for this.
> "
>
Thanks! Could you please add it to the commit message?
>>> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
>>> index a18b974..0187c74 100644
>>> --- a/PVE/QemuServer/PCI.pm
>>> +++ b/PVE/QemuServer/PCI.pm
>>> @@ -249,6 +249,14 @@ sub get_pci_addr_map {
>>> 'scsihw2' => { bus => 4, addr => 1 },
>>> 'scsihw3' => { bus => 4, addr => 2 },
>>> 'scsihw4' => { bus => 4, addr => 3 },
>>> + 'virtiomem0' => { bus => 4, addr => 4 },
>>> + 'virtiomem1' => { bus => 4, addr => 5 },
>>> + 'virtiomem2' => { bus => 4, addr => 6 },
>>> + 'virtiomem3' => { bus => 4, addr => 7 },
>>> + 'virtiomem4' => { bus => 4, addr => 8 },
>>> + 'virtiomem5' => { bus => 4, addr => 9 },
>>> + 'virtiomem6' => { bus => 4, addr => 10 },
>>> + 'virtiomem7' => { bus => 4, addr => 11 },
>>
>> What if $conf->{sockets} > 8? Maybe mention the limitation in the
>> description of the 'virtio' property in the 'memory' string. Is the
>> plan
>> to just add on more virtiomem PCI devices in the future?
>>
> Well, do we really support more than 8 sockets ?
> I'm not sure than current numa implementation is working with more than
> 8 socket, and in real world, I never have seen servers with more than 8
> sockets. (I think super-expensive hardware with 16 - 32 sockets is the
> max)
> AFAIK, The main usecase of sockets/numa node, is for auto
> numabalancing, to try to map the qemu numa nodes to physical numa
> nodes.
>
> If needed, we could a dedicated pci bridge (like for virtioscsi), with
> 32 devices.
> and use something like virtiomem0 = addr=1 , virtiomem1=addr=2
>
>
Okay, makes sense. You never know what the future brings, but if it'll
take some time until we need to add more (and not that much more) I
guess it's fine.
More information about the pve-devel
mailing list