[pve-devel] [PATCH v2 qemu-server 8/9] memory: add virtio-mem support

Wed Jan 25 11:52:09 CET 2023

Am 25.01.23 um 11:28 schrieb DERUMIER, Alexandre:
>> Sure, doing it in parallel is perfectly fine. I'm just thinking that
>> switching gears (too early) and redirecting the request might not be
>> ideal. You also issue an additional qom-set to go back to
>> $virtiomem->{current} * 1024 *1024 if a request didn't make progress
>> in
>> time. But to be sure that request worked, we'd also need to monitor
>> it
>> ;) I think issuing that request is fine as-is, but if there is a
>> "hanging" device, we really can't do much. And I do think the user
>> should be informed via an appropriate error if there is a problematic
>> device.
>>
>> Maybe we can use 10 seconds instead of 5 (2-3 seconds already sounds
>> too
>> close to 5 IMHO), so that we have a good margin, and just die instead
>> of
>> trying to redirect the request to another device. After issuing the
>> reset request and writing our best guess of what the memory is to the
>> config of course.
>>
> I forgot to say, than it don't timeout 5s after the qom-set,
> but timeout after 5s if no memory change is detected by qom-get. 
> I'm reseting the retry counter if a change is detected.
> (so 5s is really big, in real, when it's blocking for 1s, it's really
> blocking)

I saw that and thought the "It can take 2-3second on unplug on bigger
setup" means that it can take this long for a change to happen. Since
that's not the case, 5 seconds can be fine of course :)

>> If it really is an issue in practice that certain devices often take
>> too
>> long for whatever reason, we can still add the redirect logic. But
>> starting out, I feel like it's not worth the additional complexity.
>>
> The real only reason will be on unplug, if memory block are unmovable
> (kernel reserved) or with big fragmentation, no available block.(with
> 4MB granurality it's difficult to reach, but with bigger maxmem and
> bigger block, we have more chance to trigger it.  (with 1GB
> hugepage,it's more easy to trigger it too I think)
>
> But if you want something more simple,
> 
> I can something like before, split memory by number of sockets,
> and if we have an error on 1 socket, don't try to redispatch again
> remaining block of this socket on other nodes.
Hope it's not too much work. I do think we should wait for some user
demand before adding the redispatch feature (and having it as a separate
patch helps to review and for git history). But feel free to include it
if you already experienced the above-mentioned issue a few times ;)