[pve-devel] [PATCH qemu] add pve-api-updates trigger

Mon Aug 12 14:27:55 CEST 2019

Am 8/12/19 um 12:06 PM schrieb Dominik Csapak:
> On 8/12/19 11:43 AM, Thomas Lamprecht wrote:
>> Am 8/12/19 um 10:37 AM schrieb Dominik Csapak:
>>> we want to notify the api that there is a new qemu-binary, since
>>> the version will be cached in qemu-server and instead of
>>> checking the version every time, just restart pveproxy/pvedaemon
>>> whenever there is a qemu update
>>>
>>> this fixes a rare theoretical issue when only updating qemu, that
>>> the pvedaemon starts a vm with a new version but with the
>>> defaults of an old version because of the version cache,
>>> breaking live migration
>>>
>>> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
>>> ---
>>> i do not know if that issue was ever triggered, but it seems
>>> very unlikely, so this is just to be safe
>>>
>>> the other alternative, either dont cache the version, or caching
>>> and checking the file timestamp would also work, but this is the 'cheaper'
>>> solution overall, since we do not update pve-qemu-kvm that often
>>
>> This also triggers a full HA LRM/CRM restart, not really cheap either.
>>
>> Is the LRM with the direct access to the API module (in a always newly
>> forked worker) also affected by the caching "issue" from a call to the
>> "QemuServer::kvm_user_version" method?
> 
> afaics yes, since we do a 'require and import' of the QemuServer package
> in the BEGIN block which does not get rexecuted after forking there,
> am i right?

Yeah, you're right.

>>
>> Else I'd either just:
>>
>> * restart pve- proxy/daemon "manually" in the configure step
> 
> is not solving the problem for lrm (if affected)
> 
>> * improve the caching detection by doing your proposed "/usr/sbin/kvm"
>>    stat call
> 
> possible and ok for me
> 
>> * removing caching entirely
> 
> i think we should not do that

As said, measure and then decide with some facts as a base..

> 
>>
>> Maybe you could benchmark if the removal of the caching is really a big
>> performance "hit", I'd guess so as fork+exec is not too cheap..
>> If it's not just a few percent points I'd to the stat thing, I really
>> want to avoid LRM/CRM restarts if possible.
> 
> i proposed this change since the frequency of which we update qemu is
> rather low (~once per month) which mostly is shipped together with other
> updates which trigger the api-update anyway (e.g. pve-manager,qemu-server,pve-storage,etc.)
> instead of having a penalty on every vm start...

It would be great to first find out how big that penality is in the
first place ;) While I'd also guess that it may be in the uncofortable
range it's just easier to decide from actual values. If it adds < 1ms
I'd see it as just fine.

> 
> but i am ok with the stat approach
> 

IMO it's more correct, caching normally needs do be done in a way where it
returns the correct value, sooner or later, this can only be omitted if one
knows for sure that the cache get invalidate periodically anyway. As this
is not the case here we either need to fix caching or omit it, as it's a
bug else, and one that should not be solved by restarting the whole services.

Ah alternative could also be to add a cache invalidation method which sends
a signal (e.g., USR1 or the like) to the child processes calling a (new)
method which sets the kvm_user_version (and possible others?) to undef/0,
or whatever the value for "I'm not cached" is.. But that needs probably some
more thought about general design (to include other users, if any) and may
be over-engineered. So I'd go for the stat approach.

Another alternative could be to have a version file somewhere in /run (tmpfs,
so memory, so fast) which gets updated by:
* pve-qemu packages postinst
* the kvm_user_version call if it does not exists as fallback

just as an idea, but please no restart of >= 5 services just for this..