[pve-devel] [PATCH manager] pvestatd: improve broadcast of node version-info

Fiona Ebner f.ebner at proxmox.com
Thu Feb 27 16:00:35 CET 2025


Am 27.02.25 um 15:52 schrieb Fabian Grünbichler:
> On February 27, 2025 9:59 am, Fiona Ebner wrote:
>> Am 26.02.25 um 17:02 schrieb Aaron Lauterer:
>>>
>>>
>>> On  2025-01-17  13:18, Fiona Ebner wrote:
>>>> Am 16.01.25 um 17:30 schrieb Aaron Lauterer:
>>>>> Until now, the pvestatd did broadcast the pve-manager version only once
>>>>> after startup of the service. But there are some situations, where the
>>>>> local pmxcfs (pve-cluster) restarts and loses that information.
>>>>> Basically everytime we restart the pmxcfs without restarting pvestatd
>>>>> too.
>>>>>
>>>>> For example, on a cluster join, or if the pmxcfs has been restarted
>>>>> manually.
>>>>>
>>>>> By additionally checking if the local kv-store of the pmxcfs has any
>>>>> version info for the node, we can decide if another broadcast is
>>>>> necessary.
>>>>> Therefore after the next run of pvestatd, we should have the full
>>>>> version info available again.
>>>>>
>>>>> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
>>>>> ---
>>>>> This patch is preparation to get reliable version infos as I am picking
>>>>> of the patch series of Folke to include more metrics into the RRD data
>>>>> and summary graphs. [0]
>>>>> This was a big blocker and now with the major version change coming up,
>>>>> we at least can assume the latest 8.x installed as part of the update to
>>>>> PVE 9.
>>>>> Therefore, we should get this in with PVE 8. Additional patches for PVE
>>>>> 8 will follow to make the transition smoother. But as mentioned, this
>>>>> here is one of the things that needs to work reliably, which is why I
>>>>> submit the patch already now.
>>>>
>>>> If we start relying more on this, we likely also want:
>>>> https://lore.proxmox.com/pve-devel/20221006125414.58279-1-
>>>> f.ebner at proxmox.com/
>>>
>>> Hmm, honestly, I might prefer having the last known version info still
>>> present. That would make it easier to determine if all cluster nodes are
>>> on at least a required version ;).
>>
>> That is an edge case where it might be useful, but I'd argue that in
>> general, it can be problematic to rely on stale information, especially
>> if you can't detect if it's stale or not. And IMHO, it's worth doing
>> properly here too, i.e. wait for the node to send its current version.
>> You already need to wait for nodes that were not online before.
> 
> we could make it detectable by including a timestamp? that way, if using
> stale information is (not) okay, that decision can be made by the
> consumer of the information, instead of only allowing either variant?

If it's broadcast only once then the timestamp doesn't help much? Or do
you mean also keeping track/checking when the node last joined the
quorum to decide?




More information about the pve-devel mailing list