[pve-devel] [PATCH pve-cluster 1/1] notify: add common_template_data

Fri Mar 28 11:04:09 CET 2025

On  2025-03-28 10:38, Thomas Lamprecht wrote:
> Am 28.03.25 um 09:28 schrieb Lukas Wagner:
>> We of course can cache the FQDN, but realistically speaking, this is only called once per
>> notification being sent, thus any real-world performance impact is absolutely tiny.
> 
> Not so sure about that in general, e.g. sending out notifications could
> correlate with an overloaded system, and for really overloaded systems
> things that are normally cheap suddenly ain't – e.g., on low memory
> situations even a doing a plain fork+exec of a tiny binary can hang for
> a long time then, socket operations like our helper does are definitively
> less problematic (I think, as I did not evaluate it [0] and that's something
> one can less easily experience directly compared to the former, where even
> starting a new basic dash shell on such overloaded system can need minutes).

Yes, my 'impact is tiny' was mostly based on already using PVE::Tools::get_fqdn. For the
old fork+exec version you are absolutely right, this could take very long on a
system at its limits.

> 
> And as the notification system now also handles things like HA events it's
> definitively part of the more critical systems which _can_ justify some extra
> scrutiny. That said, switching to the get_fqdn method makes this indeed quite
> cheap to get [0], so I'm fine with not doing any caching here for now, but
> let's not underestimate the impact of such things too much, especially for
> anything in critical chains that can be important in critical (load) situation
> (as general strategy for all, as I'm really not thinking about you here, and
> it certainly is a balance).

That makes a lot of sense. Thanks for the detailed explanation, highly appreciated.
Actually I will add caching right away, it a pretty trivial change anyways.

> 
> [0]: FWIW, I just did a quick evaluation of querying the fqdn 100 000 times
> with the socket variant and the hostname one, this was done on a very healthy
> system though, I'd expect that the fork+exec one degrades a lot worse with
> higher cpu/memory pressure. Test and result:
> 
> # perl -wE 'use PVE::Tools; use Time::HiRes qw(gettimeofday tv_interval); my $t0 = [gettimeofday]; for(my $i = 0; $i <= 100_000; $i++) { my $fqdn = PVE::Tools::get_fqdn("nina"); } my $elapsed = tv_interval ( $t0, [gettimeofday]); say "elapsed (s): ". $elapsed;' 
> elapsed (s): 0.436712
> 
> Same with 1 million runs gets me 4.368217 s, so seems to scale quite linearly.
> 
> # perl -wE 'use Time::HiRes qw(gettimeofday tv_interval); my $t0 = [gettimeofday]; for(my $i = 0; $i <= 100_000; $i++) { my $fqdn = `hostname -f`; } my $elapsed = tv_interval ( $t0, [gettimeofday]); say "elapsed: ". $elapsed;'      
> elapsed (s): 82.484177
> 
> Same with 1 million runs gets me 577.653117 s, so not fully linearly, but
> in any way about 188x and 132x times slower, respectively.

Good to know, thanks for backing that up with some concrete data! :)

-- 
- Lukas