[pve-devel] applied: [RFC pve-qemu] disable jemalloc

DERUMIER, Alexandre Alexandre.DERUMIER at groupe-cyllene.com
Mon Mar 13 08:17:54 CET 2023


I have done tests writing a small C program calling  malloc_trim(0),

and it don't break/segfault with LD_PRELOAD tcmalloc.

I don't think that tcmalloc override this specific gblic function, but
maybe malloc_trim is triming empty glibc malloc memory.


I have done 2 days of continous fio benchmark in a vm with tcmalloc
preload, I don't have any problem.

But the speed is really night & days, with iodepth=64  4k randread,

It's something like average 85-90k iops (with some spike at 120k)  vs
50kiops. (with spike to 60kiops).


If it's ok for you, I'll send a patch with something like:

vmid.conf
---------
memory_allocator: glibc|tcmalloc


and simply add the LD_PRELOAD in systemd unit when vm is starting

?


Le samedi 11 mars 2023 à 13:14 +0000, DERUMIER, Alexandre a écrit :
> Le samedi 11 mars 2023 à 10:01 +0100, Thomas Lamprecht a écrit :
> > Hi,
> > 
> > Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> > > I'm currently benching again qemu with librbd and memory
> > > allocator.
> > > 
> > > 
> > > It's seem that they are still performance problem with default
> > > glibc
> > > allocator, around 20-25% less iops and bigger latency.
> > 
> > Are those numbers compared to jemalloc or tcmalloc?
> > 
> oh sorry,
> 
> tcmalloc.  (I'm gotting almost same result with jmalloc, maybe a
> little
> bit more less/unstable)
> 
> 
> > Also, a key problem with allocator tuning is that its heavily
> > dependent on
> > the workload of each specific library (i.e., not only QEMU itself
> > but
> > also
> > the specific block backend (library).
> > 
> > > 
> yes, it should help librbd mainly. I don't think help other storage.
> 
> 
> 
> > > From my bench, i'm around 60k iops vs 80-90k iops with 4k
> > > randread.
> > > 
> > > Redhat have also notice it
> > > 
> > > 
> > > I known than jemalloc was buggy with rust lib  && pbs block
> > > driver,
> > > but did you have evaluated tcmalloc ?
> > 
> > Yes, for PBS once - was way worse in how it generally worked than
> > either
> > jemalloc and default glibc IIRC, but I don't think I checked for
> > latency,
> > as then we tracked down freed memory that the allocator did not
> > give
> > back
> > to the system to how they internally try to keep a pool of
> > available
> > memory
> > around.
> > 
> I known than jemalloc could have strange effect on memory. (ceph was
> using jemalloc some year ago with this kind of side effect, and they
> have migrate to tcmalloc later)
> 
> 
> > So for latency it might be a win, but IMO not to sure if the other
> > effects
> > it has are worth that.
> > 
> > > 
> yes, latency is my main objective, mainly for ceph synchronous write
> with low iodepth,they are pretty slow, so 20% improvement is really
> big.
> 
> > > Note that it's possible to load it dynamically with LD_PRELOAD,
> > > so maybe could we add an option in vm config to enable it ? 
> > > 
> 
> > I'm not 100% sure if QEMU copes well with preloading it via the
> > dynlinker
> > as is, or if we need to hard-disable malloc_trim support for it
> > then.
> > As currently with the "system" allocator (glibc) there's
> > malloc_trim
> > called
> > (semi-) periodically via call_rcu_thread - and at least qemu's
> > meson
> > build
> > system config disables malloc_trim for tcmalloc or jemalloc.
> > 
> > 
> > Or did you already test this directly on QEMU, not just rbd bench?
> > As
> > then
> > I'd be open to add some tuning config with a allocator sub-property
> > in there
> > to our CFGs.
> > 
> 
> I have tried directly in qemu, with 
> 
> "
>     my $run_qemu = sub {
>         PVE::Tools::run_fork sub {
> 
>             $ENV{LD_PRELOAD} = "/usr/lib/x86_64-linux-
> gnu/libtcmalloc.so.4" ;
> 
>             PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM
> $vmid", %systemd_properties);
> 
> "
> 
> I really don't known about malloc_trim,
> the initial discussion about is here,
> https://patchwork.ozlabs.org/project/qemu-devel/patch/1510899814-19372-1-git-send-email-yang.zhong@intel.com/
> and indeed, it's disabled when building with tcmalloc/jemalloc  , but
> I
> don't known about dynamic loading.
> 
> But I don't have any crash or segfault.
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



More information about the pve-devel mailing list