[pve-devel] applied: [RFC pve-qemu] disable jemalloc

DERUMIER, Alexandre Alexandre.DERUMIER at groupe-cyllene.com
Sat Mar 11 14:14:30 CET 2023


Le samedi 11 mars 2023 à 10:01 +0100, Thomas Lamprecht a écrit :
> Hi,
> 
> Am 10/03/2023 um 19:05 schrieb DERUMIER, Alexandre:
> > I'm currently benching again qemu with librbd and memory allocator.
> > 
> > 
> > It's seem that they are still performance problem with default
> > glibc
> > allocator, around 20-25% less iops and bigger latency.
> 
> Are those numbers compared to jemalloc or tcmalloc?
> 
oh sorry,

tcmalloc.  (I'm gotting almost same result with jmalloc, maybe a little
bit more less/unstable)


> Also, a key problem with allocator tuning is that its heavily
> dependent on
> the workload of each specific library (i.e., not only QEMU itself but
> also
> the specific block backend (library).
> 
> > 
yes, it should help librbd mainly. I don't think help other storage.



> > From my bench, i'm around 60k iops vs 80-90k iops with 4k randread.
> > 
> > Redhat have also notice it
> > 
> > 
> > I known than jemalloc was buggy with rust lib  && pbs block driver,
> > but did you have evaluated tcmalloc ?
> 
> Yes, for PBS once - was way worse in how it generally worked than
> either
> jemalloc and default glibc IIRC, but I don't think I checked for
> latency,
> as then we tracked down freed memory that the allocator did not give
> back
> to the system to how they internally try to keep a pool of available
> memory
> around.
> 
I known than jemalloc could have strange effect on memory. (ceph was
using jemalloc some year ago with this kind of side effect, and they
have migrate to tcmalloc later)


> So for latency it might be a win, but IMO not to sure if the other
> effects
> it has are worth that.
> 
> > 
yes, latency is my main objective, mainly for ceph synchronous write
with low iodepth,they are pretty slow, so 20% improvement is really
big.

> > Note that it's possible to load it dynamically with LD_PRELOAD,
> > so maybe could we add an option in vm config to enable it ? 
> > 

> I'm not 100% sure if QEMU copes well with preloading it via the
> dynlinker
> as is, or if we need to hard-disable malloc_trim support for it then.
> As currently with the "system" allocator (glibc) there's malloc_trim
> called
> (semi-) periodically via call_rcu_thread - and at least qemu's meson
> build
> system config disables malloc_trim for tcmalloc or jemalloc.
> 
> 
> Or did you already test this directly on QEMU, not just rbd bench? As
> then
> I'd be open to add some tuning config with a allocator sub-property
> in there
> to our CFGs.
> 

I have tried directly in qemu, with 

"
    my $run_qemu = sub {
        PVE::Tools::run_fork sub {

            $ENV{LD_PRELOAD} = "/usr/lib/x86_64-linux-
gnu/libtcmalloc.so.4" ;

            PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM
$vmid", %systemd_properties);

"

I really don't known about malloc_trim,
the initial discussion about is here,
https://patchwork.ozlabs.org/project/qemu-devel/patch/1510899814-19372-1-git-send-email-yang.zhong@intel.com/
and indeed, it's disabled when building with tcmalloc/jemalloc  , but I
don't known about dynamic loading.

But I don't have any crash or segfault.




More information about the pve-devel mailing list