[pve-devel] [RFC qemu 1/1] block/rbd: add @keyring-file option to BlockdevOptionsRbd

Mon May 12 16:36:18 CEST 2025

Am 12.05.25 um 15:39 schrieb DERUMIER, Alexandre:
> Am 12.05.25 um 12:57 schrieb DERUMIER, Alexandre:
>> for blockdev, do we still use a ceph config file in /var/run for
>> potential others rbd client options ?
> 
>>> Not currently, but we can add that later if we consider it worth it.
>>> We
>>> would need to merge with the storage's already existing ceph.conf and
>>> not only write the new options. For now, users can adapt their
>>> storage's
>>> ceph.conf as desired.
> 
> they still are this rbd_cache_policy for efidisk to fix
> https://bugzilla.proxmox.com/show_bug.cgi?id=3329
> 
> 
> # SPI flash does lots of read-modify-write OPs, without writeback this
> gets really slow #3329
>       if ($path =~ m/^rbd:/) {
>           $var_drive_str .= ',cache=writeback';
>           $path .= ':rbd_cache_policy=writeback'; # avoid write-around,
> we *need* to cache writes too
>       }
> 
> 
> 
> I'm not sure, but maybe it's fixed in qemu , the biggest problem was
> that every single byte write was push to the storage without any buffer
> (so it was pretty slow with rbd crush).
> but maybe it ok now with:
> https://github.com/qemu/qemu/commit/284a7ee2e290e0c9b8cd3ea6164d92386933054f
> 
> (I don't have tested it)

Good point!

Unfortunately, it's still very slow without the additional options in
current QEMU 9.2 (i.e. even after that commit).

I suppose this does require us to have a per-drive configuration already.

It's not ideal that qemu-server knows about storage-internal details
though and would need to re-write the Ceph config, I might abstract that
away by passing an additional $hints parameter or something (e.g.
'writeback-cache' => 1, for EFI disk).

We do have a similar situation (but with KRBD):
https://lore.proxmox.com/pve-devel/20241025111304.99680-1-f.weber@proxmox.com/

Replying to stuff from your other mail here too:

> They are interesting rbd client option that we could add later
> https://bugzilla.proxmox.com/show_bug.cgi?id=6290
> crush_location=host:myhost|datacenter:mydc
> read_from_replica=localize

Those can/should simply be set in the storage's ceph.conf, or do they
need to be different per-volume or per-VM?