[pve-devel] [RFC PATCH pve-cluster] fix #5728: pmxcfs: allow bigger writes than 4k for fuse
Dominik Csapak
d.csapak at proxmox.com
Thu Sep 19 14:45:08 CEST 2024
On 9/19/24 14:01, Thomas Lamprecht wrote:
> Am 19/09/2024 um 11:52 schrieb Dominik Csapak:
>> by default libfuse2 limits writes to 4k size, which means that on writes
>> bigger than that, we do a whole write cycle for each 4k block that comes
>> in. To avoid that, add the option 'big_writes' to allow writes bigger
>> than 4k at once.
>>
>> This should improve pmxcfs performance for situations where we often
>> write large files (e.g. big ha status) and maybe reduce writes to disk.
>
> Should? Something like before/after for benchmark numbers, flamegraphs
> would be really good to have, without those it's rather hard to discuss
> this, and I'd like to avoid having to do those, or check the inner workings
> of the affected fuse userspace/kernel code paths here myself.
well I mean the code change is relatively small and the result is rather clear:
in the current case we have the following calls from pmxcfs (shortened for e-mail)
when writing a single 128k block:
(dd if=... of=/etc/pve/test bs=128k count=1)
---
[main] debug: enter cfs_fuse_write /test 4096 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88ED (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 4096
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 4096 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 4096 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE88EE (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 8192
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
[main] debug: enter cfs_fuse_write /test 4096 8192 (pmxcfs.c:355:cfs_fuse_write)
---
and so on until we reach
---
[main] debug: enter cfs_fuse_write /test 4096 126976 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x64319c3e0bc0 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 4096 126976 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE890C (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (4096) (pmxcfs.c:368:cfs_fuse_write)
---
with my changes it results in:
---
[main] debug: enter cfs_fuse_write /test 131072 0 (pmxcfs.c:355:cfs_fuse_write)
[main] debug: find_plug start test (pmxcfs.c:103:find_plug)
[main] debug: cfs_plug_base_lookup_plug test (cfs-plug.c:52:cfs_plug_base_lookup_plug)
[main] debug: cfs_plug_base_lookup_plug name = test new path = (null)
(cfs-plug.c:59:cfs_plug_base_lookup_plug)
[main] debug: find_plug end test = 0x62f47945f360 (test) (pmxcfs.c:110:find_plug)
[main] debug: enter cfs_plug_base_write test 131072 0 (cfs-plug.c:294:cfs_plug_base_write)
[main] debug: memdb_pwrite . test 0000000000DE88EC 0000000000DE8AB8 (memdb.c:850:memdb_pwrite)
[database] debug: enter backend_write_inode 0000000000DE88EC 'test', size 131072
(database.c:157:backend_write_inode)
[database] debug: enter backend_write_inode 0000000000000000 '__version__', size 0
(database.c:157:backend_write_inode)
[main] debug: leave cfs_fuse_write /test (131072) (pmxcfs.c:368:cfs_fuse_write)
---
so a factor of 32 less calls to cfs_fuse_write (including memdb_pwrite)
>
>>
>> If we'd change to libfuse3, this would be a non-issue, since that option
>> got removed and is the default there.
>
> I'd prefer that. At least if done with the future PVE 9.0, as I do not think
> it's a good idea in the middle of a stable release cycle.
why not this change now, and the rewrite to libfuse3 later? that way we can
have some improvements now too...
>
> FWIW, some Debian Devs also want to push the fuse 2 to 3 migration forward
> [0]. So, sooner or later we have to switch over pmxcfs anyway, and there's
> already a POC from Fabian [1].
>
> [0]: https://lists.debian.org/debian-devel/2024/08/msg00177.html
> [1]: https://lore.proxmox.com/pve-devel/20220715074742.3387301-1-f.gruenbichler@proxmox.com/
>
>
sure I can look at that when we're nearing 9.0
More information about the pve-devel
mailing list