[pbs-devel] Authentication performance
Shannon Sterz
s.sterz at proxmox.com
Fri Dec 20 14:22:18 CET 2024
On Thu Dec 19, 2024 at 10:56 AM CET, Mark Schouten wrote:
> Hi,
>
> We upgraded to 3.3 yesterday, not much gain to notice with regards to
> the new version or the change in keying. It’s still (obvioulsy) pretty
> busy.
just be aware that the patch i linked to in my last mail has not been
packaged yet, so you wouldn't see the impact of that patch yet.
> However, I also tried to remove some datastores, which failed with
> timeouts. PBS even stopped authenticating (so probably just working) all
> together for about 10 seconds, which was an unpleasant surprise.
>
> So looking into that further, I noticed the following logging:
> Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET
> /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client
> [::ffff]:42104] Unable to acquire lock
> "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error
> 4)
> Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET
> /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client
> [::ffff]:42144] Unable to acquire lock
> "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error
> 4)
> Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET
> /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client
> [::ffff]:47286] Unable to acquire lock
> "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error
> 4)
> Dec 18 16:14:32 pbs005 proxmox-backup-proxy[39143]: GET
> /api2/json/admin/datastore/XXXXXX/status: 400 Bad Request: [client
> [::ffff:]:45994] Unable to acquire lock
> "/etc/proxmox-backup/.datastore.lck" - Interrupted system call (os error
> 4)
>
> Which surprised me, since this is a ’status’ call, which should not need
> locking of the datastore-config.
>
> https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=src/api2/admin/datastore.rs;h=c611f593624977defc49d6e4de2ab8185cfe09e9;hb=HEAD#l687
> does not lock the config, but
>
> https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-datastore/src/datastore.rs;h=0801b4bf6b25eaa6f306c7b39ae2cfe81b4782e1;hb=HEAD#l204
> does.
>
> So if I understand this correctly, every ’status’ call (30 per second in
> our case) locks the datastore-config exclusively. And also, every time
> ’status’ get called, the whole datastore-config gets loaded?
probably, there are some comments about that there already, it might
make sense to open a bugzilla issue to discuss this further [1].
[1]: https://bugzilla.proxmox.com/
> Is that something that could use some performance tuning?
>
> —
> Mark Schouten
> CTO, Tuxis B.V.
> +31 318 200208 / mark at tuxis.nl
>
>
> ------ Original Message ------
> From "Shannon Sterz" <s.sterz at proxmox.com>
> To "Mark Schouten" <mark at tuxis.nl>
> Cc "Proxmox Backup Server development discussion"
> <pbs-devel at lists.proxmox.com>
> Date 16/12/2024 12:51:47
> Subject Re: Re[2]: [pbs-devel] Authentication performance
>
> >On Mon Dec 16, 2024 at 12:23 PM CET, Mark Schouten wrote:
> >> Hi,
> >>
> >> >
> >> >would you mind sharing either `authkey.pub` or the output of the
> >> >following commands:
> >> >
> >> >head --lines=1 /etc/proxmox-backup/authkey.key
> >> >cat /etc/proxmox-backup/authkey.key | wc -l
> >>
> >> -----BEGIN RSA PRIVATE KEY-----
> >> 51
> >>
> >> So that is indeed the legacy method. We are going to upgrade our PBS’es
> >> on wednesday.
> >>
> >> >
> >> >The first should give the PEM header of the authkey whereas the second
> >> >provides the amount of lines that the key takes up in the file. Both
> >> >give an indication whether you are using the legacy RSA keys or newer
> >> >Ed25519 keys. The later should provide more performance, security should
> >> >not be affected much by this change. If the output of the commands look
> >> >like this:
> >> >
> >> >-----BEGIN PRIVATE KEY-----
> >> >3
> >> >
> >> >Then you are using the newer keys. There currently isn't a recommended
> >> >way to upgrade the keys. However, in theory you should be able to remove
> >> >the old keys, re-start PBS and it should just generate keys in the new
> >> >format. Note that this will logout anyone that is currently
> >> >authenticated and they'll have to re-authenticate.
> >>
> >> Seems like a good moment to update those keys as well.
> >
> >Sure, just be aware that you have to manually delete the key before
> >restarting the PBS. Upgrading alone won't affect the key. Ideally you'd
> >test this before rolling it out, if you can
> >
> >> >In general, tokens should still be fater to authenticate so we'd
> >> >recommend that you try to get your users to switch to token-based
> >> >authentication where possible. Improving performance there is a bit
> >> >trickier though, as it often comes with a security trade-off (in the
> >> >background we use yescrypt fo the authentication there, that
> >> >delibaretely adds a work factor). However, we may be able to improve
> >> >performance a bit via caching methods or similar.
> >>
> >> Yes, that might help. I’m also not sure if it actually is
> >> authentication, or if it is the datastore-call that the PVE-environments
> >> call. As you can see in your support issue 3153557, it looks like some
> >> requests loop through all datastores, before responding with a limited
> >> set of datastores.
> >
> >I looked at that ticket and yes, that is probably unrelated to
> >authentication.
> >
> >> For instance (and I’m a complete noob wrt Rust) but if I understand
> >>https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=src/api2/admin/datastore.rs;h=11d2641b9ca2d2c92da1a85e4cb16d780368abd3;hb=HEAD#l1315
> >> correcly, PBS loops through all the datastores, checks mount-status and
> >> config, and only starts filtering at line 1347. If I understand that
> >> correctly, in our case with over 1100 datastores, that might cause quite
> >> some load?
> >
> >Possible, yes, that would depend on your configuration. Are all of these
> >datastores defined with a backing device? Because if not, than this
> >should be fairly fast (as in, this should not actually touch the disks).
> >If they are, then yes this could be slow as each store would trigger at
> >least 2 stat calls afaict.
> >
> >In any case, it should be fine to move the `mount_status` check after
> >the `if allowed || allow_id` check from what i can tell. Not sure why
> >we'd need to check the mount_status for a datastore we won't include in
> >the resulsts anyway. Same goes for parsing the store config imo. Send a
> >patch for that [1].
> >
> >[1]: https://lore.proxmox.com/pbs-devel/20241216115044.208595-1-s.sterz@proxmox.com/T/#u
> >
> >>
> >>
> >> Thanks,
> >>
> >> —
> >> Mark Schouten
> >> CTO, Tuxis B.V.
> >> +31 318 200208 / mark at tuxis.nl
> >
> >
More information about the pbs-devel
mailing list