[pve-devel] [PATCH access-control] ldap: fix ldap distinguished names regex

Tue May 23 12:12:09 CEST 2023

On Tue, May 23, 2023 at 10:56:24AM +0200, Stefan Sterz wrote:
> On 23.05.23 08:58, Christoph Heiss wrote:
> > On Wed, May 17, 2023 at 03:39:31PM +0200, Stefan Sterz wrote:
> >> [..]
> > While reviewing that, I had a look at the `Net::LDAP` perl library
> > again, if it provides a way to _somehow_ validate DNs properly without
> > having to resort to very fragile regexes.
> >
> > => `Net::LDAP` provides a canonical_dn() function in `Net::LDAP::Util`
> >    https://metacpan.org/pod/Net::LDAP::Util#canonical_dn
> >
>
> handling this via Net::LDAP is probably the better way to go. however,
> just to through this suggestion out there: do we maybe want to use a
> more lax regex in general instead of actually checking whether a dn
> *could* be valid? maybe this check should just be a sanity check instead
> of making the dn conform to the spec.
I guess so. In the end, it will get rejected by the LDAP server anyway
if it isn't valid. Offloading some of that logic might be indeed a good
idea.

When saving, we also don't currently check whether the server actually
accepts it, is allowed to login, etc. So a sync might fail for other
reason too, so it could fail due to invalid DN syntax as well at this
stage IMO.

Or - just a quick idea - maybe directly check the DN with the server on
changes via the API (given that there is a valid server configuration as
well)?

>
> > I quickly hacked up your test script to use canonical_dn() instead of
> > the regex. Works pretty much the same, except that it considers empty
> > strings and attributes with empty values as valid DNs and allows
> > whitespaces at both the beginning and the end of attribute values. But
> > these edge-cases can be much more easily checked than the whole thing
> > (esp. with escaping and such), and should be more robust.
>
> yes, but imo that might end up being somewhat confusing. especially if
> we end up using the canonical version of the dn returned by
> `canonical_dn()`. in my testing it also escaped already validly escaped
> values such as `CN=James \"Jim\" Smith\, III` and turned them into
> `CN=James \22Jim\22 Smith\2c III`. this may be confusing to some users.
If the entered DN is suddenly replaced with the canonicalized version,
definitely. But we could make that clear in the UI and documentation.
Although I would favor keeping it as-is, after thinking about it more.

>
> it's handling of spaces is imo especially strange. according to the docs:
>
> > Converts all leading and trailing spaces in values to be \20.
>
> but `ou= test ,dc=net` yields `OU=test,DC=net` for me. so it doesn't
> escape them, it just drops them
That's weird. Guess a bug on their side?

> (also note that capitalization)?
Yes, in it's canonical form the attribute names are always uppercase,
but any sane server accepts lowercase just fine too. (Maybe the spec
says something about that, but not digging in there right now.)

> if you quote the value it does what it says though. so `ou=" test
> ",dc=net` becomes `OU=\20test\20,DC=net` as expected, although
> somewhat pointlessly so in this case as spaces in quoted values are
> in-spec.
>
>
> imo if we go this route we either need additional logic to handle spaces
> and empty values/dns before passing the dn to `canonical_dn()` for
> validation only, or we need to, at least, document the transformations
> that `canonical_dn()` does. otherwise, these substitutions may end up
> confusing users.

I don't have a strong opinion on what overall approach to take here.

To quickly summarize, I guess
- keep storing the DN as-is from user
- only do a lax sanity check ourselves
- let it fail later on sync if it is invalid
- maybe contact the server on changes via the API and check DN/login?

would be a pretty good way and save us the trouble from tweaking the
regex every few weeks for all eternity.