[pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files
Fiona Ebner
f.ebner at proxmox.com
Mon Feb 17 11:15:29 CET 2025
Am 14.02.25 um 16:40 schrieb Laurențiu Leahu-Vlăducu:
>
> This patch series fixes bug #3256:
>
> 1. It ensures that general config files (e.g. storage.cfg) are decoded
> from UTF-8 when deserialized. Previously, no decoding happened,
> meaning that Perl interpreted the string as single bytes instead of
> Unicode code points. Note: while I would have preferred to decode
> the text right after reading from the file, there are some Perl
> functions like Digest::SHA::sha1_hex that expect bytes
> instead of UTF-8.
What about pre-existing configs that are not UTF-8? Not breaking those
is very important here.
>
> 2. It ensures that general config files are explicitly encoded
> as UTF-8 before serialization to prevent similar issues the other
> way around.
>
> 3. It adds a unit test to prevent similar issues from happening in
> the future.
>
> 4. It fixes the PBS storage plugin for serializing/deserializing the
> password, similar to points 1 and 2, but for the case where the
> password itself contains Unicode characters.
>
> For more information on this topic, please read:
> https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
>
> I'm sending this patch series to begin a discussion on how to handle
> encodings in our config files, and eventually also other relevant
> files. In my opinion, we should handle them consistently as UTF-8,
> also over both Perl and Rust code.
Yes, that is the long-term plan AFAIK, but right now existing config
files might be encoded differently.
>
> Due to the fact that Linux uses UTF-8 encoding by default since
> a long time, as well as browsers* and other software, I doubt that
> we have to worry too much about other encodings
> like Latin-1 (ISO-8859-1). However, according to the
> Perl documentation, Perl could have deserialized such a string
> in the past (since it's the default in Perl when not decoding
> explicitly), and it is no longer able to after the fixes included
> in this patch series.
Unfortunately, we do. E.g.
> [I] root at pve8a1 ~# pct set 112 --mp1 /root/ö,mp=/o
> [I] root at pve8a1 ~# file /etc/pve/lxc/112.conf
> /etc/pve/lxc/112.conf: ISO-8859 text
>
> We have to ask ourselves:
>
> a. Do we want to define, in general, that configuration files should
> always be serialized and deserialized as UTF-8? If yes, should we
> consider this a breaking change?
Yes, see above.
>
> b. Do we want to introduce any backward-compatibility for existing
> config files? In other words, assume that older files might have
> used other encodings in the past. To be honest, I didn't test
> Latin-1 encoded files yet, so I'm not sure how (or if) our
> current code would handle it.
Yes, we certainly need to.
>
> There are further parsers and plugins that I still need to modify,
> but I first wanted to get your feedback on this subject.
>
>
> * With browsers I mean the encoding in HTML and not the JavaScript
> internals with its UTF-16 encoding.
>
>
More information about the pve-devel
mailing list