[pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files
Laurențiu Leahu-Vlăducu
l.leahu-vladucu at proxmox.com
Fri Feb 14 16:40:37 CET 2025
This patch series fixes bug #3256:
1. It ensures that general config files (e.g. storage.cfg) are decoded
from UTF-8 when deserialized. Previously, no decoding happened,
meaning that Perl interpreted the string as single bytes instead of
Unicode code points. Note: while I would have preferred to decode
the text right after reading from the file, there are some Perl
functions like Digest::SHA::sha1_hex that expect bytes
instead of UTF-8.
2. It ensures that general config files are explicitly encoded
as UTF-8 before serialization to prevent similar issues the other
way around.
3. It adds a unit test to prevent similar issues from happening in
the future.
4. It fixes the PBS storage plugin for serializing/deserializing the
password, similar to points 1 and 2, but for the case where the
password itself contains Unicode characters.
For more information on this topic, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?
I'm sending this patch series to begin a discussion on how to handle
encodings in our config files, and eventually also other relevant
files. In my opinion, we should handle them consistently as UTF-8,
also over both Perl and Rust code.
Due to the fact that Linux uses UTF-8 encoding by default since
a long time, as well as browsers* and other software, I doubt that
we have to worry too much about other encodings
like Latin-1 (ISO-8859-1). However, according to the
Perl documentation, Perl could have deserialized such a string
in the past (since it's the default in Perl when not decoding
explicitly), and it is no longer able to after the fixes included
in this patch series.
We have to ask ourselves:
a. Do we want to define, in general, that configuration files should
always be serialized and deserialized as UTF-8? If yes, should we
consider this a breaking change?
b. Do we want to introduce any backward-compatibility for existing
config files? In other words, assume that older files might have
used other encodings in the past. To be honest, I didn't test
Latin-1 encoded files yet, so I'm not sure how (or if) our
current code would handle it.
There are further parsers and plugins that I still need to modify,
but I first wanted to get your feedback on this subject.
* With browsers I mean the encoding in HTML and not the JavaScript
internals with its UTF-16 encoding.
pve-common:
Laurențiu Leahu-Vlăducu (2):
fix #3256: SectionConfig: ensure UTF-8 encoding for general configs
SectionConfig: add unit test for UTF-8 configs
src/PVE/SectionConfig.pm | 10 +++++++---
test/section_config_test.pl | 25 +++++++++++++++++++++++++
2 files changed, 32 insertions(+), 3 deletions(-)
pve-storage:
Laurențiu Leahu-Vlăducu (1):
fix #3256: Storage: PBS: ensure passwords are saved and loaded as
UTF-8
src/PVE/Storage/PBSPlugin.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--
2.39.5
More information about the pve-devel
mailing list