[pve-devel] [PATCH common 1/3] INotify.pm: add binmode utf8 to read/update_file

Wolfgang Bumiller w.bumiller at proxmox.com
Mon Aug 27 11:20:11 CEST 2018


On Fri, Aug 24, 2018 at 05:14:47PM +0200, Thomas Lamprecht wrote:
> On 8/24/18 4:56 PM, Dietmar Maurer wrote:
> > BTW, why do you think /etc/hosts may contain utf8 characters? 
> > Is that defined/documented somewhere?
> > 
> 
> With his patch the whole file including comments gets returned as "raw",
> and comments can contain utf-8 (not defending the binmode)

I'm not *completely* opposed to patches that enforce utf-8 on *certain*
files. Particularly ones of which the non-comment content needs to be
ASCII compatible anyway.

Here's the issue with utf-8:
If the file contains non-ascii characters, our code currently reads it
as-is, that is, a byte '234' will become the code point '234' in perl's
internal string. This means a single utf-8 encoded letter is treated as
2 or more letters in the range 128..255 (as is to be expected).
When serializing this to json in the API's output code, we tell to_json
to produce utf-8, this produces the utf-8 representation for *each* of
the bytes that initially made up the character separately.
The GUI then decodes that to produce the same string perl was using
internally: something containing 2 or more code points for what was
initially a single utf-8 encoded letter. This of course shows up as
garbage in the GUI.

One could argue we're currently using a file-encoding of latin-1 /
iso-8859-1, as its code points 0..255 AFAIK map directly to unicode...




More information about the pve-devel mailing list