Print

Print


That's correct. I'm trying to work out the route the data takes from the form to the repository and see where the encoding gets mixed up.

Pete Bleackley
The Fantastical Devices of Pete The Mad Scientist - http://fantasticaldevices.blogspot.com

-----Original Message-----
From: "Mark J. Reed" <[log in to unmask]>
To: [log in to unmask]
Sent: Fri, 06 Jan 2017 22:39
Subject: Re: Character encoding problem

What POST request?  You mean one that publishes the content to the site?

On Fri, Jan 6, 2017 at 2:51 AM, Pete Bleackley <
[log in to unmask]> wrote:

> Looking through the codebase, I've discovered that the payload of the POST
> request is written to a BytesIO object at one point. I assume that the
> error occurs when the data is retrieved from that object.
>
> Pete Bleackley
> The Fantastical Devices of Pete The Mad Scientist -
> http://fantasticaldevices.blogspot.com
>
> -----Original Message-----
> From: "Mark J. Reed" <[log in to unmask]>
> To: [log in to unmask]
> Sent: Thu, 05 Jan 2017 16:22
> Subject: Re: Character encoding problem
>
> Hm. It appears that somewhere along the way, something interpreted UTF-8
> text as Latin-1 and wrote it back out as the UTF-8 version of that
> interpretation. Which means the bytes on the wire are actually wrong and
> need to be changed; it's not just a metadata/labelling problem.
>
> For example, in UTF-8, the 'á' in "ámman îar", is encoded with two bytes.
> If you interpret those two bytes as individual Latin-1 characters, you get
> an 'Ã' followed by a '¡',  What you are serving is the two characters 'Ã'
> and '¡' in UTF-8, in which they take up two bytes each. You could repeat
> this mangling by interpreting that sequence as 4 Latin-1 characters and
> re-encoding them into UTF-8, but one pass is enough to muck things up.
>
> If you granted me read access to your hg repo I may be able to do an
> automated transmogrification to fix things up.
>
>
> On Thu, Jan 5, 2017 at 10:27 AM, Pete Bleackley <
> [log in to unmask]> wrote:
>
> > I'm in the process of setting up a wiki about conlang scholarship and
> > criticism, and I'm finding that the wiki engine I'm using doesn't encode
> > non-ascii characters properly. See http://sources.conlang.org/
> > The_Art_of_Language_Invention for an example.
> >
> > The engine I'm using stores its pages as plain text files in a Mercurial
> > repository. I've looked at the server and they're garbled there.
> >
> > Anyone have any idea what I need to do to fix this?
> >
> > Pete Bleackley
> > The Fantastical Devices of Pete The Mad Scientist -
> > http://fantasticaldevices.blogspot.com
> >
>
>
>
> --
> Mark J. Reed <[log in to unmask]>
>



-- 
Mark J. Reed <[log in to unmask]>