On 12-10-23 09:01 AM, Sebastian Rahtz wrote:
> the development site at http://tei.oucs.ox.ac.uk/Roma/startroma.php?mode=changeModule&module=dictionaries
> does have the accept attribute on the form, but still fails, I think?
It does have the right accept-charset attribute, as well as the correct
encoding in the HTTP header. So the problem must be with how the
response is processed by the PHP back on the server.
That creates an instance of the roma class defined in roma/roma.php.
There's a couple of thousand lines of code in there. One thing that
might be worth trying before diving into that is:
at the top of startroma.php.
But forging on:
The roma class has a private instance of RomaDom, which is called to
process the change in tag name, through its changeElementNameInModule
method. This function uses preg_match, which may handle UTF-8 wrongly
unless PHP was compiled with the PCRE UTF-8, I think.
The roma class appears to be an extension of the domDocument class,
which I think must be related to PHP DOMDocument. I see this comment on
the relevant PHP doc page:
"DOMDocument notoriously doesn't handle encoding (at least UTF-8)
correctly and garbles the output"
although the documentation itself seems to suggest that DomDocuments are
UTF-8 by default. One way of testing this would be to add the character
encoding flag to all constructors for descendants of DOMDocument.
Finally, in the source tree, there's a file called xml/roma.xml, which
looks like this:
<?xml version="1.0" encoding="iso-8859-1"?>
<title>My TEI Extension</title>
<author>generated by Roma</author>
<p>for use by whoever wants it</p>
I don't know what use is made of this file, but I see no reason not to
change that to UTF-8 too.
> Sebastian Rahtz
> Director (Research Support) of Academic IT Services
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
University of Victoria Humanities Computing and Media Centre
([log in to unmask])