Print

Print


This question and answer picks up a problem I've been suddenly having
with Windows and directory names with spaces: from the command prompt,
Windows is suddenly no longer *writing* to the correct alias for
something like c:/Documents and Settings/User/My Documents. When you
give a command-line for saxon, it is able to understand the aliasing for
the input files, even when they are in the same directory; but the
output is being written to
c:/Documents%20and%Settings/User/My%20Documents/ If that directory
doesn't exist it makes a new one.

Anybody seen this error before? It arose suddenly on my XP Pro machine:
I've fixed registries, virus checked, spybot checked, firewalled, and
checked help, but have no idea what the problem is.

I realise this is getting farther and farther from our main business.
Nobody I've asked has had an answer thus far, however.

-dan

Michael Beddow wrote:
> Dieter Köhler wrote:
>
>
>>I tried to process some TEI documents with the Xalan XSLT
>>processor under Windows XP Professional.  The documents
>>are located in my personal directory whose name contains the
>>special character 'ö'.
>
> [...]
>
>>Usually I escape an 'ö' character in such URLs with its Latin-1/Unicode
>>scalar value, which is '%F6'.  This works fine for example with Firefox,
>
>
> [...]
>
> With scant confidence, I can only suggest you try escaping the o umlaut in
> the URI as
> %C3%B6
> If that fails, unless someone else has any ideas, it's time to take evasive
> action.
>
> Escaping o umlaut in a uri to the ISO-8859-1 single-byte value %F6 will
> indeed work with applications that are attuned to the vagaries of Windows
> (as Firefox is). Such applications realise that despite its support for
> Unicode, Windows when installed with a Western default codepage will use a
> single-byte representation of characters whose codepoint is in the decimal
> 128-255 range when accessing the filesystem. With Xalan (and I assume this
> is the Java, not the C++ version), you are using an application that is
> isolated from Windows by the Java VM (and what precisely that means about
> the encoding of characters in this problematic range depends on which
> flavour of Java you are using and what its default encoding is set to) and
> so it may well assume that codepoints will be represented in utf-8 unless
> otherwise specified. You can correct that assumption if accessing it via the
> API, but I'm not sure it can be done from the command line. It's hard to win
> here if you need to use one of these characters: applications expecting
> utf-8 will balk at 0xf6, which is not a legal character in utf-8, and the
> filesystem when Windows is running under a European codepage will
> misinterpret the utf-8 sequence 0xc3 0xb6 as two distinct codepoints, thus
> failing to match the filename concerned.
>
> Assuming the utf-8 sequence escaping trick fails, then short of renaming
> your directories (and while you are about it, it would be a good idea to
> take out any spaces, last time I tried it on W32, admittedly some 18 months
> ago, Xalan didn't like those much either, though for different reasons) your
> other possibility, provided you don't need either XPath2 or Xalan-specific
> XSLT extension functions, is to use libxml2, which on Win 32 runs as a
> native Windows library. The general excellence of that package is why it's
> so long since I revisited Xalan. I will be indeed shortly be toying with
> Xerces/Xalan, to play about with the bleeding-edge DOM3 support, but that
> will be most definitely under Linux with a utf-8 locale setting. If you
> wanted Xalan for XPath2 support, you might consider switching to Saxon. Of
> course, that too is a Java app, but if you encountered similar problems
> there I'm sure Michael Kay would sort them out in a trice, whereas a trawl
> through the Xalan or Xerces Bugzillas looking at the treatment encoding
> problems receive is not an encouraging experience.
>
> Michael Beddow

--
Daniel Paul O'Donnell, PhD
Associate Professor of English
University of Lethbridge
Lethbridge AB T1K 3M4
Tel. (403) 329-2377
Fax. (403) 382-7191
E-mail <[log in to unmask]>
Home Page <http://people.uleth.ca/~daniel.odonnell/>
The Digital Medievalist Project: <http://www.digitalmedievalist.org/>