Print

Print


On Mon, May 4, 2009 at 5:44 AM, taliesin the storyteller <
[log in to unmask]> wrote:

> * Edgard Bikelis said on 2009-05-04 06:40:30 +0200
> > I know PHP fairly well, but, as I use Debian, it would be nice to know a
> > scripting language.
>
> PHP *is* a scripting language.


I meant "an...other scripting language canonically used in bash like perl,
but hopefully easier to read ages after the code have been written"... ok,
it seems I did ferget to write something ; ).

>
>
> > I'm really hoping it will not begin a python versus perl
> > discussion, but still... I tried making a script do do the sanskrit
> > a-declension, and all my unicode was messed up. Do anyone here have
> > experiencing with it... specially avoiding python to output binary codes
> > like "\xc3\xa1" ?
> >
> > (python 2.6)
> >
> > [
> > #! /usr/bin/env python
> > # coding=UTF-8
> >
> > lalala = ""
> > print lalala
>
> OK: When using python 2.x, always use unicode-strings, that is: prefix
> with u:
>
> lalala = u""
> print lalala
>
> Your example prints  for me, with length 2, since it counts bytes. My
> example also prints  for me, and is of length 1 since it counts
> characters. Bytes != characters in unicode.
>
> > lalala = repr(lalala)
> > print lalala
>
> Prints '\xc3\xa1' for me, with length 10
>
> Repr is supposed to be ascii-safe so no wonder. Furthermore, you've now
> deleted what was in "lalala" and replaced it with the string '\xc3\xa1'
>
> Save it to something else so that you can compare the various versions
> instead.
>
> > lalala = unicode(lalala, "utf8")
> > print lalala
>
> You say that the string lalala, which is '\xc3\xa1', that is ten letters
> long with only ascii letters, is really utf8. Result: the '\' are
> double-encoded and you get a unicode-string of the ascii-string, looking
> like
>
> u"'\\xc3\\xa1'"
>
>
> > print "but", unicode('\xc3\xa1', "utf8")
> > ]
>
> The same but starting with lalala = u'':
>
>    >>> lalala = u""
>    >>> print lalala
>     
>    >>>
>    >>> lalala = repr(lalala)
>    >>> print lalala
>     u'\xe1'
>     >>>
>    >>> lalala = unicode(lalala, "utf8")
>    >>> print lalala
>     u'\xe1'
>     >>>
>    >>> print "but", unicode('\xc3\xa1', "utf8")
>     but 
>
> When you read from a file or from the keyboard you read in bytes, so you
> need to convert to unicode. Ditto when you write, convert from unicode
> back to bytes: lalala.encode('utf8')


I meant to show the frustration of having the same string being treated
differently...


> That way, all strings inside the python-program are unicode.


My actual script have tons of strings, writing 'u' in each one would be
really boring; it should have some way to say 'let every string be in utf8
by default'... but...


>
>
> In Python 3.x, unicode-strings are the default, so no u-prefix, instead
> byte-strings get a b-prefix.


... aha. Then Python 3.x it is : ). Thanks!


Edgard