135 lines
3.6 KiB
Plaintext
135 lines
3.6 KiB
Plaintext
|
|
=========================================
|
|
NLTK Python 2.x - 3.x Compatibility Layer
|
|
=========================================
|
|
|
|
NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
|
|
(which is loosely based on `six <http://packages.python.org/six/>`_)::
|
|
|
|
>>> from nltk import compat
|
|
>>> compat.PY3
|
|
False
|
|
>>> # and so on
|
|
|
|
@python_2_unicode_compatible
|
|
----------------------------
|
|
|
|
Under Python 2.x ``__str__`` and ``__repr__`` methods must
|
|
return bytestrings.
|
|
|
|
``@python_2_unicode_compatible`` decorator allows writing these methods
|
|
in a way compatible with Python 3.x:
|
|
|
|
1) wrap a class with this decorator,
|
|
2) define ``__str__`` and ``__repr__`` methods returning unicode text
|
|
(that's what they must return under Python 3.x),
|
|
|
|
and they would be fixed under Python 2.x to return byte strings::
|
|
|
|
>>> from nltk.compat import python_2_unicode_compatible
|
|
|
|
>>> @python_2_unicode_compatible
|
|
... class Foo(object):
|
|
... def __str__(self):
|
|
... return u'__str__ is called'
|
|
... def __repr__(self):
|
|
... return u'__repr__ is called'
|
|
|
|
>>> foo = Foo()
|
|
>>> foo.__str__().__class__
|
|
<type 'str'>
|
|
>>> foo.__repr__().__class__
|
|
<type 'str'>
|
|
>>> print(foo)
|
|
__str__ is called
|
|
>>> foo
|
|
__repr__ is called
|
|
|
|
Original versions of ``__str__`` and ``__repr__`` are available as
|
|
``__unicode__`` and ``unicode_repr``::
|
|
|
|
>>> foo.__unicode__().__class__
|
|
<type 'unicode'>
|
|
>>> foo.unicode_repr().__class__
|
|
<type 'unicode'>
|
|
>>> unicode(foo)
|
|
u'__str__ is called'
|
|
>>> foo.unicode_repr()
|
|
u'__repr__ is called'
|
|
|
|
There is no need to wrap a subclass with ``@python_2_unicode_compatible``
|
|
if it doesn't override ``__str__`` and ``__repr__``::
|
|
|
|
>>> class Bar(Foo):
|
|
... pass
|
|
>>> bar = Bar()
|
|
>>> bar.__str__().__class__
|
|
<type 'str'>
|
|
|
|
However, if a subclass overrides ``__str__`` or ``__repr__``,
|
|
wrap it again::
|
|
|
|
>>> class BadBaz(Foo):
|
|
... def __str__(self):
|
|
... return u'Baz.__str__'
|
|
>>> baz = BadBaz()
|
|
>>> baz.__str__().__class__ # this is incorrect!
|
|
<type 'unicode'>
|
|
|
|
>>> @python_2_unicode_compatible
|
|
... class GoodBaz(Foo):
|
|
... def __str__(self):
|
|
... return u'Baz.__str__'
|
|
>>> baz = GoodBaz()
|
|
>>> baz.__str__().__class__
|
|
<type 'str'>
|
|
>>> baz.__unicode__().__class__
|
|
<type 'unicode'>
|
|
|
|
Applying ``@python_2_unicode_compatible`` to a subclass
|
|
shouldn't break methods that was not overridden::
|
|
|
|
>>> baz.__repr__().__class__
|
|
<type 'str'>
|
|
>>> baz.unicode_repr().__class__
|
|
<type 'unicode'>
|
|
|
|
unicode_repr
|
|
------------
|
|
|
|
Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.
|
|
|
|
``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
|
|
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::
|
|
|
|
>>> from nltk.compat import unicode_repr
|
|
>>> print(repr(u"test"))
|
|
u'test'
|
|
>>> print(unicode_repr(u"test"))
|
|
'test'
|
|
|
|
It may be also used to get an original unescaped repr (as unicode)
|
|
of objects which class was fixed by ``@python_2_unicode_compatible``
|
|
decorator::
|
|
|
|
>>> @python_2_unicode_compatible
|
|
... class Foo(object):
|
|
... def __repr__(self):
|
|
... return u'<Foo: foo>'
|
|
|
|
>>> foo = Foo()
|
|
>>> repr(foo)
|
|
'<Foo: foo>'
|
|
>>> unicode_repr(foo)
|
|
u'<Foo: foo>'
|
|
|
|
For other objects it returns the same value as ``repr``::
|
|
|
|
>>> unicode_repr(5)
|
|
'5'
|
|
|
|
It may be a good idea to use ``unicode_repr`` instead of ``%r``
|
|
string formatting specifier inside ``__repr__`` or ``__str__``
|
|
methods of classes fixed by ``@python_2_unicode_compatible``
|
|
to make the output consistent between Python 2.x and 3.x.
|