PCQRSCANER/venv/Lib/site-packages/nltk/test/compat.doctest


=========================================
NLTK Python 2.x - 3.x Compatibility Layer
=========================================

NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
(which is loosely based on `six <http://packages.python.org/six/>`_)::

    >>> from nltk import compat
    >>> compat.PY3
    False
    >>> # and so on

@python_2_unicode_compatible
----------------------------

Under Python 2.x ``__str__`` and ``__repr__`` methods must
return bytestrings.

``@python_2_unicode_compatible`` decorator allows writing these methods
in a way compatible with Python 3.x:

1) wrap a class with this decorator,
2) define ``__str__`` and ``__repr__`` methods returning unicode text
   (that's what they must return under Python 3.x),

and they would be fixed under Python 2.x to return byte strings::

    >>> from nltk.compat import python_2_unicode_compatible

    >>> @python_2_unicode_compatible
    ... class Foo(object):
    ...     def __str__(self):
    ...         return u'__str__ is called'
    ...     def __repr__(self):
    ...         return u'__repr__ is called'

    >>> foo = Foo()
    >>> foo.__str__().__class__
    <type 'str'>
    >>> foo.__repr__().__class__
    <type 'str'>
    >>> print(foo)
    __str__ is called
    >>> foo
    __repr__ is called

Original versions of ``__str__`` and ``__repr__`` are available as
``__unicode__`` and ``unicode_repr``::

    >>> foo.__unicode__().__class__
    <type 'unicode'>
    >>> foo.unicode_repr().__class__
    <type 'unicode'>
    >>> unicode(foo)
    u'__str__ is called'
    >>> foo.unicode_repr()
    u'__repr__ is called'

There is no need to wrap a subclass with ``@python_2_unicode_compatible``
if it doesn't override ``__str__`` and ``__repr__``::

    >>> class Bar(Foo):
    ...     pass
    >>> bar = Bar()
    >>> bar.__str__().__class__
    <type 'str'>

However, if a subclass overrides ``__str__`` or ``__repr__``,
wrap it again::

    >>> class BadBaz(Foo):
    ...     def __str__(self):
    ...         return u'Baz.__str__'
    >>> baz = BadBaz()
    >>> baz.__str__().__class__  # this is incorrect!
    <type 'unicode'>

    >>> @python_2_unicode_compatible
    ... class GoodBaz(Foo):
    ...     def __str__(self):
    ...         return u'Baz.__str__'
    >>> baz = GoodBaz()
    >>> baz.__str__().__class__
    <type 'str'>
    >>> baz.__unicode__().__class__
    <type 'unicode'>

Applying ``@python_2_unicode_compatible`` to a subclass
shouldn't break methods that was not overridden::

    >>> baz.__repr__().__class__
    <type 'str'>
    >>> baz.unicode_repr().__class__
    <type 'unicode'>

unicode_repr
------------

Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.

``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::

    >>> from nltk.compat import unicode_repr
    >>> print(repr(u"test"))
    u'test'
    >>> print(unicode_repr(u"test"))
    'test'

It may be also used to get an original unescaped repr (as unicode)
of objects which class was fixed by ``@python_2_unicode_compatible``
decorator::

    >>> @python_2_unicode_compatible
    ... class Foo(object):
    ...     def __repr__(self):
    ...         return u'<Foo: foo>'

    >>> foo = Foo()
    >>> repr(foo)
    '<Foo: foo>'
    >>> unicode_repr(foo)
    u'<Foo: foo>'

For other objects it returns the same value as ``repr``::

    >>> unicode_repr(5)
    '5'

It may be a good idea to use ``unicode_repr`` instead of ``%r``
string formatting specifier inside ``__repr__`` or ``__str__``
methods of classes fixed by ``@python_2_unicode_compatible``
to make the output consistent between Python 2.x and 3.x.
3 2019-12-22 21:51:47 +01:00
			`=========================================`
			`NLTK Python 2.x - 3.x Compatibility Layer`
			`=========================================`

			`NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat`
			(which is loosely based on `six <http://packages.python.org/six/>`_)::

			`>>> from nltk import compat`
			`>>> compat.PY3`
			`False`
			`>>> # and so on`

			`@python_2_unicode_compatible`
			`----------------------------`

			Under Python 2.x ``__str__`` and ``__repr__`` methods must
			`return bytestrings.`

			``@python_2_unicode_compatible`` decorator allows writing these methods
			`in a way compatible with Python 3.x:`

			`1) wrap a class with this decorator,`
			2) define ``__str__`` and ``__repr__`` methods returning unicode text
			`(that's what they must return under Python 3.x),`

			`and they would be fixed under Python 2.x to return byte strings::`

			`>>> from nltk.compat import python_2_unicode_compatible`

			`>>> @python_2_unicode_compatible`
			`... class Foo(object):`
			`... def __str__(self):`
			`... return u'__str__ is called'`
			`... def __repr__(self):`
			`... return u'__repr__ is called'`

			`>>> foo = Foo()`
			`>>> foo.__str__().__class__`
			`<type 'str'>`
			`>>> foo.__repr__().__class__`
			`<type 'str'>`
			`>>> print(foo)`
			`__str__ is called`
			`>>> foo`
			`__repr__ is called`

			Original versions of ``__str__`` and ``__repr__`` are available as
			``__unicode__`` and ``unicode_repr``::

			`>>> foo.__unicode__().__class__`
			`<type 'unicode'>`
			`>>> foo.unicode_repr().__class__`
			`<type 'unicode'>`
			`>>> unicode(foo)`
			`u'__str__ is called'`
			`>>> foo.unicode_repr()`
			`u'__repr__ is called'`

			There is no need to wrap a subclass with ``@python_2_unicode_compatible``
			if it doesn't override ``__str__`` and ``__repr__``::

			`>>> class Bar(Foo):`
			`... pass`
			`>>> bar = Bar()`
			`>>> bar.__str__().__class__`
			`<type 'str'>`

			However, if a subclass overrides ``__str__`` or ``__repr__``,
			`wrap it again::`

			`>>> class BadBaz(Foo):`
			`... def __str__(self):`
			`... return u'Baz.__str__'`
			`>>> baz = BadBaz()`
			`>>> baz.__str__().__class__ # this is incorrect!`
			`<type 'unicode'>`

			`>>> @python_2_unicode_compatible`
			`... class GoodBaz(Foo):`
			`... def __str__(self):`
			`... return u'Baz.__str__'`
			`>>> baz = GoodBaz()`
			`>>> baz.__str__().__class__`
			`<type 'str'>`
			`>>> baz.__unicode__().__class__`
			`<type 'unicode'>`

			Applying ``@python_2_unicode_compatible`` to a subclass
			`shouldn't break methods that was not overridden::`

			`>>> baz.__repr__().__class__`
			`<type 'str'>`
			`>>> baz.unicode_repr().__class__`
			`<type 'unicode'>`

			`unicode_repr`
			`------------`

			Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.

			``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
			``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::

			`>>> from nltk.compat import unicode_repr`
			`>>> print(repr(u"test"))`
			`u'test'`
			`>>> print(unicode_repr(u"test"))`
			`'test'`

			`It may be also used to get an original unescaped repr (as unicode)`
			of objects which class was fixed by ``@python_2_unicode_compatible``
			`decorator::`

			`>>> @python_2_unicode_compatible`
			`... class Foo(object):`
			`... def __repr__(self):`
			`... return u'<Foo: foo>'`

			`>>> foo = Foo()`
			`>>> repr(foo)`
			`'<Foo: foo>'`
			`>>> unicode_repr(foo)`
			`u'<Foo: foo>'`

			For other objects it returns the same value as ``repr``::

			`>>> unicode_repr(5)`
			`'5'`

			It may be a good idea to use ``unicode_repr`` instead of ``%r``
			string formatting specifier inside ``__repr__`` or ``__str__``
			methods of classes fixed by ``@python_2_unicode_compatible``
			`to make the output consistent between Python 2.x and 3.x.`