230 lines
7.2 KiB
Plaintext
230 lines
7.2 KiB
Plaintext
Metadata-Version: 2.1
|
|
Name: fuzzysearch
|
|
Version: 0.6.2
|
|
Summary: fuzzysearch is useful for finding approximate subsequence matches
|
|
Home-page: https://github.com/taleinat/fuzzysearch
|
|
Author: Tal Einat
|
|
Author-email: taleinat@gmail.com
|
|
License: MIT
|
|
Keywords: fuzzysearch
|
|
Platform: UNKNOWN
|
|
Classifier: Development Status :: 3 - Alpha
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: License :: OSI Approved :: MIT License
|
|
Classifier: Natural Language :: English
|
|
Classifier: Operating System :: MacOS :: MacOS X
|
|
Classifier: Programming Language :: Python :: 2
|
|
Classifier: Programming Language :: Python :: 2.7
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.4
|
|
Classifier: Programming Language :: Python :: 3.5
|
|
Classifier: Programming Language :: Python :: 3.6
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
Requires-Dist: six
|
|
|
|
===========
|
|
fuzzysearch
|
|
===========
|
|
|
|
.. image:: https://img.shields.io/pypi/v/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch
|
|
:alt: Latest Version
|
|
|
|
.. image:: https://img.shields.io/travis/taleinat/fuzzysearch.svg?branch=master
|
|
:target: https://travis-ci.org/taleinat/fuzzysearch/branches
|
|
:alt: Build & Tests Status
|
|
|
|
.. image:: https://img.shields.io/coveralls/taleinat/fuzzysearch.svg?branch=master
|
|
:target: https://coveralls.io/r/taleinat/fuzzysearch?branch=master
|
|
:alt: Test Coverage
|
|
|
|
.. image:: https://img.shields.io/pypi/dm/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch
|
|
:alt: Downloads
|
|
|
|
.. image:: https://img.shields.io/pypi/wheel/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch
|
|
:alt: Wheels
|
|
|
|
.. image:: https://img.shields.io/pypi/pyversions/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch
|
|
:alt: Supported Python versions
|
|
|
|
.. image:: https://img.shields.io/pypi/implementation/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch
|
|
:alt: Supported Python implementations
|
|
|
|
.. image:: https://img.shields.io/pypi/l/fuzzysearch.svg?style=flat
|
|
:target: https://pypi.python.org/pypi/fuzzysearch/
|
|
:alt: License
|
|
|
|
**Easy fuzzy search that just works, fast!**
|
|
|
|
.. code:: python
|
|
|
|
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
|
|
[Match(start=3, end=9, dist=1)]
|
|
|
|
* Approximate sub-string searches
|
|
|
|
* A single, simple function to use
|
|
|
|
* Chooses the fastest available search mechanism based on the given input
|
|
|
|
* Uses the Levenshtein Distance metric with configurable parameters
|
|
|
|
* Separately configure the max. allowed distance, substitutions, deletions
|
|
and insertions
|
|
|
|
* Advanced algorithms with optional C and Cython optimizations
|
|
|
|
* Extensively tested
|
|
|
|
* Free software: `MIT license <LICENSE>`_
|
|
|
|
For more info, see the `documentation <http://fuzzysearch.rtfd.org>`_.
|
|
|
|
Installation
|
|
------------
|
|
|
|
.. code::
|
|
|
|
$ pip install fuzzysearch
|
|
|
|
This will work even if installing the C and Cython extensions fails, using
|
|
pure-Python fallbacks.
|
|
|
|
Usage
|
|
-----
|
|
Just call ``find_near_matches()`` with the sub-sequence you're looking for,
|
|
the sequence to search, and the matching parameters:
|
|
|
|
.. code:: python
|
|
|
|
>>> from fuzzysearch import find_near_matches
|
|
# search for 'PATTERN' with a maximum Levenshtein Distance of 1
|
|
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
|
|
[Match(start=3, end=9, dist=1)]
|
|
|
|
.. code:: python
|
|
|
|
>>> sequence = '''\
|
|
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
|
|
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
|
|
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
|
|
GGGATAGG'''
|
|
>>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
|
|
>>> find_near_matches(subsequence, sequence, max_l_dist=2)
|
|
[Match(start=3, end=24, dist=1)]
|
|
|
|
Matching Criteria
|
|
-----------------
|
|
The search function supports four possible match criteria, which may be
|
|
supplied in any combination:
|
|
|
|
* maximum Levenshtein distance (*max_l_dist*)
|
|
|
|
* maximum # of subsitutions
|
|
|
|
* maximum # of deletions ("delete" = skip a character in the sub-sequence)
|
|
|
|
* maximum # of insertions ("insert" = skip a character in the sequence)
|
|
|
|
Not supplying a criterion means that there is no limit for it. For this reason,
|
|
one must always supply *max_l_dist* and/or all other criteria.
|
|
|
|
.. code:: python
|
|
|
|
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
|
|
[Match(start=3, end=9, dist=1)]
|
|
|
|
# this will not match since max-deletions is set to zero
|
|
>>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
|
|
[]
|
|
|
|
# note that a deletion + insertion may be combined to match a substution
|
|
>>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
|
|
[Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1
|
|
|
|
# ... but deletion + insertion may also match other, non-substitution differences
|
|
>>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
|
|
[Match(start=3, end=10, dist=2)]
|
|
|
|
|
|
|
|
|
|
History
|
|
-------
|
|
|
|
0.6.1 (2018-12-08)
|
|
++++++++++++++++++
|
|
|
|
* Fixed some C compiler warnings for the C and Cython modules
|
|
|
|
0.6.0 (2018-12-07)
|
|
++++++++++++++++++
|
|
|
|
* Dropped support for Python versions 2.6, 3.2 and 3.3
|
|
* Added support and testing for Python 3.7
|
|
* Optimized the n-grams Levenshtein search for long sub-sequences
|
|
* Further optimized the n-grams Levenshtein search
|
|
* Cython versions of the optimized parts of the n-grams Levenshtein search
|
|
|
|
0.5.0 (2017-09-05)
|
|
++++++++++++++++++
|
|
|
|
* Fixed ``search_exact_byteslike()`` to support supplying start and end indexes
|
|
* Added support for lists, tuples and other Sequence types to ``search_exact()``
|
|
* Fixed a bug where ``find_near_matches()`` could return a wrong ``Match.end``
|
|
with ``max_l_dist=0``
|
|
* Added more tests and improved some existing ones.
|
|
|
|
0.4.0 (2017-07-06)
|
|
++++++++++++++++++
|
|
|
|
* Added support and testing for Python 3.5 and 3.6
|
|
* Many small improvements to README, setup.py and CI testing
|
|
|
|
0.3.0 (2015-02-12)
|
|
++++++++++++++++++
|
|
|
|
* Added C extensions for several search functions as well as internal functions
|
|
* Use C extensions if available, or pure-Python implementations otherwise
|
|
* setup.py attempts to build C extensions, but installs without if build fails
|
|
* Added ``--noexts`` setup.py option to avoid trying to build the C extensions
|
|
* Greatly improved testing and coverage
|
|
|
|
0.2.2 (2014-03-27)
|
|
++++++++++++++++++
|
|
|
|
* Added support for searching through BioPython Seq objects
|
|
* Added specialized search function allowing only subsitutions and insertions
|
|
* Fixed several bugs
|
|
|
|
0.2.1 (2014-03-14)
|
|
++++++++++++++++++
|
|
|
|
* Fixed major match grouping bug
|
|
|
|
0.2.0 (2013-03-13)
|
|
++++++++++++++++++
|
|
|
|
* New utility function ``find_near_matches()`` for easier use
|
|
* Additional documentation
|
|
|
|
0.1.0 (2013-11-12)
|
|
++++++++++++++++++
|
|
|
|
* Two working implementations
|
|
* Extensive test suite; all tests passing
|
|
* Full support for Python 2.6-2.7 and 3.1-3.3
|
|
* Bumped status from Pre-Alpha to Alpha
|
|
|
|
0.0.1 (2013-11-01)
|
|
++++++++++++++++++
|
|
|
|
* First release on PyPI.
|
|
|