Metadata-Version: 2.1 Name: fuzzysearch Version: 0.6.2 Summary: fuzzysearch is useful for finding approximate subsequence matches Home-page: https://github.com/taleinat/fuzzysearch Author: Tal Einat Author-email: taleinat@gmail.com License: MIT Keywords: fuzzysearch Platform: UNKNOWN Classifier: Development Status :: 3 - Alpha Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Dist: six =========== fuzzysearch =========== .. image:: https://img.shields.io/pypi/v/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Latest Version .. image:: https://img.shields.io/travis/taleinat/fuzzysearch.svg?branch=master :target: https://travis-ci.org/taleinat/fuzzysearch/branches :alt: Build & Tests Status .. image:: https://img.shields.io/coveralls/taleinat/fuzzysearch.svg?branch=master :target: https://coveralls.io/r/taleinat/fuzzysearch?branch=master :alt: Test Coverage .. image:: https://img.shields.io/pypi/dm/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Downloads .. image:: https://img.shields.io/pypi/wheel/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Wheels .. image:: https://img.shields.io/pypi/pyversions/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Supported Python versions .. image:: https://img.shields.io/pypi/implementation/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch :alt: Supported Python implementations .. image:: https://img.shields.io/pypi/l/fuzzysearch.svg?style=flat :target: https://pypi.python.org/pypi/fuzzysearch/ :alt: License **Easy fuzzy search that just works, fast!** .. code:: python >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1) [Match(start=3, end=9, dist=1)] * Approximate sub-string searches * A single, simple function to use * Chooses the fastest available search mechanism based on the given input * Uses the Levenshtein Distance metric with configurable parameters * Separately configure the max. allowed distance, substitutions, deletions and insertions * Advanced algorithms with optional C and Cython optimizations * Extensively tested * Free software: `MIT license `_ For more info, see the `documentation `_. Installation ------------ .. code:: $ pip install fuzzysearch This will work even if installing the C and Cython extensions fails, using pure-Python fallbacks. Usage ----- Just call ``find_near_matches()`` with the sub-sequence you're looking for, the sequence to search, and the matching parameters: .. code:: python >>> from fuzzysearch import find_near_matches # search for 'PATTERN' with a maximum Levenshtein Distance of 1 >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1) [Match(start=3, end=9, dist=1)] .. code:: python >>> sequence = '''\ GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG GGGATAGG''' >>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1 >>> find_near_matches(subsequence, sequence, max_l_dist=2) [Match(start=3, end=24, dist=1)] Matching Criteria ----------------- The search function supports four possible match criteria, which may be supplied in any combination: * maximum Levenshtein distance (*max_l_dist*) * maximum # of subsitutions * maximum # of deletions ("delete" = skip a character in the sub-sequence) * maximum # of insertions ("insert" = skip a character in the sequence) Not supplying a criterion means that there is no limit for it. For this reason, one must always supply *max_l_dist* and/or all other criteria. .. code:: python >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1) [Match(start=3, end=9, dist=1)] # this will not match since max-deletions is set to zero >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0) [] # note that a deletion + insertion may be combined to match a substution >>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0) [Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1 # ... but deletion + insertion may also match other, non-substitution differences >>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0) [Match(start=3, end=10, dist=2)] History ------- 0.6.1 (2018-12-08) ++++++++++++++++++ * Fixed some C compiler warnings for the C and Cython modules 0.6.0 (2018-12-07) ++++++++++++++++++ * Dropped support for Python versions 2.6, 3.2 and 3.3 * Added support and testing for Python 3.7 * Optimized the n-grams Levenshtein search for long sub-sequences * Further optimized the n-grams Levenshtein search * Cython versions of the optimized parts of the n-grams Levenshtein search 0.5.0 (2017-09-05) ++++++++++++++++++ * Fixed ``search_exact_byteslike()`` to support supplying start and end indexes * Added support for lists, tuples and other Sequence types to ``search_exact()`` * Fixed a bug where ``find_near_matches()`` could return a wrong ``Match.end`` with ``max_l_dist=0`` * Added more tests and improved some existing ones. 0.4.0 (2017-07-06) ++++++++++++++++++ * Added support and testing for Python 3.5 and 3.6 * Many small improvements to README, setup.py and CI testing 0.3.0 (2015-02-12) ++++++++++++++++++ * Added C extensions for several search functions as well as internal functions * Use C extensions if available, or pure-Python implementations otherwise * setup.py attempts to build C extensions, but installs without if build fails * Added ``--noexts`` setup.py option to avoid trying to build the C extensions * Greatly improved testing and coverage 0.2.2 (2014-03-27) ++++++++++++++++++ * Added support for searching through BioPython Seq objects * Added specialized search function allowing only subsitutions and insertions * Fixed several bugs 0.2.1 (2014-03-14) ++++++++++++++++++ * Fixed major match grouping bug 0.2.0 (2013-03-13) ++++++++++++++++++ * New utility function ``find_near_matches()`` for easier use * Additional documentation 0.1.0 (2013-11-12) ++++++++++++++++++ * Two working implementations * Extensive test suite; all tests passing * Full support for Python 2.6-2.7 and 3.1-3.3 * Bumped status from Pre-Alpha to Alpha 0.0.1 (2013-11-01) ++++++++++++++++++ * First release on PyPI.