#!/usr/bin/env python # encoding: utf-8 # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Module documentation ''' Tika Python module provides Python API client to Aapche Tika Server. **Example usage**:: import tika from tika import parser parsed = parser.from_file('/path/to/file') print(parsed["metadata"]) print(parsed["content"]) Visit https://github.com/chrismattmann/tika-python to learn more about it. **Detect IANA MIME Type**:: from tika import detector print(detector.from_file('/path/to/file')) **Detect Language**:: from tika import language print(language.from_file('/path/to/file')) **Use Tika Translate**:: from tika import translate print(translate.from_file('/path/to/file', 'srcLang', 'destLang') # Use auto Language detection feature print(translate.from_file('/path/to/file', 'destLang') ***Tika-Python Configuration*** You can now use custom configuration files. See https://tika.apache.org/1.18/configuring.html for details on writing configuration files. Configuration is set the first time the server is started. To use a configuration file with a parser, or detector: parsed = parser.from_file('/path/to/file', config_path='/path/to/configfile') or: detected = detector.from_file('/path/to/file', config_path='/path/to/configfile') or: detected = detector.from_buffer('some buffered content', config_path='/path/to/configfile') ''' USAGE = """ tika.py [-v] [-e] [-o ] [--server ] [--install ] [--port ]