Metadata-Version: 2.1 Name: python-mnist Version: 0.7 Summary: Simple MNIST and EMNIST data parser written in pure Python Home-page: https://github.com/sorki/python-mnist Author: Richard Marko Author-email: srk@48.io License: BSD Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python python-mnist ============ Simple MNIST and EMNIST data parser written in pure Python. MNIST is a database of handwritten digits available on http://yann.lecun.com/exdb/mnist/. EMNIST is an extended MNIST database https://www.nist.gov/itl/iad/image-group/emnist-dataset. Requirements ------------ - Python 2 or Python 3 Usage ----- - ``git clone https://github.com/sorki/python-mnist`` - ``cd python-mnist`` - Get MNIST data: :: ./bin/mnist_get_data.sh - Check preview with: :: PYTHONPATH=. ./bin/mnist_preview Installation ------------ Get the package from PyPi: :: pip install python-mnist or install with ``setup.py``: :: python setup.py install Code sample: :: from mnist import MNIST mndata = MNIST('./dir_with_mnist_data_files') images, labels = mndata.load_training() To enable loading of gzip-ed files use: :: mndata.gz = True Library tries to load files named t10k-images-idx3-ubyte train-labels-idx1-ubyte train-images-idx3-ubyte and t10k-labels-idx1-ubyte. If loading throws an exception check if these names match. EMNIST ------ - Get EMNIST data: :: ./bin/emnist_get_data.sh - Check preview with: :: PYTHONPATH=. ./bin/emnist_preview To use EMNIST datasets you need to call: :: mndata.select_emnist('digits') Where digits is one of the available EMNIST datasets. You can choose from - balanced - byclass - bymerge - digits - letters - mnist EMNIST loader uses gziped files by default, this can be disabled by by setting: :: mndata.gz = False You also need to unpack EMNIST files as bin/emnist_get_data.sh script won't do it for you. EMNIST loader also needs to mirror and rotate images so it is a bit slower (If this is an issue for you, you should repack the data to avoid mirroring and rotation on each load). Notes ----- This package doesn't use numpy by design as when I've tried to find a working implementation all of them were based on some archaic version of numpy and none of them worked. This loads data files with struct.unpack instead. Example ------- :: $ PYTHONPATH=. ./bin/mnist_preview Showing num: 3 ............................ ............................ ............................ ............................ ............................ ............................ .............@@@@@.......... ..........@@@@@@@@@@........ .......@@@@@@......@@....... .......@@@........@@@....... .................@@......... ................@@@......... ...............@@@@@........ .............@@@............ .............@.......@...... .....................@...... .....................@@..... ....................@@...... ...................@@@...... .................@@@@....... ................@@@@........ ....@........@@@@@.......... ....@@@@@@@@@@@@............ ......@@@@@@................ ............................ ............................ ............................ ............................