327 lines
14 KiB
Plaintext
327 lines
14 KiB
Plaintext
Metadata-Version: 2.1
|
|
Name: tensorflow-io-gcs-filesystem
|
|
Version: 0.31.0
|
|
Summary: TensorFlow IO
|
|
Home-page: https://github.com/tensorflow/io
|
|
Download-URL: https://github.com/tensorflow/io/tags
|
|
Author: Google Inc.
|
|
Author-email: opensource@google.com
|
|
Project-URL: Source, https://github.com/tensorflow/io
|
|
Project-URL: Bug Reports, https://github.com/tensorflow/io/issues
|
|
Project-URL: Documentation, https://tensorflow.org/io
|
|
Keywords: tensorflow io machine learning
|
|
Classifier: Development Status :: 4 - Beta
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: Intended Audience :: Education
|
|
Classifier: Intended Audience :: Science/Research
|
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Python :: 3.8
|
|
Classifier: Programming Language :: Python :: 3.9
|
|
Classifier: Programming Language :: Python :: 3.10
|
|
Classifier: Programming Language :: Python :: 3.11
|
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
Classifier: Topic :: Scientific/Engineering
|
|
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
Classifier: Topic :: Software Development
|
|
Classifier: Topic :: Software Development :: Libraries
|
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
Requires-Python: >=3.7, <3.12
|
|
Description-Content-Type: text/markdown
|
|
License-File: LICENSE
|
|
Provides-Extra: tensorflow
|
|
Requires-Dist: tensorflow (<2.12.0,>=2.11.0) ; extra == 'tensorflow'
|
|
Provides-Extra: tensorflow-aarch64
|
|
Requires-Dist: tensorflow-aarch64 (<2.12.0,>=2.11.0) ; extra == 'tensorflow-aarch64'
|
|
Provides-Extra: tensorflow-cpu
|
|
Requires-Dist: tensorflow-cpu (<2.12.0,>=2.11.0) ; extra == 'tensorflow-cpu'
|
|
Provides-Extra: tensorflow-gpu
|
|
Requires-Dist: tensorflow-gpu (<2.12.0,>=2.11.0) ; extra == 'tensorflow-gpu'
|
|
Provides-Extra: tensorflow-rocm
|
|
Requires-Dist: tensorflow-rocm (<2.12.0,>=2.11.0) ; extra == 'tensorflow-rocm'
|
|
|
|
<div align="center">
|
|
<img src="https://github.com/tensorflow/community/blob/master/sigs/logos/SIGIO.png" width="60%"><br><br>
|
|
</div>
|
|
|
|
-----------------
|
|
|
|
# TensorFlow I/O
|
|
|
|
[![GitHub CI](https://github.com/tensorflow/io/workflows/GitHub%20CI/badge.svg?branch=master)](https://github.com/tensorflow/io/actions?query=branch%3Amaster)
|
|
[![PyPI](https://badge.fury.io/py/tensorflow-io.svg)](https://pypi.org/project/tensorflow-io/)
|
|
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/tensorflow/io/blob/master/LICENSE)
|
|
[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://www.tensorflow.org/io)
|
|
|
|
TensorFlow I/O is a collection of file systems and file formats that are not
|
|
available in TensorFlow's built-in support. A full list of supported file systems
|
|
and file formats by TensorFlow I/O can be found [here](https://www.tensorflow.org/io/api_docs/python/tfio).
|
|
|
|
The use of tensorflow-io is straightforward with keras. Below is an example
|
|
to [Get Started with TensorFlow](https://www.tensorflow.org/tutorials/quickstart/beginner) with
|
|
the data processing aspect replaced by tensorflow-io:
|
|
|
|
```python
|
|
import tensorflow as tf
|
|
import tensorflow_io as tfio
|
|
|
|
# Read the MNIST data into the IODataset.
|
|
dataset_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
|
|
d_train = tfio.IODataset.from_mnist(
|
|
dataset_url + "train-images-idx3-ubyte.gz",
|
|
dataset_url + "train-labels-idx1-ubyte.gz",
|
|
)
|
|
|
|
# Shuffle the elements of the dataset.
|
|
d_train = d_train.shuffle(buffer_size=1024)
|
|
|
|
# By default image data is uint8, so convert to float32 using map().
|
|
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))
|
|
|
|
# prepare batches the data just like any other tf.data.Dataset
|
|
d_train = d_train.batch(32)
|
|
|
|
# Build the model.
|
|
model = tf.keras.models.Sequential(
|
|
[
|
|
tf.keras.layers.Flatten(input_shape=(28, 28)),
|
|
tf.keras.layers.Dense(512, activation=tf.nn.relu),
|
|
tf.keras.layers.Dropout(0.2),
|
|
tf.keras.layers.Dense(10, activation=tf.nn.softmax),
|
|
]
|
|
)
|
|
|
|
# Compile the model.
|
|
model.compile(
|
|
optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
|
|
)
|
|
|
|
# Fit the model.
|
|
model.fit(d_train, epochs=5, steps_per_epoch=200)
|
|
```
|
|
|
|
In the above [MNIST](http://yann.lecun.com/exdb/mnist/) example, the URL's
|
|
to access the dataset files are passed directly to the `tfio.IODataset.from_mnist` API call.
|
|
This is due to the inherent support that `tensorflow-io` provides for `HTTP`/`HTTPS` file system,
|
|
thus eliminating the need for downloading and saving datasets on a local directory.
|
|
|
|
NOTE: Since `tensorflow-io` is able to detect and uncompress the MNIST dataset automatically if needed,
|
|
we can pass the URL's for the compressed files (gzip) to the API call as is.
|
|
|
|
Please check the official [documentation](https://www.tensorflow.org/io) for more
|
|
detailed and interesting usages of the package.
|
|
|
|
## Installation
|
|
|
|
### Python Package
|
|
|
|
The `tensorflow-io` Python package can be installed with pip directly using:
|
|
```sh
|
|
$ pip install tensorflow-io
|
|
```
|
|
|
|
People who are a little more adventurous can also try our nightly binaries:
|
|
```sh
|
|
$ pip install tensorflow-io-nightly
|
|
```
|
|
|
|
To ensure you have a version of TensorFlow that is compatible with TensorFlow-IO,
|
|
you can specify the `tensorflow` extra requirement during install:
|
|
|
|
```
|
|
pip install tensorflow-io[tensorflow]
|
|
```
|
|
|
|
Similar extras exist for the `tensorflow-gpu`, `tensorflow-cpu` and `tensorflow-rocm`
|
|
packages.
|
|
|
|
### Docker Images
|
|
|
|
In addition to the pip packages, the docker images can be used to quickly get started.
|
|
|
|
For stable builds:
|
|
```sh
|
|
$ docker pull tfsigio/tfio:latest
|
|
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest
|
|
```
|
|
|
|
For nightly builds:
|
|
```sh
|
|
$ docker pull tfsigio/tfio:nightly
|
|
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly
|
|
```
|
|
|
|
### R Package
|
|
|
|
Once the `tensorflow-io` Python package has been successfully installed, you
|
|
can install the development version of the R package from GitHub via the following:
|
|
```r
|
|
if (!require("remotes")) install.packages("remotes")
|
|
remotes::install_github("tensorflow/io", subdir = "R-package")
|
|
```
|
|
|
|
### TensorFlow Version Compatibility
|
|
|
|
To ensure compatibility with TensorFlow, it is recommended to install a matching
|
|
version of TensorFlow I/O according to the table below. You can find the list
|
|
of releases [here](https://github.com/tensorflow/io/releases).
|
|
|
|
| TensorFlow I/O Version | TensorFlow Compatibility | Release Date |
|
|
| --- | --- | --- |
|
|
| 0.31.0 | 2.11.x | Feb 25, 2022 |
|
|
| 0.30.0 | 2.11.x | Jan 20, 2022 |
|
|
| 0.29.0 | 2.11.x | Dec 18, 2022 |
|
|
| 0.28.0 | 2.11.x | Nov 21, 2022 |
|
|
| 0.27.0 | 2.10.x | Sep 08, 2022 |
|
|
| 0.26.0 | 2.9.x | May 17, 2022 |
|
|
| 0.25.0 | 2.8.x | Apr 19, 2022 |
|
|
| 0.24.0 | 2.8.x | Feb 04, 2022 |
|
|
| 0.23.1 | 2.7.x | Dec 15, 2021 |
|
|
| 0.23.0 | 2.7.x | Dec 14, 2021 |
|
|
| 0.22.0 | 2.7.x | Nov 10, 2021 |
|
|
| 0.21.0 | 2.6.x | Sep 12, 2021 |
|
|
| 0.20.0 | 2.6.x | Aug 11, 2021 |
|
|
| 0.19.1 | 2.5.x | Jul 25, 2021 |
|
|
| 0.19.0 | 2.5.x | Jun 25, 2021 |
|
|
| 0.18.0 | 2.5.x | May 13, 2021 |
|
|
| 0.17.1 | 2.4.x | Apr 16, 2021 |
|
|
| 0.17.0 | 2.4.x | Dec 14, 2020 |
|
|
| 0.16.0 | 2.3.x | Oct 23, 2020 |
|
|
| 0.15.0 | 2.3.x | Aug 03, 2020 |
|
|
| 0.14.0 | 2.2.x | Jul 08, 2020 |
|
|
| 0.13.0 | 2.2.x | May 10, 2020 |
|
|
| 0.12.0 | 2.1.x | Feb 28, 2020 |
|
|
| 0.11.0 | 2.1.x | Jan 10, 2020 |
|
|
| 0.10.0 | 2.0.x | Dec 05, 2019 |
|
|
| 0.9.1 | 2.0.x | Nov 15, 2019 |
|
|
| 0.9.0 | 2.0.x | Oct 18, 2019 |
|
|
| 0.8.1 | 1.15.x | Nov 15, 2019 |
|
|
| 0.8.0 | 1.15.x | Oct 17, 2019 |
|
|
| 0.7.2 | 1.14.x | Nov 15, 2019 |
|
|
| 0.7.1 | 1.14.x | Oct 18, 2019 |
|
|
| 0.7.0 | 1.14.x | Jul 14, 2019 |
|
|
| 0.6.0 | 1.13.x | May 29, 2019 |
|
|
| 0.5.0 | 1.13.x | Apr 12, 2019 |
|
|
| 0.4.0 | 1.13.x | Mar 01, 2019 |
|
|
| 0.3.0 | 1.12.0 | Feb 15, 2019 |
|
|
| 0.2.0 | 1.12.0 | Jan 29, 2019 |
|
|
| 0.1.0 | 1.12.0 | Dec 16, 2018 |
|
|
|
|
|
|
## Performance Benchmarking
|
|
|
|
We use [github-pages](https://tensorflow.github.io/io/dev/bench/) to document the results of API performance benchmarks. The benchmark job is triggered on every commit to `master` branch and
|
|
facilitates tracking performance w.r.t commits.
|
|
|
|
## Contributing
|
|
|
|
Tensorflow I/O is a community led open source project. As such, the project
|
|
depends on public contributions, bug-fixes, and documentation. Please see:
|
|
|
|
- [contribution guidelines](CONTRIBUTING.md) for a guide on how to contribute.
|
|
- [development doc](docs/development.md) for instructions on the development environment setup.
|
|
- [tutorials](docs/tutorials) for a list of tutorial notebooks and instructions on how to write one.
|
|
|
|
### Build Status and CI
|
|
|
|
| Build | Status |
|
|
| --- | --- |
|
|
| Linux CPU Python 2 | [![Status](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-py2.svg)](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-py2.html) |
|
|
| Linux CPU Python 3 | [![Status](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-py3.svg)](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-py3.html) |
|
|
| Linux GPU Python 2| [![Status](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-gpu-py2.svg)](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-gpu-py2.html) |
|
|
| Linux GPU Python 3| [![Status](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-gpu-py3.svg)](https://storage.googleapis.com/tensorflow-kokoro-build-badges/io/ubuntu-gpu-py3.html) |
|
|
|
|
Because of manylinux2010 requirement, TensorFlow I/O is built with
|
|
Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration
|
|
with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward.
|
|
If the system have docker installed, then the following command
|
|
will automatically build manylinux2010 compatible whl package:
|
|
|
|
```sh
|
|
#!/usr/bin/env bash
|
|
|
|
ls dist/*
|
|
for f in dist/*.whl; do
|
|
docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
|
|
done
|
|
sudo chown -R $(id -nu):$(id -ng) .
|
|
ls wheelhouse/*
|
|
```
|
|
|
|
It takes some time to build, but once complete, there will be python
|
|
`3.5`, `3.6`, `3.7` compatible whl packages available in `wheelhouse`
|
|
directory.
|
|
|
|
On macOS, the same command could be used. However, the script expects `python` in shell
|
|
and will only generate a whl package that matches the version of `python` in shell. If
|
|
you want to build a whl package for a specific python then you have to alias this version
|
|
of python to `python` in shell. See [.github/workflows/build.yml](.github/workflows/build.yml)
|
|
Auditwheel step for instructions how to do that.
|
|
|
|
Note the above command is also the command we use when releasing packages for Linux and macOS.
|
|
|
|
TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration.
|
|
GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test.
|
|
Again, because of the manylinux2010 requirement, on Linux whl packages are always
|
|
built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems
|
|
with different python3 versions to ensure a good coverage:
|
|
|
|
| Python | Ubuntu 18.04| Ubuntu 20.04 | macOS + osx9 | Windows-2019 |
|
|
| ------- | ----- | ------- | ------- | --------- |
|
|
| 2.7 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | N/A |
|
|
| 3.7 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
|
|
| 3.8 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
|
|
|
|
|
|
TensorFlow I/O has integrations with many systems and cloud vendors such as
|
|
Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis,
|
|
Microsoft Azure Storage, Alibaba Cloud OSS etc.
|
|
|
|
We tried our best to test against those systems in our continuous integration
|
|
whenever possible. Some tests such as Prometheus, Kafka, and Ignite
|
|
are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before
|
|
the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done
|
|
through official or non-official emulators. Offline tests are also performed whenever
|
|
possible, though systems covered through offine tests may not have the same
|
|
level of coverage as live systems or emulators.
|
|
|
|
|
|
| | Live System | Emulator| CI Integration | Offline |
|
|
| ------- | ----- | ----- | ----- | ----- |
|
|
| Apache Kafka | :heavy_check_mark: | | :heavy_check_mark:| |
|
|
| Apache Ignite | :heavy_check_mark: | |:heavy_check_mark:| |
|
|
| Prometheus | :heavy_check_mark: | |:heavy_check_mark:| |
|
|
| Google PubSub | | :heavy_check_mark: |:heavy_check_mark:| |
|
|
| Azure Storage | | :heavy_check_mark: |:heavy_check_mark:| |
|
|
| AWS Kinesis | | :heavy_check_mark: |:heavy_check_mark:| |
|
|
| Alibaba Cloud OSS | | | | :heavy_check_mark: |
|
|
| Google BigTable/BigQuery | | to be added | | |
|
|
| Elasticsearch (experimental) | :heavy_check_mark: | |:heavy_check_mark:| |
|
|
| MongoDB (experimental) | :heavy_check_mark: | |:heavy_check_mark:| |
|
|
|
|
|
|
References for emulators:
|
|
- Official [PubSub Emulator](https://cloud.google.com/sdk/gcloud/reference/beta/emulators/pubsub/) by Google Cloud for Cloud PubSub.
|
|
- Official [Azurite Emulator](https://github.com/Azure/Azurite) by Azure for Azure Storage.
|
|
- None-official [LocalStack emulator](https://github.com/localstack/localstack) by LocalStack for AWS Kinesis.
|
|
|
|
|
|
## Community
|
|
|
|
* SIG IO [Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/io) and mailing list: [io@tensorflow.org](io@tensorflow.org)
|
|
* SIG IO [Monthly Meeting Notes](https://docs.google.com/document/d/1CB51yJxns5WA4Ylv89D-a5qReiGTC0GYum6DU-9nKGo/edit)
|
|
* Gitter room: [tensorflow/sig-io](https://gitter.im/tensorflow/sig-io)
|
|
|
|
## Additional Information
|
|
|
|
* [Streaming Machine Learning with Tiered Storage and Without a Data Lake](https://www.confluent.io/blog/streaming-machine-learning-with-tiered-storage/) - [Kai Waehner](https://github.com/kaiwaehner)
|
|
* [TensorFlow with Apache Arrow Datasets](https://medium.com/tensorflow/tensorflow-with-apache-arrow-datasets-cdbcfe80a59f) - [Bryan Cutler](https://github.com/BryanCutler)
|
|
* [How to build a custom Dataset for Tensorflow](https://towardsdatascience.com/how-to-build-a-custom-dataset-for-tensorflow-1fe3967544d8) - [Ivelin Ivanov](https://github.com/ivelin)
|
|
* [TensorFlow on Apache Ignite](https://medium.com/tensorflow/tensorflow-on-apache-ignite-99f1fc60efeb) - [Anton Dmitriev](https://github.com/dmitrievanthony)
|
|
|
|
## License
|
|
|
|
[Apache License 2.0](LICENSE)
|