# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""API for using the tf.data service.

This module contains:

1. tf.data server implementations for running the tf.data service.
2. APIs for registering datasets with the tf.data service and reading from
   the registered datasets.

The tf.data service provides the following benefits:

- Horizontal scaling of tf.data input pipeline processing to solve input
  bottlenecks.
- Data coordination for distributed training. Coordinated reads
  enable all replicas to train on similar-length examples across each global
  training step, improving step times in synchronous training.
- Dynamic balancing of data across training replicas.

>>> dispatcher = tf.data.experimental.service.DispatchServer()
>>> dispatcher_address = dispatcher.target.split("://")[1]
>>> worker = tf.data.experimental.service.WorkerServer(
...     tf.data.experimental.service.WorkerConfig(
...         dispatcher_address=dispatcher_address))
>>> dataset = tf.data.Dataset.range(10)
>>> dataset = dataset.apply(tf.data.experimental.service.distribute(
...     processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
...     service=dispatcher.target))
>>> print(list(dataset.as_numpy_iterator()))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Setup

This section goes over how to set up the tf.data service.

### Run tf.data servers

The tf.data service consists of one dispatch server and `n` worker servers.
tf.data servers should be brought up alongside your training jobs, then brought
down when the jobs are finished.
Use `tf.data.experimental.service.DispatchServer` to start a dispatch server,
and `tf.data.experimental.service.WorkerServer` to start worker servers. Servers
can be run in the same process for testing purposes, or scaled up on separate
machines.

See https://github.com/tensorflow/ecosystem/tree/master/data_service for an
example of using Google Kubernetes Engine (GKE) to manage the tf.data service.
Note that the server implementation in
[tf_std_data_server.py](https://github.com/tensorflow/ecosystem/blob/master/data_service/tf_std_data_server.py)
is not GKE-specific, and can be used to run the tf.data service in other
contexts.

### Custom ops

If your dataset uses custom ops, these ops need to be made available to tf.data
servers by calling
[load_op_library](https://www.tensorflow.org/api_docs/python/tf/load_op_library)
from the dispatcher and worker processes at startup.

## Usage

Users interact with tf.data service by programmatically registering their
datasets with tf.data service, then creating datasets that read from the
registered datasets. The
[register_dataset](https://www.tensorflow.org/api_docs/python/tf/data/experimental/service/register_dataset)
function registers a dataset, then the
[from_dataset_id](https://www.tensorflow.org/api_docs/python/tf/data/experimental/service/from_dataset_id)
function creates a new dataset which reads from the registered dataset.
The
[distribute](https://www.tensorflow.org/api_docs/python/tf/data/experimental/service/distribute)
function wraps `register_dataset` and `from_dataset_id` into a single convenient
transformation which registers its input dataset and then reads from it.
`distribute` enables tf.data service to be used with a one-line code change.
However, it assumes that the dataset is created and consumed by the same entity
and this assumption might not always be valid or desirable. In particular, in
certain scenarios, such as distributed training, it might be desirable to
decouple the creation and consumption of the dataset (via `register_dataset`
and `from_dataset_id` respectively) to avoid having to create the dataset on
each of the training workers.

### Example

#### `distribute`

To use the `distribute` transformation, apply the transformation after the
prefix of your input pipeline that you would like to be executed using tf.data
service (typically at the end).

```
dataset = ...  # Define your dataset here.
# Move dataset processing from the local machine to the tf.data service
dataset = dataset.apply(
    tf.data.experimental.service.distribute(
        processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
        service=FLAGS.tf_data_service_address,
        job_name="shared_job"))
# Any transformations added after `distribute` will be run on the local machine.
dataset = dataset.prefetch(1)
```

The above code will create a tf.data service "job", which iterates through the
dataset to generate data. To share the data from a job across multiple clients
(e.g. when using TPUStrategy or MultiWorkerMirroredStrategy), set a common
`job_name` across all clients.

#### `register_dataset` and `from_dataset_id`

`register_dataset` registers a dataset with the tf.data service, returning a
dataset id for the registered dataset. `from_dataset_id` creates a dataset that
reads from the registered dataset. These APIs can be used to reduce dataset
building time for distributed training. Instead of building the dataset on all
training workers, we can build the dataset just once and then register the
dataset using `register_dataset`. Then all workers can call `from_dataset_id`
without needing to build the dataset themselves.

```
dataset = ...  # Define your dataset here.
dataset_id = tf.data.experimental.service.register_dataset(
    service=FLAGS.tf_data_service_address,
    dataset=dataset)
# Use `from_dataset_id` to create per-worker datasets.
per_worker_datasets = {}
for worker in workers:
  per_worker_datasets[worker] = tf.data.experimental.service.from_dataset_id(
      processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
      service=FLAGS.tf_data_service_address,
      dataset_id=dataset_id,
      job_name="shared_job")
```

### Processing Modes

`processing_mode` specifies how to shard a dataset among tf.data service
workers. tf.data service supports `OFF`, `DYNAMIC`, `FILE`, `DATA`,
`FILE_OR_DATA`, `HINT` sharding policies.

OFF: No sharding will be performed. The entire input dataset will be processed
independently by each of the tf.data service workers. For this reason, it is
important to shuffle data (e.g. filenames) non-deterministically, so that each
worker will process the elements of the dataset in a different order. This mode
can be used to distribute datasets that aren't splittable.

If a worker is added or restarted during ShardingPolicy.OFF processing, the
worker will instantiate a new copy of the dataset and begin producing data from
the beginning.

#### Dynamic Sharding

DYNAMIC: In this mode, tf.data service divides the dataset into two components:
a source component that generates "splits" such as filenames, and a processing
component that takes splits and outputs dataset elements. The source component
is executed in a centralized fashion by the tf.data service dispatcher, which
generates different splits of input data. The processing component is executed
in a parallel fashion by the tf.data service workers, each operating on a
different set of input data splits.

For example, consider the following dataset:

```
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.interleave(TFRecordDataset)
dataset = dataset.map(preprocess_fn)
dataset = dataset.batch(batch_size)
dataset = dataset.apply(
    tf.data.experimental.service.distribute(
        processing_mode=tf.data.experimental.service.ShardingPolicy.DYNAMIC,
        ...))
```

The `from_tensor_slices` will be run on the dispatcher, while the `interleave`,
`map`, and `batch` will be run on tf.data service workers. The workers will pull
filenames from the dispatcher for processing. To process a dataset with
dynamic sharding, the dataset must have a splittable source, and all of
its transformations must be compatible with splitting. While most sources and
transformations support splitting, there are exceptions, such as custom datasets
which may not implement the splitting API. Please file a Github issue if you
would like to use distributed epoch processing for a currently unsupported
dataset source or transformation.

If no workers are restarted during training, dynamic sharding mode will visit
every example exactly once. If workers are restarted during training, the splits
they were processing will not be fully visited. The dispatcher maintains a
cursor through the dataset's splits. Assuming fault tolerance is enabled (See
"Fault Tolerance" below), the dispatcher will store cursor state in write-ahead
logs so that the cursor can be restored in case the dispatcher is restarted
mid-training. This provides an at-most-once visitation guarantee in the presence
of server restarts.

#### Static Sharding

The following are static sharding policies. The semantics are similar to
`tf.data.experimental.AutoShardPolicy`. These policies require:

  * The tf.data service cluster is configured with a fixed list of workers
    in DispatcherConfig.
  * Each client only reads from the local tf.data service worker.

If a worker is restarted while performing static sharding, the worker will
begin processing its shard again from the beginning.

FILE: Shards by input files (i.e. each worker will get a fixed set of files to
process). When this option is selected, make sure that there is at least as
many files as workers. If there are fewer input files than workers, a runtime
error will be raised.

DATA: Shards by elements produced by the dataset. Each worker will process the
whole dataset and discard the portion that is not for itself. Note that for
this mode to correctly partition the dataset elements, the dataset needs to
produce elements in a deterministic order.

FILE_OR_DATA: Attempts FILE-based sharding, falling back to DATA-based
sharding on failure.

HINT: Looks for the presence of `shard(SHARD_HINT, ...)` which is treated as a
placeholder to replace with `shard(num_workers, worker_index)`.

For backwards compatibility, `processing_mode` may also be set to the strings
`"parallel_epochs"` or `"distributed_epoch"`, which are respectively equivalent
to `ShardingPolicy.OFF` and `ShardingPolicy.DYNAMIC`.

### Coordinated Data Read

By default, when multiple consumers read from the same job, they receive data on
a first-come first-served basis. In some use cases, it is advantageous to
coordinate the consumers. At each step, consumers read data from the same
worker.

For example, the tf.data service can be used to coordinate example sizes across
a cluster during synchronous training, so that during each step all replicas
train on similar-sized elements. To achieve this, define a dataset which
generates rounds of `num_consumers` consecutive similar-sized batches, then
enable coordinated reads by setting `consumer_index` and `num_consumers`.

NOTE: To keep consumers in sync, coordinated reads require that the dataset have
infinite cardinality. You can get this by adding `.repeat()` at the end of the
dataset definition.

### Jobs

A tf.data service "job" refers to the process of reading from a dataset managed
by the tf.data service, using one or more data consumers. Jobs are created when
iterating over datasets that read from tf.data service. The data produced by a
job is determined by (1) dataset associated with the job and (2) the job's
processing mode. For example, if a job is created for the dataset
`Dataset.range(5)`, and the processing mode is `ShardingPolicy.OFF`, each
tf.data worker will produce the elements `{0, 1, 2, 3, 4}` for the job,
resulting in the
job producing `5 * num_workers` elements. If the processing mode is
`ShardingPolicy.DYNAMIC`, the job will only produce `5` elements.

One or more consumers can consume data from a job. By default, jobs are
"anonymous", meaning that only the consumer which created the job can read from
it. To share the output of a job across multiple consumers, you can set a common
`job_name`.

### Fault Tolerance

By default, the tf.data dispatch server stores its state in-memory, making it a
single point of failure during training. To avoid this, pass
`fault_tolerant_mode=True` when creating your `DispatchServer`. Dispatcher
fault tolerance requires `work_dir` to be configured and accessible from the
dispatcher both before and after restart (e.g. a GCS path). With fault tolerant
mode enabled, the dispatcher will journal its state to the work directory so
that no state is lost when the dispatcher is restarted.

WorkerServers may be freely restarted, added, or removed during training. At
startup, workers will register with the dispatcher and begin processing all
outstanding jobs from the beginning.

### Usage with tf.distribute

tf.distribute is the TensorFlow API for distributed training. There are
several ways to use tf.data with tf.distribute:
`strategy.experimental_distribute_dataset`,
`strategy.distribute_datasets_from_function`, and (for PSStrategy)
`coordinator.create_per_worker_dataset`. The following sections give code
examples for each.

In general we recommend using
`tf.data.experimental.service.{register_dataset,from_dataset_id}` over
`tf.data.experimental.service.distribute` for two reasons:

- The dataset only needs to be constructed and optimized once, instead of once
  per worker. This can significantly reduce startup time, because the current
  `experimental_distribute_dataset` and `distribute_datasets_from_function`
  implementations create and optimize worker datasets sequentially.
- If a dataset depends on lookup tables or variables that are only present on
  one host, the dataset needs to be registered from that host. Typically this
  only happens when resources are placed on the chief or worker 0. Registering
  the dataset from the chief will avoid issues with depending on remote
  resources.

#### strategy.experimental_distribute_dataset

Nothing special is required when using
`strategy.experimental_distribute_dataset`, just apply `register_dataset` and
`from_dataset_id` as above, making sure to specify a `job_name` so that all
workers consume from the same tf.data service job.

```
dataset = ...  # Define your dataset here.
dataset_id = tf.data.experimental.service.register_dataset(
    service=FLAGS.tf_data_service_address,
    dataset=dataset)
dataset = tf.data.experimental.service.from_dataset_id(
    processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
    service=FLAGS.tf_data_service_address,
    dataset_id=dataset_id,
    job_name="shared_job")

dataset = strategy.experimental_distribute_dataset(dataset)
```

#### strategy.distribute_datasets_from_function

First, make sure the dataset produced by the `dataset_fn` does not depend on the
`input_context` for the training worker on which it is run. Instead of each
worker building its own (sharded) dataset, one worker should register an
unsharded dataset, and the remaining workers should consume data from that
dataset.

```
dataset = dataset_fn()
dataset_id = tf.data.experimental.service.register_dataset(
    service=FLAGS.tf_data_service_address,
    dataset=dataset)

def new_dataset_fn(input_context):
  del input_context
  return tf.data.experimental.service.from_dataset_id(
      processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
      service=FLAGS.tf_data_service_address,
      dataset_id=dataset_id,
      job_name="shared_job")

dataset = strategy.distribute_datasets_from_function(new_dataset_fn)
```

#### coordinator.create_per_worker_dataset

`create_per_worker_dataset` works the same as
`distribute_datasets_from_function`.

```
dataset = dataset_fn()
dataset_id = tf.data.experimental.service.register_dataset(
    service=FLAGS.tf_data_service_address,
    dataset=dataset)

def new_dataset_fn(input_context):
  del input_context
  return tf.data.experimental.service.from_dataset_id(
      processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
      service=FLAGS.tf_data_service_address,
      dataset_id=dataset_id,
      job_name="shared_job")

dataset = coordinator.create_per_worker_dataset(new_dataset_fn)
```

### Sharing tf.data service with concurrent trainers

If you run multiple trainers concurrently using the same training data, it could
save resources to cache the data in one tf.data service cluster and share the
cluster with the trainers. For example, if you use Vizier to tune
hyperparameters, the Vizier jobs can run concurrently and share one tf.data
service cluster.

To enable this feature, each trainer needs to generate a unique trainer ID, and
you pass the trainer ID to `tf.data.experimental.service.distribute`. Once a job
has consumed data, the data remains in the cache and is re-used by jobs with
different `trainer_id`s. Requests with the same `trainer_id` do not re-use data.
For example:

```
dataset = expensive_computation()
dataset = dataset.apply(tf.data.experimental.service.distribute(
    processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,
    service=FLAGS.tf_data_service_address,
    job_name="job",
    cross_trainer_cache=data_service_ops.CrossTrainerCache(
        trainer_id=trainer_id())))
```

tf.data service uses a sliding-window cache to store shared data. When one
trainer consumes data, the data remains in the cache. When other trainers need
data, they can get data from the cache instead of repeating the expensive
computation. The cache has a bounded size, so some workers may not read the full
dataset. To ensure all the trainers get sufficient training data, we require the
input dataset to be infinite. This can be achieved, for example, by repeating
the dataset and performing random augmentation on the training instances.

## Limitations

- Python-based data processing: Datasets which use Python-based data processing
  (e.g. `tf.py_function`, `tf.numpy_function`, or
  `tf.data.Dataset.from_generator`) are currently not supported.
- Non-Serializable Resources: Datasets may only depend on TF resources that
  support serialization. Serialization is currently supported for lookup
  tables and variables. If your dataset depends on a TF resource that cannot be
  serialized, please file a Github issue.
- Remote Resources: If a dataset depends on a resource, the dataset must be
  registered from the same process that created the resource (e.g. the "chief"
  job of ParameterServerStrategy).
"""

from tensorflow.python.data.experimental.ops.data_service_ops import distribute
from tensorflow.python.data.experimental.ops.data_service_ops import from_dataset_id
from tensorflow.python.data.experimental.ops.data_service_ops import register_dataset
from tensorflow.python.data.experimental.ops.data_service_ops import ShardingPolicy
from tensorflow.python.data.experimental.service.server_lib import DispatcherConfig
from tensorflow.python.data.experimental.service.server_lib import DispatchServer
from tensorflow.python.data.experimental.service.server_lib import WorkerConfig
from tensorflow.python.data.experimental.service.server_lib import WorkerServer