You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This document show how a training example passes through the T2T pipeline,
13
-
and how all its parts are connected to work together.
12
+
This doc explains how a training example flows through T2T, from data generation
13
+
to training, evaluation, and decoding. It points out the various hooks available
14
+
in the `Problem` and `T2TModel` classes and gives an overview of the T2T code
15
+
(key functions, files, hyperparameters, etc.).
14
16
15
-
## The Life of an Example
17
+
Some key files and their functions:
16
18
17
-
A training example passes the following stages in T2T:
18
-
* raw input (text from command line or file)
19
-
* encoded input after [Problem.feature_encoder](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L173) function `encode` is usually a sparse tensor, e.g., a vector of `tf.int32`s
20
-
* batched input after [data input pipeline](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/data_reader.py#L242) where the inputs, after [Problem.preprocess_examples](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L188) are grouped by their length and made into batches.
21
-
* dense input after being processed by a [Modality](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/modality.py#L30) function `bottom`.
22
-
* dense output after [T2T.model_fn_body](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/t2t_model.py#L542)
23
-
* back to sparse output through [Modality](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/modality.py#L30) function `top`.
24
-
* if decoding, back through [Problem.feature_encoder](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L173) function `decode` to display on the screen.
The `t2t-datagen` binary is the entrypoint for data generation. It simply looks
35
+
up the `Problem` specified by `--problem` and calls
36
+
`Problem.generate_data(data_dir, tmp_dir)`.
29
37
30
-
TODO: describe [Problem.feature_encoder](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L173) which is a dict of encoders that have `encode` and `decode` functions.
38
+
All `Problem`s are expected to generate 2 sharded `TFRecords` files - 1 for
39
+
training and 1 for evaluation - with `tensorflow.Example` protocol buffers. The
40
+
expected names of the files are given by `Problem.{training, dev}_filepaths`.
41
+
Typically, the features in the `Example` will be `"inputs"` and `"targets"`;
42
+
however, some tasks have a different on-disk representation that is converted to
43
+
`"inputs"` and `"targets"` online in the input pipeline (e.g. image features are
44
+
typically stored with features `"image/encoded"` and `"image/format"` and the
45
+
decoding happens in the input pipeline).
31
46
32
-
## Modalities
47
+
For tasks that require a vocabulary, this is also the point at which the
48
+
vocabulary is generated and all examples are encoded.
33
49
34
-
TODO: describe [Modality](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/modality.py#L30) which has `bottom` and `top` but also sharded versions and one for targets.
0 commit comments