Tutorial-TensorFlowServing

Playing with TensorFlow Serving

About TensorFlow Serving (TFS)

TFS is a flexible, high-performance serving system for machine learning models, designed for production environments. TFS makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TFS provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

Key Concepts

Servables

Servables are the central abstraction in TFS. They are the underlying objects that clients use to perform computation. The size of a Servable is flexible - can be of any type and interface. Servables do not manage their own lifecycle (managed by Managers/Loaders)

Typical servables include...

a TF SavedModelBundle (tensorflow::Session)
a lookup table for embedding or vocab lookups

Servable Versions

TFS can handle one or more versions of a servable over the lifetime of a single server instance --> can load new algo configs and data over time. More than 1 versions could be loaded concurrently to support gradual rollout and exp.

Servable Streams

The sequence of versions of a servable, sorted by increasing version #.

Models

TFS represents a model as one or more servables. A ML model may include one or more algos (including learned weights) and lookup or embedding tables.

Can represent a composite model as either

multiple independent servables
single composite servable

A servable may also correspond to a fraction of a model. (e.g. a large lookup table sharded across multiple TFS)

Loaders

Manage servable's life cycle. Loaders standardize the APIs for loading and unloading a servable (independent from specific learning algos, data, or product use-case).

Sources

Plugin modules that find and provide servables. Each source provides zero or more servable streams.

Aspired Versions

Represent the set of servable versions that should be loaded and ready. Sources communicate this set of servable versions for a single servable stream at a time.

Managers

Handle the full lifecycle of Servables - loading, serving, unloading. Managers listen to Sources and track all versions. Provide a simple, narrow interface - GetServableHandle() - for clients to access loaded servable instances.

Core

Manages (via standard TFS APIs) the servable lifecycles and metrics. Treats servables and loaders as opaque objects.

Example

For example, say a Source represents a TensorFlow graph with frequently updated model weights. The weights are stored in a file on disk.

The Source detects a new version of the model weights. It creates a Loader that contains a pointer to the model data on disk.
The Source notifies the Dynamic Manager of the Aspired Version.
The Dynamic Manager applies the Version Policy and decides to load the new version.
The Dynamic Manager tells the Loader that there is enough memory. The Loader instantiates the TensorFlow graph with the new weights.
A client requests a handle to the latest version of the model, and the Dynamic Manager returns a handle to the new version of the Servable.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
basic_tutorial		basic_tutorial
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tutorial-TensorFlowServing

About TensorFlow Serving (TFS)

Key Concepts

Servables

Servable Versions

Servable Streams

Models

Loaders

Sources

Aspired Versions

Managers

Core

Example

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

glenchao2/Tutorial-TensorFlowServing

Folders and files

Latest commit

History

Repository files navigation

Tutorial-TensorFlowServing

About TensorFlow Serving (TFS)

Key Concepts

Servables

Servable Versions

Servable Streams

Models

Loaders

Sources

Aspired Versions

Managers

Core

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages