diff --git a/review/pr-458/QAList.html b/review/pr-458/QAList.html deleted file mode 100644 index 9c0cb8858a..0000000000 --- a/review/pr-458/QAList.html +++ /dev/null @@ -1,355 +0,0 @@ - - - - - - - Questions and Answers — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Questions and Answers

-
-

1. Who are the target users of HugeCTR?

-

We are trying to provide a recommender specific framework to users from various industries, who need high-efficient solutions for their online/offline CTR training. -HugeCTR is also a reference design for framework developers who want to port their CPU solutions to GPU or optimize their current GPU solutions.

-
-
-

2. Which models can be supported in HugeCTR?

-

HugeCTR v2.2 supports DNN, WDL, DCN, DeepFM, DLRM and their variants, which are widely used in industrial recommender systems. -Refer to the samples directory in the HugeCTR repository on GitHub to try them with HugeCTR. -HugeCTR’s expressiveness is not confined to the aforementioned models. -You can construct your own models by combining the layers supported by HugeCTR.

-
-
-

3. Does HugeCTR support TensorFlow?

-

HugeCTR v2.2 has no TF interface yet, but a HugeCTR Trained model is compatible with TensorFlow. -We recommend that you export a trained model to TensorFlow for inference by following the instructions in dump_to_tf tutorial that is in the tutorial directory of the HugeCTR repository on GitHub.

-
-
-

4. Does HugeCTR support multiple nodes CTR training?

-

Yes. HugeCTR supports single-GPU, multi-GPU and multi-node training. Check out samples/dcn2node for more details.

-
-
-

5. How to deal with the huge embedding table that cannot be stored in a single GPU memory?

-

Embedding table in HugeCTR is model-parallel stored across GPUs and nodes. So if you have a very large embedding table, just use as many GPUs as you need to store it. That’s why we have the name “HugeCTR”. Suppose you have 1TB embedding table and 16xV100-32GB in a GPU server node, you can take 2 nodes for such a case.

-
-
-

6. Which GPUs are supported in HugeCTR?

-

HugeCTR supports GPUs with Compute Compatibility > 7.0 such as V100, T4 and A100.

-
-
-

7. Must we use the DGX family such as DGX A100 to run HugeCTR?

-

A DGX machine is not mandatory but recommended to achieve the best performance by exploiting NVSwitch’s high inter-GPU bandwidth.

-
-
-

8. Can HugeCTR run without InfiniBand?

-

For multi-node training, InfiniBand is recommended but not required. You can use any solution with UCX support. -However, InfiniBand with GPU RDMA support will maximize performance of inter-node transactions.

-
-
-

9. Is there any requirement of CPU configuration for HugeCTR execution?

-

HugeCTR’s approach is to offload the computational workloads to GPUs with the memory operations overlapped with them. -So HugeCTR performance is mainly decided by what kinds of GPUs and I/O devices are used.

-
-
-

10. What is the specific format of files as input in HugeCTR?

-

We have specific file format support. -Refer to the Dataset formats section of the Python API documentation.

-
-
-

11. Does HugeCTR support Python interface?

-

Yes we introduced our first version of Python interface. -Check out our example notebooks and Python API documentation.

-
-
-

12. Does HugeCTR do synchronous training with multiple GPUs (and nodes)? Otherwise, does it do asynchronous training?

-

HugeCTR only supports synchronous training.

-
-
-

13. Does HugeCTR support stream training?

-

Yes, hashtable based embedding in HugeCTR supports dynamic insertion, which is designed for stream training. New features can be added into embedding in runtime. -HugeCTR also supports data check. Error data will be skipped in training.

-
-
-

14. What is a “slot” in HugeCTR?

-

In HugeCTR, a slot is a feature field or table. -The features in a slot can be one-hot or multi-hot. -The number of features in different slots can be various. -You can specify the number of slots (slot_num) in the data layer of your configuration file.

-
-
-

15. What are the differences between LocalizedSlotEmbedding and DistributedSlotEmbedding?

-

There are two sub-classes of Embedding layer, LocalizedSlotEmbedding and DistributedSlotEmbedding. -They are distinguished by different methods of distributing embedding tables on multiple GPUs as model parallelism. -For LocalizedSlotEmbedding, the features in the same slot will be stored in one GPU (that is why we call it “localized slot”), and different slots may be stored in different GPUs according to the index number of the slot. -For DistributedSlotEmbedding, all the features are distributed to different GPUs according to the index number of the feature, regardless of the index number of the slot. -That means the features in the same slot may be stored in different GPUs (that is why we call it “distributed slot”).

-

Thus LocalizedSlotEmbedding is optimized for the case each embedding is smaller than the memory size of GPU. As local reduction per slot is used in LocalizedSlotEmbedding and no global reduce between GPUs, the overall data transaction in Embedding is much less than DistributedSlotEmbedding. DistributedSlotEmbedding is made for the case some of the embeddings are larger than the memory size of GPU. As global reduction is required. DistributedSlotEmbedding has much more memory trasactions between GPUs.

-
-
-

16. For multi-node,is DataReader required to read the same batch of data on each node for each step?

-

Yes, each node in training will read the same data in each iteration.

-
-
-

17. As model parallelism in embedding layers, how does it get all the embedding lookup features from multi-node / multi-gpu?

-

After embedding lookup, the embedding features in one slot need to be combined (or reduced) into one embedding vector. -There are 2 steps:

-
    -
  1. local reduction in single GPU in forward kernel function;

  2. -
  3. global reduction across multi-node / multi-gpu by collective communications libraries such as NCCL.

  4. -
-
-
-

18. How to set data clauses, if there are two embeddings needed?

-

There should only be one source where the “sparse” is an array. Suppose there are 26 features (slots), first 13 features belong to the first embedding and the last 13 features belong to the second embedding, you can have two elements in “sparse” array as below:

-
"sparse": [
-{
- "top": "data1",
- "type": "DistributedSlot",
- "max_feature_num_per_sample": 30,
- "slot_num": 13
-},
-{
- "top": "data2",
- "type": "DistributedSlot",
- "max_feature_num_per_sample": 30,
- "slot_num": 13
-}
-]
-
-
-
-
-

19. How to save and load models in HugeCTR?

-

In HugeCTR, the model is saved in binary raw format. For model saving, you can set the “snapshot” in .json file to set the intervals of saving a checkpoint in file with the prefix of “snapshot_prefix”; For model loading, you can just modify the “dense_model_file”, “sparse_model_file” in .json file (in solver clause) according to the name of the snapshot.

-
-
-

20. Could the post training model from HugeCTR be imported into other frameworks such as TensorFlow for inference deployment?

-

Yes. The training model in HugeCTR is saved in raw format, and you can import it to other frameworks by writing some scripts. -We provide a tutorial to demonstrate how to import the HugeCTR trained model to TensorFlow. -Refer to the dump_to_tf tutorial in the tutorial directory of the HugeCTR repository on GitHub.

-
-
-

21. Does HugeCTR support overlap between different slots?

-

Features in different slots must be unique (no overlap). You may want to preprocess the data if you have overlaps e.g. offset or use hash function.

-
-
-

22. What if there’s no value in a slot?

-

nnz=0 is supported in HugeCTR input. That means no features will be looked up.

-
-
-

23. How can I benchmark my network?

-

Firstly, you should construct your own configure file. You can refer to our User Guide and samples. -Secondly, using our data_generator to generate a random dataset. See the Getting Started section of the HugeCTR repository README for an example. -Thirdly, run with ./huge_ctr --train ./your_config.json

-
-
-

24. How to set workspace_size_per_gpu_in_mb and slot_size_array?

-

As embeddings are model-parallel in HugeCTR, -workspace_size_per_gpu_in_mb is a reference number for HugeCTR to allocate GPU memory accordingly and not necessarily the exact number of features in your dataset. -The value to set depends on the vocabulary size per GPU, the embedding vector size, and the optimizer type. -Refer to the embedding workspace calculator in the tools directory of the HugeCTR repository on GitHub. -Use the calculator to calculate the vocabulary size per GPU and workspace_size per GPU for different embedding types, embedding vector size, and optimizer type.

-

In practice, we usually set the values larger than the real size because of the non-uniform distribution of the keys.

-

The slot_size_array argument has 2 usages. -You can use it as a replacement for workspace_size_per_gpu_in_mb to avoid wasting memory that is caused by imbalanced vocabulary size. -You can also use it as a reference to add an offset for keys in a different slot.

-

When you specify slot_size_array to avoid wasting memory, set each value in the array to the maximum key value in each slot. -Refer to the following equation:

-

\(slot\_size\_array[k] = \max\limits_i slot^k_i + 1\)

-

Refer to the following list for information about the relation between embedding type, the workspace_size_per_gpu_in_mb value, and the slot_size_array value:

-
    -
  • For DistributedSlotEmbedding, specify a value for workspace_size_per_gpu_in_mb. -The slot_size_array argument is not needed. -Each GPU allocates the same amount of memory for the embedding table usage.

  • -
  • For LocalizedSlotSparseEmbeddingHash, specify only one of workspace_size_per_gpu_in_mb or slot_size_array. -If you can provide the exact size for each slot, we recommend that you specify slot_size_array. -The slot_size_array argument can help avoid wasting memory that is caused by an imbalanced vocabulary size. -As an alternative, you can specify workspace_size_per_gpu_in_mb so that each GPU allocates the same amount of memory for the embedding table usage. -If you specify both slot_size_array and workspace_size_per_gpu_in_mb, HugeCTR uses slot_size_array for LocalizedSlotSparseEmbeddingHash.

  • -
  • For LocalizedSlotSparseEmbeddingOneHot, you must specify slot_size_array. -The slot_size_array argument is used for allocating memory and adding an offset for each slot.

  • -
  • For HybridSparseEmbedding, specify both workspace_size_per_gpu_in_mb and slot_size_array. -The workspace_size_per_gpu_in_md argument is used for allocating memory while slot_size_array is used for adding an offset for the embedding table usage.

  • -
-
- -
-

26. Is DGX the only GPU server that is required in HugeCTR?

-

DGX is not required, but recommended, because the performance of CTR training highly relies on the performance of inter-GPUs transactions. DGX has NVLink and NVSwitch inside, so that you can expect 150GB/s per direction per GPU. It’s 9.3x to PCI-E 3.0.

-
-
-

27. Can HugeCTR run without InfiniBand?

-

For multi-node training, InfiniBand is recommended but not required. You can use any solution with UCX support. InfiniBand with GPU RDMA support will maximize performance of inter-node transactions.

-
-
-

28. Does HugeCTR support loading pretrained embeddings in other formats?

-

You can convert the pretrained embeddings to the HugeCTR sparse models and then load them to facilitate the training process. You can refer to save_params_to_files to get familiar with the HugeCTR sparse model format. We demonstrate the usage in 3.4 Load Pre-trained Embeddings of hugectr_criteo.ipynb.

-
-
-

29. How to construct the model graph with branch topology in HugeCTR?

-

The branch topology is inherently supported by HugeCTR model graph, and extra layers are abstracted away in HugeCTR Python Interface. -Refer to the Slice Layer for information about model graphs with branches and sample code.

-
-
-

30. What is the good practice of configuring the embedding vector size?

-

The embedding vector size is related to the size of Cooperative Thread Array (CTA) for HugeCTR kernel launching, so first and foremost it should not exceed the maximum number of threads per block. It would be better that it is configured to a multiple of the warp size for the sake of occupancy. Still, you can set the embedding vector size freely according to the specific model architecture as long as it complies with the limit.

-
-
-

31. How to resolve the bus error when running HugeCTR samples and notebooks?

-

HugeCTR uses NCCL to share data between ranks, and NCCL may requires shared memory for IPC and pinned (page-locked) system memory resources. The bus error is related to the limited resources and can be resolved by issuing the following options in the docker run command.

-
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864
-
-
-
-
-

32. What does the log “memory pool is empty” imply for HugeCTR inference?

-

HugeCTR inference leverages Hirarchical Parameter Server, which combines a high-performance GPU embedding cache with a hierarchical storage architecture encompassing different types of database backends. Each iteration of GPU embedding cache lookup and update requires an workspace which is pre-allocated and managed by a memory pool. The memory pool can be exhausted when asynchronous update of embedding cache is constantly triggered. In this case, there will be the message “memory pool is empty” in the log.

-

If you do not want this scenario, you can either:

-
    -
  • Enforce the synchronous mode for embedding cache update by configuring hit_rate_threshold as 1.0

  • -
  • Extend the memory pool by configuring a large enough number_of_worker_buffers_in_pool

  • -
-

For more information, please refer to Embedding Cache Asynchronous Insertion and HPS Configuration.

-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/_downloads/11997758ab318be2f9b25bc08768d77f/spark2json.py b/review/pr-458/_downloads/11997758ab318be2f9b25bc08768d77f/spark2json.py deleted file mode 100644 index f0000aee5e..0000000000 --- a/review/pr-458/_downloads/11997758ab318be2f9b25bc08768d77f/spark2json.py +++ /dev/null @@ -1,157 +0,0 @@ -#!/usr/bin/env python3 - -# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import argparse -import json -import logging -import os -import pathlib - -import tensorflow as tf -import numpy as np - -# method from PEP-366 to support relative import in executed modules -if __package__ is None: - __package__ = pathlib.Path(__file__).parent.name - -import dataloading.feature_spec -from dataloading.dataloader import create_input_pipelines - -LOGGER = logging.getLogger("run_performance_on_triton") - -def create_input_data_hps(batch_sizes, dataset_path, dataset_type, result_path, feature_spec, - total_benchmark_samples, fused_embedding): - - input_data = {} - for batch_size in batch_sizes: - filename = os.path.join(result_path, str(batch_size) + ".json") - print("generating input data: ", filename) - shapes = create_input_data_hps_batch(batch_size=batch_size, dst_path=filename, dataset_path=dataset_path, - dataset_type=dataset_type, feature_spec=feature_spec, - total_benchmark_samples=total_benchmark_samples, - fused_embedding=fused_embedding) - input_data[batch_size] = (filename, shapes) - return input_data - - -def create_input_data_hps_batch(batch_size, dst_path, dataset_path, dataset_type, feature_spec, - total_benchmark_samples, fused_embedding): - - fspec = dataloading.feature_spec.FeatureSpec.from_yaml( - os.path.join(dataset_path, feature_spec) - ) - num_tables = len(fspec.get_categorical_sizes()) - table_ids = list(range(num_tables)) - - _, dataloader = create_input_pipelines(dataset_type=dataset_type, dataset_path=dataset_path, - train_batch_size=batch_size, test_batch_size=batch_size, - table_ids=table_ids, feature_spec=feature_spec, rank=0, world_size=1) - - generated = 0 - batches = [] - - categorical_cardinalities = fspec.get_categorical_sizes() - categorical_cardinalities = np.roll(np.cumsum(np.array(categorical_cardinalities)), 1) - categorical_cardinalities[0] = 0 - - for batch in dataloader.op(): - features, labels = batch - numerical_features, cat_features = features - cat_features = tf.concat(cat_features, axis=1).numpy().astype(np.int32) - cat_features = np.add(cat_features, categorical_cardinalities).flatten() - numerical_features = numerical_features.numpy().astype(np.float32).flatten() - - batch = { - "categorical_features": cat_features.tolist(), - "numerical_features": numerical_features.tolist(), - } - batches.append(batch) - generated += batch_size - if generated >= total_benchmark_samples: - break - - with open(dst_path, "w") as f: - json.dump(obj={"data": batches}, fp=f, indent=4) - - shapes = [ - f"categorical_features:{cat_features.shape[0]}", - f"numerical_features:{numerical_features.shape[0]}", - ] - return shapes - - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument( - "--result-path", - type=pathlib.Path, - required=True, - help="Path where processed data is stored.", - ) - parser.add_argument( - "--fused-embedding", - action="store_true", - help="Use the fused embedding API for HPS", - ) - parser.add_argument( - "--batch-sizes", - type=int, - default=[256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536], - help="List of batch sizes to test.", - nargs="*", - ) - parser.add_argument( - "-v", - "--verbose", - help="Verbose logs", - action="store_true", - default=False, - ) - parser.add_argument( - "--dataset_path", default=None, required=True, help="Path to dataset directory" - ) - parser.add_argument( - "--feature_spec", - default="feature_spec.yaml", - help="Name of the feature spec file in the dataset directory", - ) - parser.add_argument( - "--dataset_type", - default="tf_raw", - choices=["tf_raw", "synthetic", "split_tfrecords"], - help="The type of the dataset to use", - ) - - parser.add_argument( - "--num-benchmark-samples", - default=2**18, - type=int, - help="The type of the dataset to use", - ) - - args = parser.parse_args() - - log_level = logging.INFO if not args.verbose else logging.DEBUG - log_format = "%(asctime)s %(levelname)s %(name)s %(message)s" - logging.basicConfig(level=log_level, format=log_format) - - input_data = create_input_data_hps(batch_sizes=args.batch_sizes, dataset_path=args.dataset_path, result_path=args.result_path, - dataset_type=args.dataset_type, feature_spec=args.feature_spec, - total_benchmark_samples=args.num_benchmark_samples, - fused_embedding=args.fused_embedding) - -if __name__ == "__main__": - main() diff --git a/review/pr-458/_downloads/dbb8b96b5782fa5bb7367e4c0cc75722/create_tf_models.py b/review/pr-458/_downloads/dbb8b96b5782fa5bb7367e4c0cc75722/create_tf_models.py deleted file mode 100644 index d1d1ef69ea..0000000000 --- a/review/pr-458/_downloads/dbb8b96b5782fa5bb7367e4c0cc75722/create_tf_models.py +++ /dev/null @@ -1,259 +0,0 @@ -import os -import numpy as np -import tensorflow as tf -import struct -import json -import hierarchical_parameter_server as hps - -args = dict() -args["gpu_num"] = 1 -args["slot_num"] = 26 -args["embed_vec_size"] = 128 -args["dense_dim"] = 13 -args["global_batch_size"] = 131072 -args["max_vocabulary_size"] = 32709138 - -args["dlrm_saved_path"] = "dlrm_tf_saved_model" -args["hps_plugin_dlrm_saved_path"] = "hps_plugin_dlrm_tf_saved_model" -args["dlrm_dense_saved_path"] = "dlrm_dense_tf_saved_model" -args["dlrm_embedding_table_saved_path"] = "dlrm_embedding_table" -args["ps_config_file"] = "dlrm.json" -args["np_vector_type"] = np.float32 -args["tf_key_type"] = tf.int32 -args["tf_vector_type"] = tf.float32 - -os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"]))) -gpus = tf.config.experimental.list_physical_devices("GPU") -for gpu in gpus: - tf.config.experimental.set_memory_growth(gpu, True) - -hps_config = { - "supportlonglong": False, - "models": [ - { - "model": "dlrm", - "sparse_files": [args["dlrm_embedding_table_saved_path"]], - "num_of_worker_buffer_in_pool": 3, - "embedding_table_names": ["sparse_embedding0"], - "embedding_vecsize_per_table": [128], - "maxnum_catfeature_query_per_table_per_sample": [26], - "default_value_for_each_table": [1.0], - "deployed_device_list": [0], - "max_batch_size": args["global_batch_size"], - "cache_refresh_percentage_per_iteration": 0.0, - "hit_rate_threshold": 1.0, - "gpucacheper": 0.2, - "gpucache": True, - } - ], -} -hps_config_json_object = json.dumps(hps_config, indent=4) -with open(args["ps_config_file"], "w") as outfile: - outfile.write(hps_config_json_object) - - -class MLP(tf.keras.layers.Layer): - def __init__(self, arch, activation="relu", out_activation=None, **kwargs): - super(MLP, self).__init__(**kwargs) - self.layers = [] - index = 0 - for units in arch[:-1]: - self.layers.append( - tf.keras.layers.Dense( - units, activation=activation, name="{}_{}".format(kwargs["name"], index) - ) - ) - index += 1 - self.layers.append( - tf.keras.layers.Dense( - arch[-1], activation=out_activation, name="{}_{}".format(kwargs["name"], index) - ) - ) - - def call(self, inputs, training=True): - x = self.layers[0](inputs) - for layer in self.layers[1:]: - x = layer(x) - return x - - -class SecondOrderFeatureInteraction(tf.keras.layers.Layer): - def __init__(self): - super(SecondOrderFeatureInteraction, self).__init__() - - def call(self, inputs, num_feas): - dot_products = tf.reshape( - tf.matmul(inputs, inputs, transpose_b=True), (-1, num_feas * num_feas) - ) - indices = tf.constant([i * num_feas + j for j in range(1, num_feas) for i in range(j)]) - flat_interactions = tf.gather(dot_products, indices, axis=1) - return flat_interactions - - -class DLRM(tf.keras.models.Model): - def __init__( - self, init_tensors, embed_vec_size, slot_num, dense_dim, arch_bot, arch_top, **kwargs - ): - super(DLRM, self).__init__(**kwargs) - - self.init_tensors = init_tensors - - with tf.device("/CPU:0"): - self.params = tf.Variable( - initial_value=tf.concat(self.init_tensors, axis=0), name="cpu_embedding" - ) - - self.embed_vec_size = embed_vec_size - self.slot_num = slot_num - self.dense_dim = dense_dim - - self.bot_nn = MLP(arch_bot, name="bottom", out_activation="relu") - self.top_nn = MLP(arch_top, name="top", out_activation="sigmoid") - self.interaction_op = SecondOrderFeatureInteraction() - - self.interaction_out_dim = self.slot_num * (self.slot_num + 1) // 2 - self.reshape_layer1 = tf.keras.layers.Reshape((1, arch_bot[-1]), name="reshape1") - self.concat1 = tf.keras.layers.Concatenate(axis=1, name="concat1") - self.concat2 = tf.keras.layers.Concatenate(axis=1, name="concat2") - - def call(self, inputs, training=True): - categorical_features = inputs["categorical_features"] - numerical_features = inputs["numerical_features"] - - embedding_vector = tf.nn.embedding_lookup(params=self.params, ids=categorical_features) - dense_x = self.bot_nn(numerical_features) - concat_features = self.concat1([embedding_vector, self.reshape_layer1(dense_x)]) - - Z = self.interaction_op(concat_features, self.slot_num + 1) - z = self.concat2([dense_x, Z]) - logit = self.top_nn(z) - return logit - - def summary(self): - inputs = { - "categorical_features": tf.keras.Input( - shape=(self.slot_num,), dtype=args["tf_key_type"], name="categorical_features" - ), - "numerical_features": tf.keras.Input( - shape=(self.dense_dim,), dtype=tf.float32, name="numrical_features" - ), - } - model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs)) - return model.summary() - - -class HPS_Plugin_DLRM(tf.keras.models.Model): - def __init__(self, slot_num, dense_dim, embed_vec_size, dense_model_path, **kwargs): - super(HPS_Plugin_DLRM, self).__init__(**kwargs) - - self.slot_num = slot_num - self.dense_dim = dense_dim - self.embed_vec_size = embed_vec_size - self.lookup_layer = hps.LookupLayer( - model_name="dlrm", - table_id=0, - emb_vec_size=self.embed_vec_size, - emb_vec_dtype=args["tf_vector_type"], - name="lookup", - ps_config_file="dlrm.json", - global_batch_size=args["global_batch_size"], - ) - self.dense_model = tf.keras.models.load_model(dense_model_path) - - def call(self, inputs): - categorical_features = inputs["categorical_features"] - numerical_features = inputs["numerical_features"] - - embedding_vector = self.lookup_layer(categorical_features) - logit = self.dense_model([embedding_vector, numerical_features]) - return logit - - def summary(self): - inputs = { - "categorical_features": tf.keras.Input( - shape=(self.slot_num,), dtype=args["tf_key_type"], name="categorical_features" - ), - "numerical_features": tf.keras.Input( - shape=(self.dense_dim,), dtype=tf.float32, name="numerical_features" - ), - } - model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs)) - return model.summary() - - -def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size): - os.system("mkdir -p {}".format(embedding_table_path)) - with open("{}/key".format(embedding_table_path), "wb") as key_file, open( - "{}/emb_vector".format(embedding_table_path), "wb" - ) as vec_file: - for key in range(embeddings_weights.shape[0]): - vec = embeddings_weights[key] - key_struct = struct.pack("q", key) - vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec) - key_file.write(key_struct) - vec_file.write(vec_struct) - - -def native_tf(args): - init_tensors = np.random.random((args["max_vocabulary_size"], args["embed_vec_size"])).astype( - args["np_vector_type"] - ) - model = DLRM( - init_tensors, - args["embed_vec_size"], - args["slot_num"], - args["dense_dim"], - arch_bot=[512, 256, args["embed_vec_size"]], - arch_top=[1024, 1024, 512, 256, 1], - name="dlrm", - ) - model.summary() - inputs = { - "categorical_features": tf.keras.Input( - shape=(args["slot_num"],), dtype=args["tf_key_type"], name="categorical_features" - ), - "numerical_features": tf.keras.Input( - shape=(args["dense_dim"],), dtype=tf.float32, name="numerical_features" - ), - } - pred = model(inputs) - model.save( - args["dlrm_saved_path"], - options=tf.saved_model.SaveOptions( - experimental_variable_policy=tf.saved_model.experimental.VariablePolicy.SAVE_VARIABLE_DEVICES - ), - ) - - dense_model = tf.keras.models.Model( - [model.get_layer("concat1").input[0], model.get_layer("bottom").input], - model.get_layer("top").output, - ) - dense_model.summary() - dense_model.save(args["dlrm_dense_saved_path"]) - - embedding_weights = model.get_weights()[-1] - convert_to_sparse_model( - embedding_weights, args["dlrm_embedding_table_saved_path"], args["embed_vec_size"] - ) - - -def hps_plugin_tf(args): - model = HPS_Plugin_DLRM( - args["slot_num"], args["dense_dim"], args["embed_vec_size"], args["dlrm_dense_saved_path"] - ) - model.summary() - inputs = { - "categorical_features": tf.keras.Input( - shape=(args["slot_num"],), dtype=args["tf_key_type"], name="categorical_features" - ), - "numerical_features": tf.keras.Input( - shape=(args["dense_dim"],), dtype=tf.float32, name="numerical_features" - ), - } - pred = model(inputs) - model.save(args["hps_plugin_dlrm_saved_path"]) - - -if __name__ == "__main__": - native_tf(args) - hps_plugin_tf(args) diff --git a/review/pr-458/_downloads/e858fb50f7c9701eb1852ee88cd4ab31/create_trt_engines.py b/review/pr-458/_downloads/e858fb50f7c9701eb1852ee88cd4ab31/create_trt_engines.py deleted file mode 100644 index 4beea1c414..0000000000 --- a/review/pr-458/_downloads/e858fb50f7c9701eb1852ee88cd4ab31/create_trt_engines.py +++ /dev/null @@ -1,96 +0,0 @@ -import onnx_graphsurgeon as gs -from onnx import shape_inference -import numpy as np -import onnx -import tensorrt as trt -import ctypes -import os - -TRT_LOGGER = trt.Logger(trt.Logger.INFO) -EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) - -args = dict() -args["hps_trt_plugin_lib_path"] = "/usr/local/hps_trt/lib/libhps_plugin.so" -args["dlrm_dense_saved_path"] = "dlrm_dense_tf_saved_model" -args["dlrm_dense_onnx_path"] = "dlrm_dense.onnx" -args["hps_plugin_dlrm_onnx_path"] = "hps_plugin_dlrm.onnx" -args["hps_plugin_dlrm_trt_path"] = "hps_plugin_dlrm.trt" - - -def onnx_surgery(args): - graph = gs.import_onnx(onnx.load(args["dlrm_dense_onnx_path"])) - saved = [] - for i in graph.inputs: - if i.name == "args_0": - categorical_features = gs.Variable( - name="categorical_features", dtype=np.int32, shape=("unknown", 26) - ) - node = gs.Node( - op="HPS_TRT", - attrs={ - "ps_config_file": "dlrm.json\0", - "model_name": "dlrm\0", - "table_id": 0, - "emb_vec_size": 128, - }, - inputs=[categorical_features], - outputs=[i], - ) - graph.nodes.append(node) - saved.append(categorical_features) - if i.name == "args_0_1": - i.name = "numerical_features" - saved.append(i) - graph.inputs = saved - graph.cleanup().toposort() - onnx.save(gs.export_onnx(graph), args["hps_plugin_dlrm_onnx_path"]) - - -def create_hps_plugin_creator(args): - plugin_lib_name = args["hps_trt_plugin_lib_path"] - handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL) - trt.init_libnvinfer_plugins(TRT_LOGGER, "") - plg_registry = trt.get_plugin_registry() - for plugin_creator in plg_registry.plugin_creator_list: - if plugin_creator.name[0] == "H": - print(plugin_creator.name) - hps_plugin_creator = plg_registry.get_plugin_creator("HPS_TRT", "1", "") - return hps_plugin_creator - - -def build_engine_from_onnx(args, fp16): - with trt.Builder(TRT_LOGGER) as builder, builder.create_network( - EXPLICIT_BATCH - ) as network, trt.OnnxParser( - network, TRT_LOGGER - ) as parser, builder.create_builder_config() as builder_config: - model = open(args["hps_plugin_dlrm_onnx_path"], "rb") - parser.parse(model.read()) - print(network.num_layers) - - trt_engine_saved_path = "fp32_" + args["hps_plugin_dlrm_trt_path"] - if fp16: - builder_config.set_flag(trt.BuilderFlag.FP16) - trt_engine_saved_path = "fp16_" + args["hps_plugin_dlrm_trt_path"] - - profile = builder.create_optimization_profile() - profile.set_shape("categorical_features", (1, 26), (1024, 26), (131072, 26)) - profile.set_shape("numerical_features", (1, 13), (1024, 13), (131072, 13)) - builder_config.add_optimization_profile(profile) - - engine = builder.build_serialized_network(network, builder_config) - with open(trt_engine_saved_path, "wb") as fout: - fout.write(engine) - - -if __name__ == "__main__": - os.system( - "python -m tf2onnx.convert --saved-model " - + args["dlrm_dense_saved_path"] - + " --output " - + args["dlrm_dense_onnx_path"] - ) - onnx_surgery(args) - create_hps_plugin_creator(args) - build_engine_from_onnx(args, fp16=False) - build_engine_from_onnx(args, fp16=True) diff --git a/review/pr-458/_images/066b2fd5e0bb125957a08439799519110a492bebe7436d750d0721277048b473.svg b/review/pr-458/_images/066b2fd5e0bb125957a08439799519110a492bebe7436d750d0721277048b473.svg deleted file mode 100644 index 83b7ca118e..0000000000 --- a/review/pr-458/_images/066b2fd5e0bb125957a08439799519110a492bebe7436d750d0721277048b473.svg +++ /dev/null @@ -1,67 +0,0 @@ - - - - - - -%3 - - - -0 - -SelectionOp - - - -1 - -JoinExternal - - - -0->1 - - - - - -0_selector - -['userId', 'movieId'] - - - -0_selector->0 - - - - - -2 - -output cols=[userId, movieId] - - - -1->2 - - - - - -1_selector - -['userId', 'movieId'] - - - -1_selector->1 - - - - - \ No newline at end of file diff --git a/review/pr-458/_images/9029315af2f13b63460fd19ef84ef369a2f7327367b6e976dcf7f624d6784168.svg b/review/pr-458/_images/9029315af2f13b63460fd19ef84ef369a2f7327367b6e976dcf7f624d6784168.svg deleted file mode 100644 index bedfa27a98..0000000000 --- a/review/pr-458/_images/9029315af2f13b63460fd19ef84ef369a2f7327367b6e976dcf7f624d6784168.svg +++ /dev/null @@ -1,193 +0,0 @@ - - - - - - -%3 - - - -0 - -['m', 'o', 'v', 'i', 'e', 'I', 'd'] - - - -6 - -Rename - - - -0->6 - - - - - -1 - -Categorify - - - -1->0 - - - - - -7 - -+ - - - -1->7 - - - - - -0_selector - -['movieId'] - - - -0_selector->0 - - - - - -2 - -JoinExternal - - - -2->1 - - - - - -3 - -SelectionOp - - - -3->2 - - - - - -2_selector - -['userId', 'movieId'] - - - -2_selector->2 - - - - - -3_selector - -['userId', 'movieId'] - - - -3_selector->3 - - - - - -4 - -SelectionOp - - - -5 - -(lambda col: (col > 3).astype("int8")) - - - -4->5 - - - - - -4_selector - -['rating'] - - - -4_selector->4 - - - - - -5->7 - - - - - -5_selector - -['rating'] - - - -5_selector->5 - - - - - -6->7 - - - - - -6_selector - -['movieId'] - - - -6_selector->6 - - - - - -8 - -output cols - - - -7->8 - - - - - \ No newline at end of file diff --git a/review/pr-458/_images/DCN.JPG b/review/pr-458/_images/DCN.JPG deleted file mode 100644 index 47b81d865d..0000000000 Binary files a/review/pr-458/_images/DCN.JPG and /dev/null differ diff --git a/review/pr-458/_images/WDL.JPG b/review/pr-458/_images/WDL.JPG deleted file mode 100644 index 818a0a7035..0000000000 Binary files a/review/pr-458/_images/WDL.JPG and /dev/null differ diff --git a/review/pr-458/_images/dataset.png b/review/pr-458/_images/dataset.png deleted file mode 100644 index 2424b8c543..0000000000 Binary files a/review/pr-458/_images/dataset.png and /dev/null differ diff --git a/review/pr-458/_images/fig12_multi_gpu_performance.PNG b/review/pr-458/_images/fig12_multi_gpu_performance.PNG deleted file mode 100644 index e1f6acb671..0000000000 Binary files a/review/pr-458/_images/fig12_multi_gpu_performance.PNG and /dev/null differ diff --git a/review/pr-458/_images/fig1_hugectr_arch.png b/review/pr-458/_images/fig1_hugectr_arch.png deleted file mode 100644 index 840e418c44..0000000000 Binary files a/review/pr-458/_images/fig1_hugectr_arch.png and /dev/null differ diff --git a/review/pr-458/_images/fig2_embedding_mlp.png b/review/pr-458/_images/fig2_embedding_mlp.png deleted file mode 100644 index 05db4c2ba4..0000000000 Binary files a/review/pr-458/_images/fig2_embedding_mlp.png and /dev/null differ diff --git a/review/pr-458/_images/fig3_embedding_mech.png b/review/pr-458/_images/fig3_embedding_mech.png deleted file mode 100644 index f10531369b..0000000000 Binary files a/review/pr-458/_images/fig3_embedding_mech.png and /dev/null differ diff --git a/review/pr-458/_images/fig4_arithmetic_underflow.png b/review/pr-458/_images/fig4_arithmetic_underflow.png deleted file mode 100644 index ff74563b58..0000000000 Binary files a/review/pr-458/_images/fig4_arithmetic_underflow.png and /dev/null differ diff --git a/review/pr-458/_images/graph_surgeon.png b/review/pr-458/_images/graph_surgeon.png deleted file mode 100644 index c1fed2f0cb..0000000000 Binary files a/review/pr-458/_images/graph_surgeon.png and /dev/null differ diff --git a/review/pr-458/_images/hps_dlrm_latency.png b/review/pr-458/_images/hps_dlrm_latency.png deleted file mode 100644 index 3aeed71187..0000000000 Binary files a/review/pr-458/_images/hps_dlrm_latency.png and /dev/null differ diff --git a/review/pr-458/_images/hps_library.svg b/review/pr-458/_images/hps_library.svg deleted file mode 100644 index db026b40ca..0000000000 --- a/review/pr-458/_images/hps_library.svg +++ /dev/null @@ -1,94 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HPS Plugin - for TensorFlow - - - - - HPS Plugin - for Torch - - - - - HPS Plugin - for TensorRT - - - - - HPS Backend - for Triton Inference Server - - - - - - - HPS Database Backend - - - GPU Embedding Cache - - - Hierarchical Parameter Server Library - - \ No newline at end of file diff --git a/review/pr-458/_images/learning_rate_scheduling.png b/review/pr-458/_images/learning_rate_scheduling.png deleted file mode 100644 index 320a7474e6..0000000000 Binary files a/review/pr-458/_images/learning_rate_scheduling.png and /dev/null differ diff --git a/review/pr-458/_images/memory_hierarchy.png b/review/pr-458/_images/memory_hierarchy.png deleted file mode 100644 index 01b60c2035..0000000000 Binary files a/review/pr-458/_images/memory_hierarchy.png and /dev/null differ diff --git a/review/pr-458/_images/merlin_arch.png b/review/pr-458/_images/merlin_arch.png deleted file mode 100644 index 8108829d48..0000000000 Binary files a/review/pr-458/_images/merlin_arch.png and /dev/null differ diff --git a/review/pr-458/_images/mlperf_10.PNG b/review/pr-458/_images/mlperf_10.PNG deleted file mode 100644 index 76c6aded9f..0000000000 Binary files a/review/pr-458/_images/mlperf_10.PNG and /dev/null differ diff --git a/review/pr-458/_images/workflow.png b/review/pr-458/_images/workflow.png deleted file mode 100644 index d37f8bd7a8..0000000000 Binary files a/review/pr-458/_images/workflow.png and /dev/null differ diff --git a/review/pr-458/_images/workflow1.png b/review/pr-458/_images/workflow1.png deleted file mode 100644 index 3344e75504..0000000000 Binary files a/review/pr-458/_images/workflow1.png and /dev/null differ diff --git a/review/pr-458/_images/workflow_of_embeddinglayer.png b/review/pr-458/_images/workflow_of_embeddinglayer.png deleted file mode 100644 index b8915cb0b8..0000000000 Binary files a/review/pr-458/_images/workflow_of_embeddinglayer.png and /dev/null differ diff --git a/review/pr-458/_modules/hierarchical_parameter_server/core/initialize.html b/review/pr-458/_modules/hierarchical_parameter_server/core/initialize.html deleted file mode 100644 index 7706c56335..0000000000 --- a/review/pr-458/_modules/hierarchical_parameter_server/core/initialize.html +++ /dev/null @@ -1,382 +0,0 @@ - - - - - - hierarchical_parameter_server.core.initialize — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - - -
  • -
  • -
-
-
-
-
- -

Source code for hierarchical_parameter_server.core.initialize

-"""
- Copyright (c) 2023, NVIDIA CORPORATION.
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-"""
-
-from hierarchical_parameter_server import hps_lib
-import tensorflow.distribute as tf_dist
-from tensorflow import print as tf_print
-from tensorflow import function
-from tensorflow.python.framework import config
-from tensorflow.dtypes import int32, int64
-from tensorflow.python.ops import array_ops
-import tensorflow as tf
-
-MirroredStrategy = tf_dist.MirroredStrategy
-try:
-    MultiWorkerMirroredStrategy = tf_dist.MultiWorkerMirroredStrategy
-except AttributeError:
-    MultiWorkerMirroredStrategy = tf_dist.experimental.MultiWorkerMirroredStrategy
-import sys
-
-
-
[docs]def Init(**kwargs): - """ - Abbreviated as ``hps.Init(**kwargs)``. - - This function initializes the HPS for all the deployed models. It can be used - explicitly or implicitly. When used explicitly, you must call the function - only once and you must call it before any other HPS APIs. - When used implicitly, ``ps_config_file`` and ``global_batch_size`` - should be specified in the constructor of ``hps.SparseLookupLayer`` - and ``hps.LookupLayer``. When the layer is executed for the first time, it triggers - the internal HPS initialization implicitly in a thread-safe call-once manner. The - implicit initialization is especially useful for deploying the SavedModels that - leverage the HPS layers for online inference. - - HPS leverages all available GPUs for the current CPU process. Set - ``CUDA_VISIBLE_DEVICES`` or ``tf.config.set_visible_devices`` to specify which - GPUs to use in this process before you launch the TensorFlow runtime - and calling this function. Additionally, ensure that the ``deployed_device_list`` - parameter in the HPS configuration JSON file matches the visible devices. - - In **TensorFlow 2.x**, HPS can be used with ``tf.distribute.Strategy`` or Horovod. - When it is used with ``tf.distribute.Strategy``, you must call it under ``strategy.scope()`` - as shown in the following code block. - - .. code-block:: python - - import hierarchical_parameter_server as hps - - with strategy.scope(): - hps.Init(**kwargs) - - To use the function with Horovod, call it one for each time you initialize a - Horovod process such as the following code block shows. - - .. code-block:: python - - import hierarchical_parameter_server as hps - import horovod.tensorflow as hvd - - hvd.init() - - hps.Init(**kwargs) - - In **TensorFlow 1.15**, HPS can only work with Horovod. The returned status - must be evaluated with ``sess.run`` and it must be the first step before evaluating - any other HPS APIs. - - .. code-block:: python - - import hierarchical_parameter_server as hps - - hps_init = hps.Init(**kwargs) - with tf.Session() as sess: - sess.run(hps_init) - ... - - Parameters - ---------- - kwargs: dict - Keyword arguments for this function. - The dictionary must contain ``global_batch_size`` and ``ps_config_file``. - - * `global_batch_size`: int, the global batch size for HPS that is deployed on multiple GPUs. - - * `ps_config_file`: str, the JSON configuration file for HPS initialization. - - An example ``ps_config_file`` is as follows and ``global_batch_size`` can be - configured as 16384 correspondingly: - - .. code-block:: python - - ps_config_file = { - "supportlonglong" : True, - "models" : - [{ - "model": "demo_model", - "sparse_files": ["demo_model_sparse.model"], - "num_of_worker_buffer_in_pool": 3, - "embedding_table_names":["sparse_embedding0"], - "embedding_vecsize_per_table": [16], - "maxnum_catfeature_query_per_table_per_sample": [10], - "default_value_for_each_table": [1.0], - "deployed_device_list": [0], - "max_batch_size": 16384, - "cache_refresh_percentage_per_iteration": 0.2, - "hit_rate_threshold": 1.0, - "gpucacheper": 1.0, - "gpucache": True - }, - { - "model": "demo_model2", - "sparse_files": ["demo_model2_sparse_0.model", "demo_model2_sparse_1.model"], - "num_of_worker_buffer_in_pool": 3, - "embedding_table_names":["sparse_embedding0", "sparse_embedding1"], - "embedding_vecsize_per_table": [64, 32], - "maxnum_catfeature_query_per_table_per_sample": [3, 5], - "default_value_for_each_table": [1.0, 1.0], - "deployed_device_list": [0], - "max_batch_size": 16384, - "cache_refresh_percentage_per_iteration": 0.2, - "hit_rate_threshold": 1.0, - "gpucacheper": 1.0, - "gpucache": True}, - ] - } - - - Returns - ------- - status: str - On success, the function returns string with the value ``OK``. - """ - - def _get_visible_devices(): - gpus = config.get_visible_devices("GPU") - assert len(gpus) > 0 - visible_devices = [] - for i in range(len(gpus)): - visible_devices.append(int(gpus[i].name.split(":")[-1])) - return array_ops.constant(visible_devices, dtype=int32) - - def _single_worker_init(**kwargs): - replica_ctx = tf_dist.get_replica_context() - replica_ctx.merge_call( - lambda strategy: tf_print("You are using the plugin with MirroredStrategy.") - ) - global_id = replica_ctx.replica_id_in_sync_group - visible_devices = _get_visible_devices() - status = hps_lib.init( - global_id, - replica_ctx.num_replicas_in_sync, - visible_devices, - global_batch_size=kwargs["global_batch_size"], - ps_config_file=kwargs["ps_config_file"], - ) - return status - - def _multi_worker_init(**kwargs): - replica_ctx = tf_dist.get_replica_context() - global_id = replica_ctx.replica_id_in_sync_group - visible_devices = _get_visible_devices() - status = hps_lib.init( - global_id, - replica_ctx.num_replicas_in_sync, - visible_devices, - global_batch_size=kwargs["global_batch_size"], - ps_config_file=kwargs["ps_config_file"], - ) - return status - - def _horovod_init(**kwargs): - local_rank = hvd.local_rank() - visible_devices = _get_visible_devices() - status = hps_lib.init( - local_rank, - hvd.size(), - visible_devices, - global_batch_size=kwargs["global_batch_size"], - ps_config_file=kwargs["ps_config_file"], - ) - return status - - def _one_device_init(**kwargs): - local_rank = 0 - visible_devices = _get_visible_devices() - status = hps_lib.init( - local_rank, - 1, - visible_devices, - global_batch_size=kwargs["global_batch_size"], - ps_config_file=kwargs["ps_config_file"], - ) - return status - - if tf_dist.has_strategy(): - strategy = tf_dist.get_strategy() - - @function - def _init_wrapper(run_fn, init_fn, **kwargs): - return run_fn(init_fn, kwargs=kwargs) - - if isinstance(strategy, MirroredStrategy): - _init_fn = _single_worker_init - elif isinstance(strategy, MultiWorkerMirroredStrategy): - _init_fn = _multi_worker_init - else: - raise RuntimeError("This strategy type is not supported yet.") - - if not hps_lib.in_tensorflow2(): - _run_fn = strategy.experimental_run_v2 - else: - _run_fn = strategy.run - - if tf.distribute.get_replica_context() is None: - _init_results = _init_wrapper(_run_fn, _init_fn, **kwargs) - else: - _init_results = _init_fn(**kwargs) - if hasattr(_init_results, "values"): - _init_results = _init_results.values - return _init_results - - elif "horovod.tensorflow" in sys.modules: - import horovod.tensorflow as hvd - - if not hps_lib.in_tensorflow2(): - - @function - def _init_wrapper(**kwargs): - return _horovod_init(**kwargs) - - return _init_wrapper(**kwargs) - else: - return _horovod_init(**kwargs) - else: - return _one_device_init(**kwargs)
-
- -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/_modules/hierarchical_parameter_server/core/lookup_layer.html b/review/pr-458/_modules/hierarchical_parameter_server/core/lookup_layer.html deleted file mode 100644 index ffb72ac69b..0000000000 --- a/review/pr-458/_modules/hierarchical_parameter_server/core/lookup_layer.html +++ /dev/null @@ -1,251 +0,0 @@ - - - - - - hierarchical_parameter_server.core.lookup_layer — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - - -
  • -
  • -
-
-
-
-
- -

Source code for hierarchical_parameter_server.core.lookup_layer

-"""
- Copyright (c) 2023, NVIDIA CORPORATION.
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-"""
-
-import tensorflow as tf
-from hierarchical_parameter_server.core import lookup_ops
-
-
-
[docs]class LookupLayer(tf.keras.layers.Layer): - """ - Abbreviated as ``hps.LookupLayer(*args, **kwargs)``. - - This is a wrapper class for HPS lookup layer, which basically performs - the same function as ``tf.nn.embedding_lookup``. Note that ``ps_config_file`` - and ``global_batch_size`` should be specified in the constructor if you want - to use implicit HPS initialization. - - Parameters - ---------- - model_name: str - The name of the model that has embedding tables. - table_id: int - The index of the embedding table for the model specified by - model_name. - emb_vec_size: int - The embedding vector size for the embedding table specified - by model_name and table_id. - emb_vec_dtype: - The data type of embedding vectors which must be ``tf.float32``. - ps_config_file: str - The JSON configuration file for HPS initialization. - global_batch_size: int - The global batch size for HPS that is deployed on multiple GPUs. - - Examples - -------- - .. code-block:: python - - import hierarchical_parameter_server as hps - - lookup_layer = hps.LookupLayer(model_name = args.model_name, - table_id = args.table_id, - emb_vec_size = args.embed_vec_size, - emb_vec_dtype = tf.float32, - ps_config_file = args.ps_config_file, - global_batch_size = args.global_batch_size) - - @tf.function - def _infer_step(inputs): - embedding_vector = lookup_layer(inputs) - ... - - for i, (inputs, labels) in enumerate(dataset): - _infer_step(inputs) - """ - - def __init__( - self, - model_name, - table_id, - emb_vec_size, - emb_vec_dtype, - ps_config_file="", - global_batch_size=1, - **kwargs - ): - super(LookupLayer, self).__init__(**kwargs) - self.model_name = model_name - self.table_id = table_id - self.emb_vec_size = emb_vec_size - self.emb_vec_dtype = emb_vec_dtype - self.ps_config_file = ps_config_file - self.global_batch_size = global_batch_size - -
[docs] def call(self, ids, max_norm=None): - """ - The forward logic of this wrapper class. - - Parameters - ---------- - ids: - Keys are stored in Tensor. The supported data types are ``tf.int32`` and ``tf.int64``. - max_norm: - if not ``None``, each embedding is clipped if its l2-norm is larger - than this value. - - Returns - ------- - emb_vector: ``tf.Tensor`` of float32 - the embedding vectors for the input keys. Its shape is - *ids.get_shape() + emb_vec_size*. - """ - emb_vector = lookup_ops.lookup( - ids=ids, - model_name=self.model_name, - table_id=self.table_id, - emb_vec_size=self.emb_vec_size, - emb_vec_dtype=self.emb_vec_dtype, - ps_config_file=self.ps_config_file, - global_batch_size=self.global_batch_size, - max_norm=max_norm, - ) - output_shape = ids.get_shape() + self.emb_vec_size - emb_vector.set_shape(output_shape) - return emb_vector
-
- -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/_modules/hierarchical_parameter_server/core/sparse_lookup_layer.html b/review/pr-458/_modules/hierarchical_parameter_server/core/sparse_lookup_layer.html deleted file mode 100644 index b5de59c536..0000000000 --- a/review/pr-458/_modules/hierarchical_parameter_server/core/sparse_lookup_layer.html +++ /dev/null @@ -1,415 +0,0 @@ - - - - - - hierarchical_parameter_server.core.sparse_lookup_layer — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - - -
  • -
  • -
-
-
-
-
- -

Source code for hierarchical_parameter_server.core.sparse_lookup_layer

-"""
- Copyright (c) 2023, NVIDIA CORPORATION.
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
-         http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-"""
-
-import tensorflow as tf
-from tensorflow.python.framework import sparse_tensor
-from tensorflow.python.ops import math_ops
-from tensorflow.python.ops import array_ops
-from tensorflow.python.framework import dtypes
-from tensorflow.nn import embedding_lookup
-
-from hierarchical_parameter_server.core import lookup_ops
-
-
-
[docs]class SparseLookupLayer(tf.keras.layers.Layer): - """ - Abbreviated as ``hps.SparseLookupLayer(*args, **kwargs)``. - - This is a wrapper class for HPS sparse lookup layer, which basically performs - the same function as ``tf.nn.embedding_lookup_sparse``. Note that ``ps_config_file`` - and ``global_batch_size`` should be specified in the constructor if you want - to use implicit HPS initialization. - - Parameters - ---------- - model_name: str - The name of the model that has embedding tables. - table_id: int - The index of the embedding table for the model specified by - model_name. - emb_vec_size: int - The embedding vector size for the embedding table specified - by model_name and table_id. - emb_vec_dtype: - The data type of embedding vectors which must be ``tf.float32``. - ps_config_file: str - The JSON configuration file for HPS initialization. - global_batch_size: int - The global batch size for HPS that is deployed on multiple GPUs. - - Examples - -------- - .. code-block:: python - - import hierarchical_parameter_server as hps - - sparse_lookup_layer = hps.SparseLookupLayer(model_name = args.model_name, - table_id = args.table_id, - emb_vec_size = args.embed_vec_size, - emb_vec_dtype = tf.float32, - ps_config_file = args.ps_config_file, - global_batch_size = args.global_batch_size) - - @tf.function - def _infer_step(inputs): - embedding_vector = sparse_lookup_layer(sp_ids=inputs, - sp_weights = None, - combiner="mean") - ... - - for i, (inputs, labels) in enumerate(dataset): - _infer_step(inputs) - """ - - def __init__( - self, - model_name, - table_id, - emb_vec_size, - emb_vec_dtype, - ps_config_file="", - global_batch_size=1, - **kwargs, - ): - super(SparseLookupLayer, self).__init__(**kwargs) - self.model_name = model_name - self.table_id = table_id - self.emb_vec_size = emb_vec_size - self.emb_vec_dtype = emb_vec_dtype - self.ps_config_file = ps_config_file - self.global_batch_size = global_batch_size - -
[docs] def call(self, sp_ids, sp_weights, name=None, combiner=None, max_norm=None): - """ - Looks up embeddings for the given ids and weights from a list of tensors. - This op assumes that there is at least one ID for each row in the dense tensor - represented by ``sp_ids`` (i.e. there are no rows with empty features), and that - all the indices of ``sp_ids`` are in canonical row-major order. The ``sp_ids`` - and ``sp_weights`` (if not None) are ``SparseTensor`` with rank of 2. - Embeddings are always aggregated along the last dimension. - If an ID value cannot be found in the HPS, the default embeddings are retrieved, - which can be specified in the HPS configuration JSON file. - - Parameters - ---------- - sp_ids: - N x M ``SparseTensor`` of ``int32`` or ``int64`` IDs where N is typically batch size - and M is arbitrary. - sp_weights: - Either a ``SparseTensor`` of float or double weights, or ``None`` to - indicate all weights should be taken to be `1`. If specified, ``sp_weights`` - must have exactly the same shape and indices as ``sp_ids``. - combiner: - A string that specifies the reduction op: - - ``"sum"`` - Computes the weighted sum of the embedding results for each row. - ``"mean"`` - Computes the weighted sum divided by the total weight. - ``"sqrtn"`` - Computes the weighted sum divided by the square root of the sum of the - squares of the weights. - - The default value is ``"mean"``. - max_norm: - if not ``None``, each embedding is clipped if its l2-norm is larger - than this value, before combining. - - Returns - ------- - emb_vector: ``tf.Tensor`` of float32 - A dense tensor representing the combined embeddings for the - sparse IDs. For each row in the dense tensor represented by ``sp_ids``, the op - looks up the embeddings for all IDs in that row, multiplies them by the - corresponding weight, and combines these embeddings as specified. - In other words, if - - .. code-block:: python - - shape(sp_ids) = shape(sp_weights) = [d0, d1] - - then - - .. code-block:: python - - shape(output) = [d0, self.emb_vec_dtype] - - For instance, if self.emb_vec_dtype is 16, and sp_ids / sp_weights are - - .. code-block:: python - - [0, 0]: id 1, weight 2.0 - [0, 1]: id 3, weight 0.5 - [1, 0]: id 0, weight 1.0 - [2, 3]: id 1, weight 3.0 - - with ``combiner = "mean"``, then the output is a 3x16 matrix where - - .. code-block:: python - - output[0, :] = (vector_for_id_1 * 2.0 + vector_for_id_3 * 0.5) / (2.0 + 0.5) - output[1, :] = (vector_for_id_0 * 1.0) / 1.0 - output[2, :] = (vector_for_id_1 * 3.0) / 3.0 - - Raises - ------ - TypeError: If ``sp_ids`` is not a ``SparseTensor``, or if ``sp_weights`` is - neither ``None`` nor ``SparseTensor``. - ValueError: If ``combiner`` is not one of {``"mean"``, ``"sqrtn"``, ``"sum"``}. - - """ - - # Extract unique dense ids to be looked up - if combiner is None: - combiner = "mean" - if combiner not in ("mean", "sqrtn", "sum"): - raise ValueError(f"combiner must be one of 'mean', 'sqrtn' or 'sum', got {combiner}") - - if not isinstance(sp_ids, sparse_tensor.SparseTensor): - raise TypeError(f"sp_ids must be SparseTensor, got {type(sp_ids)}") - - ignore_weights = sp_weights is None - if not ignore_weights: - if not isinstance(sp_weights, sparse_tensor.SparseTensor): - raise TypeError( - f"sp_weights must be either None or SparseTensor," f"got {type(sp_weights)}" - ) - sp_ids.values.get_shape().assert_is_compatible_with(sp_weights.values.get_shape()) - sp_ids.indices.get_shape().assert_is_compatible_with(sp_weights.indices.get_shape()) - sp_ids.dense_shape.get_shape().assert_is_compatible_with( - sp_weights.dense_shape.get_shape() - ) - # TODO(yleon): Add enhanced node assertions to verify that sp_ids and - # sp_weights have equal indices and shapes. - - segment_ids = sp_ids.indices[:, 0] - - ids = sp_ids.values - ids, idx = array_ops.unique(ids) - - # Query HPS for embeddings - embeddings = lookup_ops.lookup( - ids=ids, - model_name=self.model_name, - table_id=self.table_id, - emb_vec_size=self.emb_vec_size, - emb_vec_dtype=self.emb_vec_dtype, - ps_config_file=self.ps_config_file, - global_batch_size=self.global_batch_size, - max_norm=max_norm, - ) - - # Handle weights and combiner - if not ignore_weights: - if segment_ids.dtype != dtypes.int32: - segment_ids = math_ops.cast(segment_ids, dtypes.int32) - - weights = sp_weights.values - embeddings = array_ops.gather(embeddings, idx) - - original_dtype = embeddings.dtype - if embeddings.dtype in (dtypes.float16, dtypes.bfloat16): - # Cast low-precision embeddings to float32 during the computation to - # avoid numerical issues. - embeddings = math_ops.cast(embeddings, dtypes.float32) - if weights.dtype != embeddings.dtype: - weights = math_ops.cast(weights, embeddings.dtype) - - # Reshape weights to allow broadcast - ones_shape = array_ops.expand_dims(array_ops.rank(embeddings) - 1, 0) - ones = array_ops.ones(ones_shape, dtype=dtypes.int32) - bcast_weights_shape = array_ops.concat([array_ops.shape(weights), ones], 0) - - orig_weights_shape = weights.get_shape() - weights = array_ops.reshape(weights, bcast_weights_shape) - - # Set the weight shape, since after reshaping to bcast_weights_shape, - # the shape becomes None. - if embeddings.get_shape().ndims is not None: - weights.set_shape( - orig_weights_shape.concatenate( - [1 for _ in range(embeddings.get_shape().ndims - 1)] - ) - ) - - embeddings *= weights - - if combiner == "sum": - embeddings = math_ops.segment_sum(embeddings, segment_ids) - elif combiner == "mean": - embeddings = math_ops.segment_sum(embeddings, segment_ids) - weight_sum = math_ops.segment_sum(weights, segment_ids) - embeddings = math_ops.div_no_nan(embeddings, weight_sum) - elif combiner == "sqrtn": - embeddings = math_ops.segment_sum(embeddings, segment_ids) - weights_squared = math_ops.pow(weights, 2) - weight_sum = math_ops.segment_sum(weights_squared, segment_ids) - weight_sum_sqrt = math_ops.sqrt(weight_sum) - embeddings = math_ops.div_no_nan(embeddings, weight_sum_sqrt) - else: - assert False, "Unrecognized combiner" - if embeddings.dtype != original_dtype: - embeddings = math_ops.cast(embeddings, original_dtype) - else: - if segment_ids.dtype not in (dtypes.int32, dtypes.int64): - segment_ids = math_ops.cast(segment_ids, dtypes.int32) - assert idx is not None - if combiner == "sum": - embeddings = math_ops.sparse_segment_sum(embeddings, idx, segment_ids) - elif combiner == "mean": - embeddings = math_ops.sparse_segment_mean(embeddings, idx, segment_ids) - elif combiner == "sqrtn": - embeddings = math_ops.sparse_segment_sqrt_n(embeddings, idx, segment_ids) - else: - assert False, "Unrecognized combiner" - - output_shape = [sp_ids.get_shape()[0], self.emb_vec_size] - embeddings.set_shape(output_shape) - return embeddings
-
- -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/_modules/index.html b/review/pr-458/_modules/index.html deleted file mode 100644 index 017527cb05..0000000000 --- a/review/pr-458/_modules/index.html +++ /dev/null @@ -1,136 +0,0 @@ - - - - - - Overview: module code — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
- - -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/_static/_sphinx_javascript_frameworks_compat.js b/review/pr-458/_static/_sphinx_javascript_frameworks_compat.js deleted file mode 100644 index 8549469dc2..0000000000 --- a/review/pr-458/_static/_sphinx_javascript_frameworks_compat.js +++ /dev/null @@ -1,134 +0,0 @@ -/* - * _sphinx_javascript_frameworks_compat.js - * ~~~~~~~~~~ - * - * Compatability shim for jQuery and underscores.js. - * - * WILL BE REMOVED IN Sphinx 6.0 - * xref RemovedInSphinx60Warning - * - */ - -/** - * select a different prefix for underscore - */ -$u = _.noConflict(); - - -/** - * small helper function to urldecode strings - * - * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL - */ -jQuery.urldecode = function(x) { - if (!x) { - return x - } - return decodeURIComponent(x.replace(/\+/g, ' ')); -}; - -/** - * small helper function to urlencode strings - */ -jQuery.urlencode = encodeURIComponent; - -/** - * This function returns the parsed url parameters of the - * current request. Multiple values per key are supported, - * it will always return arrays of strings for the value parts. - */ -jQuery.getQueryParameters = function(s) { - if (typeof s === 'undefined') - s = document.location.search; - var parts = s.substr(s.indexOf('?') + 1).split('&'); - var result = {}; - for (var i = 0; i < parts.length; i++) { - var tmp = parts[i].split('=', 2); - var key = jQuery.urldecode(tmp[0]); - var value = jQuery.urldecode(tmp[1]); - if (key in result) - result[key].push(value); - else - result[key] = [value]; - } - return result; -}; - -/** - * highlight a given string on a jquery object by wrapping it in - * span elements with the given class name. - */ -jQuery.fn.highlightText = function(text, className) { - function highlight(node, addItems) { - if (node.nodeType === 3) { - var val = node.nodeValue; - var pos = val.toLowerCase().indexOf(text); - if (pos >= 0 && - !jQuery(node.parentNode).hasClass(className) && - !jQuery(node.parentNode).hasClass("nohighlight")) { - var span; - var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg"); - if (isInSVG) { - span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); - } else { - span = document.createElement("span"); - span.className = className; - } - span.appendChild(document.createTextNode(val.substr(pos, text.length))); - node.parentNode.insertBefore(span, node.parentNode.insertBefore( - document.createTextNode(val.substr(pos + text.length)), - node.nextSibling)); - node.nodeValue = val.substr(0, pos); - if (isInSVG) { - var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect"); - var bbox = node.parentElement.getBBox(); - rect.x.baseVal.value = bbox.x; - rect.y.baseVal.value = bbox.y; - rect.width.baseVal.value = bbox.width; - rect.height.baseVal.value = bbox.height; - rect.setAttribute('class', className); - addItems.push({ - "parent": node.parentNode, - "target": rect}); - } - } - } - else if (!jQuery(node).is("button, select, textarea")) { - jQuery.each(node.childNodes, function() { - highlight(this, addItems); - }); - } - } - var addItems = []; - var result = this.each(function() { - highlight(this, addItems); - }); - for (var i = 0; i < addItems.length; ++i) { - jQuery(addItems[i].parent).before(addItems[i].target); - } - return result; -}; - -/* - * backward compatibility for jQuery.browser - * This will be supported until firefox bug is fixed. - */ -if (!jQuery.browser) { - jQuery.uaMatch = function(ua) { - ua = ua.toLowerCase(); - - var match = /(chrome)[ \/]([\w.]+)/.exec(ua) || - /(webkit)[ \/]([\w.]+)/.exec(ua) || - /(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) || - /(msie) ([\w.]+)/.exec(ua) || - ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) || - []; - - return { - browser: match[ 1 ] || "", - version: match[ 2 ] || "0" - }; - }; - jQuery.browser = {}; - jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true; -} diff --git a/review/pr-458/_static/basic.css b/review/pr-458/_static/basic.css deleted file mode 100644 index 4e9a9f1fac..0000000000 --- a/review/pr-458/_static/basic.css +++ /dev/null @@ -1,900 +0,0 @@ -/* - * basic.css - * ~~~~~~~~~ - * - * Sphinx stylesheet -- basic theme. - * - * :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS. - * :license: BSD, see LICENSE for details. - * - */ - -/* -- main layout ----------------------------------------------------------- */ - -div.clearer { - clear: both; -} - -div.section::after { - display: block; - content: ''; - clear: left; -} - -/* -- relbar ---------------------------------------------------------------- */ - -div.related { - width: 100%; - font-size: 90%; -} - -div.related h3 { - display: none; -} - -div.related ul { - margin: 0; - padding: 0 0 0 10px; - list-style: none; -} - -div.related li { - display: inline; -} - -div.related li.right { - float: right; - margin-right: 5px; -} - -/* -- sidebar --------------------------------------------------------------- */ - -div.sphinxsidebarwrapper { - padding: 10px 5px 0 10px; -} - -div.sphinxsidebar { - float: left; - width: 230px; - margin-left: -100%; - font-size: 90%; - word-wrap: break-word; - overflow-wrap : break-word; -} - -div.sphinxsidebar ul { - list-style: none; -} - -div.sphinxsidebar ul ul, -div.sphinxsidebar ul.want-points { - margin-left: 20px; - list-style: square; -} - -div.sphinxsidebar ul ul { - margin-top: 0; - margin-bottom: 0; -} - -div.sphinxsidebar form { - margin-top: 10px; -} - -div.sphinxsidebar input { - border: 1px solid #98dbcc; - font-family: sans-serif; - font-size: 1em; -} - -div.sphinxsidebar #searchbox form.search { - overflow: hidden; -} - -div.sphinxsidebar #searchbox input[type="text"] { - float: left; - width: 80%; - padding: 0.25em; - box-sizing: border-box; -} - -div.sphinxsidebar #searchbox input[type="submit"] { - float: left; - width: 20%; - border-left: none; - padding: 0.25em; - box-sizing: border-box; -} - - -img { - border: 0; - max-width: 100%; -} - -/* -- search page ----------------------------------------------------------- */ - -ul.search { - margin: 10px 0 0 20px; - padding: 0; -} - -ul.search li { - padding: 5px 0 5px 20px; - background-image: url(file.png); - background-repeat: no-repeat; - background-position: 0 7px; -} - -ul.search li a { - font-weight: bold; -} - -ul.search li p.context { - color: #888; - margin: 2px 0 0 30px; - text-align: left; -} - -ul.keywordmatches li.goodmatch a { - font-weight: bold; -} - -/* -- index page ------------------------------------------------------------ */ - -table.contentstable { - width: 90%; - margin-left: auto; - margin-right: auto; -} - -table.contentstable p.biglink { - line-height: 150%; -} - -a.biglink { - font-size: 1.3em; -} - -span.linkdescr { - font-style: italic; - padding-top: 5px; - font-size: 90%; -} - -/* -- general index --------------------------------------------------------- */ - -table.indextable { - width: 100%; -} - -table.indextable td { - text-align: left; - vertical-align: top; -} - -table.indextable ul { - margin-top: 0; - margin-bottom: 0; - list-style-type: none; -} - -table.indextable > tbody > tr > td > ul { - padding-left: 0em; -} - -table.indextable tr.pcap { - height: 10px; -} - -table.indextable tr.cap { - margin-top: 10px; - background-color: #f2f2f2; -} - -img.toggler { - margin-right: 3px; - margin-top: 3px; - cursor: pointer; -} - -div.modindex-jumpbox { - border-top: 1px solid #ddd; - border-bottom: 1px solid #ddd; - margin: 1em 0 1em 0; - padding: 0.4em; -} - -div.genindex-jumpbox { - border-top: 1px solid #ddd; - border-bottom: 1px solid #ddd; - margin: 1em 0 1em 0; - padding: 0.4em; -} - -/* -- domain module index --------------------------------------------------- */ - -table.modindextable td { - padding: 2px; - border-collapse: collapse; -} - -/* -- general body styles --------------------------------------------------- */ - -div.body { - min-width: 360px; - max-width: 800px; -} - -div.body p, div.body dd, div.body li, div.body blockquote { - -moz-hyphens: auto; - -ms-hyphens: auto; - -webkit-hyphens: auto; - hyphens: auto; -} - -a.headerlink { - visibility: hidden; -} - -h1:hover > a.headerlink, -h2:hover > a.headerlink, -h3:hover > a.headerlink, -h4:hover > a.headerlink, -h5:hover > a.headerlink, -h6:hover > a.headerlink, -dt:hover > a.headerlink, -caption:hover > a.headerlink, -p.caption:hover > a.headerlink, -div.code-block-caption:hover > a.headerlink { - visibility: visible; -} - -div.body p.caption { - text-align: inherit; -} - -div.body td { - text-align: left; -} - -.first { - margin-top: 0 !important; -} - -p.rubric { - margin-top: 30px; - font-weight: bold; -} - -img.align-left, figure.align-left, .figure.align-left, object.align-left { - clear: left; - float: left; - margin-right: 1em; -} - -img.align-right, figure.align-right, .figure.align-right, object.align-right { - clear: right; - float: right; - margin-left: 1em; -} - -img.align-center, figure.align-center, .figure.align-center, object.align-center { - display: block; - margin-left: auto; - margin-right: auto; -} - -img.align-default, figure.align-default, .figure.align-default { - display: block; - margin-left: auto; - margin-right: auto; -} - -.align-left { - text-align: left; -} - -.align-center { - text-align: center; -} - -.align-default { - text-align: center; -} - -.align-right { - text-align: right; -} - -/* -- sidebars -------------------------------------------------------------- */ - -div.sidebar, -aside.sidebar { - margin: 0 0 0.5em 1em; - border: 1px solid #ddb; - padding: 7px; - background-color: #ffe; - width: 40%; - float: right; - clear: right; - overflow-x: auto; -} - -p.sidebar-title { - font-weight: bold; -} -nav.contents, -aside.topic, -div.admonition, div.topic, blockquote { - clear: left; -} - -/* -- topics ---------------------------------------------------------------- */ -nav.contents, -aside.topic, -div.topic { - border: 1px solid #ccc; - padding: 7px; - margin: 10px 0 10px 0; -} - -p.topic-title { - font-size: 1.1em; - font-weight: bold; - margin-top: 10px; -} - -/* -- admonitions ----------------------------------------------------------- */ - -div.admonition { - margin-top: 10px; - margin-bottom: 10px; - padding: 7px; -} - -div.admonition dt { - font-weight: bold; -} - -p.admonition-title { - margin: 0px 10px 5px 0px; - font-weight: bold; -} - -div.body p.centered { - text-align: center; - margin-top: 25px; -} - -/* -- content of sidebars/topics/admonitions -------------------------------- */ - -div.sidebar > :last-child, -aside.sidebar > :last-child, -nav.contents > :last-child, -aside.topic > :last-child, -div.topic > :last-child, -div.admonition > :last-child { - margin-bottom: 0; -} - -div.sidebar::after, -aside.sidebar::after, -nav.contents::after, -aside.topic::after, -div.topic::after, -div.admonition::after, -blockquote::after { - display: block; - content: ''; - clear: both; -} - -/* -- tables ---------------------------------------------------------------- */ - -table.docutils { - margin-top: 10px; - margin-bottom: 10px; - border: 0; - border-collapse: collapse; -} - -table.align-center { - margin-left: auto; - margin-right: auto; -} - -table.align-default { - margin-left: auto; - margin-right: auto; -} - -table caption span.caption-number { - font-style: italic; -} - -table caption span.caption-text { -} - -table.docutils td, table.docutils th { - padding: 1px 8px 1px 5px; - border-top: 0; - border-left: 0; - border-right: 0; - border-bottom: 1px solid #aaa; -} - -th { - text-align: left; - padding-right: 5px; -} - -table.citation { - border-left: solid 1px gray; - margin-left: 1px; -} - -table.citation td { - border-bottom: none; -} - -th > :first-child, -td > :first-child { - margin-top: 0px; -} - -th > :last-child, -td > :last-child { - margin-bottom: 0px; -} - -/* -- figures --------------------------------------------------------------- */ - -div.figure, figure { - margin: 0.5em; - padding: 0.5em; -} - -div.figure p.caption, figcaption { - padding: 0.3em; -} - -div.figure p.caption span.caption-number, -figcaption span.caption-number { - font-style: italic; -} - -div.figure p.caption span.caption-text, -figcaption span.caption-text { -} - -/* -- field list styles ----------------------------------------------------- */ - -table.field-list td, table.field-list th { - border: 0 !important; -} - -.field-list ul { - margin: 0; - padding-left: 1em; -} - -.field-list p { - margin: 0; -} - -.field-name { - -moz-hyphens: manual; - -ms-hyphens: manual; - -webkit-hyphens: manual; - hyphens: manual; -} - -/* -- hlist styles ---------------------------------------------------------- */ - -table.hlist { - margin: 1em 0; -} - -table.hlist td { - vertical-align: top; -} - -/* -- object description styles --------------------------------------------- */ - -.sig { - font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace; -} - -.sig-name, code.descname { - background-color: transparent; - font-weight: bold; -} - -.sig-name { - font-size: 1.1em; -} - -code.descname { - font-size: 1.2em; -} - -.sig-prename, code.descclassname { - background-color: transparent; -} - -.optional { - font-size: 1.3em; -} - -.sig-paren { - font-size: larger; -} - -.sig-param.n { - font-style: italic; -} - -/* C++ specific styling */ - -.sig-inline.c-texpr, -.sig-inline.cpp-texpr { - font-family: unset; -} - -.sig.c .k, .sig.c .kt, -.sig.cpp .k, .sig.cpp .kt { - color: #0033B3; -} - -.sig.c .m, -.sig.cpp .m { - color: #1750EB; -} - -.sig.c .s, .sig.c .sc, -.sig.cpp .s, .sig.cpp .sc { - color: #067D17; -} - - -/* -- other body styles ----------------------------------------------------- */ - -ol.arabic { - list-style: decimal; -} - -ol.loweralpha { - list-style: lower-alpha; -} - -ol.upperalpha { - list-style: upper-alpha; -} - -ol.lowerroman { - list-style: lower-roman; -} - -ol.upperroman { - list-style: upper-roman; -} - -:not(li) > ol > li:first-child > :first-child, -:not(li) > ul > li:first-child > :first-child { - margin-top: 0px; -} - -:not(li) > ol > li:last-child > :last-child, -:not(li) > ul > li:last-child > :last-child { - margin-bottom: 0px; -} - -ol.simple ol p, -ol.simple ul p, -ul.simple ol p, -ul.simple ul p { - margin-top: 0; -} - -ol.simple > li:not(:first-child) > p, -ul.simple > li:not(:first-child) > p { - margin-top: 0; -} - -ol.simple p, -ul.simple p { - margin-bottom: 0; -} -aside.footnote > span, -div.citation > span { - float: left; -} -aside.footnote > span:last-of-type, -div.citation > span:last-of-type { - padding-right: 0.5em; -} -aside.footnote > p { - margin-left: 2em; -} -div.citation > p { - margin-left: 4em; -} -aside.footnote > p:last-of-type, -div.citation > p:last-of-type { - margin-bottom: 0em; -} -aside.footnote > p:last-of-type:after, -div.citation > p:last-of-type:after { - content: ""; - clear: both; -} - -dl.field-list { - display: grid; - grid-template-columns: fit-content(30%) auto; -} - -dl.field-list > dt { - font-weight: bold; - word-break: break-word; - padding-left: 0.5em; - padding-right: 5px; -} - -dl.field-list > dd { - padding-left: 0.5em; - margin-top: 0em; - margin-left: 0em; - margin-bottom: 0em; -} - -dl { - margin-bottom: 15px; -} - -dd > :first-child { - margin-top: 0px; -} - -dd ul, dd table { - margin-bottom: 10px; -} - -dd { - margin-top: 3px; - margin-bottom: 10px; - margin-left: 30px; -} - -dl > dd:last-child, -dl > dd:last-child > :last-child { - margin-bottom: 0; -} - -dt:target, span.highlighted { - background-color: #fbe54e; -} - -rect.highlighted { - fill: #fbe54e; -} - -dl.glossary dt { - font-weight: bold; - font-size: 1.1em; -} - -.versionmodified { - font-style: italic; -} - -.system-message { - background-color: #fda; - padding: 5px; - border: 3px solid red; -} - -.footnote:target { - background-color: #ffa; -} - -.line-block { - display: block; - margin-top: 1em; - margin-bottom: 1em; -} - -.line-block .line-block { - margin-top: 0; - margin-bottom: 0; - margin-left: 1.5em; -} - -.guilabel, .menuselection { - font-family: sans-serif; -} - -.accelerator { - text-decoration: underline; -} - -.classifier { - font-style: oblique; -} - -.classifier:before { - font-style: normal; - margin: 0 0.5em; - content: ":"; - display: inline-block; -} - -abbr, acronym { - border-bottom: dotted 1px; - cursor: help; -} - -/* -- code displays --------------------------------------------------------- */ - -pre { - overflow: auto; - overflow-y: hidden; /* fixes display issues on Chrome browsers */ -} - -pre, div[class*="highlight-"] { - clear: both; -} - -span.pre { - -moz-hyphens: none; - -ms-hyphens: none; - -webkit-hyphens: none; - hyphens: none; - white-space: nowrap; -} - -div[class*="highlight-"] { - margin: 1em 0; -} - -td.linenos pre { - border: 0; - background-color: transparent; - color: #aaa; -} - -table.highlighttable { - display: block; -} - -table.highlighttable tbody { - display: block; -} - -table.highlighttable tr { - display: flex; -} - -table.highlighttable td { - margin: 0; - padding: 0; -} - -table.highlighttable td.linenos { - padding-right: 0.5em; -} - -table.highlighttable td.code { - flex: 1; - overflow: hidden; -} - -.highlight .hll { - display: block; -} - -div.highlight pre, -table.highlighttable pre { - margin: 0; -} - -div.code-block-caption + div { - margin-top: 0; -} - -div.code-block-caption { - margin-top: 1em; - padding: 2px 5px; - font-size: small; -} - -div.code-block-caption code { - background-color: transparent; -} - -table.highlighttable td.linenos, -span.linenos, -div.highlight span.gp { /* gp: Generic.Prompt */ - user-select: none; - -webkit-user-select: text; /* Safari fallback only */ - -webkit-user-select: none; /* Chrome/Safari */ - -moz-user-select: none; /* Firefox */ - -ms-user-select: none; /* IE10+ */ -} - -div.code-block-caption span.caption-number { - padding: 0.1em 0.3em; - font-style: italic; -} - -div.code-block-caption span.caption-text { -} - -div.literal-block-wrapper { - margin: 1em 0; -} - -code.xref, a code { - background-color: transparent; - font-weight: bold; -} - -h1 code, h2 code, h3 code, h4 code, h5 code, h6 code { - background-color: transparent; -} - -.viewcode-link { - float: right; -} - -.viewcode-back { - float: right; - font-family: sans-serif; -} - -div.viewcode-block:target { - margin: -1px -10px; - padding: 0 10px; -} - -/* -- math display ---------------------------------------------------------- */ - -img.math { - vertical-align: middle; -} - -div.body div.math p { - text-align: center; -} - -span.eqno { - float: right; -} - -span.eqno a.headerlink { - position: absolute; - z-index: 1; -} - -div.math:hover a.headerlink { - visibility: visible; -} - -/* -- printout stylesheet --------------------------------------------------- */ - -@media print { - div.document, - div.documentwrapper, - div.bodywrapper { - margin: 0 !important; - width: 100%; - } - - div.sphinxsidebar, - div.related, - div.footer, - #top-link { - display: none; - } -} \ No newline at end of file diff --git a/review/pr-458/_static/css/badge_only.css b/review/pr-458/_static/css/badge_only.css deleted file mode 100644 index c718cee441..0000000000 --- a/review/pr-458/_static/css/badge_only.css +++ /dev/null @@ -1 +0,0 @@ -.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}@font-face{font-family:FontAwesome;font-style:normal;font-weight:400;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#FontAwesome) format("svg")}.fa:before{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1}.fa:before,a .fa{text-decoration:inherit}.fa:before,a .fa,li .fa{display:inline-block}li .fa-large:before{width:1.875em}ul.fas{list-style-type:none;margin-left:2em;text-indent:-.8em}ul.fas li .fa{width:.8em}ul.fas li .fa-large:before{vertical-align:baseline}.fa-book:before,.icon-book:before{content:"\f02d"}.fa-caret-down:before,.icon-caret-down:before{content:"\f0d7"}.fa-caret-up:before,.icon-caret-up:before{content:"\f0d8"}.fa-caret-left:before,.icon-caret-left:before{content:"\f0d9"}.fa-caret-right:before,.icon-caret-right:before{content:"\f0da"}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60}.rst-versions .rst-current-version:after{clear:both;content:"";display:block}.rst-versions .rst-current-version .fa{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}} \ No newline at end of file diff --git a/review/pr-458/_static/css/custom.css b/review/pr-458/_static/css/custom.css deleted file mode 100644 index 319ddff89a..0000000000 --- a/review/pr-458/_static/css/custom.css +++ /dev/null @@ -1,34 +0,0 @@ -.wy-nav-content { - margin: 0; - background: #fcfcfc; - padding-top: 40px; -} - -.wy-side-nav-search { - display: block; - width: 300px; - padding: .809em; - padding-top: 0.809em; - margin-bottom: .809em; - z-index: 200; - background-color: #2980b9; - text-align: center; - color: #fcfcfc; - padding-top: 40px; -} - -div.banner { - position: fixed; - top: 10px; - left: 20px; - margin: 0; - z-index: 1000; - width: 1050px; - text-align: center; -} - -p.banner { - border-radius: 4px; - color: #004831; - background: #76b900; -} \ No newline at end of file diff --git a/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff b/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff deleted file mode 100644 index 6cb6000018..0000000000 Binary files a/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff2 b/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff2 deleted file mode 100644 index 7059e23142..0000000000 Binary files a/review/pr-458/_static/css/fonts/Roboto-Slab-Bold.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff b/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff deleted file mode 100644 index f815f63f99..0000000000 Binary files a/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff2 b/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff2 deleted file mode 100644 index f2c76e5bda..0000000000 Binary files a/review/pr-458/_static/css/fonts/Roboto-Slab-Regular.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/fontawesome-webfont.eot b/review/pr-458/_static/css/fonts/fontawesome-webfont.eot deleted file mode 100644 index e9f60ca953..0000000000 Binary files a/review/pr-458/_static/css/fonts/fontawesome-webfont.eot and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/fontawesome-webfont.svg b/review/pr-458/_static/css/fonts/fontawesome-webfont.svg deleted file mode 100644 index 855c845e53..0000000000 --- a/review/pr-458/_static/css/fonts/fontawesome-webfont.svg +++ /dev/null @@ -1,2671 +0,0 @@ - - - - -Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016 - By ,,, -Copyright Dave Gandy 2016. All rights reserved. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/review/pr-458/_static/css/fonts/fontawesome-webfont.ttf b/review/pr-458/_static/css/fonts/fontawesome-webfont.ttf deleted file mode 100644 index 35acda2fa1..0000000000 Binary files a/review/pr-458/_static/css/fonts/fontawesome-webfont.ttf and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/fontawesome-webfont.woff b/review/pr-458/_static/css/fonts/fontawesome-webfont.woff deleted file mode 100644 index 400014a4b0..0000000000 Binary files a/review/pr-458/_static/css/fonts/fontawesome-webfont.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/fontawesome-webfont.woff2 b/review/pr-458/_static/css/fonts/fontawesome-webfont.woff2 deleted file mode 100644 index 4d13fc6040..0000000000 Binary files a/review/pr-458/_static/css/fonts/fontawesome-webfont.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-bold-italic.woff b/review/pr-458/_static/css/fonts/lato-bold-italic.woff deleted file mode 100644 index 88ad05b9ff..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-bold-italic.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-bold-italic.woff2 b/review/pr-458/_static/css/fonts/lato-bold-italic.woff2 deleted file mode 100644 index c4e3d804b5..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-bold-italic.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-bold.woff b/review/pr-458/_static/css/fonts/lato-bold.woff deleted file mode 100644 index c6dff51f06..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-bold.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-bold.woff2 b/review/pr-458/_static/css/fonts/lato-bold.woff2 deleted file mode 100644 index bb195043cf..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-bold.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-normal-italic.woff b/review/pr-458/_static/css/fonts/lato-normal-italic.woff deleted file mode 100644 index 76114bc033..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-normal-italic.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-normal-italic.woff2 b/review/pr-458/_static/css/fonts/lato-normal-italic.woff2 deleted file mode 100644 index 3404f37e2e..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-normal-italic.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-normal.woff b/review/pr-458/_static/css/fonts/lato-normal.woff deleted file mode 100644 index ae1307ff5f..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-normal.woff and /dev/null differ diff --git a/review/pr-458/_static/css/fonts/lato-normal.woff2 b/review/pr-458/_static/css/fonts/lato-normal.woff2 deleted file mode 100644 index 3bf9843328..0000000000 Binary files a/review/pr-458/_static/css/fonts/lato-normal.woff2 and /dev/null differ diff --git a/review/pr-458/_static/css/theme.css b/review/pr-458/_static/css/theme.css deleted file mode 100644 index 19a446a0e7..0000000000 --- a/review/pr-458/_static/css/theme.css +++ /dev/null @@ -1,4 +0,0 @@ -html{box-sizing:border-box}*,:after,:before{box-sizing:inherit}article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block}audio,canvas,video{display:inline-block;*display:inline;*zoom:1}[hidden],audio:not([controls]){display:none}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%}body{margin:0}a:active,a:hover{outline:0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}blockquote{margin:0}dfn{font-style:italic}ins{background:#ff9;text-decoration:none}ins,mark{color:#000}mark{background:#ff0;font-style:italic;font-weight:700}.rst-content code,.rst-content tt,code,kbd,pre,samp{font-family:monospace,serif;_font-family:courier new,monospace;font-size:1em}pre{white-space:pre}q{quotes:none}q:after,q:before{content:"";content:none}small{font-size:85%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}dl,ol,ul{margin:0;padding:0;list-style:none;list-style-image:none}li{list-style:none}dd{margin:0}img{border:0;-ms-interpolation-mode:bicubic;vertical-align:middle;max-width:100%}svg:not(:root){overflow:hidden}figure,form{margin:0}label{cursor:pointer}button,input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}button,input{line-height:normal}button,input[type=button],input[type=reset],input[type=submit]{cursor:pointer;-webkit-appearance:button;*overflow:visible}button[disabled],input[disabled]{cursor:default}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}textarea{resize:vertical}table{border-collapse:collapse;border-spacing:0}td{vertical-align:top}.chromeframe{margin:.2em 0;background:#ccc;color:#000;padding:.2em 0}.ir{display:block;border:0;text-indent:-999em;overflow:hidden;background-color:transparent;background-repeat:no-repeat;text-align:left;direction:ltr;*line-height:0}.ir br{display:none}.hidden{display:none!important;visibility:hidden}.visuallyhidden{border:0;clip:rect(0 0 0 0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}.visuallyhidden.focusable:active,.visuallyhidden.focusable:focus{clip:auto;height:auto;margin:0;overflow:visible;position:static;width:auto}.invisible{visibility:hidden}.relative{position:relative}big,small{font-size:100%}@media print{body,html,section{background:none!important}*{box-shadow:none!important;text-shadow:none!important;filter:none!important;-ms-filter:none!important}a,a:visited{text-decoration:underline}.ir a:after,a[href^="#"]:after,a[href^="javascript:"]:after{content:""}blockquote,pre{page-break-inside:avoid}thead{display:table-header-group}img,tr{page-break-inside:avoid}img{max-width:100%!important}@page{margin:.5cm}.rst-content .toctree-wrapper>p.caption,h2,h3,p{orphans:3;widows:3}.rst-content .toctree-wrapper>p.caption,h2,h3{page-break-after:avoid}}.btn,.fa:before,.icon:before,.rst-content .admonition,.rst-content .admonition-title:before,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .code-block-caption .headerlink:before,.rst-content .danger,.rst-content .eqno .headerlink:before,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-alert,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before,input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week],select,textarea{-webkit-font-smoothing:antialiased}.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}/*! - * Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome - * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) - */@font-face{font-family:FontAwesome;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713);src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix&v=4.7.0) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#fontawesomeregular) format("svg");font-weight:400;font-style:normal}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14286em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14286em;width:2.14286em;top:.14286em;text-align:center}.fa-li.fa-lg{left:-1.85714em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa-pull-left.icon,.fa.fa-pull-left,.rst-content .code-block-caption .fa-pull-left.headerlink,.rst-content .eqno .fa-pull-left.headerlink,.rst-content .fa-pull-left.admonition-title,.rst-content code.download span.fa-pull-left:first-child,.rst-content dl dt .fa-pull-left.headerlink,.rst-content h1 .fa-pull-left.headerlink,.rst-content h2 .fa-pull-left.headerlink,.rst-content h3 .fa-pull-left.headerlink,.rst-content h4 .fa-pull-left.headerlink,.rst-content h5 .fa-pull-left.headerlink,.rst-content h6 .fa-pull-left.headerlink,.rst-content p .fa-pull-left.headerlink,.rst-content table>caption .fa-pull-left.headerlink,.rst-content tt.download span.fa-pull-left:first-child,.wy-menu-vertical li.current>a button.fa-pull-left.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-left.toctree-expand,.wy-menu-vertical li button.fa-pull-left.toctree-expand{margin-right:.3em}.fa-pull-right.icon,.fa.fa-pull-right,.rst-content .code-block-caption .fa-pull-right.headerlink,.rst-content .eqno .fa-pull-right.headerlink,.rst-content .fa-pull-right.admonition-title,.rst-content code.download span.fa-pull-right:first-child,.rst-content dl dt .fa-pull-right.headerlink,.rst-content h1 .fa-pull-right.headerlink,.rst-content h2 .fa-pull-right.headerlink,.rst-content h3 .fa-pull-right.headerlink,.rst-content h4 .fa-pull-right.headerlink,.rst-content h5 .fa-pull-right.headerlink,.rst-content h6 .fa-pull-right.headerlink,.rst-content p .fa-pull-right.headerlink,.rst-content table>caption .fa-pull-right.headerlink,.rst-content tt.download span.fa-pull-right:first-child,.wy-menu-vertical li.current>a button.fa-pull-right.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-right.toctree-expand,.wy-menu-vertical li button.fa-pull-right.toctree-expand{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left,.pull-left.icon,.rst-content .code-block-caption .pull-left.headerlink,.rst-content .eqno .pull-left.headerlink,.rst-content .pull-left.admonition-title,.rst-content code.download span.pull-left:first-child,.rst-content dl dt .pull-left.headerlink,.rst-content h1 .pull-left.headerlink,.rst-content h2 .pull-left.headerlink,.rst-content h3 .pull-left.headerlink,.rst-content h4 .pull-left.headerlink,.rst-content h5 .pull-left.headerlink,.rst-content h6 .pull-left.headerlink,.rst-content p .pull-left.headerlink,.rst-content table>caption .pull-left.headerlink,.rst-content tt.download span.pull-left:first-child,.wy-menu-vertical li.current>a button.pull-left.toctree-expand,.wy-menu-vertical li.on a button.pull-left.toctree-expand,.wy-menu-vertical li button.pull-left.toctree-expand{margin-right:.3em}.fa.pull-right,.pull-right.icon,.rst-content .code-block-caption .pull-right.headerlink,.rst-content .eqno .pull-right.headerlink,.rst-content .pull-right.admonition-title,.rst-content code.download span.pull-right:first-child,.rst-content dl dt .pull-right.headerlink,.rst-content h1 .pull-right.headerlink,.rst-content h2 .pull-right.headerlink,.rst-content h3 .pull-right.headerlink,.rst-content h4 .pull-right.headerlink,.rst-content h5 .pull-right.headerlink,.rst-content h6 .pull-right.headerlink,.rst-content p .pull-right.headerlink,.rst-content table>caption .pull-right.headerlink,.rst-content tt.download span.pull-right:first-child,.wy-menu-vertical li.current>a button.pull-right.toctree-expand,.wy-menu-vertical li.on a button.pull-right.toctree-expand,.wy-menu-vertical li button.pull-right.toctree-expand{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s linear infinite;animation:fa-spin 2s linear infinite}.fa-pulse{-webkit-animation:fa-spin 1s steps(8) infinite;animation:fa-spin 1s steps(8) infinite}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scaleX(-1);-ms-transform:scaleX(-1);transform:scaleX(-1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scaleY(-1);-ms-transform:scaleY(-1);transform:scaleY(-1)}:root .fa-flip-horizontal,:root .fa-flip-vertical,:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:""}.fa-music:before{content:""}.fa-search:before,.icon-search:before{content:""}.fa-envelope-o:before{content:""}.fa-heart:before{content:""}.fa-star:before{content:""}.fa-star-o:before{content:""}.fa-user:before{content:""}.fa-film:before{content:""}.fa-th-large:before{content:""}.fa-th:before{content:""}.fa-th-list:before{content:""}.fa-check:before{content:""}.fa-close:before,.fa-remove:before,.fa-times:before{content:""}.fa-search-plus:before{content:""}.fa-search-minus:before{content:""}.fa-power-off:before{content:""}.fa-signal:before{content:""}.fa-cog:before,.fa-gear:before{content:""}.fa-trash-o:before{content:""}.fa-home:before,.icon-home:before{content:""}.fa-file-o:before{content:""}.fa-clock-o:before{content:""}.fa-road:before{content:""}.fa-download:before,.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{content:""}.fa-arrow-circle-o-down:before{content:""}.fa-arrow-circle-o-up:before{content:""}.fa-inbox:before{content:""}.fa-play-circle-o:before{content:""}.fa-repeat:before,.fa-rotate-right:before{content:""}.fa-refresh:before{content:""}.fa-list-alt:before{content:""}.fa-lock:before{content:""}.fa-flag:before{content:""}.fa-headphones:before{content:""}.fa-volume-off:before{content:""}.fa-volume-down:before{content:""}.fa-volume-up:before{content:""}.fa-qrcode:before{content:""}.fa-barcode:before{content:""}.fa-tag:before{content:""}.fa-tags:before{content:""}.fa-book:before,.icon-book:before{content:""}.fa-bookmark:before{content:""}.fa-print:before{content:""}.fa-camera:before{content:""}.fa-font:before{content:""}.fa-bold:before{content:""}.fa-italic:before{content:""}.fa-text-height:before{content:""}.fa-text-width:before{content:""}.fa-align-left:before{content:""}.fa-align-center:before{content:""}.fa-align-right:before{content:""}.fa-align-justify:before{content:""}.fa-list:before{content:""}.fa-dedent:before,.fa-outdent:before{content:""}.fa-indent:before{content:""}.fa-video-camera:before{content:""}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:""}.fa-pencil:before{content:""}.fa-map-marker:before{content:""}.fa-adjust:before{content:""}.fa-tint:before{content:""}.fa-edit:before,.fa-pencil-square-o:before{content:""}.fa-share-square-o:before{content:""}.fa-check-square-o:before{content:""}.fa-arrows:before{content:""}.fa-step-backward:before{content:""}.fa-fast-backward:before{content:""}.fa-backward:before{content:""}.fa-play:before{content:""}.fa-pause:before{content:""}.fa-stop:before{content:""}.fa-forward:before{content:""}.fa-fast-forward:before{content:""}.fa-step-forward:before{content:""}.fa-eject:before{content:""}.fa-chevron-left:before{content:""}.fa-chevron-right:before{content:""}.fa-plus-circle:before{content:""}.fa-minus-circle:before{content:""}.fa-times-circle:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before{content:""}.fa-check-circle:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before{content:""}.fa-question-circle:before{content:""}.fa-info-circle:before{content:""}.fa-crosshairs:before{content:""}.fa-times-circle-o:before{content:""}.fa-check-circle-o:before{content:""}.fa-ban:before{content:""}.fa-arrow-left:before{content:""}.fa-arrow-right:before{content:""}.fa-arrow-up:before{content:""}.fa-arrow-down:before{content:""}.fa-mail-forward:before,.fa-share:before{content:""}.fa-expand:before{content:""}.fa-compress:before{content:""}.fa-plus:before{content:""}.fa-minus:before{content:""}.fa-asterisk:before{content:""}.fa-exclamation-circle:before,.rst-content .admonition-title:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before{content:""}.fa-gift:before{content:""}.fa-leaf:before{content:""}.fa-fire:before,.icon-fire:before{content:""}.fa-eye:before{content:""}.fa-eye-slash:before{content:""}.fa-exclamation-triangle:before,.fa-warning:before{content:""}.fa-plane:before{content:""}.fa-calendar:before{content:""}.fa-random:before{content:""}.fa-comment:before{content:""}.fa-magnet:before{content:""}.fa-chevron-up:before{content:""}.fa-chevron-down:before{content:""}.fa-retweet:before{content:""}.fa-shopping-cart:before{content:""}.fa-folder:before{content:""}.fa-folder-open:before{content:""}.fa-arrows-v:before{content:""}.fa-arrows-h:before{content:""}.fa-bar-chart-o:before,.fa-bar-chart:before{content:""}.fa-twitter-square:before{content:""}.fa-facebook-square:before{content:""}.fa-camera-retro:before{content:""}.fa-key:before{content:""}.fa-cogs:before,.fa-gears:before{content:""}.fa-comments:before{content:""}.fa-thumbs-o-up:before{content:""}.fa-thumbs-o-down:before{content:""}.fa-star-half:before{content:""}.fa-heart-o:before{content:""}.fa-sign-out:before{content:""}.fa-linkedin-square:before{content:""}.fa-thumb-tack:before{content:""}.fa-external-link:before{content:""}.fa-sign-in:before{content:""}.fa-trophy:before{content:""}.fa-github-square:before{content:""}.fa-upload:before{content:""}.fa-lemon-o:before{content:""}.fa-phone:before{content:""}.fa-square-o:before{content:""}.fa-bookmark-o:before{content:""}.fa-phone-square:before{content:""}.fa-twitter:before{content:""}.fa-facebook-f:before,.fa-facebook:before{content:""}.fa-github:before,.icon-github:before{content:""}.fa-unlock:before{content:""}.fa-credit-card:before{content:""}.fa-feed:before,.fa-rss:before{content:""}.fa-hdd-o:before{content:""}.fa-bullhorn:before{content:""}.fa-bell:before{content:""}.fa-certificate:before{content:""}.fa-hand-o-right:before{content:""}.fa-hand-o-left:before{content:""}.fa-hand-o-up:before{content:""}.fa-hand-o-down:before{content:""}.fa-arrow-circle-left:before,.icon-circle-arrow-left:before{content:""}.fa-arrow-circle-right:before,.icon-circle-arrow-right:before{content:""}.fa-arrow-circle-up:before{content:""}.fa-arrow-circle-down:before{content:""}.fa-globe:before{content:""}.fa-wrench:before{content:""}.fa-tasks:before{content:""}.fa-filter:before{content:""}.fa-briefcase:before{content:""}.fa-arrows-alt:before{content:""}.fa-group:before,.fa-users:before{content:""}.fa-chain:before,.fa-link:before,.icon-link:before{content:""}.fa-cloud:before{content:""}.fa-flask:before{content:""}.fa-cut:before,.fa-scissors:before{content:""}.fa-copy:before,.fa-files-o:before{content:""}.fa-paperclip:before{content:""}.fa-floppy-o:before,.fa-save:before{content:""}.fa-square:before{content:""}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:""}.fa-list-ul:before{content:""}.fa-list-ol:before{content:""}.fa-strikethrough:before{content:""}.fa-underline:before{content:""}.fa-table:before{content:""}.fa-magic:before{content:""}.fa-truck:before{content:""}.fa-pinterest:before{content:""}.fa-pinterest-square:before{content:""}.fa-google-plus-square:before{content:""}.fa-google-plus:before{content:""}.fa-money:before{content:""}.fa-caret-down:before,.icon-caret-down:before,.wy-dropdown .caret:before{content:""}.fa-caret-up:before{content:""}.fa-caret-left:before{content:""}.fa-caret-right:before{content:""}.fa-columns:before{content:""}.fa-sort:before,.fa-unsorted:before{content:""}.fa-sort-desc:before,.fa-sort-down:before{content:""}.fa-sort-asc:before,.fa-sort-up:before{content:""}.fa-envelope:before{content:""}.fa-linkedin:before{content:""}.fa-rotate-left:before,.fa-undo:before{content:""}.fa-gavel:before,.fa-legal:before{content:""}.fa-dashboard:before,.fa-tachometer:before{content:""}.fa-comment-o:before{content:""}.fa-comments-o:before{content:""}.fa-bolt:before,.fa-flash:before{content:""}.fa-sitemap:before{content:""}.fa-umbrella:before{content:""}.fa-clipboard:before,.fa-paste:before{content:""}.fa-lightbulb-o:before{content:""}.fa-exchange:before{content:""}.fa-cloud-download:before{content:""}.fa-cloud-upload:before{content:""}.fa-user-md:before{content:""}.fa-stethoscope:before{content:""}.fa-suitcase:before{content:""}.fa-bell-o:before{content:""}.fa-coffee:before{content:""}.fa-cutlery:before{content:""}.fa-file-text-o:before{content:""}.fa-building-o:before{content:""}.fa-hospital-o:before{content:""}.fa-ambulance:before{content:""}.fa-medkit:before{content:""}.fa-fighter-jet:before{content:""}.fa-beer:before{content:""}.fa-h-square:before{content:""}.fa-plus-square:before{content:""}.fa-angle-double-left:before{content:""}.fa-angle-double-right:before{content:""}.fa-angle-double-up:before{content:""}.fa-angle-double-down:before{content:""}.fa-angle-left:before{content:""}.fa-angle-right:before{content:""}.fa-angle-up:before{content:""}.fa-angle-down:before{content:""}.fa-desktop:before{content:""}.fa-laptop:before{content:""}.fa-tablet:before{content:""}.fa-mobile-phone:before,.fa-mobile:before{content:""}.fa-circle-o:before{content:""}.fa-quote-left:before{content:""}.fa-quote-right:before{content:""}.fa-spinner:before{content:""}.fa-circle:before{content:""}.fa-mail-reply:before,.fa-reply:before{content:""}.fa-github-alt:before{content:""}.fa-folder-o:before{content:""}.fa-folder-open-o:before{content:""}.fa-smile-o:before{content:""}.fa-frown-o:before{content:""}.fa-meh-o:before{content:""}.fa-gamepad:before{content:""}.fa-keyboard-o:before{content:""}.fa-flag-o:before{content:""}.fa-flag-checkered:before{content:""}.fa-terminal:before{content:""}.fa-code:before{content:""}.fa-mail-reply-all:before,.fa-reply-all:before{content:""}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:""}.fa-location-arrow:before{content:""}.fa-crop:before{content:""}.fa-code-fork:before{content:""}.fa-chain-broken:before,.fa-unlink:before{content:""}.fa-question:before{content:""}.fa-info:before{content:""}.fa-exclamation:before{content:""}.fa-superscript:before{content:""}.fa-subscript:before{content:""}.fa-eraser:before{content:""}.fa-puzzle-piece:before{content:""}.fa-microphone:before{content:""}.fa-microphone-slash:before{content:""}.fa-shield:before{content:""}.fa-calendar-o:before{content:""}.fa-fire-extinguisher:before{content:""}.fa-rocket:before{content:""}.fa-maxcdn:before{content:""}.fa-chevron-circle-left:before{content:""}.fa-chevron-circle-right:before{content:""}.fa-chevron-circle-up:before{content:""}.fa-chevron-circle-down:before{content:""}.fa-html5:before{content:""}.fa-css3:before{content:""}.fa-anchor:before{content:""}.fa-unlock-alt:before{content:""}.fa-bullseye:before{content:""}.fa-ellipsis-h:before{content:""}.fa-ellipsis-v:before{content:""}.fa-rss-square:before{content:""}.fa-play-circle:before{content:""}.fa-ticket:before{content:""}.fa-minus-square:before{content:""}.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before{content:""}.fa-level-up:before{content:""}.fa-level-down:before{content:""}.fa-check-square:before{content:""}.fa-pencil-square:before{content:""}.fa-external-link-square:before{content:""}.fa-share-square:before{content:""}.fa-compass:before{content:""}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:""}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:""}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:""}.fa-eur:before,.fa-euro:before{content:""}.fa-gbp:before{content:""}.fa-dollar:before,.fa-usd:before{content:""}.fa-inr:before,.fa-rupee:before{content:""}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:""}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:""}.fa-krw:before,.fa-won:before{content:""}.fa-bitcoin:before,.fa-btc:before{content:""}.fa-file:before{content:""}.fa-file-text:before{content:""}.fa-sort-alpha-asc:before{content:""}.fa-sort-alpha-desc:before{content:""}.fa-sort-amount-asc:before{content:""}.fa-sort-amount-desc:before{content:""}.fa-sort-numeric-asc:before{content:""}.fa-sort-numeric-desc:before{content:""}.fa-thumbs-up:before{content:""}.fa-thumbs-down:before{content:""}.fa-youtube-square:before{content:""}.fa-youtube:before{content:""}.fa-xing:before{content:""}.fa-xing-square:before{content:""}.fa-youtube-play:before{content:""}.fa-dropbox:before{content:""}.fa-stack-overflow:before{content:""}.fa-instagram:before{content:""}.fa-flickr:before{content:""}.fa-adn:before{content:""}.fa-bitbucket:before,.icon-bitbucket:before{content:""}.fa-bitbucket-square:before{content:""}.fa-tumblr:before{content:""}.fa-tumblr-square:before{content:""}.fa-long-arrow-down:before{content:""}.fa-long-arrow-up:before{content:""}.fa-long-arrow-left:before{content:""}.fa-long-arrow-right:before{content:""}.fa-apple:before{content:""}.fa-windows:before{content:""}.fa-android:before{content:""}.fa-linux:before{content:""}.fa-dribbble:before{content:""}.fa-skype:before{content:""}.fa-foursquare:before{content:""}.fa-trello:before{content:""}.fa-female:before{content:""}.fa-male:before{content:""}.fa-gittip:before,.fa-gratipay:before{content:""}.fa-sun-o:before{content:""}.fa-moon-o:before{content:""}.fa-archive:before{content:""}.fa-bug:before{content:""}.fa-vk:before{content:""}.fa-weibo:before{content:""}.fa-renren:before{content:""}.fa-pagelines:before{content:""}.fa-stack-exchange:before{content:""}.fa-arrow-circle-o-right:before{content:""}.fa-arrow-circle-o-left:before{content:""}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:""}.fa-dot-circle-o:before{content:""}.fa-wheelchair:before{content:""}.fa-vimeo-square:before{content:""}.fa-try:before,.fa-turkish-lira:before{content:""}.fa-plus-square-o:before,.wy-menu-vertical li button.toctree-expand:before{content:""}.fa-space-shuttle:before{content:""}.fa-slack:before{content:""}.fa-envelope-square:before{content:""}.fa-wordpress:before{content:""}.fa-openid:before{content:""}.fa-bank:before,.fa-institution:before,.fa-university:before{content:""}.fa-graduation-cap:before,.fa-mortar-board:before{content:""}.fa-yahoo:before{content:""}.fa-google:before{content:""}.fa-reddit:before{content:""}.fa-reddit-square:before{content:""}.fa-stumbleupon-circle:before{content:""}.fa-stumbleupon:before{content:""}.fa-delicious:before{content:""}.fa-digg:before{content:""}.fa-pied-piper-pp:before{content:""}.fa-pied-piper-alt:before{content:""}.fa-drupal:before{content:""}.fa-joomla:before{content:""}.fa-language:before{content:""}.fa-fax:before{content:""}.fa-building:before{content:""}.fa-child:before{content:""}.fa-paw:before{content:""}.fa-spoon:before{content:""}.fa-cube:before{content:""}.fa-cubes:before{content:""}.fa-behance:before{content:""}.fa-behance-square:before{content:""}.fa-steam:before{content:""}.fa-steam-square:before{content:""}.fa-recycle:before{content:""}.fa-automobile:before,.fa-car:before{content:""}.fa-cab:before,.fa-taxi:before{content:""}.fa-tree:before{content:""}.fa-spotify:before{content:""}.fa-deviantart:before{content:""}.fa-soundcloud:before{content:""}.fa-database:before{content:""}.fa-file-pdf-o:before{content:""}.fa-file-word-o:before{content:""}.fa-file-excel-o:before{content:""}.fa-file-powerpoint-o:before{content:""}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:""}.fa-file-archive-o:before,.fa-file-zip-o:before{content:""}.fa-file-audio-o:before,.fa-file-sound-o:before{content:""}.fa-file-movie-o:before,.fa-file-video-o:before{content:""}.fa-file-code-o:before{content:""}.fa-vine:before{content:""}.fa-codepen:before{content:""}.fa-jsfiddle:before{content:""}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:""}.fa-circle-o-notch:before{content:""}.fa-ra:before,.fa-rebel:before,.fa-resistance:before{content:""}.fa-empire:before,.fa-ge:before{content:""}.fa-git-square:before{content:""}.fa-git:before{content:""}.fa-hacker-news:before,.fa-y-combinator-square:before,.fa-yc-square:before{content:""}.fa-tencent-weibo:before{content:""}.fa-qq:before{content:""}.fa-wechat:before,.fa-weixin:before{content:""}.fa-paper-plane:before,.fa-send:before{content:""}.fa-paper-plane-o:before,.fa-send-o:before{content:""}.fa-history:before{content:""}.fa-circle-thin:before{content:""}.fa-header:before{content:""}.fa-paragraph:before{content:""}.fa-sliders:before{content:""}.fa-share-alt:before{content:""}.fa-share-alt-square:before{content:""}.fa-bomb:before{content:""}.fa-futbol-o:before,.fa-soccer-ball-o:before{content:""}.fa-tty:before{content:""}.fa-binoculars:before{content:""}.fa-plug:before{content:""}.fa-slideshare:before{content:""}.fa-twitch:before{content:""}.fa-yelp:before{content:""}.fa-newspaper-o:before{content:""}.fa-wifi:before{content:""}.fa-calculator:before{content:""}.fa-paypal:before{content:""}.fa-google-wallet:before{content:""}.fa-cc-visa:before{content:""}.fa-cc-mastercard:before{content:""}.fa-cc-discover:before{content:""}.fa-cc-amex:before{content:""}.fa-cc-paypal:before{content:""}.fa-cc-stripe:before{content:""}.fa-bell-slash:before{content:""}.fa-bell-slash-o:before{content:""}.fa-trash:before{content:""}.fa-copyright:before{content:""}.fa-at:before{content:""}.fa-eyedropper:before{content:""}.fa-paint-brush:before{content:""}.fa-birthday-cake:before{content:""}.fa-area-chart:before{content:""}.fa-pie-chart:before{content:""}.fa-line-chart:before{content:""}.fa-lastfm:before{content:""}.fa-lastfm-square:before{content:""}.fa-toggle-off:before{content:""}.fa-toggle-on:before{content:""}.fa-bicycle:before{content:""}.fa-bus:before{content:""}.fa-ioxhost:before{content:""}.fa-angellist:before{content:""}.fa-cc:before{content:""}.fa-ils:before,.fa-shekel:before,.fa-sheqel:before{content:""}.fa-meanpath:before{content:""}.fa-buysellads:before{content:""}.fa-connectdevelop:before{content:""}.fa-dashcube:before{content:""}.fa-forumbee:before{content:""}.fa-leanpub:before{content:""}.fa-sellsy:before{content:""}.fa-shirtsinbulk:before{content:""}.fa-simplybuilt:before{content:""}.fa-skyatlas:before{content:""}.fa-cart-plus:before{content:""}.fa-cart-arrow-down:before{content:""}.fa-diamond:before{content:""}.fa-ship:before{content:""}.fa-user-secret:before{content:""}.fa-motorcycle:before{content:""}.fa-street-view:before{content:""}.fa-heartbeat:before{content:""}.fa-venus:before{content:""}.fa-mars:before{content:""}.fa-mercury:before{content:""}.fa-intersex:before,.fa-transgender:before{content:""}.fa-transgender-alt:before{content:""}.fa-venus-double:before{content:""}.fa-mars-double:before{content:""}.fa-venus-mars:before{content:""}.fa-mars-stroke:before{content:""}.fa-mars-stroke-v:before{content:""}.fa-mars-stroke-h:before{content:""}.fa-neuter:before{content:""}.fa-genderless:before{content:""}.fa-facebook-official:before{content:""}.fa-pinterest-p:before{content:""}.fa-whatsapp:before{content:""}.fa-server:before{content:""}.fa-user-plus:before{content:""}.fa-user-times:before{content:""}.fa-bed:before,.fa-hotel:before{content:""}.fa-viacoin:before{content:""}.fa-train:before{content:""}.fa-subway:before{content:""}.fa-medium:before{content:""}.fa-y-combinator:before,.fa-yc:before{content:""}.fa-optin-monster:before{content:""}.fa-opencart:before{content:""}.fa-expeditedssl:before{content:""}.fa-battery-4:before,.fa-battery-full:before,.fa-battery:before{content:""}.fa-battery-3:before,.fa-battery-three-quarters:before{content:""}.fa-battery-2:before,.fa-battery-half:before{content:""}.fa-battery-1:before,.fa-battery-quarter:before{content:""}.fa-battery-0:before,.fa-battery-empty:before{content:""}.fa-mouse-pointer:before{content:""}.fa-i-cursor:before{content:""}.fa-object-group:before{content:""}.fa-object-ungroup:before{content:""}.fa-sticky-note:before{content:""}.fa-sticky-note-o:before{content:""}.fa-cc-jcb:before{content:""}.fa-cc-diners-club:before{content:""}.fa-clone:before{content:""}.fa-balance-scale:before{content:""}.fa-hourglass-o:before{content:""}.fa-hourglass-1:before,.fa-hourglass-start:before{content:""}.fa-hourglass-2:before,.fa-hourglass-half:before{content:""}.fa-hourglass-3:before,.fa-hourglass-end:before{content:""}.fa-hourglass:before{content:""}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:""}.fa-hand-paper-o:before,.fa-hand-stop-o:before{content:""}.fa-hand-scissors-o:before{content:""}.fa-hand-lizard-o:before{content:""}.fa-hand-spock-o:before{content:""}.fa-hand-pointer-o:before{content:""}.fa-hand-peace-o:before{content:""}.fa-trademark:before{content:""}.fa-registered:before{content:""}.fa-creative-commons:before{content:""}.fa-gg:before{content:""}.fa-gg-circle:before{content:""}.fa-tripadvisor:before{content:""}.fa-odnoklassniki:before{content:""}.fa-odnoklassniki-square:before{content:""}.fa-get-pocket:before{content:""}.fa-wikipedia-w:before{content:""}.fa-safari:before{content:""}.fa-chrome:before{content:""}.fa-firefox:before{content:""}.fa-opera:before{content:""}.fa-internet-explorer:before{content:""}.fa-television:before,.fa-tv:before{content:""}.fa-contao:before{content:""}.fa-500px:before{content:""}.fa-amazon:before{content:""}.fa-calendar-plus-o:before{content:""}.fa-calendar-minus-o:before{content:""}.fa-calendar-times-o:before{content:""}.fa-calendar-check-o:before{content:""}.fa-industry:before{content:""}.fa-map-pin:before{content:""}.fa-map-signs:before{content:""}.fa-map-o:before{content:""}.fa-map:before{content:""}.fa-commenting:before{content:""}.fa-commenting-o:before{content:""}.fa-houzz:before{content:""}.fa-vimeo:before{content:""}.fa-black-tie:before{content:""}.fa-fonticons:before{content:""}.fa-reddit-alien:before{content:""}.fa-edge:before{content:""}.fa-credit-card-alt:before{content:""}.fa-codiepie:before{content:""}.fa-modx:before{content:""}.fa-fort-awesome:before{content:""}.fa-usb:before{content:""}.fa-product-hunt:before{content:""}.fa-mixcloud:before{content:""}.fa-scribd:before{content:""}.fa-pause-circle:before{content:""}.fa-pause-circle-o:before{content:""}.fa-stop-circle:before{content:""}.fa-stop-circle-o:before{content:""}.fa-shopping-bag:before{content:""}.fa-shopping-basket:before{content:""}.fa-hashtag:before{content:""}.fa-bluetooth:before{content:""}.fa-bluetooth-b:before{content:""}.fa-percent:before{content:""}.fa-gitlab:before,.icon-gitlab:before{content:""}.fa-wpbeginner:before{content:""}.fa-wpforms:before{content:""}.fa-envira:before{content:""}.fa-universal-access:before{content:""}.fa-wheelchair-alt:before{content:""}.fa-question-circle-o:before{content:""}.fa-blind:before{content:""}.fa-audio-description:before{content:""}.fa-volume-control-phone:before{content:""}.fa-braille:before{content:""}.fa-assistive-listening-systems:before{content:""}.fa-american-sign-language-interpreting:before,.fa-asl-interpreting:before{content:""}.fa-deaf:before,.fa-deafness:before,.fa-hard-of-hearing:before{content:""}.fa-glide:before{content:""}.fa-glide-g:before{content:""}.fa-sign-language:before,.fa-signing:before{content:""}.fa-low-vision:before{content:""}.fa-viadeo:before{content:""}.fa-viadeo-square:before{content:""}.fa-snapchat:before{content:""}.fa-snapchat-ghost:before{content:""}.fa-snapchat-square:before{content:""}.fa-pied-piper:before{content:""}.fa-first-order:before{content:""}.fa-yoast:before{content:""}.fa-themeisle:before{content:""}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:""}.fa-fa:before,.fa-font-awesome:before{content:""}.fa-handshake-o:before{content:""}.fa-envelope-open:before{content:""}.fa-envelope-open-o:before{content:""}.fa-linode:before{content:""}.fa-address-book:before{content:""}.fa-address-book-o:before{content:""}.fa-address-card:before,.fa-vcard:before{content:""}.fa-address-card-o:before,.fa-vcard-o:before{content:""}.fa-user-circle:before{content:""}.fa-user-circle-o:before{content:""}.fa-user-o:before{content:""}.fa-id-badge:before{content:""}.fa-drivers-license:before,.fa-id-card:before{content:""}.fa-drivers-license-o:before,.fa-id-card-o:before{content:""}.fa-quora:before{content:""}.fa-free-code-camp:before{content:""}.fa-telegram:before{content:""}.fa-thermometer-4:before,.fa-thermometer-full:before,.fa-thermometer:before{content:""}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:""}.fa-thermometer-2:before,.fa-thermometer-half:before{content:""}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:""}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:""}.fa-shower:before{content:""}.fa-bath:before,.fa-bathtub:before,.fa-s15:before{content:""}.fa-podcast:before{content:""}.fa-window-maximize:before{content:""}.fa-window-minimize:before{content:""}.fa-window-restore:before{content:""}.fa-times-rectangle:before,.fa-window-close:before{content:""}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:""}.fa-bandcamp:before{content:""}.fa-grav:before{content:""}.fa-etsy:before{content:""}.fa-imdb:before{content:""}.fa-ravelry:before{content:""}.fa-eercast:before{content:""}.fa-microchip:before{content:""}.fa-snowflake-o:before{content:""}.fa-superpowers:before{content:""}.fa-wpexplorer:before{content:""}.fa-meetup:before{content:""}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0,0,0,0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-dropdown .caret,.wy-inline-validate.wy-inline-validate-danger .wy-input-context,.wy-inline-validate.wy-inline-validate-info .wy-input-context,.wy-inline-validate.wy-inline-validate-success .wy-input-context,.wy-inline-validate.wy-inline-validate-warning .wy-input-context,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{font-family:inherit}.fa:before,.icon:before,.rst-content .admonition-title:before,.rst-content .code-block-caption .headerlink:before,.rst-content .eqno .headerlink:before,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before{font-family:FontAwesome;display:inline-block;font-style:normal;font-weight:400;line-height:1;text-decoration:inherit}.rst-content .code-block-caption a .headerlink,.rst-content .eqno a .headerlink,.rst-content a .admonition-title,.rst-content code.download a span:first-child,.rst-content dl dt a .headerlink,.rst-content h1 a .headerlink,.rst-content h2 a .headerlink,.rst-content h3 a .headerlink,.rst-content h4 a .headerlink,.rst-content h5 a .headerlink,.rst-content h6 a .headerlink,.rst-content p.caption a .headerlink,.rst-content p a .headerlink,.rst-content table>caption a .headerlink,.rst-content tt.download a span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li a button.toctree-expand,a .fa,a .icon,a .rst-content .admonition-title,a .rst-content .code-block-caption .headerlink,a .rst-content .eqno .headerlink,a .rst-content code.download span:first-child,a .rst-content dl dt .headerlink,a .rst-content h1 .headerlink,a .rst-content h2 .headerlink,a .rst-content h3 .headerlink,a .rst-content h4 .headerlink,a .rst-content h5 .headerlink,a .rst-content h6 .headerlink,a .rst-content p.caption .headerlink,a .rst-content p .headerlink,a .rst-content table>caption .headerlink,a .rst-content tt.download span:first-child,a .wy-menu-vertical li button.toctree-expand{display:inline-block;text-decoration:inherit}.btn .fa,.btn .icon,.btn .rst-content .admonition-title,.btn .rst-content .code-block-caption .headerlink,.btn .rst-content .eqno .headerlink,.btn .rst-content code.download span:first-child,.btn .rst-content dl dt .headerlink,.btn .rst-content h1 .headerlink,.btn .rst-content h2 .headerlink,.btn .rst-content h3 .headerlink,.btn .rst-content h4 .headerlink,.btn .rst-content h5 .headerlink,.btn .rst-content h6 .headerlink,.btn .rst-content p .headerlink,.btn .rst-content table>caption .headerlink,.btn .rst-content tt.download span:first-child,.btn .wy-menu-vertical li.current>a button.toctree-expand,.btn .wy-menu-vertical li.on a button.toctree-expand,.btn .wy-menu-vertical li button.toctree-expand,.nav .fa,.nav .icon,.nav .rst-content .admonition-title,.nav .rst-content .code-block-caption .headerlink,.nav .rst-content .eqno .headerlink,.nav .rst-content code.download span:first-child,.nav .rst-content dl dt .headerlink,.nav .rst-content h1 .headerlink,.nav .rst-content h2 .headerlink,.nav .rst-content h3 .headerlink,.nav .rst-content h4 .headerlink,.nav .rst-content h5 .headerlink,.nav .rst-content h6 .headerlink,.nav .rst-content p .headerlink,.nav .rst-content table>caption .headerlink,.nav .rst-content tt.download span:first-child,.nav .wy-menu-vertical li.current>a button.toctree-expand,.nav .wy-menu-vertical li.on a button.toctree-expand,.nav .wy-menu-vertical li button.toctree-expand,.rst-content .btn .admonition-title,.rst-content .code-block-caption .btn .headerlink,.rst-content .code-block-caption .nav .headerlink,.rst-content .eqno .btn .headerlink,.rst-content .eqno .nav .headerlink,.rst-content .nav .admonition-title,.rst-content code.download .btn span:first-child,.rst-content code.download .nav span:first-child,.rst-content dl dt .btn .headerlink,.rst-content dl dt .nav .headerlink,.rst-content h1 .btn .headerlink,.rst-content h1 .nav .headerlink,.rst-content h2 .btn .headerlink,.rst-content h2 .nav .headerlink,.rst-content h3 .btn .headerlink,.rst-content h3 .nav .headerlink,.rst-content h4 .btn .headerlink,.rst-content h4 .nav .headerlink,.rst-content h5 .btn .headerlink,.rst-content h5 .nav .headerlink,.rst-content h6 .btn .headerlink,.rst-content h6 .nav .headerlink,.rst-content p .btn .headerlink,.rst-content p .nav .headerlink,.rst-content table>caption .btn .headerlink,.rst-content table>caption .nav .headerlink,.rst-content tt.download .btn span:first-child,.rst-content tt.download .nav span:first-child,.wy-menu-vertical li .btn button.toctree-expand,.wy-menu-vertical li.current>a .btn button.toctree-expand,.wy-menu-vertical li.current>a .nav button.toctree-expand,.wy-menu-vertical li .nav button.toctree-expand,.wy-menu-vertical li.on a .btn button.toctree-expand,.wy-menu-vertical li.on a .nav button.toctree-expand{display:inline}.btn .fa-large.icon,.btn .fa.fa-large,.btn .rst-content .code-block-caption .fa-large.headerlink,.btn .rst-content .eqno .fa-large.headerlink,.btn .rst-content .fa-large.admonition-title,.btn .rst-content code.download span.fa-large:first-child,.btn .rst-content dl dt .fa-large.headerlink,.btn .rst-content h1 .fa-large.headerlink,.btn .rst-content h2 .fa-large.headerlink,.btn .rst-content h3 .fa-large.headerlink,.btn .rst-content h4 .fa-large.headerlink,.btn .rst-content h5 .fa-large.headerlink,.btn .rst-content h6 .fa-large.headerlink,.btn .rst-content p .fa-large.headerlink,.btn .rst-content table>caption .fa-large.headerlink,.btn .rst-content tt.download span.fa-large:first-child,.btn .wy-menu-vertical li button.fa-large.toctree-expand,.nav .fa-large.icon,.nav .fa.fa-large,.nav .rst-content .code-block-caption .fa-large.headerlink,.nav .rst-content .eqno .fa-large.headerlink,.nav .rst-content .fa-large.admonition-title,.nav .rst-content code.download span.fa-large:first-child,.nav .rst-content dl dt .fa-large.headerlink,.nav .rst-content h1 .fa-large.headerlink,.nav .rst-content h2 .fa-large.headerlink,.nav .rst-content h3 .fa-large.headerlink,.nav .rst-content h4 .fa-large.headerlink,.nav .rst-content h5 .fa-large.headerlink,.nav .rst-content h6 .fa-large.headerlink,.nav .rst-content p .fa-large.headerlink,.nav .rst-content table>caption .fa-large.headerlink,.nav .rst-content tt.download span.fa-large:first-child,.nav .wy-menu-vertical li button.fa-large.toctree-expand,.rst-content .btn .fa-large.admonition-title,.rst-content .code-block-caption .btn .fa-large.headerlink,.rst-content .code-block-caption .nav .fa-large.headerlink,.rst-content .eqno .btn .fa-large.headerlink,.rst-content .eqno .nav .fa-large.headerlink,.rst-content .nav .fa-large.admonition-title,.rst-content code.download .btn span.fa-large:first-child,.rst-content code.download .nav span.fa-large:first-child,.rst-content dl dt .btn .fa-large.headerlink,.rst-content dl dt .nav .fa-large.headerlink,.rst-content h1 .btn .fa-large.headerlink,.rst-content h1 .nav .fa-large.headerlink,.rst-content h2 .btn .fa-large.headerlink,.rst-content h2 .nav .fa-large.headerlink,.rst-content h3 .btn .fa-large.headerlink,.rst-content h3 .nav .fa-large.headerlink,.rst-content h4 .btn .fa-large.headerlink,.rst-content h4 .nav .fa-large.headerlink,.rst-content h5 .btn .fa-large.headerlink,.rst-content h5 .nav .fa-large.headerlink,.rst-content h6 .btn .fa-large.headerlink,.rst-content h6 .nav .fa-large.headerlink,.rst-content p .btn .fa-large.headerlink,.rst-content p .nav .fa-large.headerlink,.rst-content table>caption .btn .fa-large.headerlink,.rst-content table>caption .nav .fa-large.headerlink,.rst-content tt.download .btn span.fa-large:first-child,.rst-content tt.download .nav span.fa-large:first-child,.wy-menu-vertical li .btn button.fa-large.toctree-expand,.wy-menu-vertical li .nav button.fa-large.toctree-expand{line-height:.9em}.btn .fa-spin.icon,.btn .fa.fa-spin,.btn .rst-content .code-block-caption .fa-spin.headerlink,.btn .rst-content .eqno .fa-spin.headerlink,.btn .rst-content .fa-spin.admonition-title,.btn .rst-content code.download span.fa-spin:first-child,.btn .rst-content dl dt .fa-spin.headerlink,.btn .rst-content h1 .fa-spin.headerlink,.btn .rst-content h2 .fa-spin.headerlink,.btn .rst-content h3 .fa-spin.headerlink,.btn .rst-content h4 .fa-spin.headerlink,.btn .rst-content h5 .fa-spin.headerlink,.btn .rst-content h6 .fa-spin.headerlink,.btn .rst-content p .fa-spin.headerlink,.btn .rst-content table>caption .fa-spin.headerlink,.btn .rst-content tt.download span.fa-spin:first-child,.btn .wy-menu-vertical li button.fa-spin.toctree-expand,.nav .fa-spin.icon,.nav .fa.fa-spin,.nav .rst-content .code-block-caption .fa-spin.headerlink,.nav .rst-content .eqno .fa-spin.headerlink,.nav .rst-content .fa-spin.admonition-title,.nav .rst-content code.download span.fa-spin:first-child,.nav .rst-content dl dt .fa-spin.headerlink,.nav .rst-content h1 .fa-spin.headerlink,.nav .rst-content h2 .fa-spin.headerlink,.nav .rst-content h3 .fa-spin.headerlink,.nav .rst-content h4 .fa-spin.headerlink,.nav .rst-content h5 .fa-spin.headerlink,.nav .rst-content h6 .fa-spin.headerlink,.nav .rst-content p .fa-spin.headerlink,.nav .rst-content table>caption .fa-spin.headerlink,.nav .rst-content tt.download span.fa-spin:first-child,.nav .wy-menu-vertical li button.fa-spin.toctree-expand,.rst-content .btn .fa-spin.admonition-title,.rst-content .code-block-caption .btn .fa-spin.headerlink,.rst-content .code-block-caption .nav .fa-spin.headerlink,.rst-content .eqno .btn .fa-spin.headerlink,.rst-content .eqno .nav .fa-spin.headerlink,.rst-content .nav .fa-spin.admonition-title,.rst-content code.download .btn span.fa-spin:first-child,.rst-content code.download .nav span.fa-spin:first-child,.rst-content dl dt .btn .fa-spin.headerlink,.rst-content dl dt .nav .fa-spin.headerlink,.rst-content h1 .btn .fa-spin.headerlink,.rst-content h1 .nav .fa-spin.headerlink,.rst-content h2 .btn .fa-spin.headerlink,.rst-content h2 .nav .fa-spin.headerlink,.rst-content h3 .btn .fa-spin.headerlink,.rst-content h3 .nav .fa-spin.headerlink,.rst-content h4 .btn .fa-spin.headerlink,.rst-content h4 .nav .fa-spin.headerlink,.rst-content h5 .btn .fa-spin.headerlink,.rst-content h5 .nav .fa-spin.headerlink,.rst-content h6 .btn .fa-spin.headerlink,.rst-content h6 .nav .fa-spin.headerlink,.rst-content p .btn .fa-spin.headerlink,.rst-content p .nav .fa-spin.headerlink,.rst-content table>caption .btn .fa-spin.headerlink,.rst-content table>caption .nav .fa-spin.headerlink,.rst-content tt.download .btn span.fa-spin:first-child,.rst-content tt.download .nav span.fa-spin:first-child,.wy-menu-vertical li .btn button.fa-spin.toctree-expand,.wy-menu-vertical li .nav button.fa-spin.toctree-expand{display:inline-block}.btn.fa:before,.btn.icon:before,.rst-content .btn.admonition-title:before,.rst-content .code-block-caption .btn.headerlink:before,.rst-content .eqno .btn.headerlink:before,.rst-content code.download span.btn:first-child:before,.rst-content dl dt .btn.headerlink:before,.rst-content h1 .btn.headerlink:before,.rst-content h2 .btn.headerlink:before,.rst-content h3 .btn.headerlink:before,.rst-content h4 .btn.headerlink:before,.rst-content h5 .btn.headerlink:before,.rst-content h6 .btn.headerlink:before,.rst-content p .btn.headerlink:before,.rst-content table>caption .btn.headerlink:before,.rst-content tt.download span.btn:first-child:before,.wy-menu-vertical li button.btn.toctree-expand:before{opacity:.5;-webkit-transition:opacity .05s ease-in;-moz-transition:opacity .05s ease-in;transition:opacity .05s ease-in}.btn.fa:hover:before,.btn.icon:hover:before,.rst-content .btn.admonition-title:hover:before,.rst-content .code-block-caption .btn.headerlink:hover:before,.rst-content .eqno .btn.headerlink:hover:before,.rst-content code.download span.btn:first-child:hover:before,.rst-content dl dt .btn.headerlink:hover:before,.rst-content h1 .btn.headerlink:hover:before,.rst-content h2 .btn.headerlink:hover:before,.rst-content h3 .btn.headerlink:hover:before,.rst-content h4 .btn.headerlink:hover:before,.rst-content h5 .btn.headerlink:hover:before,.rst-content h6 .btn.headerlink:hover:before,.rst-content p .btn.headerlink:hover:before,.rst-content table>caption .btn.headerlink:hover:before,.rst-content tt.download span.btn:first-child:hover:before,.wy-menu-vertical li button.btn.toctree-expand:hover:before{opacity:1}.btn-mini .fa:before,.btn-mini .icon:before,.btn-mini .rst-content .admonition-title:before,.btn-mini .rst-content .code-block-caption .headerlink:before,.btn-mini .rst-content .eqno .headerlink:before,.btn-mini .rst-content code.download span:first-child:before,.btn-mini .rst-content dl dt .headerlink:before,.btn-mini .rst-content h1 .headerlink:before,.btn-mini .rst-content h2 .headerlink:before,.btn-mini .rst-content h3 .headerlink:before,.btn-mini .rst-content h4 .headerlink:before,.btn-mini .rst-content h5 .headerlink:before,.btn-mini .rst-content h6 .headerlink:before,.btn-mini .rst-content p .headerlink:before,.btn-mini .rst-content table>caption .headerlink:before,.btn-mini .rst-content tt.download span:first-child:before,.btn-mini .wy-menu-vertical li button.toctree-expand:before,.rst-content .btn-mini .admonition-title:before,.rst-content .code-block-caption .btn-mini .headerlink:before,.rst-content .eqno .btn-mini .headerlink:before,.rst-content code.download .btn-mini span:first-child:before,.rst-content dl dt .btn-mini .headerlink:before,.rst-content h1 .btn-mini .headerlink:before,.rst-content h2 .btn-mini .headerlink:before,.rst-content h3 .btn-mini .headerlink:before,.rst-content h4 .btn-mini .headerlink:before,.rst-content h5 .btn-mini .headerlink:before,.rst-content h6 .btn-mini .headerlink:before,.rst-content p .btn-mini .headerlink:before,.rst-content table>caption .btn-mini .headerlink:before,.rst-content tt.download .btn-mini span:first-child:before,.wy-menu-vertical li .btn-mini button.toctree-expand:before{font-size:14px;vertical-align:-15%}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.wy-alert{padding:12px;line-height:24px;margin-bottom:24px;background:#e7f2fa}.rst-content .admonition-title,.wy-alert-title{font-weight:700;display:block;color:#fff;background:#6ab0de;padding:6px 12px;margin:-12px -12px 12px}.rst-content .danger,.rst-content .error,.rst-content .wy-alert-danger.admonition,.rst-content .wy-alert-danger.admonition-todo,.rst-content .wy-alert-danger.attention,.rst-content .wy-alert-danger.caution,.rst-content .wy-alert-danger.hint,.rst-content .wy-alert-danger.important,.rst-content .wy-alert-danger.note,.rst-content .wy-alert-danger.seealso,.rst-content .wy-alert-danger.tip,.rst-content .wy-alert-danger.warning,.wy-alert.wy-alert-danger{background:#fdf3f2}.rst-content .danger .admonition-title,.rst-content .danger .wy-alert-title,.rst-content .error .admonition-title,.rst-content .error .wy-alert-title,.rst-content .wy-alert-danger.admonition-todo .admonition-title,.rst-content .wy-alert-danger.admonition-todo .wy-alert-title,.rst-content .wy-alert-danger.admonition .admonition-title,.rst-content .wy-alert-danger.admonition .wy-alert-title,.rst-content .wy-alert-danger.attention .admonition-title,.rst-content .wy-alert-danger.attention .wy-alert-title,.rst-content .wy-alert-danger.caution .admonition-title,.rst-content .wy-alert-danger.caution .wy-alert-title,.rst-content .wy-alert-danger.hint .admonition-title,.rst-content .wy-alert-danger.hint .wy-alert-title,.rst-content .wy-alert-danger.important .admonition-title,.rst-content .wy-alert-danger.important .wy-alert-title,.rst-content .wy-alert-danger.note .admonition-title,.rst-content .wy-alert-danger.note .wy-alert-title,.rst-content .wy-alert-danger.seealso .admonition-title,.rst-content .wy-alert-danger.seealso .wy-alert-title,.rst-content .wy-alert-danger.tip .admonition-title,.rst-content .wy-alert-danger.tip .wy-alert-title,.rst-content .wy-alert-danger.warning .admonition-title,.rst-content .wy-alert-danger.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-danger .admonition-title,.wy-alert.wy-alert-danger .rst-content .admonition-title,.wy-alert.wy-alert-danger .wy-alert-title{background:#f29f97}.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .warning,.rst-content .wy-alert-warning.admonition,.rst-content .wy-alert-warning.danger,.rst-content .wy-alert-warning.error,.rst-content .wy-alert-warning.hint,.rst-content .wy-alert-warning.important,.rst-content .wy-alert-warning.note,.rst-content .wy-alert-warning.seealso,.rst-content .wy-alert-warning.tip,.wy-alert.wy-alert-warning{background:#ffedcc}.rst-content .admonition-todo .admonition-title,.rst-content .admonition-todo .wy-alert-title,.rst-content .attention .admonition-title,.rst-content .attention .wy-alert-title,.rst-content .caution .admonition-title,.rst-content .caution .wy-alert-title,.rst-content .warning .admonition-title,.rst-content .warning .wy-alert-title,.rst-content .wy-alert-warning.admonition .admonition-title,.rst-content .wy-alert-warning.admonition .wy-alert-title,.rst-content .wy-alert-warning.danger .admonition-title,.rst-content .wy-alert-warning.danger .wy-alert-title,.rst-content .wy-alert-warning.error .admonition-title,.rst-content .wy-alert-warning.error .wy-alert-title,.rst-content .wy-alert-warning.hint .admonition-title,.rst-content .wy-alert-warning.hint .wy-alert-title,.rst-content .wy-alert-warning.important .admonition-title,.rst-content .wy-alert-warning.important .wy-alert-title,.rst-content .wy-alert-warning.note .admonition-title,.rst-content .wy-alert-warning.note .wy-alert-title,.rst-content .wy-alert-warning.seealso .admonition-title,.rst-content .wy-alert-warning.seealso .wy-alert-title,.rst-content .wy-alert-warning.tip .admonition-title,.rst-content .wy-alert-warning.tip .wy-alert-title,.rst-content .wy-alert.wy-alert-warning .admonition-title,.wy-alert.wy-alert-warning .rst-content .admonition-title,.wy-alert.wy-alert-warning .wy-alert-title{background:#f0b37e}.rst-content .note,.rst-content .seealso,.rst-content .wy-alert-info.admonition,.rst-content .wy-alert-info.admonition-todo,.rst-content .wy-alert-info.attention,.rst-content .wy-alert-info.caution,.rst-content .wy-alert-info.danger,.rst-content .wy-alert-info.error,.rst-content .wy-alert-info.hint,.rst-content .wy-alert-info.important,.rst-content .wy-alert-info.tip,.rst-content .wy-alert-info.warning,.wy-alert.wy-alert-info{background:#e7f2fa}.rst-content .note .admonition-title,.rst-content .note .wy-alert-title,.rst-content .seealso .admonition-title,.rst-content .seealso .wy-alert-title,.rst-content .wy-alert-info.admonition-todo .admonition-title,.rst-content .wy-alert-info.admonition-todo .wy-alert-title,.rst-content .wy-alert-info.admonition .admonition-title,.rst-content .wy-alert-info.admonition .wy-alert-title,.rst-content .wy-alert-info.attention .admonition-title,.rst-content .wy-alert-info.attention .wy-alert-title,.rst-content .wy-alert-info.caution .admonition-title,.rst-content .wy-alert-info.caution .wy-alert-title,.rst-content .wy-alert-info.danger .admonition-title,.rst-content .wy-alert-info.danger .wy-alert-title,.rst-content .wy-alert-info.error .admonition-title,.rst-content .wy-alert-info.error .wy-alert-title,.rst-content .wy-alert-info.hint .admonition-title,.rst-content .wy-alert-info.hint .wy-alert-title,.rst-content .wy-alert-info.important .admonition-title,.rst-content .wy-alert-info.important .wy-alert-title,.rst-content .wy-alert-info.tip .admonition-title,.rst-content .wy-alert-info.tip .wy-alert-title,.rst-content .wy-alert-info.warning .admonition-title,.rst-content .wy-alert-info.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-info .admonition-title,.wy-alert.wy-alert-info .rst-content .admonition-title,.wy-alert.wy-alert-info .wy-alert-title{background:#6ab0de}.rst-content .hint,.rst-content .important,.rst-content .tip,.rst-content .wy-alert-success.admonition,.rst-content .wy-alert-success.admonition-todo,.rst-content .wy-alert-success.attention,.rst-content .wy-alert-success.caution,.rst-content .wy-alert-success.danger,.rst-content .wy-alert-success.error,.rst-content .wy-alert-success.note,.rst-content .wy-alert-success.seealso,.rst-content .wy-alert-success.warning,.wy-alert.wy-alert-success{background:#dbfaf4}.rst-content .hint .admonition-title,.rst-content .hint .wy-alert-title,.rst-content .important .admonition-title,.rst-content .important .wy-alert-title,.rst-content .tip .admonition-title,.rst-content .tip .wy-alert-title,.rst-content .wy-alert-success.admonition-todo .admonition-title,.rst-content .wy-alert-success.admonition-todo .wy-alert-title,.rst-content .wy-alert-success.admonition .admonition-title,.rst-content .wy-alert-success.admonition .wy-alert-title,.rst-content .wy-alert-success.attention .admonition-title,.rst-content .wy-alert-success.attention .wy-alert-title,.rst-content .wy-alert-success.caution .admonition-title,.rst-content .wy-alert-success.caution .wy-alert-title,.rst-content .wy-alert-success.danger .admonition-title,.rst-content .wy-alert-success.danger .wy-alert-title,.rst-content .wy-alert-success.error .admonition-title,.rst-content .wy-alert-success.error .wy-alert-title,.rst-content .wy-alert-success.note .admonition-title,.rst-content .wy-alert-success.note .wy-alert-title,.rst-content .wy-alert-success.seealso .admonition-title,.rst-content .wy-alert-success.seealso .wy-alert-title,.rst-content .wy-alert-success.warning .admonition-title,.rst-content .wy-alert-success.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-success .admonition-title,.wy-alert.wy-alert-success .rst-content .admonition-title,.wy-alert.wy-alert-success .wy-alert-title{background:#1abc9c}.rst-content .wy-alert-neutral.admonition,.rst-content .wy-alert-neutral.admonition-todo,.rst-content .wy-alert-neutral.attention,.rst-content .wy-alert-neutral.caution,.rst-content .wy-alert-neutral.danger,.rst-content .wy-alert-neutral.error,.rst-content .wy-alert-neutral.hint,.rst-content .wy-alert-neutral.important,.rst-content .wy-alert-neutral.note,.rst-content .wy-alert-neutral.seealso,.rst-content .wy-alert-neutral.tip,.rst-content .wy-alert-neutral.warning,.wy-alert.wy-alert-neutral{background:#f3f6f6}.rst-content .wy-alert-neutral.admonition-todo .admonition-title,.rst-content .wy-alert-neutral.admonition-todo .wy-alert-title,.rst-content .wy-alert-neutral.admonition .admonition-title,.rst-content .wy-alert-neutral.admonition .wy-alert-title,.rst-content .wy-alert-neutral.attention .admonition-title,.rst-content .wy-alert-neutral.attention .wy-alert-title,.rst-content .wy-alert-neutral.caution .admonition-title,.rst-content .wy-alert-neutral.caution .wy-alert-title,.rst-content .wy-alert-neutral.danger .admonition-title,.rst-content .wy-alert-neutral.danger .wy-alert-title,.rst-content .wy-alert-neutral.error .admonition-title,.rst-content .wy-alert-neutral.error .wy-alert-title,.rst-content .wy-alert-neutral.hint .admonition-title,.rst-content .wy-alert-neutral.hint .wy-alert-title,.rst-content .wy-alert-neutral.important .admonition-title,.rst-content .wy-alert-neutral.important .wy-alert-title,.rst-content .wy-alert-neutral.note .admonition-title,.rst-content .wy-alert-neutral.note .wy-alert-title,.rst-content .wy-alert-neutral.seealso .admonition-title,.rst-content .wy-alert-neutral.seealso .wy-alert-title,.rst-content .wy-alert-neutral.tip .admonition-title,.rst-content .wy-alert-neutral.tip .wy-alert-title,.rst-content .wy-alert-neutral.warning .admonition-title,.rst-content .wy-alert-neutral.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-neutral .admonition-title,.wy-alert.wy-alert-neutral .rst-content .admonition-title,.wy-alert.wy-alert-neutral .wy-alert-title{color:#404040;background:#e1e4e5}.rst-content .wy-alert-neutral.admonition-todo a,.rst-content .wy-alert-neutral.admonition a,.rst-content .wy-alert-neutral.attention a,.rst-content .wy-alert-neutral.caution a,.rst-content .wy-alert-neutral.danger a,.rst-content .wy-alert-neutral.error a,.rst-content .wy-alert-neutral.hint a,.rst-content .wy-alert-neutral.important a,.rst-content .wy-alert-neutral.note a,.rst-content .wy-alert-neutral.seealso a,.rst-content .wy-alert-neutral.tip a,.rst-content .wy-alert-neutral.warning a,.wy-alert.wy-alert-neutral a{color:#2980b9}.rst-content .admonition-todo p:last-child,.rst-content .admonition p:last-child,.rst-content .attention p:last-child,.rst-content .caution p:last-child,.rst-content .danger p:last-child,.rst-content .error p:last-child,.rst-content .hint p:last-child,.rst-content .important p:last-child,.rst-content .note p:last-child,.rst-content .seealso p:last-child,.rst-content .tip p:last-child,.rst-content .warning p:last-child,.wy-alert p:last-child{margin-bottom:0}.wy-tray-container{position:fixed;bottom:0;left:0;z-index:600}.wy-tray-container li{display:block;width:300px;background:transparent;color:#fff;text-align:center;box-shadow:0 5px 5px 0 rgba(0,0,0,.1);padding:0 24px;min-width:20%;opacity:0;height:0;line-height:56px;overflow:hidden;-webkit-transition:all .3s ease-in;-moz-transition:all .3s ease-in;transition:all .3s ease-in}.wy-tray-container li.wy-tray-item-success{background:#27ae60}.wy-tray-container li.wy-tray-item-info{background:#2980b9}.wy-tray-container li.wy-tray-item-warning{background:#e67e22}.wy-tray-container li.wy-tray-item-danger{background:#e74c3c}.wy-tray-container li.on{opacity:1;height:56px}@media screen and (max-width:768px){.wy-tray-container{bottom:auto;top:0;width:100%}.wy-tray-container li{width:100%}}button{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle;cursor:pointer;line-height:normal;-webkit-appearance:button;*overflow:visible}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}button[disabled]{cursor:default}.btn{display:inline-block;border-radius:2px;line-height:normal;white-space:nowrap;text-align:center;cursor:pointer;font-size:100%;padding:6px 12px 8px;color:#fff;border:1px solid rgba(0,0,0,.1);background-color:#27ae60;text-decoration:none;font-weight:400;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 2px -1px hsla(0,0%,100%,.5),inset 0 -2px 0 0 rgba(0,0,0,.1);outline-none:false;vertical-align:middle;*display:inline;zoom:1;-webkit-user-drag:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;-webkit-transition:all .1s linear;-moz-transition:all .1s linear;transition:all .1s linear}.btn-hover{background:#2e8ece;color:#fff}.btn:hover{background:#2cc36b;color:#fff}.btn:focus{background:#2cc36b;outline:0}.btn:active{box-shadow:inset 0 -1px 0 0 rgba(0,0,0,.05),inset 0 2px 0 0 rgba(0,0,0,.1);padding:8px 12px 6px}.btn:visited{color:#fff}.btn-disabled,.btn-disabled:active,.btn-disabled:focus,.btn-disabled:hover,.btn:disabled{background-image:none;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);filter:alpha(opacity=40);opacity:.4;cursor:not-allowed;box-shadow:none}.btn::-moz-focus-inner{padding:0;border:0}.btn-small{font-size:80%}.btn-info{background-color:#2980b9!important}.btn-info:hover{background-color:#2e8ece!important}.btn-neutral{background-color:#f3f6f6!important;color:#404040!important}.btn-neutral:hover{background-color:#e5ebeb!important;color:#404040}.btn-neutral:visited{color:#404040!important}.btn-success{background-color:#27ae60!important}.btn-success:hover{background-color:#295!important}.btn-danger{background-color:#e74c3c!important}.btn-danger:hover{background-color:#ea6153!important}.btn-warning{background-color:#e67e22!important}.btn-warning:hover{background-color:#e98b39!important}.btn-invert{background-color:#222}.btn-invert:hover{background-color:#2f2f2f!important}.btn-link{background-color:transparent!important;color:#2980b9;box-shadow:none;border-color:transparent!important}.btn-link:active,.btn-link:hover{background-color:transparent!important;color:#409ad5!important;box-shadow:none}.btn-link:visited{color:#9b59b6}.wy-btn-group .btn,.wy-control .btn{vertical-align:middle}.wy-btn-group{margin-bottom:24px;*zoom:1}.wy-btn-group:after,.wy-btn-group:before{display:table;content:""}.wy-btn-group:after{clear:both}.wy-dropdown{position:relative;display:inline-block}.wy-dropdown-active .wy-dropdown-menu{display:block}.wy-dropdown-menu{position:absolute;left:0;display:none;float:left;top:100%;min-width:100%;background:#fcfcfc;z-index:100;border:1px solid #cfd7dd;box-shadow:0 2px 2px 0 rgba(0,0,0,.1);padding:12px}.wy-dropdown-menu>dd>a{display:block;clear:both;color:#404040;white-space:nowrap;font-size:90%;padding:0 12px;cursor:pointer}.wy-dropdown-menu>dd>a:hover{background:#2980b9;color:#fff}.wy-dropdown-menu>dd.divider{border-top:1px solid #cfd7dd;margin:6px 0}.wy-dropdown-menu>dd.search{padding-bottom:12px}.wy-dropdown-menu>dd.search input[type=search]{width:100%}.wy-dropdown-menu>dd.call-to-action{background:#e3e3e3;text-transform:uppercase;font-weight:500;font-size:80%}.wy-dropdown-menu>dd.call-to-action:hover{background:#e3e3e3}.wy-dropdown-menu>dd.call-to-action .btn{color:#fff}.wy-dropdown.wy-dropdown-up .wy-dropdown-menu{bottom:100%;top:auto;left:auto;right:0}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu{background:#fcfcfc;margin-top:2px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a{padding:6px 12px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a:hover{background:#2980b9;color:#fff}.wy-dropdown.wy-dropdown-left .wy-dropdown-menu{right:0;left:auto;text-align:right}.wy-dropdown-arrow:before{content:" ";border-bottom:5px solid #f5f5f5;border-left:5px solid transparent;border-right:5px solid transparent;position:absolute;display:block;top:-4px;left:50%;margin-left:-3px}.wy-dropdown-arrow.wy-dropdown-arrow-left:before{left:11px}.wy-form-stacked select{display:block}.wy-form-aligned .wy-help-inline,.wy-form-aligned input,.wy-form-aligned label,.wy-form-aligned select,.wy-form-aligned textarea{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-form-aligned .wy-control-group>label{display:inline-block;vertical-align:middle;width:10em;margin:6px 12px 0 0;float:left}.wy-form-aligned .wy-control{float:left}.wy-form-aligned .wy-control label{display:block}.wy-form-aligned .wy-control select{margin-top:6px}fieldset{margin:0}fieldset,legend{border:0;padding:0}legend{width:100%;white-space:normal;margin-bottom:24px;font-size:150%;*margin-left:-7px}label,legend{display:block}label{margin:0 0 .3125em;color:#333;font-size:90%}input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}.wy-control-group{margin-bottom:24px;max-width:1200px;margin-left:auto;margin-right:auto;*zoom:1}.wy-control-group:after,.wy-control-group:before{display:table;content:""}.wy-control-group:after{clear:both}.wy-control-group.wy-control-group-required>label:after{content:" *";color:#e74c3c}.wy-control-group .wy-form-full,.wy-control-group .wy-form-halves,.wy-control-group .wy-form-thirds{padding-bottom:12px}.wy-control-group .wy-form-full input[type=color],.wy-control-group .wy-form-full input[type=date],.wy-control-group .wy-form-full input[type=datetime-local],.wy-control-group .wy-form-full input[type=datetime],.wy-control-group .wy-form-full input[type=email],.wy-control-group .wy-form-full input[type=month],.wy-control-group .wy-form-full input[type=number],.wy-control-group .wy-form-full input[type=password],.wy-control-group .wy-form-full input[type=search],.wy-control-group .wy-form-full input[type=tel],.wy-control-group .wy-form-full input[type=text],.wy-control-group .wy-form-full input[type=time],.wy-control-group .wy-form-full input[type=url],.wy-control-group .wy-form-full input[type=week],.wy-control-group .wy-form-full select,.wy-control-group .wy-form-halves input[type=color],.wy-control-group .wy-form-halves input[type=date],.wy-control-group .wy-form-halves input[type=datetime-local],.wy-control-group .wy-form-halves input[type=datetime],.wy-control-group .wy-form-halves input[type=email],.wy-control-group .wy-form-halves input[type=month],.wy-control-group .wy-form-halves input[type=number],.wy-control-group .wy-form-halves input[type=password],.wy-control-group .wy-form-halves input[type=search],.wy-control-group .wy-form-halves input[type=tel],.wy-control-group .wy-form-halves input[type=text],.wy-control-group .wy-form-halves input[type=time],.wy-control-group .wy-form-halves input[type=url],.wy-control-group .wy-form-halves input[type=week],.wy-control-group .wy-form-halves select,.wy-control-group .wy-form-thirds input[type=color],.wy-control-group .wy-form-thirds input[type=date],.wy-control-group .wy-form-thirds input[type=datetime-local],.wy-control-group .wy-form-thirds input[type=datetime],.wy-control-group .wy-form-thirds input[type=email],.wy-control-group .wy-form-thirds input[type=month],.wy-control-group .wy-form-thirds input[type=number],.wy-control-group .wy-form-thirds input[type=password],.wy-control-group .wy-form-thirds input[type=search],.wy-control-group .wy-form-thirds input[type=tel],.wy-control-group .wy-form-thirds input[type=text],.wy-control-group .wy-form-thirds input[type=time],.wy-control-group .wy-form-thirds input[type=url],.wy-control-group .wy-form-thirds input[type=week],.wy-control-group .wy-form-thirds select{width:100%}.wy-control-group .wy-form-full{float:left;display:block;width:100%;margin-right:0}.wy-control-group .wy-form-full:last-child{margin-right:0}.wy-control-group .wy-form-halves{float:left;display:block;margin-right:2.35765%;width:48.82117%}.wy-control-group .wy-form-halves:last-child,.wy-control-group .wy-form-halves:nth-of-type(2n){margin-right:0}.wy-control-group .wy-form-halves:nth-of-type(odd){clear:left}.wy-control-group .wy-form-thirds{float:left;display:block;margin-right:2.35765%;width:31.76157%}.wy-control-group .wy-form-thirds:last-child,.wy-control-group .wy-form-thirds:nth-of-type(3n){margin-right:0}.wy-control-group .wy-form-thirds:nth-of-type(3n+1){clear:left}.wy-control-group.wy-control-group-no-input .wy-control,.wy-control-no-input{margin:6px 0 0;font-size:90%}.wy-control-no-input{display:inline-block}.wy-control-group.fluid-input input[type=color],.wy-control-group.fluid-input input[type=date],.wy-control-group.fluid-input input[type=datetime-local],.wy-control-group.fluid-input input[type=datetime],.wy-control-group.fluid-input input[type=email],.wy-control-group.fluid-input input[type=month],.wy-control-group.fluid-input input[type=number],.wy-control-group.fluid-input input[type=password],.wy-control-group.fluid-input input[type=search],.wy-control-group.fluid-input input[type=tel],.wy-control-group.fluid-input input[type=text],.wy-control-group.fluid-input input[type=time],.wy-control-group.fluid-input input[type=url],.wy-control-group.fluid-input input[type=week]{width:100%}.wy-form-message-inline{padding-left:.3em;color:#666;font-size:90%}.wy-form-message{display:block;color:#999;font-size:70%;margin-top:.3125em;font-style:italic}.wy-form-message p{font-size:inherit;font-style:italic;margin-bottom:6px}.wy-form-message p:last-child{margin-bottom:0}input{line-height:normal}input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;*overflow:visible}input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week]{-webkit-appearance:none;padding:6px;display:inline-block;border:1px solid #ccc;font-size:80%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 3px #ddd;border-radius:0;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}input[type=datetime-local]{padding:.34375em .625em}input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{padding:0;margin-right:.3125em;*height:13px;*width:13px}input[type=checkbox],input[type=radio],input[type=search]{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}input[type=search]::-webkit-search-cancel-button,input[type=search]::-webkit-search-decoration{-webkit-appearance:none}input[type=color]:focus,input[type=date]:focus,input[type=datetime-local]:focus,input[type=datetime]:focus,input[type=email]:focus,input[type=month]:focus,input[type=number]:focus,input[type=password]:focus,input[type=search]:focus,input[type=tel]:focus,input[type=text]:focus,input[type=time]:focus,input[type=url]:focus,input[type=week]:focus{outline:0;outline:thin dotted\9;border-color:#333}input.no-focus:focus{border-color:#ccc!important}input[type=checkbox]:focus,input[type=file]:focus,input[type=radio]:focus{outline:thin dotted #333;outline:1px auto #129fea}input[type=color][disabled],input[type=date][disabled],input[type=datetime-local][disabled],input[type=datetime][disabled],input[type=email][disabled],input[type=month][disabled],input[type=number][disabled],input[type=password][disabled],input[type=search][disabled],input[type=tel][disabled],input[type=text][disabled],input[type=time][disabled],input[type=url][disabled],input[type=week][disabled]{cursor:not-allowed;background-color:#fafafa}input:focus:invalid,select:focus:invalid,textarea:focus:invalid{color:#e74c3c;border:1px solid #e74c3c}input:focus:invalid:focus,select:focus:invalid:focus,textarea:focus:invalid:focus{border-color:#e74c3c}input[type=checkbox]:focus:invalid:focus,input[type=file]:focus:invalid:focus,input[type=radio]:focus:invalid:focus{outline-color:#e74c3c}input.wy-input-large{padding:12px;font-size:100%}textarea{overflow:auto;vertical-align:top;width:100%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif}select,textarea{padding:.5em .625em;display:inline-block;border:1px solid #ccc;font-size:80%;box-shadow:inset 0 1px 3px #ddd;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}select{border:1px solid #ccc;background-color:#fff}select[multiple]{height:auto}select:focus,textarea:focus{outline:0}input[readonly],select[disabled],select[readonly],textarea[disabled],textarea[readonly]{cursor:not-allowed;background-color:#fafafa}input[type=checkbox][disabled],input[type=radio][disabled]{cursor:not-allowed}.wy-checkbox,.wy-radio{margin:6px 0;color:#404040;display:block}.wy-checkbox input,.wy-radio input{vertical-align:baseline}.wy-form-message-inline{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-input-prefix,.wy-input-suffix{white-space:nowrap;padding:6px}.wy-input-prefix .wy-input-context,.wy-input-suffix .wy-input-context{line-height:27px;padding:0 8px;display:inline-block;font-size:80%;background-color:#f3f6f6;border:1px solid #ccc;color:#999}.wy-input-suffix .wy-input-context{border-left:0}.wy-input-prefix .wy-input-context{border-right:0}.wy-switch{position:relative;display:block;height:24px;margin-top:12px;cursor:pointer}.wy-switch:before{left:0;top:0;width:36px;height:12px;background:#ccc}.wy-switch:after,.wy-switch:before{position:absolute;content:"";display:block;border-radius:4px;-webkit-transition:all .2s ease-in-out;-moz-transition:all .2s ease-in-out;transition:all .2s ease-in-out}.wy-switch:after{width:18px;height:18px;background:#999;left:-3px;top:-3px}.wy-switch span{position:absolute;left:48px;display:block;font-size:12px;color:#ccc;line-height:1}.wy-switch.active:before{background:#1e8449}.wy-switch.active:after{left:24px;background:#27ae60}.wy-switch.disabled{cursor:not-allowed;opacity:.8}.wy-control-group.wy-control-group-error .wy-form-message,.wy-control-group.wy-control-group-error>label{color:#e74c3c}.wy-control-group.wy-control-group-error input[type=color],.wy-control-group.wy-control-group-error input[type=date],.wy-control-group.wy-control-group-error input[type=datetime-local],.wy-control-group.wy-control-group-error input[type=datetime],.wy-control-group.wy-control-group-error input[type=email],.wy-control-group.wy-control-group-error input[type=month],.wy-control-group.wy-control-group-error input[type=number],.wy-control-group.wy-control-group-error input[type=password],.wy-control-group.wy-control-group-error input[type=search],.wy-control-group.wy-control-group-error input[type=tel],.wy-control-group.wy-control-group-error input[type=text],.wy-control-group.wy-control-group-error input[type=time],.wy-control-group.wy-control-group-error input[type=url],.wy-control-group.wy-control-group-error input[type=week],.wy-control-group.wy-control-group-error textarea{border:1px solid #e74c3c}.wy-inline-validate{white-space:nowrap}.wy-inline-validate .wy-input-context{padding:.5em .625em;display:inline-block;font-size:80%}.wy-inline-validate.wy-inline-validate-success .wy-input-context{color:#27ae60}.wy-inline-validate.wy-inline-validate-danger .wy-input-context{color:#e74c3c}.wy-inline-validate.wy-inline-validate-warning .wy-input-context{color:#e67e22}.wy-inline-validate.wy-inline-validate-info .wy-input-context{color:#2980b9}.rotate-90{-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.rotate-180{-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.rotate-270{-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.mirror{-webkit-transform:scaleX(-1);-moz-transform:scaleX(-1);-ms-transform:scaleX(-1);-o-transform:scaleX(-1);transform:scaleX(-1)}.mirror.rotate-90{-webkit-transform:scaleX(-1) rotate(90deg);-moz-transform:scaleX(-1) rotate(90deg);-ms-transform:scaleX(-1) rotate(90deg);-o-transform:scaleX(-1) rotate(90deg);transform:scaleX(-1) rotate(90deg)}.mirror.rotate-180{-webkit-transform:scaleX(-1) rotate(180deg);-moz-transform:scaleX(-1) rotate(180deg);-ms-transform:scaleX(-1) rotate(180deg);-o-transform:scaleX(-1) rotate(180deg);transform:scaleX(-1) rotate(180deg)}.mirror.rotate-270{-webkit-transform:scaleX(-1) rotate(270deg);-moz-transform:scaleX(-1) rotate(270deg);-ms-transform:scaleX(-1) rotate(270deg);-o-transform:scaleX(-1) rotate(270deg);transform:scaleX(-1) rotate(270deg)}@media only screen and (max-width:480px){.wy-form button[type=submit]{margin:.7em 0 0}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=text],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week],.wy-form label{margin-bottom:.3em;display:block}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week]{margin-bottom:0}.wy-form-aligned .wy-control-group label{margin-bottom:.3em;text-align:left;display:block;width:100%}.wy-form-aligned .wy-control{margin:1.5em 0 0}.wy-form-message,.wy-form-message-inline,.wy-form .wy-help-inline{display:block;font-size:80%;padding:6px 0}}@media screen and (max-width:768px){.tablet-hide{display:none}}@media screen and (max-width:480px){.mobile-hide{display:none}}.float-left{float:left}.float-right{float:right}.full-width{width:100%}.rst-content table.docutils,.rst-content table.field-list,.wy-table{border-collapse:collapse;border-spacing:0;empty-cells:show;margin-bottom:24px}.rst-content table.docutils caption,.rst-content table.field-list caption,.wy-table caption{color:#000;font:italic 85%/1 arial,sans-serif;padding:1em 0;text-align:center}.rst-content table.docutils td,.rst-content table.docutils th,.rst-content table.field-list td,.rst-content table.field-list th,.wy-table td,.wy-table th{font-size:90%;margin:0;overflow:visible;padding:8px 16px}.rst-content table.docutils td:first-child,.rst-content table.docutils th:first-child,.rst-content table.field-list td:first-child,.rst-content table.field-list th:first-child,.wy-table td:first-child,.wy-table th:first-child{border-left-width:0}.rst-content table.docutils thead,.rst-content table.field-list thead,.wy-table thead{color:#000;text-align:left;vertical-align:bottom;white-space:nowrap}.rst-content table.docutils thead th,.rst-content table.field-list thead th,.wy-table thead th{font-weight:700;border-bottom:2px solid #e1e4e5}.rst-content table.docutils td,.rst-content table.field-list td,.wy-table td{background-color:transparent;vertical-align:middle}.rst-content table.docutils td p,.rst-content table.field-list td p,.wy-table td p{line-height:18px}.rst-content table.docutils td p:last-child,.rst-content table.field-list td p:last-child,.wy-table td p:last-child{margin-bottom:0}.rst-content table.docutils .wy-table-cell-min,.rst-content table.field-list .wy-table-cell-min,.wy-table .wy-table-cell-min{width:1%;padding-right:0}.rst-content table.docutils .wy-table-cell-min input[type=checkbox],.rst-content table.field-list .wy-table-cell-min input[type=checkbox],.wy-table .wy-table-cell-min input[type=checkbox]{margin:0}.wy-table-secondary{color:grey;font-size:90%}.wy-table-tertiary{color:grey;font-size:80%}.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,.wy-table-backed,.wy-table-odd td,.wy-table-striped tr:nth-child(2n-1) td{background-color:#f3f6f6}.rst-content table.docutils,.wy-table-bordered-all{border:1px solid #e1e4e5}.rst-content table.docutils td,.wy-table-bordered-all td{border-bottom:1px solid #e1e4e5;border-left:1px solid #e1e4e5}.rst-content table.docutils tbody>tr:last-child td,.wy-table-bordered-all tbody>tr:last-child td{border-bottom-width:0}.wy-table-bordered{border:1px solid #e1e4e5}.wy-table-bordered-rows td{border-bottom:1px solid #e1e4e5}.wy-table-bordered-rows tbody>tr:last-child td{border-bottom-width:0}.wy-table-horizontal td,.wy-table-horizontal th{border-width:0 0 1px;border-bottom:1px solid #e1e4e5}.wy-table-horizontal tbody>tr:last-child td{border-bottom-width:0}.wy-table-responsive{margin-bottom:24px;max-width:100%;overflow:auto}.wy-table-responsive table{margin-bottom:0!important}.wy-table-responsive table td,.wy-table-responsive table th{white-space:nowrap}a{color:#2980b9;text-decoration:none;cursor:pointer}a:hover{color:#3091d1}a:visited{color:#9b59b6}html{height:100%}body,html{overflow-x:hidden}body{font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;font-weight:400;color:#404040;min-height:100%;background:#edf0f2}.wy-text-left{text-align:left}.wy-text-center{text-align:center}.wy-text-right{text-align:right}.wy-text-large{font-size:120%}.wy-text-normal{font-size:100%}.wy-text-small,small{font-size:80%}.wy-text-strike{text-decoration:line-through}.wy-text-warning{color:#e67e22!important}a.wy-text-warning:hover{color:#eb9950!important}.wy-text-info{color:#2980b9!important}a.wy-text-info:hover{color:#409ad5!important}.wy-text-success{color:#27ae60!important}a.wy-text-success:hover{color:#36d278!important}.wy-text-danger{color:#e74c3c!important}a.wy-text-danger:hover{color:#ed7669!important}.wy-text-neutral{color:#404040!important}a.wy-text-neutral:hover{color:#595959!important}.rst-content .toctree-wrapper>p.caption,h1,h2,h3,h4,h5,h6,legend{margin-top:0;font-weight:700;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif}p{line-height:24px;font-size:16px;margin:0 0 24px}h1{font-size:175%}.rst-content .toctree-wrapper>p.caption,h2{font-size:150%}h3{font-size:125%}h4{font-size:115%}h5{font-size:110%}h6{font-size:100%}hr{display:block;height:1px;border:0;border-top:1px solid #e1e4e5;margin:24px 0;padding:0}.rst-content code,.rst-content tt,code{white-space:nowrap;max-width:100%;background:#fff;border:1px solid #e1e4e5;font-size:75%;padding:0 5px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#e74c3c;overflow-x:auto}.rst-content tt.code-large,code.code-large{font-size:90%}.rst-content .section ul,.rst-content .toctree-wrapper ul,.rst-content section ul,.wy-plain-list-disc,article ul{list-style:disc;line-height:24px;margin-bottom:24px}.rst-content .section ul li,.rst-content .toctree-wrapper ul li,.rst-content section ul li,.wy-plain-list-disc li,article ul li{list-style:disc;margin-left:24px}.rst-content .section ul li p:last-child,.rst-content .section ul li ul,.rst-content .toctree-wrapper ul li p:last-child,.rst-content .toctree-wrapper ul li ul,.rst-content section ul li p:last-child,.rst-content section ul li ul,.wy-plain-list-disc li p:last-child,.wy-plain-list-disc li ul,article ul li p:last-child,article ul li ul{margin-bottom:0}.rst-content .section ul li li,.rst-content .toctree-wrapper ul li li,.rst-content section ul li li,.wy-plain-list-disc li li,article ul li li{list-style:circle}.rst-content .section ul li li li,.rst-content .toctree-wrapper ul li li li,.rst-content section ul li li li,.wy-plain-list-disc li li li,article ul li li li{list-style:square}.rst-content .section ul li ol li,.rst-content .toctree-wrapper ul li ol li,.rst-content section ul li ol li,.wy-plain-list-disc li ol li,article ul li ol li{list-style:decimal}.rst-content .section ol,.rst-content .section ol.arabic,.rst-content .toctree-wrapper ol,.rst-content .toctree-wrapper ol.arabic,.rst-content section ol,.rst-content section ol.arabic,.wy-plain-list-decimal,article ol{list-style:decimal;line-height:24px;margin-bottom:24px}.rst-content .section ol.arabic li,.rst-content .section ol li,.rst-content .toctree-wrapper ol.arabic li,.rst-content .toctree-wrapper ol li,.rst-content section ol.arabic li,.rst-content section ol li,.wy-plain-list-decimal li,article ol li{list-style:decimal;margin-left:24px}.rst-content .section ol.arabic li ul,.rst-content .section ol li p:last-child,.rst-content .section ol li ul,.rst-content .toctree-wrapper ol.arabic li ul,.rst-content .toctree-wrapper ol li p:last-child,.rst-content .toctree-wrapper ol li ul,.rst-content section ol.arabic li ul,.rst-content section ol li p:last-child,.rst-content section ol li ul,.wy-plain-list-decimal li p:last-child,.wy-plain-list-decimal li ul,article ol li p:last-child,article ol li ul{margin-bottom:0}.rst-content .section ol.arabic li ul li,.rst-content .section ol li ul li,.rst-content .toctree-wrapper ol.arabic li ul li,.rst-content .toctree-wrapper ol li ul li,.rst-content section ol.arabic li ul li,.rst-content section ol li ul li,.wy-plain-list-decimal li ul li,article ol li ul li{list-style:disc}.wy-breadcrumbs{*zoom:1}.wy-breadcrumbs:after,.wy-breadcrumbs:before{display:table;content:""}.wy-breadcrumbs:after{clear:both}.wy-breadcrumbs>li{display:inline-block;padding-top:5px}.wy-breadcrumbs>li.wy-breadcrumbs-aside{float:right}.rst-content .wy-breadcrumbs>li code,.rst-content .wy-breadcrumbs>li tt,.wy-breadcrumbs>li .rst-content tt,.wy-breadcrumbs>li code{all:inherit;color:inherit}.breadcrumb-item:before{content:"/";color:#bbb;font-size:13px;padding:0 6px 0 3px}.wy-breadcrumbs-extra{margin-bottom:0;color:#b3b3b3;font-size:80%;display:inline-block}@media screen and (max-width:480px){.wy-breadcrumbs-extra,.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}@media print{.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}html{font-size:16px}.wy-affix{position:fixed;top:1.618em}.wy-menu a:hover{text-decoration:none}.wy-menu-horiz{*zoom:1}.wy-menu-horiz:after,.wy-menu-horiz:before{display:table;content:""}.wy-menu-horiz:after{clear:both}.wy-menu-horiz li,.wy-menu-horiz ul{display:inline-block}.wy-menu-horiz li:hover{background:hsla(0,0%,100%,.1)}.wy-menu-horiz li.divide-left{border-left:1px solid #404040}.wy-menu-horiz li.divide-right{border-right:1px solid #404040}.wy-menu-horiz a{height:32px;display:inline-block;line-height:32px;padding:0 16px}.wy-menu-vertical{width:300px}.wy-menu-vertical header,.wy-menu-vertical p.caption{color:#55a5d9;height:32px;line-height:32px;padding:0 1.618em;margin:12px 0 0;display:block;font-weight:700;text-transform:uppercase;font-size:85%;white-space:nowrap}.wy-menu-vertical ul{margin-bottom:0}.wy-menu-vertical li.divide-top{border-top:1px solid #404040}.wy-menu-vertical li.divide-bottom{border-bottom:1px solid #404040}.wy-menu-vertical li.current{background:#e3e3e3}.wy-menu-vertical li.current a{color:grey;border-right:1px solid #c9c9c9;padding:.4045em 2.427em}.wy-menu-vertical li.current a:hover{background:#d6d6d6}.rst-content .wy-menu-vertical li tt,.wy-menu-vertical li .rst-content tt,.wy-menu-vertical li code{border:none;background:inherit;color:inherit;padding-left:0;padding-right:0}.wy-menu-vertical li button.toctree-expand{display:block;float:left;margin-left:-1.2em;line-height:18px;color:#4d4d4d;border:none;background:none;padding:0}.wy-menu-vertical li.current>a,.wy-menu-vertical li.on a{color:#404040;font-weight:700;position:relative;background:#fcfcfc;border:none;padding:.4045em 1.618em}.wy-menu-vertical li.current>a:hover,.wy-menu-vertical li.on a:hover{background:#fcfcfc}.wy-menu-vertical li.current>a:hover button.toctree-expand,.wy-menu-vertical li.on a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand{display:block;line-height:18px;color:#333}.wy-menu-vertical li.toctree-l1.current>a{border-bottom:1px solid #c9c9c9;border-top:1px solid #c9c9c9}.wy-menu-vertical .toctree-l1.current .toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .toctree-l11>ul{display:none}.wy-menu-vertical .toctree-l1.current .current.toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .current.toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .current.toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .current.toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .current.toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .current.toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .current.toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .current.toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .current.toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .current.toctree-l11>ul{display:block}.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4{font-size:.9em}.wy-menu-vertical li.toctree-l2 a,.wy-menu-vertical li.toctree-l3 a,.wy-menu-vertical li.toctree-l4 a,.wy-menu-vertical li.toctree-l5 a,.wy-menu-vertical li.toctree-l6 a,.wy-menu-vertical li.toctree-l7 a,.wy-menu-vertical li.toctree-l8 a,.wy-menu-vertical li.toctree-l9 a,.wy-menu-vertical li.toctree-l10 a{color:#404040}.wy-menu-vertical li.toctree-l2 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l3 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l4 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l5 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l6 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l7 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l8 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l9 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l10 a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a,.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a,.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a,.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a,.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a,.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a,.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a,.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{display:block}.wy-menu-vertical li.toctree-l2.current>a{padding:.4045em 2.427em}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{padding:.4045em 1.618em .4045em 4.045em}.wy-menu-vertical li.toctree-l3.current>a{padding:.4045em 4.045em}.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{padding:.4045em 1.618em .4045em 5.663em}.wy-menu-vertical li.toctree-l4.current>a{padding:.4045em 5.663em}.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{padding:.4045em 1.618em .4045em 7.281em}.wy-menu-vertical li.toctree-l5.current>a{padding:.4045em 7.281em}.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{padding:.4045em 1.618em .4045em 8.899em}.wy-menu-vertical li.toctree-l6.current>a{padding:.4045em 8.899em}.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a{padding:.4045em 1.618em .4045em 10.517em}.wy-menu-vertical li.toctree-l7.current>a{padding:.4045em 10.517em}.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a{padding:.4045em 1.618em .4045em 12.135em}.wy-menu-vertical li.toctree-l8.current>a{padding:.4045em 12.135em}.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a{padding:.4045em 1.618em .4045em 13.753em}.wy-menu-vertical li.toctree-l9.current>a{padding:.4045em 13.753em}.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a{padding:.4045em 1.618em .4045em 15.371em}.wy-menu-vertical li.toctree-l10.current>a{padding:.4045em 15.371em}.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{padding:.4045em 1.618em .4045em 16.989em}.wy-menu-vertical li.toctree-l2.current>a,.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{background:#c9c9c9}.wy-menu-vertical li.toctree-l2 button.toctree-expand{color:#a3a3a3}.wy-menu-vertical li.toctree-l3.current>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{background:#bdbdbd}.wy-menu-vertical li.toctree-l3 button.toctree-expand{color:#969696}.wy-menu-vertical li.current ul{display:block}.wy-menu-vertical li ul{margin-bottom:0;display:none}.wy-menu-vertical li ul li a{margin-bottom:0;color:#d9d9d9;font-weight:400}.wy-menu-vertical a{line-height:18px;padding:.4045em 1.618em;display:block;position:relative;font-size:90%;color:#d9d9d9}.wy-menu-vertical a:hover{background-color:#4e4a4a;cursor:pointer}.wy-menu-vertical a:hover button.toctree-expand{color:#d9d9d9}.wy-menu-vertical a:active{background-color:#2980b9;cursor:pointer;color:#fff}.wy-menu-vertical a:active button.toctree-expand{color:#fff}.wy-side-nav-search{display:block;width:300px;padding:.809em;margin-bottom:.809em;z-index:200;background-color:#2980b9;text-align:center;color:#fcfcfc}.wy-side-nav-search input[type=text]{width:100%;border-radius:50px;padding:6px 12px;border-color:#2472a4}.wy-side-nav-search img{display:block;margin:auto auto .809em;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-side-nav-search .wy-dropdown>a,.wy-side-nav-search>a{color:#fcfcfc;font-size:100%;font-weight:700;display:inline-block;padding:4px 6px;margin-bottom:.809em;max-width:100%}.wy-side-nav-search .wy-dropdown>a:hover,.wy-side-nav-search>a:hover{background:hsla(0,0%,100%,.1)}.wy-side-nav-search .wy-dropdown>a img.logo,.wy-side-nav-search>a img.logo{display:block;margin:0 auto;height:auto;width:auto;border-radius:0;max-width:100%;background:transparent}.wy-side-nav-search .wy-dropdown>a.icon img.logo,.wy-side-nav-search>a.icon img.logo{margin-top:.85em}.wy-side-nav-search>div.version{margin-top:-.4045em;margin-bottom:.809em;font-weight:400;color:hsla(0,0%,100%,.3)}.wy-nav .wy-menu-vertical header{color:#2980b9}.wy-nav .wy-menu-vertical a{color:#b3b3b3}.wy-nav .wy-menu-vertical a:hover{background-color:#2980b9;color:#fff}[data-menu-wrap]{-webkit-transition:all .2s ease-in;-moz-transition:all .2s ease-in;transition:all .2s ease-in;position:absolute;opacity:1;width:100%;opacity:0}[data-menu-wrap].move-center{left:0;right:auto;opacity:1}[data-menu-wrap].move-left{right:auto;left:-100%;opacity:0}[data-menu-wrap].move-right{right:-100%;left:auto;opacity:0}.wy-body-for-nav{background:#fcfcfc}.wy-grid-for-nav{position:absolute;width:100%;height:100%}.wy-nav-side{position:fixed;top:0;bottom:0;left:0;padding-bottom:2em;width:300px;overflow-x:hidden;overflow-y:hidden;min-height:100%;color:#9b9b9b;background:#343131;z-index:200}.wy-side-scroll{width:320px;position:relative;overflow-x:hidden;overflow-y:scroll;height:100%}.wy-nav-top{display:none;background:#2980b9;color:#fff;padding:.4045em .809em;position:relative;line-height:50px;text-align:center;font-size:100%;*zoom:1}.wy-nav-top:after,.wy-nav-top:before{display:table;content:""}.wy-nav-top:after{clear:both}.wy-nav-top a{color:#fff;font-weight:700}.wy-nav-top img{margin-right:12px;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-nav-top i{font-size:30px;float:left;cursor:pointer;padding-top:inherit}.wy-nav-content-wrap{margin-left:300px;background:#fcfcfc;min-height:100%}.wy-nav-content{padding:1.618em 3.236em;height:100%;max-width:800px;margin:auto}.wy-body-mask{position:fixed;width:100%;height:100%;background:rgba(0,0,0,.2);display:none;z-index:499}.wy-body-mask.on{display:block}footer{color:grey}footer p{margin-bottom:12px}.rst-content footer span.commit tt,footer span.commit .rst-content tt,footer span.commit code{padding:0;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:1em;background:none;border:none;color:grey}.rst-footer-buttons{*zoom:1}.rst-footer-buttons:after,.rst-footer-buttons:before{width:100%;display:table;content:""}.rst-footer-buttons:after{clear:both}.rst-breadcrumbs-buttons{margin-top:12px;*zoom:1}.rst-breadcrumbs-buttons:after,.rst-breadcrumbs-buttons:before{display:table;content:""}.rst-breadcrumbs-buttons:after{clear:both}#search-results .search li{margin-bottom:24px;border-bottom:1px solid #e1e4e5;padding-bottom:24px}#search-results .search li:first-child{border-top:1px solid #e1e4e5;padding-top:24px}#search-results .search li a{font-size:120%;margin-bottom:12px;display:inline-block}#search-results .context{color:grey;font-size:90%}.genindextable li>ul{margin-left:24px}@media screen and (max-width:768px){.wy-body-for-nav{background:#fcfcfc}.wy-nav-top{display:block}.wy-nav-side{left:-300px}.wy-nav-side.shift{width:85%;left:0}.wy-menu.wy-menu-vertical,.wy-side-nav-search,.wy-side-scroll{width:auto}.wy-nav-content-wrap{margin-left:0}.wy-nav-content-wrap .wy-nav-content{padding:1.618em}.wy-nav-content-wrap.shift{position:fixed;min-width:100%;left:85%;top:0;height:100%;overflow:hidden}}@media screen and (min-width:1100px){.wy-nav-content-wrap{background:rgba(0,0,0,.05)}.wy-nav-content{margin:0;background:#fcfcfc}}@media print{.rst-versions,.wy-nav-side,footer{display:none}.wy-nav-content-wrap{margin-left:0}}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60;*zoom:1}.rst-versions .rst-current-version:after,.rst-versions .rst-current-version:before{display:table;content:""}.rst-versions .rst-current-version:after{clear:both}.rst-content .code-block-caption .rst-versions .rst-current-version .headerlink,.rst-content .eqno .rst-versions .rst-current-version .headerlink,.rst-content .rst-versions .rst-current-version .admonition-title,.rst-content code.download .rst-versions .rst-current-version span:first-child,.rst-content dl dt .rst-versions .rst-current-version .headerlink,.rst-content h1 .rst-versions .rst-current-version .headerlink,.rst-content h2 .rst-versions .rst-current-version .headerlink,.rst-content h3 .rst-versions .rst-current-version .headerlink,.rst-content h4 .rst-versions .rst-current-version .headerlink,.rst-content h5 .rst-versions .rst-current-version .headerlink,.rst-content h6 .rst-versions .rst-current-version .headerlink,.rst-content p .rst-versions .rst-current-version .headerlink,.rst-content table>caption .rst-versions .rst-current-version .headerlink,.rst-content tt.download .rst-versions .rst-current-version span:first-child,.rst-versions .rst-current-version .fa,.rst-versions .rst-current-version .icon,.rst-versions .rst-current-version .rst-content .admonition-title,.rst-versions .rst-current-version .rst-content .code-block-caption .headerlink,.rst-versions .rst-current-version .rst-content .eqno .headerlink,.rst-versions .rst-current-version .rst-content code.download span:first-child,.rst-versions .rst-current-version .rst-content dl dt .headerlink,.rst-versions .rst-current-version .rst-content h1 .headerlink,.rst-versions .rst-current-version .rst-content h2 .headerlink,.rst-versions .rst-current-version .rst-content h3 .headerlink,.rst-versions .rst-current-version .rst-content h4 .headerlink,.rst-versions .rst-current-version .rst-content h5 .headerlink,.rst-versions .rst-current-version .rst-content h6 .headerlink,.rst-versions .rst-current-version .rst-content p .headerlink,.rst-versions .rst-current-version .rst-content table>caption .headerlink,.rst-versions .rst-current-version .rst-content tt.download span:first-child,.rst-versions .rst-current-version .wy-menu-vertical li button.toctree-expand,.wy-menu-vertical li .rst-versions .rst-current-version button.toctree-expand{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}}.rst-content .toctree-wrapper>p.caption,.rst-content h1,.rst-content h2,.rst-content h3,.rst-content h4,.rst-content h5,.rst-content h6{margin-bottom:24px}.rst-content img{max-width:100%;height:auto}.rst-content div.figure,.rst-content figure{margin-bottom:24px}.rst-content div.figure .caption-text,.rst-content figure .caption-text{font-style:italic}.rst-content div.figure p:last-child.caption,.rst-content figure p:last-child.caption{margin-bottom:0}.rst-content div.figure.align-center,.rst-content figure.align-center{text-align:center}.rst-content .section>a>img,.rst-content .section>img,.rst-content section>a>img,.rst-content section>img{margin-bottom:24px}.rst-content abbr[title]{text-decoration:none}.rst-content.style-external-links a.reference.external:after{font-family:FontAwesome;content:"\f08e";color:#b3b3b3;vertical-align:super;font-size:60%;margin:0 .2em}.rst-content blockquote{margin-left:24px;line-height:24px;margin-bottom:24px}.rst-content pre.literal-block{white-space:pre;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;display:block;overflow:auto}.rst-content div[class^=highlight],.rst-content pre.literal-block{border:1px solid #e1e4e5;overflow-x:auto;margin:1px 0 24px}.rst-content div[class^=highlight] div[class^=highlight],.rst-content pre.literal-block div[class^=highlight]{padding:0;border:none;margin:0}.rst-content div[class^=highlight] td.code{width:100%}.rst-content .linenodiv pre{border-right:1px solid #e6e9ea;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;user-select:none;pointer-events:none}.rst-content div[class^=highlight] pre{white-space:pre;margin:0;padding:12px;display:block;overflow:auto}.rst-content div[class^=highlight] pre .hll{display:block;margin:0 -12px;padding:0 12px}.rst-content .linenodiv pre,.rst-content div[class^=highlight] pre,.rst-content pre.literal-block{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:12px;line-height:1.4}.rst-content div.highlight .gp,.rst-content div.highlight span.linenos{user-select:none;pointer-events:none}.rst-content div.highlight span.linenos{display:inline-block;padding-left:0;padding-right:12px;margin-right:12px;border-right:1px solid #e6e9ea}.rst-content .code-block-caption{font-style:italic;font-size:85%;line-height:1;padding:1em 0;text-align:center}@media print{.rst-content .codeblock,.rst-content div[class^=highlight],.rst-content div[class^=highlight] pre{white-space:pre-wrap}}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning{clear:both}.rst-content .admonition-todo .last,.rst-content .admonition-todo>:last-child,.rst-content .admonition .last,.rst-content .admonition>:last-child,.rst-content .attention .last,.rst-content .attention>:last-child,.rst-content .caution .last,.rst-content .caution>:last-child,.rst-content .danger .last,.rst-content .danger>:last-child,.rst-content .error .last,.rst-content .error>:last-child,.rst-content .hint .last,.rst-content .hint>:last-child,.rst-content .important .last,.rst-content .important>:last-child,.rst-content .note .last,.rst-content .note>:last-child,.rst-content .seealso .last,.rst-content .seealso>:last-child,.rst-content .tip .last,.rst-content .tip>:last-child,.rst-content .warning .last,.rst-content .warning>:last-child{margin-bottom:0}.rst-content .admonition-title:before{margin-right:4px}.rst-content .admonition table{border-color:rgba(0,0,0,.1)}.rst-content .admonition table td,.rst-content .admonition table th{background:transparent!important;border-color:rgba(0,0,0,.1)!important}.rst-content .section ol.loweralpha,.rst-content .section ol.loweralpha>li,.rst-content .toctree-wrapper ol.loweralpha,.rst-content .toctree-wrapper ol.loweralpha>li,.rst-content section ol.loweralpha,.rst-content section ol.loweralpha>li{list-style:lower-alpha}.rst-content .section ol.upperalpha,.rst-content .section ol.upperalpha>li,.rst-content .toctree-wrapper ol.upperalpha,.rst-content .toctree-wrapper ol.upperalpha>li,.rst-content section ol.upperalpha,.rst-content section ol.upperalpha>li{list-style:upper-alpha}.rst-content .section ol li>*,.rst-content .section ul li>*,.rst-content .toctree-wrapper ol li>*,.rst-content .toctree-wrapper ul li>*,.rst-content section ol li>*,.rst-content section ul li>*{margin-top:12px;margin-bottom:12px}.rst-content .section ol li>:first-child,.rst-content .section ul li>:first-child,.rst-content .toctree-wrapper ol li>:first-child,.rst-content .toctree-wrapper ul li>:first-child,.rst-content section ol li>:first-child,.rst-content section ul li>:first-child{margin-top:0}.rst-content .section ol li>p,.rst-content .section ol li>p:last-child,.rst-content .section ul li>p,.rst-content .section ul li>p:last-child,.rst-content .toctree-wrapper ol li>p,.rst-content .toctree-wrapper ol li>p:last-child,.rst-content .toctree-wrapper ul li>p,.rst-content .toctree-wrapper ul li>p:last-child,.rst-content section ol li>p,.rst-content section ol li>p:last-child,.rst-content section ul li>p,.rst-content section ul li>p:last-child{margin-bottom:12px}.rst-content .section ol li>p:only-child,.rst-content .section ol li>p:only-child:last-child,.rst-content .section ul li>p:only-child,.rst-content .section ul li>p:only-child:last-child,.rst-content .toctree-wrapper ol li>p:only-child,.rst-content .toctree-wrapper ol li>p:only-child:last-child,.rst-content .toctree-wrapper ul li>p:only-child,.rst-content .toctree-wrapper ul li>p:only-child:last-child,.rst-content section ol li>p:only-child,.rst-content section ol li>p:only-child:last-child,.rst-content section ul li>p:only-child,.rst-content section ul li>p:only-child:last-child{margin-bottom:0}.rst-content .section ol li>ol,.rst-content .section ol li>ul,.rst-content .section ul li>ol,.rst-content .section ul li>ul,.rst-content .toctree-wrapper ol li>ol,.rst-content .toctree-wrapper ol li>ul,.rst-content .toctree-wrapper ul li>ol,.rst-content .toctree-wrapper ul li>ul,.rst-content section ol li>ol,.rst-content section ol li>ul,.rst-content section ul li>ol,.rst-content section ul li>ul{margin-bottom:12px}.rst-content .section ol.simple li>*,.rst-content .section ol.simple li ol,.rst-content .section ol.simple li ul,.rst-content .section ul.simple li>*,.rst-content .section ul.simple li ol,.rst-content .section ul.simple li ul,.rst-content .toctree-wrapper ol.simple li>*,.rst-content .toctree-wrapper ol.simple li ol,.rst-content .toctree-wrapper ol.simple li ul,.rst-content .toctree-wrapper ul.simple li>*,.rst-content .toctree-wrapper ul.simple li ol,.rst-content .toctree-wrapper ul.simple li ul,.rst-content section ol.simple li>*,.rst-content section ol.simple li ol,.rst-content section ol.simple li ul,.rst-content section ul.simple li>*,.rst-content section ul.simple li ol,.rst-content section ul.simple li ul{margin-top:0;margin-bottom:0}.rst-content .line-block{margin-left:0;margin-bottom:24px;line-height:24px}.rst-content .line-block .line-block{margin-left:24px;margin-bottom:0}.rst-content .topic-title{font-weight:700;margin-bottom:12px}.rst-content .toc-backref{color:#404040}.rst-content .align-right{float:right;margin:0 0 24px 24px}.rst-content .align-left{float:left;margin:0 24px 24px 0}.rst-content .align-center{margin:auto}.rst-content .align-center:not(table){display:block}.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink{opacity:0;font-size:14px;font-family:FontAwesome;margin-left:.5em}.rst-content .code-block-caption .headerlink:focus,.rst-content .code-block-caption:hover .headerlink,.rst-content .eqno .headerlink:focus,.rst-content .eqno:hover .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink:focus,.rst-content .toctree-wrapper>p.caption:hover .headerlink,.rst-content dl dt .headerlink:focus,.rst-content dl dt:hover .headerlink,.rst-content h1 .headerlink:focus,.rst-content h1:hover .headerlink,.rst-content h2 .headerlink:focus,.rst-content h2:hover .headerlink,.rst-content h3 .headerlink:focus,.rst-content h3:hover .headerlink,.rst-content h4 .headerlink:focus,.rst-content h4:hover .headerlink,.rst-content h5 .headerlink:focus,.rst-content h5:hover .headerlink,.rst-content h6 .headerlink:focus,.rst-content h6:hover .headerlink,.rst-content p.caption .headerlink:focus,.rst-content p.caption:hover .headerlink,.rst-content p .headerlink:focus,.rst-content p:hover .headerlink,.rst-content table>caption .headerlink:focus,.rst-content table>caption:hover .headerlink{opacity:1}.rst-content p a{overflow-wrap:anywhere}.rst-content .wy-table td p,.rst-content .wy-table td ul,.rst-content .wy-table th p,.rst-content .wy-table th ul,.rst-content table.docutils td p,.rst-content table.docutils td ul,.rst-content table.docutils th p,.rst-content table.docutils th ul,.rst-content table.field-list td p,.rst-content table.field-list td ul,.rst-content table.field-list th p,.rst-content table.field-list th ul{font-size:inherit}.rst-content .btn:focus{outline:2px solid}.rst-content table>caption .headerlink:after{font-size:12px}.rst-content .centered{text-align:center}.rst-content .sidebar{float:right;width:40%;display:block;margin:0 0 24px 24px;padding:24px;background:#f3f6f6;border:1px solid #e1e4e5}.rst-content .sidebar dl,.rst-content .sidebar p,.rst-content .sidebar ul{font-size:90%}.rst-content .sidebar .last,.rst-content .sidebar>:last-child{margin-bottom:0}.rst-content .sidebar .sidebar-title{display:block;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif;font-weight:700;background:#e1e4e5;padding:6px 12px;margin:-24px -24px 24px;font-size:100%}.rst-content .highlighted{background:#f1c40f;box-shadow:0 0 0 2px #f1c40f;display:inline;font-weight:700}.rst-content .citation-reference,.rst-content .footnote-reference{vertical-align:baseline;position:relative;top:-.4em;line-height:0;font-size:90%}.rst-content .citation-reference>span.fn-bracket,.rst-content .footnote-reference>span.fn-bracket{display:none}.rst-content .hlist{width:100%}.rst-content dl dt span.classifier:before{content:" : "}.rst-content dl dt span.classifier-delimiter{display:none!important}html.writer-html4 .rst-content table.docutils.citation,html.writer-html4 .rst-content table.docutils.footnote{background:none;border:none}html.writer-html4 .rst-content table.docutils.citation td,html.writer-html4 .rst-content table.docutils.citation tr,html.writer-html4 .rst-content table.docutils.footnote td,html.writer-html4 .rst-content table.docutils.footnote tr{border:none;background-color:transparent!important;white-space:normal}html.writer-html4 .rst-content table.docutils.citation td.label,html.writer-html4 .rst-content table.docutils.footnote td.label{padding-left:0;padding-right:0;vertical-align:top}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{display:grid;grid-template-columns:auto minmax(80%,95%)}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{display:inline-grid;grid-template-columns:max-content auto}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{display:grid;grid-template-columns:auto auto minmax(.65rem,auto) minmax(40%,95%)}html.writer-html5 .rst-content aside.citation>span.label,html.writer-html5 .rst-content aside.footnote>span.label,html.writer-html5 .rst-content div.citation>span.label{grid-column-start:1;grid-column-end:2}html.writer-html5 .rst-content aside.citation>span.backrefs,html.writer-html5 .rst-content aside.footnote>span.backrefs,html.writer-html5 .rst-content div.citation>span.backrefs{grid-column-start:2;grid-column-end:3;grid-row-start:1;grid-row-end:3}html.writer-html5 .rst-content aside.citation>p,html.writer-html5 .rst-content aside.footnote>p,html.writer-html5 .rst-content div.citation>p{grid-column-start:4;grid-column-end:5}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{margin-bottom:24px}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{padding-left:1rem}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dd,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dd,html.writer-html5 .rst-content dl.footnote>dt{margin-bottom:0}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{font-size:.9rem}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.footnote>dt{margin:0 .5rem .5rem 0;line-height:1.2rem;word-break:break-all;font-weight:400}html.writer-html5 .rst-content dl.citation>dt>span.brackets:before,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:before{content:"["}html.writer-html5 .rst-content dl.citation>dt>span.brackets:after,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:after{content:"]"}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a{word-break:keep-all}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a:not(:first-child):before,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.footnote>dd{margin:0 0 .5rem;line-height:1.2rem}html.writer-html5 .rst-content dl.citation>dd p,html.writer-html5 .rst-content dl.footnote>dd p{font-size:.9rem}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{padding-left:1rem;padding-right:1rem;font-size:.9rem;line-height:1.2rem}html.writer-html5 .rst-content aside.citation p,html.writer-html5 .rst-content aside.footnote p,html.writer-html5 .rst-content div.citation p{font-size:.9rem;line-height:1.2rem;margin-bottom:12px}html.writer-html5 .rst-content aside.citation span.backrefs,html.writer-html5 .rst-content aside.footnote span.backrefs,html.writer-html5 .rst-content div.citation span.backrefs{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content aside.citation span.backrefs>a,html.writer-html5 .rst-content aside.footnote span.backrefs>a,html.writer-html5 .rst-content div.citation span.backrefs>a{word-break:keep-all}html.writer-html5 .rst-content aside.citation span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content aside.footnote span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content div.citation span.backrefs>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content aside.citation span.label,html.writer-html5 .rst-content aside.footnote span.label,html.writer-html5 .rst-content div.citation span.label{line-height:1.2rem}html.writer-html5 .rst-content aside.citation-list,html.writer-html5 .rst-content aside.footnote-list,html.writer-html5 .rst-content div.citation-list{margin-bottom:24px}html.writer-html5 .rst-content dl.option-list kbd{font-size:.9rem}.rst-content table.docutils.footnote,html.writer-html4 .rst-content table.docutils.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content aside.footnote-list aside.footnote,html.writer-html5 .rst-content div.citation-list>div.citation,html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{color:grey}.rst-content table.docutils.footnote code,.rst-content table.docutils.footnote tt,html.writer-html4 .rst-content table.docutils.citation code,html.writer-html4 .rst-content table.docutils.citation tt,html.writer-html5 .rst-content aside.footnote-list aside.footnote code,html.writer-html5 .rst-content aside.footnote-list aside.footnote tt,html.writer-html5 .rst-content aside.footnote code,html.writer-html5 .rst-content aside.footnote tt,html.writer-html5 .rst-content div.citation-list>div.citation code,html.writer-html5 .rst-content div.citation-list>div.citation tt,html.writer-html5 .rst-content dl.citation code,html.writer-html5 .rst-content dl.citation tt,html.writer-html5 .rst-content dl.footnote code,html.writer-html5 .rst-content dl.footnote tt{color:#555}.rst-content .wy-table-responsive.citation,.rst-content .wy-table-responsive.footnote{margin-bottom:0}.rst-content .wy-table-responsive.citation+:not(.citation),.rst-content .wy-table-responsive.footnote+:not(.footnote){margin-top:24px}.rst-content .wy-table-responsive.citation:last-child,.rst-content .wy-table-responsive.footnote:last-child{margin-bottom:24px}.rst-content table.docutils th{border-color:#e1e4e5}html.writer-html5 .rst-content table.docutils th{border:1px solid #e1e4e5}html.writer-html5 .rst-content table.docutils td>p,html.writer-html5 .rst-content table.docutils th>p{line-height:1rem;margin-bottom:0;font-size:.9rem}.rst-content table.docutils td .last,.rst-content table.docutils td .last>:last-child{margin-bottom:0}.rst-content table.field-list,.rst-content table.field-list td{border:none}.rst-content table.field-list td p{line-height:inherit}.rst-content table.field-list td>strong{display:inline-block}.rst-content table.field-list .field-name{padding-right:10px;text-align:left;white-space:nowrap}.rst-content table.field-list .field-body{text-align:left}.rst-content code,.rst-content tt{color:#000;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;padding:2px 5px}.rst-content code big,.rst-content code em,.rst-content tt big,.rst-content tt em{font-size:100%!important;line-height:normal}.rst-content code.literal,.rst-content tt.literal{color:#e74c3c;white-space:normal}.rst-content code.xref,.rst-content tt.xref,a .rst-content code,a .rst-content tt{font-weight:700;color:#404040;overflow-wrap:normal}.rst-content kbd,.rst-content pre,.rst-content samp{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace}.rst-content a code,.rst-content a tt{color:#2980b9}.rst-content dl{margin-bottom:24px}.rst-content dl dt{font-weight:700;margin-bottom:12px}.rst-content dl ol,.rst-content dl p,.rst-content dl table,.rst-content dl ul{margin-bottom:12px}.rst-content dl dd{margin:0 0 12px 24px;line-height:24px}.rst-content dl dd>ol:last-child,.rst-content dl dd>p:last-child,.rst-content dl dd>table:last-child,.rst-content dl dd>ul:last-child{margin-bottom:0}html.writer-html4 .rst-content dl:not(.docutils),html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple){margin-bottom:24px}html.writer-html4 .rst-content dl:not(.docutils)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{display:table;margin:6px 0;font-size:90%;line-height:normal;background:#e7f2fa;color:#2980b9;border-top:3px solid #6ab0de;padding:6px;position:relative}html.writer-html4 .rst-content dl:not(.docutils)>dt:before,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:before{color:#6ab0de}html.writer-html4 .rst-content dl:not(.docutils)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{margin-bottom:6px;border:none;border-left:3px solid #ccc;background:#f0f0f0;color:#555}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils)>dt:first-child,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:first-child{margin-top:0}html.writer-html4 .rst-content dl:not(.docutils) code.descclassname,html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descclassname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{background-color:transparent;border:none;padding:0;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .optional,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .optional{display:inline-block;padding:0 4px;color:#000;font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .property,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .property{display:inline-block;padding-right:8px;max-width:100%}html.writer-html4 .rst-content dl:not(.docutils) .k,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .k{font-style:italic}html.writer-html4 .rst-content dl:not(.docutils) .descclassname,html.writer-html4 .rst-content dl:not(.docutils) .descname,html.writer-html4 .rst-content dl:not(.docutils) .sig-name,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .sig-name{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#000}.rst-content .viewcode-back,.rst-content .viewcode-link{display:inline-block;color:#27ae60;font-size:80%;padding-left:24px}.rst-content .viewcode-back{display:block;float:right}.rst-content p.rubric{margin-bottom:12px;font-weight:700}.rst-content code.download,.rst-content tt.download{background:inherit;padding:inherit;font-weight:400;font-family:inherit;font-size:inherit;color:inherit;border:inherit;white-space:inherit}.rst-content code.download span:first-child,.rst-content tt.download span:first-child{-webkit-font-smoothing:subpixel-antialiased}.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{margin-right:4px}.rst-content .guilabel,.rst-content .menuselection{font-size:80%;font-weight:700;border-radius:4px;padding:2.4px 6px;margin:auto 2px}.rst-content .guilabel,.rst-content .menuselection{border:1px solid #7fbbe3;background:#e7f2fa}.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>.kbd,.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>kbd{color:inherit;font-size:80%;background-color:#fff;border:1px solid #a6a6a6;border-radius:4px;box-shadow:0 2px grey;padding:2.4px 6px;margin:auto 0}.rst-content .versionmodified{font-style:italic}@media screen and (max-width:480px){.rst-content .sidebar{width:100%}}span[id*=MathJax-Span]{color:#404040}.math{text-align:center}@font-face{font-family:Lato;src:url(fonts/lato-normal.woff2?bd03a2cc277bbbc338d464e679fe9942) format("woff2"),url(fonts/lato-normal.woff?27bd77b9162d388cb8d4c4217c7c5e2a) format("woff");font-weight:400;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold.woff2?cccb897485813c7c256901dbca54ecf2) format("woff2"),url(fonts/lato-bold.woff?d878b6c29b10beca227e9eef4246111b) format("woff");font-weight:700;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold-italic.woff2?0b6bb6725576b072c5d0b02ecdd1900d) format("woff2"),url(fonts/lato-bold-italic.woff?9c7e4e9eb485b4a121c760e61bc3707c) format("woff");font-weight:700;font-style:italic;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-normal-italic.woff2?4eb103b4d12be57cb1d040ed5e162e9d) format("woff2"),url(fonts/lato-normal-italic.woff?f28f2d6482446544ef1ea1ccc6dd5892) format("woff");font-weight:400;font-style:italic;font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:400;src:url(fonts/Roboto-Slab-Regular.woff2?7abf5b8d04d26a2cafea937019bca958) format("woff2"),url(fonts/Roboto-Slab-Regular.woff?c1be9284088d487c5e3ff0a10a92e58c) format("woff");font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:700;src:url(fonts/Roboto-Slab-Bold.woff2?9984f4a9bda09be08e83f2506954adbe) format("woff2"),url(fonts/Roboto-Slab-Bold.woff?bed5564a116b05148e3b3bea6fb1162a) format("woff");font-display:block} \ No newline at end of file diff --git a/review/pr-458/_static/doctools.js b/review/pr-458/_static/doctools.js deleted file mode 100644 index 527b876ca6..0000000000 --- a/review/pr-458/_static/doctools.js +++ /dev/null @@ -1,156 +0,0 @@ -/* - * doctools.js - * ~~~~~~~~~~~ - * - * Base JavaScript utilities for all Sphinx HTML documentation. - * - * :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS. - * :license: BSD, see LICENSE for details. - * - */ -"use strict"; - -const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ - "TEXTAREA", - "INPUT", - "SELECT", - "BUTTON", -]); - -const _ready = (callback) => { - if (document.readyState !== "loading") { - callback(); - } else { - document.addEventListener("DOMContentLoaded", callback); - } -}; - -/** - * Small JavaScript module for the documentation. - */ -const Documentation = { - init: () => { - Documentation.initDomainIndexTable(); - Documentation.initOnKeyListeners(); - }, - - /** - * i18n support - */ - TRANSLATIONS: {}, - PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), - LOCALE: "unknown", - - // gettext and ngettext don't access this so that the functions - // can safely bound to a different name (_ = Documentation.gettext) - gettext: (string) => { - const translated = Documentation.TRANSLATIONS[string]; - switch (typeof translated) { - case "undefined": - return string; // no translation - case "string": - return translated; // translation exists - default: - return translated[0]; // (singular, plural) translation tuple exists - } - }, - - ngettext: (singular, plural, n) => { - const translated = Documentation.TRANSLATIONS[singular]; - if (typeof translated !== "undefined") - return translated[Documentation.PLURAL_EXPR(n)]; - return n === 1 ? singular : plural; - }, - - addTranslations: (catalog) => { - Object.assign(Documentation.TRANSLATIONS, catalog.messages); - Documentation.PLURAL_EXPR = new Function( - "n", - `return (${catalog.plural_expr})` - ); - Documentation.LOCALE = catalog.locale; - }, - - /** - * helper function to focus on search bar - */ - focusSearchBar: () => { - document.querySelectorAll("input[name=q]")[0]?.focus(); - }, - - /** - * Initialise the domain index toggle buttons - */ - initDomainIndexTable: () => { - const toggler = (el) => { - const idNumber = el.id.substr(7); - const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); - if (el.src.substr(-9) === "minus.png") { - el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; - toggledRows.forEach((el) => (el.style.display = "none")); - } else { - el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; - toggledRows.forEach((el) => (el.style.display = "")); - } - }; - - const togglerElements = document.querySelectorAll("img.toggler"); - togglerElements.forEach((el) => - el.addEventListener("click", (event) => toggler(event.currentTarget)) - ); - togglerElements.forEach((el) => (el.style.display = "")); - if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); - }, - - initOnKeyListeners: () => { - // only install a listener if it is really needed - if ( - !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && - !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS - ) - return; - - document.addEventListener("keydown", (event) => { - // bail for input elements - if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; - // bail with special keys - if (event.altKey || event.ctrlKey || event.metaKey) return; - - if (!event.shiftKey) { - switch (event.key) { - case "ArrowLeft": - if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; - - const prevLink = document.querySelector('link[rel="prev"]'); - if (prevLink && prevLink.href) { - window.location.href = prevLink.href; - event.preventDefault(); - } - break; - case "ArrowRight": - if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; - - const nextLink = document.querySelector('link[rel="next"]'); - if (nextLink && nextLink.href) { - window.location.href = nextLink.href; - event.preventDefault(); - } - break; - } - } - - // some keyboard layouts may need Shift to get / - switch (event.key) { - case "/": - if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; - Documentation.focusSearchBar(); - event.preventDefault(); - } - }); - }, -}; - -// quick alias for translations -const _ = Documentation.gettext; - -_ready(Documentation.init); diff --git a/review/pr-458/_static/documentation_options.js b/review/pr-458/_static/documentation_options.js deleted file mode 100644 index bcae2ea938..0000000000 --- a/review/pr-458/_static/documentation_options.js +++ /dev/null @@ -1,14 +0,0 @@ -var DOCUMENTATION_OPTIONS = { - URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'), - VERSION: '', - LANGUAGE: 'en', - COLLAPSE_INDEX: false, - BUILDER: 'html', - FILE_SUFFIX: '.html', - LINK_SUFFIX: '.html', - HAS_SOURCE: false, - SOURCELINK_SUFFIX: '.txt', - NAVIGATION_WITH_KEYS: false, - SHOW_SEARCH_SUMMARY: true, - ENABLE_SEARCH_SHORTCUTS: true, -}; \ No newline at end of file diff --git a/review/pr-458/_static/file.png b/review/pr-458/_static/file.png deleted file mode 100644 index a858a410e4..0000000000 Binary files a/review/pr-458/_static/file.png and /dev/null differ diff --git a/review/pr-458/_static/jquery-3.6.0.js b/review/pr-458/_static/jquery-3.6.0.js deleted file mode 100644 index fc6c299b73..0000000000 --- a/review/pr-458/_static/jquery-3.6.0.js +++ /dev/null @@ -1,10881 +0,0 @@ -/*! - * jQuery JavaScript Library v3.6.0 - * https://jquery.com/ - * - * Includes Sizzle.js - * https://sizzlejs.com/ - * - * Copyright OpenJS Foundation and other contributors - * Released under the MIT license - * https://jquery.org/license - * - * Date: 2021-03-02T17:08Z - */ -( function( global, factory ) { - - "use strict"; - - if ( typeof module === "object" && typeof module.exports === "object" ) { - - // For CommonJS and CommonJS-like environments where a proper `window` - // is present, execute the factory and get jQuery. - // For environments that do not have a `window` with a `document` - // (such as Node.js), expose a factory as module.exports. - // This accentuates the need for the creation of a real `window`. - // e.g. var jQuery = require("jquery")(window); - // See ticket #14549 for more info. - module.exports = global.document ? - factory( global, true ) : - function( w ) { - if ( !w.document ) { - throw new Error( "jQuery requires a window with a document" ); - } - return factory( w ); - }; - } else { - factory( global ); - } - -// Pass this if window is not defined yet -} )( typeof window !== "undefined" ? window : this, function( window, noGlobal ) { - -// Edge <= 12 - 13+, Firefox <=18 - 45+, IE 10 - 11, Safari 5.1 - 9+, iOS 6 - 9.1 -// throw exceptions when non-strict code (e.g., ASP.NET 4.5) accesses strict mode -// arguments.callee.caller (trac-13335). But as of jQuery 3.0 (2016), strict mode should be common -// enough that all such attempts are guarded in a try block. -"use strict"; - -var arr = []; - -var getProto = Object.getPrototypeOf; - -var slice = arr.slice; - -var flat = arr.flat ? function( array ) { - return arr.flat.call( array ); -} : function( array ) { - return arr.concat.apply( [], array ); -}; - - -var push = arr.push; - -var indexOf = arr.indexOf; - -var class2type = {}; - -var toString = class2type.toString; - -var hasOwn = class2type.hasOwnProperty; - -var fnToString = hasOwn.toString; - -var ObjectFunctionString = fnToString.call( Object ); - -var support = {}; - -var isFunction = function isFunction( obj ) { - - // Support: Chrome <=57, Firefox <=52 - // In some browsers, typeof returns "function" for HTML elements - // (i.e., `typeof document.createElement( "object" ) === "function"`). - // We don't want to classify *any* DOM node as a function. - // Support: QtWeb <=3.8.5, WebKit <=534.34, wkhtmltopdf tool <=0.12.5 - // Plus for old WebKit, typeof returns "function" for HTML collections - // (e.g., `typeof document.getElementsByTagName("div") === "function"`). (gh-4756) - return typeof obj === "function" && typeof obj.nodeType !== "number" && - typeof obj.item !== "function"; - }; - - -var isWindow = function isWindow( obj ) { - return obj != null && obj === obj.window; - }; - - -var document = window.document; - - - - var preservedScriptAttributes = { - type: true, - src: true, - nonce: true, - noModule: true - }; - - function DOMEval( code, node, doc ) { - doc = doc || document; - - var i, val, - script = doc.createElement( "script" ); - - script.text = code; - if ( node ) { - for ( i in preservedScriptAttributes ) { - - // Support: Firefox 64+, Edge 18+ - // Some browsers don't support the "nonce" property on scripts. - // On the other hand, just using `getAttribute` is not enough as - // the `nonce` attribute is reset to an empty string whenever it - // becomes browsing-context connected. - // See https://github.com/whatwg/html/issues/2369 - // See https://html.spec.whatwg.org/#nonce-attributes - // The `node.getAttribute` check was added for the sake of - // `jQuery.globalEval` so that it can fake a nonce-containing node - // via an object. - val = node[ i ] || node.getAttribute && node.getAttribute( i ); - if ( val ) { - script.setAttribute( i, val ); - } - } - } - doc.head.appendChild( script ).parentNode.removeChild( script ); - } - - -function toType( obj ) { - if ( obj == null ) { - return obj + ""; - } - - // Support: Android <=2.3 only (functionish RegExp) - return typeof obj === "object" || typeof obj === "function" ? - class2type[ toString.call( obj ) ] || "object" : - typeof obj; -} -/* global Symbol */ -// Defining this global in .eslintrc.json would create a danger of using the global -// unguarded in another place, it seems safer to define global only for this module - - - -var - version = "3.6.0", - - // Define a local copy of jQuery - jQuery = function( selector, context ) { - - // The jQuery object is actually just the init constructor 'enhanced' - // Need init if jQuery is called (just allow error to be thrown if not included) - return new jQuery.fn.init( selector, context ); - }; - -jQuery.fn = jQuery.prototype = { - - // The current version of jQuery being used - jquery: version, - - constructor: jQuery, - - // The default length of a jQuery object is 0 - length: 0, - - toArray: function() { - return slice.call( this ); - }, - - // Get the Nth element in the matched element set OR - // Get the whole matched element set as a clean array - get: function( num ) { - - // Return all the elements in a clean array - if ( num == null ) { - return slice.call( this ); - } - - // Return just the one element from the set - return num < 0 ? this[ num + this.length ] : this[ num ]; - }, - - // Take an array of elements and push it onto the stack - // (returning the new matched element set) - pushStack: function( elems ) { - - // Build a new jQuery matched element set - var ret = jQuery.merge( this.constructor(), elems ); - - // Add the old object onto the stack (as a reference) - ret.prevObject = this; - - // Return the newly-formed element set - return ret; - }, - - // Execute a callback for every element in the matched set. - each: function( callback ) { - return jQuery.each( this, callback ); - }, - - map: function( callback ) { - return this.pushStack( jQuery.map( this, function( elem, i ) { - return callback.call( elem, i, elem ); - } ) ); - }, - - slice: function() { - return this.pushStack( slice.apply( this, arguments ) ); - }, - - first: function() { - return this.eq( 0 ); - }, - - last: function() { - return this.eq( -1 ); - }, - - even: function() { - return this.pushStack( jQuery.grep( this, function( _elem, i ) { - return ( i + 1 ) % 2; - } ) ); - }, - - odd: function() { - return this.pushStack( jQuery.grep( this, function( _elem, i ) { - return i % 2; - } ) ); - }, - - eq: function( i ) { - var len = this.length, - j = +i + ( i < 0 ? len : 0 ); - return this.pushStack( j >= 0 && j < len ? [ this[ j ] ] : [] ); - }, - - end: function() { - return this.prevObject || this.constructor(); - }, - - // For internal use only. - // Behaves like an Array's method, not like a jQuery method. - push: push, - sort: arr.sort, - splice: arr.splice -}; - -jQuery.extend = jQuery.fn.extend = function() { - var options, name, src, copy, copyIsArray, clone, - target = arguments[ 0 ] || {}, - i = 1, - length = arguments.length, - deep = false; - - // Handle a deep copy situation - if ( typeof target === "boolean" ) { - deep = target; - - // Skip the boolean and the target - target = arguments[ i ] || {}; - i++; - } - - // Handle case when target is a string or something (possible in deep copy) - if ( typeof target !== "object" && !isFunction( target ) ) { - target = {}; - } - - // Extend jQuery itself if only one argument is passed - if ( i === length ) { - target = this; - i--; - } - - for ( ; i < length; i++ ) { - - // Only deal with non-null/undefined values - if ( ( options = arguments[ i ] ) != null ) { - - // Extend the base object - for ( name in options ) { - copy = options[ name ]; - - // Prevent Object.prototype pollution - // Prevent never-ending loop - if ( name === "__proto__" || target === copy ) { - continue; - } - - // Recurse if we're merging plain objects or arrays - if ( deep && copy && ( jQuery.isPlainObject( copy ) || - ( copyIsArray = Array.isArray( copy ) ) ) ) { - src = target[ name ]; - - // Ensure proper type for the source value - if ( copyIsArray && !Array.isArray( src ) ) { - clone = []; - } else if ( !copyIsArray && !jQuery.isPlainObject( src ) ) { - clone = {}; - } else { - clone = src; - } - copyIsArray = false; - - // Never move original objects, clone them - target[ name ] = jQuery.extend( deep, clone, copy ); - - // Don't bring in undefined values - } else if ( copy !== undefined ) { - target[ name ] = copy; - } - } - } - } - - // Return the modified object - return target; -}; - -jQuery.extend( { - - // Unique for each copy of jQuery on the page - expando: "jQuery" + ( version + Math.random() ).replace( /\D/g, "" ), - - // Assume jQuery is ready without the ready module - isReady: true, - - error: function( msg ) { - throw new Error( msg ); - }, - - noop: function() {}, - - isPlainObject: function( obj ) { - var proto, Ctor; - - // Detect obvious negatives - // Use toString instead of jQuery.type to catch host objects - if ( !obj || toString.call( obj ) !== "[object Object]" ) { - return false; - } - - proto = getProto( obj ); - - // Objects with no prototype (e.g., `Object.create( null )`) are plain - if ( !proto ) { - return true; - } - - // Objects with prototype are plain iff they were constructed by a global Object function - Ctor = hasOwn.call( proto, "constructor" ) && proto.constructor; - return typeof Ctor === "function" && fnToString.call( Ctor ) === ObjectFunctionString; - }, - - isEmptyObject: function( obj ) { - var name; - - for ( name in obj ) { - return false; - } - return true; - }, - - // Evaluates a script in a provided context; falls back to the global one - // if not specified. - globalEval: function( code, options, doc ) { - DOMEval( code, { nonce: options && options.nonce }, doc ); - }, - - each: function( obj, callback ) { - var length, i = 0; - - if ( isArrayLike( obj ) ) { - length = obj.length; - for ( ; i < length; i++ ) { - if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { - break; - } - } - } else { - for ( i in obj ) { - if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { - break; - } - } - } - - return obj; - }, - - // results is for internal usage only - makeArray: function( arr, results ) { - var ret = results || []; - - if ( arr != null ) { - if ( isArrayLike( Object( arr ) ) ) { - jQuery.merge( ret, - typeof arr === "string" ? - [ arr ] : arr - ); - } else { - push.call( ret, arr ); - } - } - - return ret; - }, - - inArray: function( elem, arr, i ) { - return arr == null ? -1 : indexOf.call( arr, elem, i ); - }, - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - merge: function( first, second ) { - var len = +second.length, - j = 0, - i = first.length; - - for ( ; j < len; j++ ) { - first[ i++ ] = second[ j ]; - } - - first.length = i; - - return first; - }, - - grep: function( elems, callback, invert ) { - var callbackInverse, - matches = [], - i = 0, - length = elems.length, - callbackExpect = !invert; - - // Go through the array, only saving the items - // that pass the validator function - for ( ; i < length; i++ ) { - callbackInverse = !callback( elems[ i ], i ); - if ( callbackInverse !== callbackExpect ) { - matches.push( elems[ i ] ); - } - } - - return matches; - }, - - // arg is for internal usage only - map: function( elems, callback, arg ) { - var length, value, - i = 0, - ret = []; - - // Go through the array, translating each of the items to their new values - if ( isArrayLike( elems ) ) { - length = elems.length; - for ( ; i < length; i++ ) { - value = callback( elems[ i ], i, arg ); - - if ( value != null ) { - ret.push( value ); - } - } - - // Go through every key on the object, - } else { - for ( i in elems ) { - value = callback( elems[ i ], i, arg ); - - if ( value != null ) { - ret.push( value ); - } - } - } - - // Flatten any nested arrays - return flat( ret ); - }, - - // A global GUID counter for objects - guid: 1, - - // jQuery.support is not used in Core but other projects attach their - // properties to it so it needs to exist. - support: support -} ); - -if ( typeof Symbol === "function" ) { - jQuery.fn[ Symbol.iterator ] = arr[ Symbol.iterator ]; -} - -// Populate the class2type map -jQuery.each( "Boolean Number String Function Array Date RegExp Object Error Symbol".split( " " ), - function( _i, name ) { - class2type[ "[object " + name + "]" ] = name.toLowerCase(); - } ); - -function isArrayLike( obj ) { - - // Support: real iOS 8.2 only (not reproducible in simulator) - // `in` check used to prevent JIT error (gh-2145) - // hasOwn isn't used here due to false negatives - // regarding Nodelist length in IE - var length = !!obj && "length" in obj && obj.length, - type = toType( obj ); - - if ( isFunction( obj ) || isWindow( obj ) ) { - return false; - } - - return type === "array" || length === 0 || - typeof length === "number" && length > 0 && ( length - 1 ) in obj; -} -var Sizzle = -/*! - * Sizzle CSS Selector Engine v2.3.6 - * https://sizzlejs.com/ - * - * Copyright JS Foundation and other contributors - * Released under the MIT license - * https://js.foundation/ - * - * Date: 2021-02-16 - */ -( function( window ) { -var i, - support, - Expr, - getText, - isXML, - tokenize, - compile, - select, - outermostContext, - sortInput, - hasDuplicate, - - // Local document vars - setDocument, - document, - docElem, - documentIsHTML, - rbuggyQSA, - rbuggyMatches, - matches, - contains, - - // Instance-specific data - expando = "sizzle" + 1 * new Date(), - preferredDoc = window.document, - dirruns = 0, - done = 0, - classCache = createCache(), - tokenCache = createCache(), - compilerCache = createCache(), - nonnativeSelectorCache = createCache(), - sortOrder = function( a, b ) { - if ( a === b ) { - hasDuplicate = true; - } - return 0; - }, - - // Instance methods - hasOwn = ( {} ).hasOwnProperty, - arr = [], - pop = arr.pop, - pushNative = arr.push, - push = arr.push, - slice = arr.slice, - - // Use a stripped-down indexOf as it's faster than native - // https://jsperf.com/thor-indexof-vs-for/5 - indexOf = function( list, elem ) { - var i = 0, - len = list.length; - for ( ; i < len; i++ ) { - if ( list[ i ] === elem ) { - return i; - } - } - return -1; - }, - - booleans = "checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|" + - "ismap|loop|multiple|open|readonly|required|scoped", - - // Regular expressions - - // http://www.w3.org/TR/css3-selectors/#whitespace - whitespace = "[\\x20\\t\\r\\n\\f]", - - // https://www.w3.org/TR/css-syntax-3/#ident-token-diagram - identifier = "(?:\\\\[\\da-fA-F]{1,6}" + whitespace + - "?|\\\\[^\\r\\n\\f]|[\\w-]|[^\0-\\x7f])+", - - // Attribute selectors: http://www.w3.org/TR/selectors/#attribute-selectors - attributes = "\\[" + whitespace + "*(" + identifier + ")(?:" + whitespace + - - // Operator (capture 2) - "*([*^$|!~]?=)" + whitespace + - - // "Attribute values must be CSS identifiers [capture 5] - // or strings [capture 3 or capture 4]" - "*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|(" + identifier + "))|)" + - whitespace + "*\\]", - - pseudos = ":(" + identifier + ")(?:\\((" + - - // To reduce the number of selectors needing tokenize in the preFilter, prefer arguments: - // 1. quoted (capture 3; capture 4 or capture 5) - "('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|" + - - // 2. simple (capture 6) - "((?:\\\\.|[^\\\\()[\\]]|" + attributes + ")*)|" + - - // 3. anything else (capture 2) - ".*" + - ")\\)|)", - - // Leading and non-escaped trailing whitespace, capturing some non-whitespace characters preceding the latter - rwhitespace = new RegExp( whitespace + "+", "g" ), - rtrim = new RegExp( "^" + whitespace + "+|((?:^|[^\\\\])(?:\\\\.)*)" + - whitespace + "+$", "g" ), - - rcomma = new RegExp( "^" + whitespace + "*," + whitespace + "*" ), - rcombinators = new RegExp( "^" + whitespace + "*([>+~]|" + whitespace + ")" + whitespace + - "*" ), - rdescend = new RegExp( whitespace + "|>" ), - - rpseudo = new RegExp( pseudos ), - ridentifier = new RegExp( "^" + identifier + "$" ), - - matchExpr = { - "ID": new RegExp( "^#(" + identifier + ")" ), - "CLASS": new RegExp( "^\\.(" + identifier + ")" ), - "TAG": new RegExp( "^(" + identifier + "|[*])" ), - "ATTR": new RegExp( "^" + attributes ), - "PSEUDO": new RegExp( "^" + pseudos ), - "CHILD": new RegExp( "^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\(" + - whitespace + "*(even|odd|(([+-]|)(\\d*)n|)" + whitespace + "*(?:([+-]|)" + - whitespace + "*(\\d+)|))" + whitespace + "*\\)|)", "i" ), - "bool": new RegExp( "^(?:" + booleans + ")$", "i" ), - - // For use in libraries implementing .is() - // We use this for POS matching in `select` - "needsContext": new RegExp( "^" + whitespace + - "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + - "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" ) - }, - - rhtml = /HTML$/i, - rinputs = /^(?:input|select|textarea|button)$/i, - rheader = /^h\d$/i, - - rnative = /^[^{]+\{\s*\[native \w/, - - // Easily-parseable/retrievable ID or TAG or CLASS selectors - rquickExpr = /^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/, - - rsibling = /[+~]/, - - // CSS escapes - // http://www.w3.org/TR/CSS21/syndata.html#escaped-characters - runescape = new RegExp( "\\\\[\\da-fA-F]{1,6}" + whitespace + "?|\\\\([^\\r\\n\\f])", "g" ), - funescape = function( escape, nonHex ) { - var high = "0x" + escape.slice( 1 ) - 0x10000; - - return nonHex ? - - // Strip the backslash prefix from a non-hex escape sequence - nonHex : - - // Replace a hexadecimal escape sequence with the encoded Unicode code point - // Support: IE <=11+ - // For values outside the Basic Multilingual Plane (BMP), manually construct a - // surrogate pair - high < 0 ? - String.fromCharCode( high + 0x10000 ) : - String.fromCharCode( high >> 10 | 0xD800, high & 0x3FF | 0xDC00 ); - }, - - // CSS string/identifier serialization - // https://drafts.csswg.org/cssom/#common-serializing-idioms - rcssescape = /([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g, - fcssescape = function( ch, asCodePoint ) { - if ( asCodePoint ) { - - // U+0000 NULL becomes U+FFFD REPLACEMENT CHARACTER - if ( ch === "\0" ) { - return "\uFFFD"; - } - - // Control characters and (dependent upon position) numbers get escaped as code points - return ch.slice( 0, -1 ) + "\\" + - ch.charCodeAt( ch.length - 1 ).toString( 16 ) + " "; - } - - // Other potentially-special ASCII characters get backslash-escaped - return "\\" + ch; - }, - - // Used for iframes - // See setDocument() - // Removing the function wrapper causes a "Permission Denied" - // error in IE - unloadHandler = function() { - setDocument(); - }, - - inDisabledFieldset = addCombinator( - function( elem ) { - return elem.disabled === true && elem.nodeName.toLowerCase() === "fieldset"; - }, - { dir: "parentNode", next: "legend" } - ); - -// Optimize for push.apply( _, NodeList ) -try { - push.apply( - ( arr = slice.call( preferredDoc.childNodes ) ), - preferredDoc.childNodes - ); - - // Support: Android<4.0 - // Detect silently failing push.apply - // eslint-disable-next-line no-unused-expressions - arr[ preferredDoc.childNodes.length ].nodeType; -} catch ( e ) { - push = { apply: arr.length ? - - // Leverage slice if possible - function( target, els ) { - pushNative.apply( target, slice.call( els ) ); - } : - - // Support: IE<9 - // Otherwise append directly - function( target, els ) { - var j = target.length, - i = 0; - - // Can't trust NodeList.length - while ( ( target[ j++ ] = els[ i++ ] ) ) {} - target.length = j - 1; - } - }; -} - -function Sizzle( selector, context, results, seed ) { - var m, i, elem, nid, match, groups, newSelector, - newContext = context && context.ownerDocument, - - // nodeType defaults to 9, since context defaults to document - nodeType = context ? context.nodeType : 9; - - results = results || []; - - // Return early from calls with invalid selector or context - if ( typeof selector !== "string" || !selector || - nodeType !== 1 && nodeType !== 9 && nodeType !== 11 ) { - - return results; - } - - // Try to shortcut find operations (as opposed to filters) in HTML documents - if ( !seed ) { - setDocument( context ); - context = context || document; - - if ( documentIsHTML ) { - - // If the selector is sufficiently simple, try using a "get*By*" DOM method - // (excepting DocumentFragment context, where the methods don't exist) - if ( nodeType !== 11 && ( match = rquickExpr.exec( selector ) ) ) { - - // ID selector - if ( ( m = match[ 1 ] ) ) { - - // Document context - if ( nodeType === 9 ) { - if ( ( elem = context.getElementById( m ) ) ) { - - // Support: IE, Opera, Webkit - // TODO: identify versions - // getElementById can match elements by name instead of ID - if ( elem.id === m ) { - results.push( elem ); - return results; - } - } else { - return results; - } - - // Element context - } else { - - // Support: IE, Opera, Webkit - // TODO: identify versions - // getElementById can match elements by name instead of ID - if ( newContext && ( elem = newContext.getElementById( m ) ) && - contains( context, elem ) && - elem.id === m ) { - - results.push( elem ); - return results; - } - } - - // Type selector - } else if ( match[ 2 ] ) { - push.apply( results, context.getElementsByTagName( selector ) ); - return results; - - // Class selector - } else if ( ( m = match[ 3 ] ) && support.getElementsByClassName && - context.getElementsByClassName ) { - - push.apply( results, context.getElementsByClassName( m ) ); - return results; - } - } - - // Take advantage of querySelectorAll - if ( support.qsa && - !nonnativeSelectorCache[ selector + " " ] && - ( !rbuggyQSA || !rbuggyQSA.test( selector ) ) && - - // Support: IE 8 only - // Exclude object elements - ( nodeType !== 1 || context.nodeName.toLowerCase() !== "object" ) ) { - - newSelector = selector; - newContext = context; - - // qSA considers elements outside a scoping root when evaluating child or - // descendant combinators, which is not what we want. - // In such cases, we work around the behavior by prefixing every selector in the - // list with an ID selector referencing the scope context. - // The technique has to be used as well when a leading combinator is used - // as such selectors are not recognized by querySelectorAll. - // Thanks to Andrew Dupont for this technique. - if ( nodeType === 1 && - ( rdescend.test( selector ) || rcombinators.test( selector ) ) ) { - - // Expand context for sibling selectors - newContext = rsibling.test( selector ) && testContext( context.parentNode ) || - context; - - // We can use :scope instead of the ID hack if the browser - // supports it & if we're not changing the context. - if ( newContext !== context || !support.scope ) { - - // Capture the context ID, setting it first if necessary - if ( ( nid = context.getAttribute( "id" ) ) ) { - nid = nid.replace( rcssescape, fcssescape ); - } else { - context.setAttribute( "id", ( nid = expando ) ); - } - } - - // Prefix every selector in the list - groups = tokenize( selector ); - i = groups.length; - while ( i-- ) { - groups[ i ] = ( nid ? "#" + nid : ":scope" ) + " " + - toSelector( groups[ i ] ); - } - newSelector = groups.join( "," ); - } - - try { - push.apply( results, - newContext.querySelectorAll( newSelector ) - ); - return results; - } catch ( qsaError ) { - nonnativeSelectorCache( selector, true ); - } finally { - if ( nid === expando ) { - context.removeAttribute( "id" ); - } - } - } - } - } - - // All others - return select( selector.replace( rtrim, "$1" ), context, results, seed ); -} - -/** - * Create key-value caches of limited size - * @returns {function(string, object)} Returns the Object data after storing it on itself with - * property name the (space-suffixed) string and (if the cache is larger than Expr.cacheLength) - * deleting the oldest entry - */ -function createCache() { - var keys = []; - - function cache( key, value ) { - - // Use (key + " ") to avoid collision with native prototype properties (see Issue #157) - if ( keys.push( key + " " ) > Expr.cacheLength ) { - - // Only keep the most recent entries - delete cache[ keys.shift() ]; - } - return ( cache[ key + " " ] = value ); - } - return cache; -} - -/** - * Mark a function for special use by Sizzle - * @param {Function} fn The function to mark - */ -function markFunction( fn ) { - fn[ expando ] = true; - return fn; -} - -/** - * Support testing using an element - * @param {Function} fn Passed the created element and returns a boolean result - */ -function assert( fn ) { - var el = document.createElement( "fieldset" ); - - try { - return !!fn( el ); - } catch ( e ) { - return false; - } finally { - - // Remove from its parent by default - if ( el.parentNode ) { - el.parentNode.removeChild( el ); - } - - // release memory in IE - el = null; - } -} - -/** - * Adds the same handler for all of the specified attrs - * @param {String} attrs Pipe-separated list of attributes - * @param {Function} handler The method that will be applied - */ -function addHandle( attrs, handler ) { - var arr = attrs.split( "|" ), - i = arr.length; - - while ( i-- ) { - Expr.attrHandle[ arr[ i ] ] = handler; - } -} - -/** - * Checks document order of two siblings - * @param {Element} a - * @param {Element} b - * @returns {Number} Returns less than 0 if a precedes b, greater than 0 if a follows b - */ -function siblingCheck( a, b ) { - var cur = b && a, - diff = cur && a.nodeType === 1 && b.nodeType === 1 && - a.sourceIndex - b.sourceIndex; - - // Use IE sourceIndex if available on both nodes - if ( diff ) { - return diff; - } - - // Check if b follows a - if ( cur ) { - while ( ( cur = cur.nextSibling ) ) { - if ( cur === b ) { - return -1; - } - } - } - - return a ? 1 : -1; -} - -/** - * Returns a function to use in pseudos for input types - * @param {String} type - */ -function createInputPseudo( type ) { - return function( elem ) { - var name = elem.nodeName.toLowerCase(); - return name === "input" && elem.type === type; - }; -} - -/** - * Returns a function to use in pseudos for buttons - * @param {String} type - */ -function createButtonPseudo( type ) { - return function( elem ) { - var name = elem.nodeName.toLowerCase(); - return ( name === "input" || name === "button" ) && elem.type === type; - }; -} - -/** - * Returns a function to use in pseudos for :enabled/:disabled - * @param {Boolean} disabled true for :disabled; false for :enabled - */ -function createDisabledPseudo( disabled ) { - - // Known :disabled false positives: fieldset[disabled] > legend:nth-of-type(n+2) :can-disable - return function( elem ) { - - // Only certain elements can match :enabled or :disabled - // https://html.spec.whatwg.org/multipage/scripting.html#selector-enabled - // https://html.spec.whatwg.org/multipage/scripting.html#selector-disabled - if ( "form" in elem ) { - - // Check for inherited disabledness on relevant non-disabled elements: - // * listed form-associated elements in a disabled fieldset - // https://html.spec.whatwg.org/multipage/forms.html#category-listed - // https://html.spec.whatwg.org/multipage/forms.html#concept-fe-disabled - // * option elements in a disabled optgroup - // https://html.spec.whatwg.org/multipage/forms.html#concept-option-disabled - // All such elements have a "form" property. - if ( elem.parentNode && elem.disabled === false ) { - - // Option elements defer to a parent optgroup if present - if ( "label" in elem ) { - if ( "label" in elem.parentNode ) { - return elem.parentNode.disabled === disabled; - } else { - return elem.disabled === disabled; - } - } - - // Support: IE 6 - 11 - // Use the isDisabled shortcut property to check for disabled fieldset ancestors - return elem.isDisabled === disabled || - - // Where there is no isDisabled, check manually - /* jshint -W018 */ - elem.isDisabled !== !disabled && - inDisabledFieldset( elem ) === disabled; - } - - return elem.disabled === disabled; - - // Try to winnow out elements that can't be disabled before trusting the disabled property. - // Some victims get caught in our net (label, legend, menu, track), but it shouldn't - // even exist on them, let alone have a boolean value. - } else if ( "label" in elem ) { - return elem.disabled === disabled; - } - - // Remaining elements are neither :enabled nor :disabled - return false; - }; -} - -/** - * Returns a function to use in pseudos for positionals - * @param {Function} fn - */ -function createPositionalPseudo( fn ) { - return markFunction( function( argument ) { - argument = +argument; - return markFunction( function( seed, matches ) { - var j, - matchIndexes = fn( [], seed.length, argument ), - i = matchIndexes.length; - - // Match elements found at the specified indexes - while ( i-- ) { - if ( seed[ ( j = matchIndexes[ i ] ) ] ) { - seed[ j ] = !( matches[ j ] = seed[ j ] ); - } - } - } ); - } ); -} - -/** - * Checks a node for validity as a Sizzle context - * @param {Element|Object=} context - * @returns {Element|Object|Boolean} The input node if acceptable, otherwise a falsy value - */ -function testContext( context ) { - return context && typeof context.getElementsByTagName !== "undefined" && context; -} - -// Expose support vars for convenience -support = Sizzle.support = {}; - -/** - * Detects XML nodes - * @param {Element|Object} elem An element or a document - * @returns {Boolean} True iff elem is a non-HTML XML node - */ -isXML = Sizzle.isXML = function( elem ) { - var namespace = elem && elem.namespaceURI, - docElem = elem && ( elem.ownerDocument || elem ).documentElement; - - // Support: IE <=8 - // Assume HTML when documentElement doesn't yet exist, such as inside loading iframes - // https://bugs.jquery.com/ticket/4833 - return !rhtml.test( namespace || docElem && docElem.nodeName || "HTML" ); -}; - -/** - * Sets document-related variables once based on the current document - * @param {Element|Object} [doc] An element or document object to use to set the document - * @returns {Object} Returns the current document - */ -setDocument = Sizzle.setDocument = function( node ) { - var hasCompare, subWindow, - doc = node ? node.ownerDocument || node : preferredDoc; - - // Return early if doc is invalid or already selected - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( doc == document || doc.nodeType !== 9 || !doc.documentElement ) { - return document; - } - - // Update global variables - document = doc; - docElem = document.documentElement; - documentIsHTML = !isXML( document ); - - // Support: IE 9 - 11+, Edge 12 - 18+ - // Accessing iframe documents after unload throws "permission denied" errors (jQuery #13936) - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( preferredDoc != document && - ( subWindow = document.defaultView ) && subWindow.top !== subWindow ) { - - // Support: IE 11, Edge - if ( subWindow.addEventListener ) { - subWindow.addEventListener( "unload", unloadHandler, false ); - - // Support: IE 9 - 10 only - } else if ( subWindow.attachEvent ) { - subWindow.attachEvent( "onunload", unloadHandler ); - } - } - - // Support: IE 8 - 11+, Edge 12 - 18+, Chrome <=16 - 25 only, Firefox <=3.6 - 31 only, - // Safari 4 - 5 only, Opera <=11.6 - 12.x only - // IE/Edge & older browsers don't support the :scope pseudo-class. - // Support: Safari 6.0 only - // Safari 6.0 supports :scope but it's an alias of :root there. - support.scope = assert( function( el ) { - docElem.appendChild( el ).appendChild( document.createElement( "div" ) ); - return typeof el.querySelectorAll !== "undefined" && - !el.querySelectorAll( ":scope fieldset div" ).length; - } ); - - /* Attributes - ---------------------------------------------------------------------- */ - - // Support: IE<8 - // Verify that getAttribute really returns attributes and not properties - // (excepting IE8 booleans) - support.attributes = assert( function( el ) { - el.className = "i"; - return !el.getAttribute( "className" ); - } ); - - /* getElement(s)By* - ---------------------------------------------------------------------- */ - - // Check if getElementsByTagName("*") returns only elements - support.getElementsByTagName = assert( function( el ) { - el.appendChild( document.createComment( "" ) ); - return !el.getElementsByTagName( "*" ).length; - } ); - - // Support: IE<9 - support.getElementsByClassName = rnative.test( document.getElementsByClassName ); - - // Support: IE<10 - // Check if getElementById returns elements by name - // The broken getElementById methods don't pick up programmatically-set names, - // so use a roundabout getElementsByName test - support.getById = assert( function( el ) { - docElem.appendChild( el ).id = expando; - return !document.getElementsByName || !document.getElementsByName( expando ).length; - } ); - - // ID filter and find - if ( support.getById ) { - Expr.filter[ "ID" ] = function( id ) { - var attrId = id.replace( runescape, funescape ); - return function( elem ) { - return elem.getAttribute( "id" ) === attrId; - }; - }; - Expr.find[ "ID" ] = function( id, context ) { - if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { - var elem = context.getElementById( id ); - return elem ? [ elem ] : []; - } - }; - } else { - Expr.filter[ "ID" ] = function( id ) { - var attrId = id.replace( runescape, funescape ); - return function( elem ) { - var node = typeof elem.getAttributeNode !== "undefined" && - elem.getAttributeNode( "id" ); - return node && node.value === attrId; - }; - }; - - // Support: IE 6 - 7 only - // getElementById is not reliable as a find shortcut - Expr.find[ "ID" ] = function( id, context ) { - if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { - var node, i, elems, - elem = context.getElementById( id ); - - if ( elem ) { - - // Verify the id attribute - node = elem.getAttributeNode( "id" ); - if ( node && node.value === id ) { - return [ elem ]; - } - - // Fall back on getElementsByName - elems = context.getElementsByName( id ); - i = 0; - while ( ( elem = elems[ i++ ] ) ) { - node = elem.getAttributeNode( "id" ); - if ( node && node.value === id ) { - return [ elem ]; - } - } - } - - return []; - } - }; - } - - // Tag - Expr.find[ "TAG" ] = support.getElementsByTagName ? - function( tag, context ) { - if ( typeof context.getElementsByTagName !== "undefined" ) { - return context.getElementsByTagName( tag ); - - // DocumentFragment nodes don't have gEBTN - } else if ( support.qsa ) { - return context.querySelectorAll( tag ); - } - } : - - function( tag, context ) { - var elem, - tmp = [], - i = 0, - - // By happy coincidence, a (broken) gEBTN appears on DocumentFragment nodes too - results = context.getElementsByTagName( tag ); - - // Filter out possible comments - if ( tag === "*" ) { - while ( ( elem = results[ i++ ] ) ) { - if ( elem.nodeType === 1 ) { - tmp.push( elem ); - } - } - - return tmp; - } - return results; - }; - - // Class - Expr.find[ "CLASS" ] = support.getElementsByClassName && function( className, context ) { - if ( typeof context.getElementsByClassName !== "undefined" && documentIsHTML ) { - return context.getElementsByClassName( className ); - } - }; - - /* QSA/matchesSelector - ---------------------------------------------------------------------- */ - - // QSA and matchesSelector support - - // matchesSelector(:active) reports false when true (IE9/Opera 11.5) - rbuggyMatches = []; - - // qSa(:focus) reports false when true (Chrome 21) - // We allow this because of a bug in IE8/9 that throws an error - // whenever `document.activeElement` is accessed on an iframe - // So, we allow :focus to pass through QSA all the time to avoid the IE error - // See https://bugs.jquery.com/ticket/13378 - rbuggyQSA = []; - - if ( ( support.qsa = rnative.test( document.querySelectorAll ) ) ) { - - // Build QSA regex - // Regex strategy adopted from Diego Perini - assert( function( el ) { - - var input; - - // Select is set to empty string on purpose - // This is to test IE's treatment of not explicitly - // setting a boolean content attribute, - // since its presence should be enough - // https://bugs.jquery.com/ticket/12359 - docElem.appendChild( el ).innerHTML = "" + - ""; - - // Support: IE8, Opera 11-12.16 - // Nothing should be selected when empty strings follow ^= or $= or *= - // The test attribute must be unknown in Opera but "safe" for WinRT - // https://msdn.microsoft.com/en-us/library/ie/hh465388.aspx#attribute_section - if ( el.querySelectorAll( "[msallowcapture^='']" ).length ) { - rbuggyQSA.push( "[*^$]=" + whitespace + "*(?:''|\"\")" ); - } - - // Support: IE8 - // Boolean attributes and "value" are not treated correctly - if ( !el.querySelectorAll( "[selected]" ).length ) { - rbuggyQSA.push( "\\[" + whitespace + "*(?:value|" + booleans + ")" ); - } - - // Support: Chrome<29, Android<4.4, Safari<7.0+, iOS<7.0+, PhantomJS<1.9.8+ - if ( !el.querySelectorAll( "[id~=" + expando + "-]" ).length ) { - rbuggyQSA.push( "~=" ); - } - - // Support: IE 11+, Edge 15 - 18+ - // IE 11/Edge don't find elements on a `[name='']` query in some cases. - // Adding a temporary attribute to the document before the selection works - // around the issue. - // Interestingly, IE 10 & older don't seem to have the issue. - input = document.createElement( "input" ); - input.setAttribute( "name", "" ); - el.appendChild( input ); - if ( !el.querySelectorAll( "[name='']" ).length ) { - rbuggyQSA.push( "\\[" + whitespace + "*name" + whitespace + "*=" + - whitespace + "*(?:''|\"\")" ); - } - - // Webkit/Opera - :checked should return selected option elements - // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked - // IE8 throws error here and will not see later tests - if ( !el.querySelectorAll( ":checked" ).length ) { - rbuggyQSA.push( ":checked" ); - } - - // Support: Safari 8+, iOS 8+ - // https://bugs.webkit.org/show_bug.cgi?id=136851 - // In-page `selector#id sibling-combinator selector` fails - if ( !el.querySelectorAll( "a#" + expando + "+*" ).length ) { - rbuggyQSA.push( ".#.+[+~]" ); - } - - // Support: Firefox <=3.6 - 5 only - // Old Firefox doesn't throw on a badly-escaped identifier. - el.querySelectorAll( "\\\f" ); - rbuggyQSA.push( "[\\r\\n\\f]" ); - } ); - - assert( function( el ) { - el.innerHTML = "" + - ""; - - // Support: Windows 8 Native Apps - // The type and name attributes are restricted during .innerHTML assignment - var input = document.createElement( "input" ); - input.setAttribute( "type", "hidden" ); - el.appendChild( input ).setAttribute( "name", "D" ); - - // Support: IE8 - // Enforce case-sensitivity of name attribute - if ( el.querySelectorAll( "[name=d]" ).length ) { - rbuggyQSA.push( "name" + whitespace + "*[*^$|!~]?=" ); - } - - // FF 3.5 - :enabled/:disabled and hidden elements (hidden elements are still enabled) - // IE8 throws error here and will not see later tests - if ( el.querySelectorAll( ":enabled" ).length !== 2 ) { - rbuggyQSA.push( ":enabled", ":disabled" ); - } - - // Support: IE9-11+ - // IE's :disabled selector does not pick up the children of disabled fieldsets - docElem.appendChild( el ).disabled = true; - if ( el.querySelectorAll( ":disabled" ).length !== 2 ) { - rbuggyQSA.push( ":enabled", ":disabled" ); - } - - // Support: Opera 10 - 11 only - // Opera 10-11 does not throw on post-comma invalid pseudos - el.querySelectorAll( "*,:x" ); - rbuggyQSA.push( ",.*:" ); - } ); - } - - if ( ( support.matchesSelector = rnative.test( ( matches = docElem.matches || - docElem.webkitMatchesSelector || - docElem.mozMatchesSelector || - docElem.oMatchesSelector || - docElem.msMatchesSelector ) ) ) ) { - - assert( function( el ) { - - // Check to see if it's possible to do matchesSelector - // on a disconnected node (IE 9) - support.disconnectedMatch = matches.call( el, "*" ); - - // This should fail with an exception - // Gecko does not error, returns false instead - matches.call( el, "[s!='']:x" ); - rbuggyMatches.push( "!=", pseudos ); - } ); - } - - rbuggyQSA = rbuggyQSA.length && new RegExp( rbuggyQSA.join( "|" ) ); - rbuggyMatches = rbuggyMatches.length && new RegExp( rbuggyMatches.join( "|" ) ); - - /* Contains - ---------------------------------------------------------------------- */ - hasCompare = rnative.test( docElem.compareDocumentPosition ); - - // Element contains another - // Purposefully self-exclusive - // As in, an element does not contain itself - contains = hasCompare || rnative.test( docElem.contains ) ? - function( a, b ) { - var adown = a.nodeType === 9 ? a.documentElement : a, - bup = b && b.parentNode; - return a === bup || !!( bup && bup.nodeType === 1 && ( - adown.contains ? - adown.contains( bup ) : - a.compareDocumentPosition && a.compareDocumentPosition( bup ) & 16 - ) ); - } : - function( a, b ) { - if ( b ) { - while ( ( b = b.parentNode ) ) { - if ( b === a ) { - return true; - } - } - } - return false; - }; - - /* Sorting - ---------------------------------------------------------------------- */ - - // Document order sorting - sortOrder = hasCompare ? - function( a, b ) { - - // Flag for duplicate removal - if ( a === b ) { - hasDuplicate = true; - return 0; - } - - // Sort on method existence if only one input has compareDocumentPosition - var compare = !a.compareDocumentPosition - !b.compareDocumentPosition; - if ( compare ) { - return compare; - } - - // Calculate position if both inputs belong to the same document - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - compare = ( a.ownerDocument || a ) == ( b.ownerDocument || b ) ? - a.compareDocumentPosition( b ) : - - // Otherwise we know they are disconnected - 1; - - // Disconnected nodes - if ( compare & 1 || - ( !support.sortDetached && b.compareDocumentPosition( a ) === compare ) ) { - - // Choose the first element that is related to our preferred document - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( a == document || a.ownerDocument == preferredDoc && - contains( preferredDoc, a ) ) { - return -1; - } - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( b == document || b.ownerDocument == preferredDoc && - contains( preferredDoc, b ) ) { - return 1; - } - - // Maintain original order - return sortInput ? - ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : - 0; - } - - return compare & 4 ? -1 : 1; - } : - function( a, b ) { - - // Exit early if the nodes are identical - if ( a === b ) { - hasDuplicate = true; - return 0; - } - - var cur, - i = 0, - aup = a.parentNode, - bup = b.parentNode, - ap = [ a ], - bp = [ b ]; - - // Parentless nodes are either documents or disconnected - if ( !aup || !bup ) { - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - /* eslint-disable eqeqeq */ - return a == document ? -1 : - b == document ? 1 : - /* eslint-enable eqeqeq */ - aup ? -1 : - bup ? 1 : - sortInput ? - ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : - 0; - - // If the nodes are siblings, we can do a quick check - } else if ( aup === bup ) { - return siblingCheck( a, b ); - } - - // Otherwise we need full lists of their ancestors for comparison - cur = a; - while ( ( cur = cur.parentNode ) ) { - ap.unshift( cur ); - } - cur = b; - while ( ( cur = cur.parentNode ) ) { - bp.unshift( cur ); - } - - // Walk down the tree looking for a discrepancy - while ( ap[ i ] === bp[ i ] ) { - i++; - } - - return i ? - - // Do a sibling check if the nodes have a common ancestor - siblingCheck( ap[ i ], bp[ i ] ) : - - // Otherwise nodes in our document sort first - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - /* eslint-disable eqeqeq */ - ap[ i ] == preferredDoc ? -1 : - bp[ i ] == preferredDoc ? 1 : - /* eslint-enable eqeqeq */ - 0; - }; - - return document; -}; - -Sizzle.matches = function( expr, elements ) { - return Sizzle( expr, null, null, elements ); -}; - -Sizzle.matchesSelector = function( elem, expr ) { - setDocument( elem ); - - if ( support.matchesSelector && documentIsHTML && - !nonnativeSelectorCache[ expr + " " ] && - ( !rbuggyMatches || !rbuggyMatches.test( expr ) ) && - ( !rbuggyQSA || !rbuggyQSA.test( expr ) ) ) { - - try { - var ret = matches.call( elem, expr ); - - // IE 9's matchesSelector returns false on disconnected nodes - if ( ret || support.disconnectedMatch || - - // As well, disconnected nodes are said to be in a document - // fragment in IE 9 - elem.document && elem.document.nodeType !== 11 ) { - return ret; - } - } catch ( e ) { - nonnativeSelectorCache( expr, true ); - } - } - - return Sizzle( expr, document, null, [ elem ] ).length > 0; -}; - -Sizzle.contains = function( context, elem ) { - - // Set document vars if needed - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( ( context.ownerDocument || context ) != document ) { - setDocument( context ); - } - return contains( context, elem ); -}; - -Sizzle.attr = function( elem, name ) { - - // Set document vars if needed - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( ( elem.ownerDocument || elem ) != document ) { - setDocument( elem ); - } - - var fn = Expr.attrHandle[ name.toLowerCase() ], - - // Don't get fooled by Object.prototype properties (jQuery #13807) - val = fn && hasOwn.call( Expr.attrHandle, name.toLowerCase() ) ? - fn( elem, name, !documentIsHTML ) : - undefined; - - return val !== undefined ? - val : - support.attributes || !documentIsHTML ? - elem.getAttribute( name ) : - ( val = elem.getAttributeNode( name ) ) && val.specified ? - val.value : - null; -}; - -Sizzle.escape = function( sel ) { - return ( sel + "" ).replace( rcssescape, fcssescape ); -}; - -Sizzle.error = function( msg ) { - throw new Error( "Syntax error, unrecognized expression: " + msg ); -}; - -/** - * Document sorting and removing duplicates - * @param {ArrayLike} results - */ -Sizzle.uniqueSort = function( results ) { - var elem, - duplicates = [], - j = 0, - i = 0; - - // Unless we *know* we can detect duplicates, assume their presence - hasDuplicate = !support.detectDuplicates; - sortInput = !support.sortStable && results.slice( 0 ); - results.sort( sortOrder ); - - if ( hasDuplicate ) { - while ( ( elem = results[ i++ ] ) ) { - if ( elem === results[ i ] ) { - j = duplicates.push( i ); - } - } - while ( j-- ) { - results.splice( duplicates[ j ], 1 ); - } - } - - // Clear input after sorting to release objects - // See https://github.com/jquery/sizzle/pull/225 - sortInput = null; - - return results; -}; - -/** - * Utility function for retrieving the text value of an array of DOM nodes - * @param {Array|Element} elem - */ -getText = Sizzle.getText = function( elem ) { - var node, - ret = "", - i = 0, - nodeType = elem.nodeType; - - if ( !nodeType ) { - - // If no nodeType, this is expected to be an array - while ( ( node = elem[ i++ ] ) ) { - - // Do not traverse comment nodes - ret += getText( node ); - } - } else if ( nodeType === 1 || nodeType === 9 || nodeType === 11 ) { - - // Use textContent for elements - // innerText usage removed for consistency of new lines (jQuery #11153) - if ( typeof elem.textContent === "string" ) { - return elem.textContent; - } else { - - // Traverse its children - for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { - ret += getText( elem ); - } - } - } else if ( nodeType === 3 || nodeType === 4 ) { - return elem.nodeValue; - } - - // Do not include comment or processing instruction nodes - - return ret; -}; - -Expr = Sizzle.selectors = { - - // Can be adjusted by the user - cacheLength: 50, - - createPseudo: markFunction, - - match: matchExpr, - - attrHandle: {}, - - find: {}, - - relative: { - ">": { dir: "parentNode", first: true }, - " ": { dir: "parentNode" }, - "+": { dir: "previousSibling", first: true }, - "~": { dir: "previousSibling" } - }, - - preFilter: { - "ATTR": function( match ) { - match[ 1 ] = match[ 1 ].replace( runescape, funescape ); - - // Move the given value to match[3] whether quoted or unquoted - match[ 3 ] = ( match[ 3 ] || match[ 4 ] || - match[ 5 ] || "" ).replace( runescape, funescape ); - - if ( match[ 2 ] === "~=" ) { - match[ 3 ] = " " + match[ 3 ] + " "; - } - - return match.slice( 0, 4 ); - }, - - "CHILD": function( match ) { - - /* matches from matchExpr["CHILD"] - 1 type (only|nth|...) - 2 what (child|of-type) - 3 argument (even|odd|\d*|\d*n([+-]\d+)?|...) - 4 xn-component of xn+y argument ([+-]?\d*n|) - 5 sign of xn-component - 6 x of xn-component - 7 sign of y-component - 8 y of y-component - */ - match[ 1 ] = match[ 1 ].toLowerCase(); - - if ( match[ 1 ].slice( 0, 3 ) === "nth" ) { - - // nth-* requires argument - if ( !match[ 3 ] ) { - Sizzle.error( match[ 0 ] ); - } - - // numeric x and y parameters for Expr.filter.CHILD - // remember that false/true cast respectively to 0/1 - match[ 4 ] = +( match[ 4 ] ? - match[ 5 ] + ( match[ 6 ] || 1 ) : - 2 * ( match[ 3 ] === "even" || match[ 3 ] === "odd" ) ); - match[ 5 ] = +( ( match[ 7 ] + match[ 8 ] ) || match[ 3 ] === "odd" ); - - // other types prohibit arguments - } else if ( match[ 3 ] ) { - Sizzle.error( match[ 0 ] ); - } - - return match; - }, - - "PSEUDO": function( match ) { - var excess, - unquoted = !match[ 6 ] && match[ 2 ]; - - if ( matchExpr[ "CHILD" ].test( match[ 0 ] ) ) { - return null; - } - - // Accept quoted arguments as-is - if ( match[ 3 ] ) { - match[ 2 ] = match[ 4 ] || match[ 5 ] || ""; - - // Strip excess characters from unquoted arguments - } else if ( unquoted && rpseudo.test( unquoted ) && - - // Get excess from tokenize (recursively) - ( excess = tokenize( unquoted, true ) ) && - - // advance to the next closing parenthesis - ( excess = unquoted.indexOf( ")", unquoted.length - excess ) - unquoted.length ) ) { - - // excess is a negative index - match[ 0 ] = match[ 0 ].slice( 0, excess ); - match[ 2 ] = unquoted.slice( 0, excess ); - } - - // Return only captures needed by the pseudo filter method (type and argument) - return match.slice( 0, 3 ); - } - }, - - filter: { - - "TAG": function( nodeNameSelector ) { - var nodeName = nodeNameSelector.replace( runescape, funescape ).toLowerCase(); - return nodeNameSelector === "*" ? - function() { - return true; - } : - function( elem ) { - return elem.nodeName && elem.nodeName.toLowerCase() === nodeName; - }; - }, - - "CLASS": function( className ) { - var pattern = classCache[ className + " " ]; - - return pattern || - ( pattern = new RegExp( "(^|" + whitespace + - ")" + className + "(" + whitespace + "|$)" ) ) && classCache( - className, function( elem ) { - return pattern.test( - typeof elem.className === "string" && elem.className || - typeof elem.getAttribute !== "undefined" && - elem.getAttribute( "class" ) || - "" - ); - } ); - }, - - "ATTR": function( name, operator, check ) { - return function( elem ) { - var result = Sizzle.attr( elem, name ); - - if ( result == null ) { - return operator === "!="; - } - if ( !operator ) { - return true; - } - - result += ""; - - /* eslint-disable max-len */ - - return operator === "=" ? result === check : - operator === "!=" ? result !== check : - operator === "^=" ? check && result.indexOf( check ) === 0 : - operator === "*=" ? check && result.indexOf( check ) > -1 : - operator === "$=" ? check && result.slice( -check.length ) === check : - operator === "~=" ? ( " " + result.replace( rwhitespace, " " ) + " " ).indexOf( check ) > -1 : - operator === "|=" ? result === check || result.slice( 0, check.length + 1 ) === check + "-" : - false; - /* eslint-enable max-len */ - - }; - }, - - "CHILD": function( type, what, _argument, first, last ) { - var simple = type.slice( 0, 3 ) !== "nth", - forward = type.slice( -4 ) !== "last", - ofType = what === "of-type"; - - return first === 1 && last === 0 ? - - // Shortcut for :nth-*(n) - function( elem ) { - return !!elem.parentNode; - } : - - function( elem, _context, xml ) { - var cache, uniqueCache, outerCache, node, nodeIndex, start, - dir = simple !== forward ? "nextSibling" : "previousSibling", - parent = elem.parentNode, - name = ofType && elem.nodeName.toLowerCase(), - useCache = !xml && !ofType, - diff = false; - - if ( parent ) { - - // :(first|last|only)-(child|of-type) - if ( simple ) { - while ( dir ) { - node = elem; - while ( ( node = node[ dir ] ) ) { - if ( ofType ? - node.nodeName.toLowerCase() === name : - node.nodeType === 1 ) { - - return false; - } - } - - // Reverse direction for :only-* (if we haven't yet done so) - start = dir = type === "only" && !start && "nextSibling"; - } - return true; - } - - start = [ forward ? parent.firstChild : parent.lastChild ]; - - // non-xml :nth-child(...) stores cache data on `parent` - if ( forward && useCache ) { - - // Seek `elem` from a previously-cached index - - // ...in a gzip-friendly way - node = parent; - outerCache = node[ expando ] || ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - cache = uniqueCache[ type ] || []; - nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; - diff = nodeIndex && cache[ 2 ]; - node = nodeIndex && parent.childNodes[ nodeIndex ]; - - while ( ( node = ++nodeIndex && node && node[ dir ] || - - // Fallback to seeking `elem` from the start - ( diff = nodeIndex = 0 ) || start.pop() ) ) { - - // When found, cache indexes on `parent` and break - if ( node.nodeType === 1 && ++diff && node === elem ) { - uniqueCache[ type ] = [ dirruns, nodeIndex, diff ]; - break; - } - } - - } else { - - // Use previously-cached element index if available - if ( useCache ) { - - // ...in a gzip-friendly way - node = elem; - outerCache = node[ expando ] || ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - cache = uniqueCache[ type ] || []; - nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; - diff = nodeIndex; - } - - // xml :nth-child(...) - // or :nth-last-child(...) or :nth(-last)?-of-type(...) - if ( diff === false ) { - - // Use the same loop as above to seek `elem` from the start - while ( ( node = ++nodeIndex && node && node[ dir ] || - ( diff = nodeIndex = 0 ) || start.pop() ) ) { - - if ( ( ofType ? - node.nodeName.toLowerCase() === name : - node.nodeType === 1 ) && - ++diff ) { - - // Cache the index of each encountered element - if ( useCache ) { - outerCache = node[ expando ] || - ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - uniqueCache[ type ] = [ dirruns, diff ]; - } - - if ( node === elem ) { - break; - } - } - } - } - } - - // Incorporate the offset, then check against cycle size - diff -= last; - return diff === first || ( diff % first === 0 && diff / first >= 0 ); - } - }; - }, - - "PSEUDO": function( pseudo, argument ) { - - // pseudo-class names are case-insensitive - // http://www.w3.org/TR/selectors/#pseudo-classes - // Prioritize by case sensitivity in case custom pseudos are added with uppercase letters - // Remember that setFilters inherits from pseudos - var args, - fn = Expr.pseudos[ pseudo ] || Expr.setFilters[ pseudo.toLowerCase() ] || - Sizzle.error( "unsupported pseudo: " + pseudo ); - - // The user may use createPseudo to indicate that - // arguments are needed to create the filter function - // just as Sizzle does - if ( fn[ expando ] ) { - return fn( argument ); - } - - // But maintain support for old signatures - if ( fn.length > 1 ) { - args = [ pseudo, pseudo, "", argument ]; - return Expr.setFilters.hasOwnProperty( pseudo.toLowerCase() ) ? - markFunction( function( seed, matches ) { - var idx, - matched = fn( seed, argument ), - i = matched.length; - while ( i-- ) { - idx = indexOf( seed, matched[ i ] ); - seed[ idx ] = !( matches[ idx ] = matched[ i ] ); - } - } ) : - function( elem ) { - return fn( elem, 0, args ); - }; - } - - return fn; - } - }, - - pseudos: { - - // Potentially complex pseudos - "not": markFunction( function( selector ) { - - // Trim the selector passed to compile - // to avoid treating leading and trailing - // spaces as combinators - var input = [], - results = [], - matcher = compile( selector.replace( rtrim, "$1" ) ); - - return matcher[ expando ] ? - markFunction( function( seed, matches, _context, xml ) { - var elem, - unmatched = matcher( seed, null, xml, [] ), - i = seed.length; - - // Match elements unmatched by `matcher` - while ( i-- ) { - if ( ( elem = unmatched[ i ] ) ) { - seed[ i ] = !( matches[ i ] = elem ); - } - } - } ) : - function( elem, _context, xml ) { - input[ 0 ] = elem; - matcher( input, null, xml, results ); - - // Don't keep the element (issue #299) - input[ 0 ] = null; - return !results.pop(); - }; - } ), - - "has": markFunction( function( selector ) { - return function( elem ) { - return Sizzle( selector, elem ).length > 0; - }; - } ), - - "contains": markFunction( function( text ) { - text = text.replace( runescape, funescape ); - return function( elem ) { - return ( elem.textContent || getText( elem ) ).indexOf( text ) > -1; - }; - } ), - - // "Whether an element is represented by a :lang() selector - // is based solely on the element's language value - // being equal to the identifier C, - // or beginning with the identifier C immediately followed by "-". - // The matching of C against the element's language value is performed case-insensitively. - // The identifier C does not have to be a valid language name." - // http://www.w3.org/TR/selectors/#lang-pseudo - "lang": markFunction( function( lang ) { - - // lang value must be a valid identifier - if ( !ridentifier.test( lang || "" ) ) { - Sizzle.error( "unsupported lang: " + lang ); - } - lang = lang.replace( runescape, funescape ).toLowerCase(); - return function( elem ) { - var elemLang; - do { - if ( ( elemLang = documentIsHTML ? - elem.lang : - elem.getAttribute( "xml:lang" ) || elem.getAttribute( "lang" ) ) ) { - - elemLang = elemLang.toLowerCase(); - return elemLang === lang || elemLang.indexOf( lang + "-" ) === 0; - } - } while ( ( elem = elem.parentNode ) && elem.nodeType === 1 ); - return false; - }; - } ), - - // Miscellaneous - "target": function( elem ) { - var hash = window.location && window.location.hash; - return hash && hash.slice( 1 ) === elem.id; - }, - - "root": function( elem ) { - return elem === docElem; - }, - - "focus": function( elem ) { - return elem === document.activeElement && - ( !document.hasFocus || document.hasFocus() ) && - !!( elem.type || elem.href || ~elem.tabIndex ); - }, - - // Boolean properties - "enabled": createDisabledPseudo( false ), - "disabled": createDisabledPseudo( true ), - - "checked": function( elem ) { - - // In CSS3, :checked should return both checked and selected elements - // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked - var nodeName = elem.nodeName.toLowerCase(); - return ( nodeName === "input" && !!elem.checked ) || - ( nodeName === "option" && !!elem.selected ); - }, - - "selected": function( elem ) { - - // Accessing this property makes selected-by-default - // options in Safari work properly - if ( elem.parentNode ) { - // eslint-disable-next-line no-unused-expressions - elem.parentNode.selectedIndex; - } - - return elem.selected === true; - }, - - // Contents - "empty": function( elem ) { - - // http://www.w3.org/TR/selectors/#empty-pseudo - // :empty is negated by element (1) or content nodes (text: 3; cdata: 4; entity ref: 5), - // but not by others (comment: 8; processing instruction: 7; etc.) - // nodeType < 6 works because attributes (2) do not appear as children - for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { - if ( elem.nodeType < 6 ) { - return false; - } - } - return true; - }, - - "parent": function( elem ) { - return !Expr.pseudos[ "empty" ]( elem ); - }, - - // Element/input types - "header": function( elem ) { - return rheader.test( elem.nodeName ); - }, - - "input": function( elem ) { - return rinputs.test( elem.nodeName ); - }, - - "button": function( elem ) { - var name = elem.nodeName.toLowerCase(); - return name === "input" && elem.type === "button" || name === "button"; - }, - - "text": function( elem ) { - var attr; - return elem.nodeName.toLowerCase() === "input" && - elem.type === "text" && - - // Support: IE<8 - // New HTML5 attribute values (e.g., "search") appear with elem.type === "text" - ( ( attr = elem.getAttribute( "type" ) ) == null || - attr.toLowerCase() === "text" ); - }, - - // Position-in-collection - "first": createPositionalPseudo( function() { - return [ 0 ]; - } ), - - "last": createPositionalPseudo( function( _matchIndexes, length ) { - return [ length - 1 ]; - } ), - - "eq": createPositionalPseudo( function( _matchIndexes, length, argument ) { - return [ argument < 0 ? argument + length : argument ]; - } ), - - "even": createPositionalPseudo( function( matchIndexes, length ) { - var i = 0; - for ( ; i < length; i += 2 ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "odd": createPositionalPseudo( function( matchIndexes, length ) { - var i = 1; - for ( ; i < length; i += 2 ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "lt": createPositionalPseudo( function( matchIndexes, length, argument ) { - var i = argument < 0 ? - argument + length : - argument > length ? - length : - argument; - for ( ; --i >= 0; ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "gt": createPositionalPseudo( function( matchIndexes, length, argument ) { - var i = argument < 0 ? argument + length : argument; - for ( ; ++i < length; ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ) - } -}; - -Expr.pseudos[ "nth" ] = Expr.pseudos[ "eq" ]; - -// Add button/input type pseudos -for ( i in { radio: true, checkbox: true, file: true, password: true, image: true } ) { - Expr.pseudos[ i ] = createInputPseudo( i ); -} -for ( i in { submit: true, reset: true } ) { - Expr.pseudos[ i ] = createButtonPseudo( i ); -} - -// Easy API for creating new setFilters -function setFilters() {} -setFilters.prototype = Expr.filters = Expr.pseudos; -Expr.setFilters = new setFilters(); - -tokenize = Sizzle.tokenize = function( selector, parseOnly ) { - var matched, match, tokens, type, - soFar, groups, preFilters, - cached = tokenCache[ selector + " " ]; - - if ( cached ) { - return parseOnly ? 0 : cached.slice( 0 ); - } - - soFar = selector; - groups = []; - preFilters = Expr.preFilter; - - while ( soFar ) { - - // Comma and first run - if ( !matched || ( match = rcomma.exec( soFar ) ) ) { - if ( match ) { - - // Don't consume trailing commas as valid - soFar = soFar.slice( match[ 0 ].length ) || soFar; - } - groups.push( ( tokens = [] ) ); - } - - matched = false; - - // Combinators - if ( ( match = rcombinators.exec( soFar ) ) ) { - matched = match.shift(); - tokens.push( { - value: matched, - - // Cast descendant combinators to space - type: match[ 0 ].replace( rtrim, " " ) - } ); - soFar = soFar.slice( matched.length ); - } - - // Filters - for ( type in Expr.filter ) { - if ( ( match = matchExpr[ type ].exec( soFar ) ) && ( !preFilters[ type ] || - ( match = preFilters[ type ]( match ) ) ) ) { - matched = match.shift(); - tokens.push( { - value: matched, - type: type, - matches: match - } ); - soFar = soFar.slice( matched.length ); - } - } - - if ( !matched ) { - break; - } - } - - // Return the length of the invalid excess - // if we're just parsing - // Otherwise, throw an error or return tokens - return parseOnly ? - soFar.length : - soFar ? - Sizzle.error( selector ) : - - // Cache the tokens - tokenCache( selector, groups ).slice( 0 ); -}; - -function toSelector( tokens ) { - var i = 0, - len = tokens.length, - selector = ""; - for ( ; i < len; i++ ) { - selector += tokens[ i ].value; - } - return selector; -} - -function addCombinator( matcher, combinator, base ) { - var dir = combinator.dir, - skip = combinator.next, - key = skip || dir, - checkNonElements = base && key === "parentNode", - doneName = done++; - - return combinator.first ? - - // Check against closest ancestor/preceding element - function( elem, context, xml ) { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - return matcher( elem, context, xml ); - } - } - return false; - } : - - // Check against all ancestor/preceding elements - function( elem, context, xml ) { - var oldCache, uniqueCache, outerCache, - newCache = [ dirruns, doneName ]; - - // We can't set arbitrary data on XML nodes, so they don't benefit from combinator caching - if ( xml ) { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - if ( matcher( elem, context, xml ) ) { - return true; - } - } - } - } else { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - outerCache = elem[ expando ] || ( elem[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ elem.uniqueID ] || - ( outerCache[ elem.uniqueID ] = {} ); - - if ( skip && skip === elem.nodeName.toLowerCase() ) { - elem = elem[ dir ] || elem; - } else if ( ( oldCache = uniqueCache[ key ] ) && - oldCache[ 0 ] === dirruns && oldCache[ 1 ] === doneName ) { - - // Assign to newCache so results back-propagate to previous elements - return ( newCache[ 2 ] = oldCache[ 2 ] ); - } else { - - // Reuse newcache so results back-propagate to previous elements - uniqueCache[ key ] = newCache; - - // A match means we're done; a fail means we have to keep checking - if ( ( newCache[ 2 ] = matcher( elem, context, xml ) ) ) { - return true; - } - } - } - } - } - return false; - }; -} - -function elementMatcher( matchers ) { - return matchers.length > 1 ? - function( elem, context, xml ) { - var i = matchers.length; - while ( i-- ) { - if ( !matchers[ i ]( elem, context, xml ) ) { - return false; - } - } - return true; - } : - matchers[ 0 ]; -} - -function multipleContexts( selector, contexts, results ) { - var i = 0, - len = contexts.length; - for ( ; i < len; i++ ) { - Sizzle( selector, contexts[ i ], results ); - } - return results; -} - -function condense( unmatched, map, filter, context, xml ) { - var elem, - newUnmatched = [], - i = 0, - len = unmatched.length, - mapped = map != null; - - for ( ; i < len; i++ ) { - if ( ( elem = unmatched[ i ] ) ) { - if ( !filter || filter( elem, context, xml ) ) { - newUnmatched.push( elem ); - if ( mapped ) { - map.push( i ); - } - } - } - } - - return newUnmatched; -} - -function setMatcher( preFilter, selector, matcher, postFilter, postFinder, postSelector ) { - if ( postFilter && !postFilter[ expando ] ) { - postFilter = setMatcher( postFilter ); - } - if ( postFinder && !postFinder[ expando ] ) { - postFinder = setMatcher( postFinder, postSelector ); - } - return markFunction( function( seed, results, context, xml ) { - var temp, i, elem, - preMap = [], - postMap = [], - preexisting = results.length, - - // Get initial elements from seed or context - elems = seed || multipleContexts( - selector || "*", - context.nodeType ? [ context ] : context, - [] - ), - - // Prefilter to get matcher input, preserving a map for seed-results synchronization - matcherIn = preFilter && ( seed || !selector ) ? - condense( elems, preMap, preFilter, context, xml ) : - elems, - - matcherOut = matcher ? - - // If we have a postFinder, or filtered seed, or non-seed postFilter or preexisting results, - postFinder || ( seed ? preFilter : preexisting || postFilter ) ? - - // ...intermediate processing is necessary - [] : - - // ...otherwise use results directly - results : - matcherIn; - - // Find primary matches - if ( matcher ) { - matcher( matcherIn, matcherOut, context, xml ); - } - - // Apply postFilter - if ( postFilter ) { - temp = condense( matcherOut, postMap ); - postFilter( temp, [], context, xml ); - - // Un-match failing elements by moving them back to matcherIn - i = temp.length; - while ( i-- ) { - if ( ( elem = temp[ i ] ) ) { - matcherOut[ postMap[ i ] ] = !( matcherIn[ postMap[ i ] ] = elem ); - } - } - } - - if ( seed ) { - if ( postFinder || preFilter ) { - if ( postFinder ) { - - // Get the final matcherOut by condensing this intermediate into postFinder contexts - temp = []; - i = matcherOut.length; - while ( i-- ) { - if ( ( elem = matcherOut[ i ] ) ) { - - // Restore matcherIn since elem is not yet a final match - temp.push( ( matcherIn[ i ] = elem ) ); - } - } - postFinder( null, ( matcherOut = [] ), temp, xml ); - } - - // Move matched elements from seed to results to keep them synchronized - i = matcherOut.length; - while ( i-- ) { - if ( ( elem = matcherOut[ i ] ) && - ( temp = postFinder ? indexOf( seed, elem ) : preMap[ i ] ) > -1 ) { - - seed[ temp ] = !( results[ temp ] = elem ); - } - } - } - - // Add elements to results, through postFinder if defined - } else { - matcherOut = condense( - matcherOut === results ? - matcherOut.splice( preexisting, matcherOut.length ) : - matcherOut - ); - if ( postFinder ) { - postFinder( null, results, matcherOut, xml ); - } else { - push.apply( results, matcherOut ); - } - } - } ); -} - -function matcherFromTokens( tokens ) { - var checkContext, matcher, j, - len = tokens.length, - leadingRelative = Expr.relative[ tokens[ 0 ].type ], - implicitRelative = leadingRelative || Expr.relative[ " " ], - i = leadingRelative ? 1 : 0, - - // The foundational matcher ensures that elements are reachable from top-level context(s) - matchContext = addCombinator( function( elem ) { - return elem === checkContext; - }, implicitRelative, true ), - matchAnyContext = addCombinator( function( elem ) { - return indexOf( checkContext, elem ) > -1; - }, implicitRelative, true ), - matchers = [ function( elem, context, xml ) { - var ret = ( !leadingRelative && ( xml || context !== outermostContext ) ) || ( - ( checkContext = context ).nodeType ? - matchContext( elem, context, xml ) : - matchAnyContext( elem, context, xml ) ); - - // Avoid hanging onto element (issue #299) - checkContext = null; - return ret; - } ]; - - for ( ; i < len; i++ ) { - if ( ( matcher = Expr.relative[ tokens[ i ].type ] ) ) { - matchers = [ addCombinator( elementMatcher( matchers ), matcher ) ]; - } else { - matcher = Expr.filter[ tokens[ i ].type ].apply( null, tokens[ i ].matches ); - - // Return special upon seeing a positional matcher - if ( matcher[ expando ] ) { - - // Find the next relative operator (if any) for proper handling - j = ++i; - for ( ; j < len; j++ ) { - if ( Expr.relative[ tokens[ j ].type ] ) { - break; - } - } - return setMatcher( - i > 1 && elementMatcher( matchers ), - i > 1 && toSelector( - - // If the preceding token was a descendant combinator, insert an implicit any-element `*` - tokens - .slice( 0, i - 1 ) - .concat( { value: tokens[ i - 2 ].type === " " ? "*" : "" } ) - ).replace( rtrim, "$1" ), - matcher, - i < j && matcherFromTokens( tokens.slice( i, j ) ), - j < len && matcherFromTokens( ( tokens = tokens.slice( j ) ) ), - j < len && toSelector( tokens ) - ); - } - matchers.push( matcher ); - } - } - - return elementMatcher( matchers ); -} - -function matcherFromGroupMatchers( elementMatchers, setMatchers ) { - var bySet = setMatchers.length > 0, - byElement = elementMatchers.length > 0, - superMatcher = function( seed, context, xml, results, outermost ) { - var elem, j, matcher, - matchedCount = 0, - i = "0", - unmatched = seed && [], - setMatched = [], - contextBackup = outermostContext, - - // We must always have either seed elements or outermost context - elems = seed || byElement && Expr.find[ "TAG" ]( "*", outermost ), - - // Use integer dirruns iff this is the outermost matcher - dirrunsUnique = ( dirruns += contextBackup == null ? 1 : Math.random() || 0.1 ), - len = elems.length; - - if ( outermost ) { - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - outermostContext = context == document || context || outermost; - } - - // Add elements passing elementMatchers directly to results - // Support: IE<9, Safari - // Tolerate NodeList properties (IE: "length"; Safari: ) matching elements by id - for ( ; i !== len && ( elem = elems[ i ] ) != null; i++ ) { - if ( byElement && elem ) { - j = 0; - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( !context && elem.ownerDocument != document ) { - setDocument( elem ); - xml = !documentIsHTML; - } - while ( ( matcher = elementMatchers[ j++ ] ) ) { - if ( matcher( elem, context || document, xml ) ) { - results.push( elem ); - break; - } - } - if ( outermost ) { - dirruns = dirrunsUnique; - } - } - - // Track unmatched elements for set filters - if ( bySet ) { - - // They will have gone through all possible matchers - if ( ( elem = !matcher && elem ) ) { - matchedCount--; - } - - // Lengthen the array for every element, matched or not - if ( seed ) { - unmatched.push( elem ); - } - } - } - - // `i` is now the count of elements visited above, and adding it to `matchedCount` - // makes the latter nonnegative. - matchedCount += i; - - // Apply set filters to unmatched elements - // NOTE: This can be skipped if there are no unmatched elements (i.e., `matchedCount` - // equals `i`), unless we didn't visit _any_ elements in the above loop because we have - // no element matchers and no seed. - // Incrementing an initially-string "0" `i` allows `i` to remain a string only in that - // case, which will result in a "00" `matchedCount` that differs from `i` but is also - // numerically zero. - if ( bySet && i !== matchedCount ) { - j = 0; - while ( ( matcher = setMatchers[ j++ ] ) ) { - matcher( unmatched, setMatched, context, xml ); - } - - if ( seed ) { - - // Reintegrate element matches to eliminate the need for sorting - if ( matchedCount > 0 ) { - while ( i-- ) { - if ( !( unmatched[ i ] || setMatched[ i ] ) ) { - setMatched[ i ] = pop.call( results ); - } - } - } - - // Discard index placeholder values to get only actual matches - setMatched = condense( setMatched ); - } - - // Add matches to results - push.apply( results, setMatched ); - - // Seedless set matches succeeding multiple successful matchers stipulate sorting - if ( outermost && !seed && setMatched.length > 0 && - ( matchedCount + setMatchers.length ) > 1 ) { - - Sizzle.uniqueSort( results ); - } - } - - // Override manipulation of globals by nested matchers - if ( outermost ) { - dirruns = dirrunsUnique; - outermostContext = contextBackup; - } - - return unmatched; - }; - - return bySet ? - markFunction( superMatcher ) : - superMatcher; -} - -compile = Sizzle.compile = function( selector, match /* Internal Use Only */ ) { - var i, - setMatchers = [], - elementMatchers = [], - cached = compilerCache[ selector + " " ]; - - if ( !cached ) { - - // Generate a function of recursive functions that can be used to check each element - if ( !match ) { - match = tokenize( selector ); - } - i = match.length; - while ( i-- ) { - cached = matcherFromTokens( match[ i ] ); - if ( cached[ expando ] ) { - setMatchers.push( cached ); - } else { - elementMatchers.push( cached ); - } - } - - // Cache the compiled function - cached = compilerCache( - selector, - matcherFromGroupMatchers( elementMatchers, setMatchers ) - ); - - // Save selector and tokenization - cached.selector = selector; - } - return cached; -}; - -/** - * A low-level selection function that works with Sizzle's compiled - * selector functions - * @param {String|Function} selector A selector or a pre-compiled - * selector function built with Sizzle.compile - * @param {Element} context - * @param {Array} [results] - * @param {Array} [seed] A set of elements to match against - */ -select = Sizzle.select = function( selector, context, results, seed ) { - var i, tokens, token, type, find, - compiled = typeof selector === "function" && selector, - match = !seed && tokenize( ( selector = compiled.selector || selector ) ); - - results = results || []; - - // Try to minimize operations if there is only one selector in the list and no seed - // (the latter of which guarantees us context) - if ( match.length === 1 ) { - - // Reduce context if the leading compound selector is an ID - tokens = match[ 0 ] = match[ 0 ].slice( 0 ); - if ( tokens.length > 2 && ( token = tokens[ 0 ] ).type === "ID" && - context.nodeType === 9 && documentIsHTML && Expr.relative[ tokens[ 1 ].type ] ) { - - context = ( Expr.find[ "ID" ]( token.matches[ 0 ] - .replace( runescape, funescape ), context ) || [] )[ 0 ]; - if ( !context ) { - return results; - - // Precompiled matchers will still verify ancestry, so step up a level - } else if ( compiled ) { - context = context.parentNode; - } - - selector = selector.slice( tokens.shift().value.length ); - } - - // Fetch a seed set for right-to-left matching - i = matchExpr[ "needsContext" ].test( selector ) ? 0 : tokens.length; - while ( i-- ) { - token = tokens[ i ]; - - // Abort if we hit a combinator - if ( Expr.relative[ ( type = token.type ) ] ) { - break; - } - if ( ( find = Expr.find[ type ] ) ) { - - // Search, expanding context for leading sibling combinators - if ( ( seed = find( - token.matches[ 0 ].replace( runescape, funescape ), - rsibling.test( tokens[ 0 ].type ) && testContext( context.parentNode ) || - context - ) ) ) { - - // If seed is empty or no tokens remain, we can return early - tokens.splice( i, 1 ); - selector = seed.length && toSelector( tokens ); - if ( !selector ) { - push.apply( results, seed ); - return results; - } - - break; - } - } - } - } - - // Compile and execute a filtering function if one is not provided - // Provide `match` to avoid retokenization if we modified the selector above - ( compiled || compile( selector, match ) )( - seed, - context, - !documentIsHTML, - results, - !context || rsibling.test( selector ) && testContext( context.parentNode ) || context - ); - return results; -}; - -// One-time assignments - -// Sort stability -support.sortStable = expando.split( "" ).sort( sortOrder ).join( "" ) === expando; - -// Support: Chrome 14-35+ -// Always assume duplicates if they aren't passed to the comparison function -support.detectDuplicates = !!hasDuplicate; - -// Initialize against the default document -setDocument(); - -// Support: Webkit<537.32 - Safari 6.0.3/Chrome 25 (fixed in Chrome 27) -// Detached nodes confoundingly follow *each other* -support.sortDetached = assert( function( el ) { - - // Should return 1, but returns 4 (following) - return el.compareDocumentPosition( document.createElement( "fieldset" ) ) & 1; -} ); - -// Support: IE<8 -// Prevent attribute/property "interpolation" -// https://msdn.microsoft.com/en-us/library/ms536429%28VS.85%29.aspx -if ( !assert( function( el ) { - el.innerHTML = ""; - return el.firstChild.getAttribute( "href" ) === "#"; -} ) ) { - addHandle( "type|href|height|width", function( elem, name, isXML ) { - if ( !isXML ) { - return elem.getAttribute( name, name.toLowerCase() === "type" ? 1 : 2 ); - } - } ); -} - -// Support: IE<9 -// Use defaultValue in place of getAttribute("value") -if ( !support.attributes || !assert( function( el ) { - el.innerHTML = ""; - el.firstChild.setAttribute( "value", "" ); - return el.firstChild.getAttribute( "value" ) === ""; -} ) ) { - addHandle( "value", function( elem, _name, isXML ) { - if ( !isXML && elem.nodeName.toLowerCase() === "input" ) { - return elem.defaultValue; - } - } ); -} - -// Support: IE<9 -// Use getAttributeNode to fetch booleans when getAttribute lies -if ( !assert( function( el ) { - return el.getAttribute( "disabled" ) == null; -} ) ) { - addHandle( booleans, function( elem, name, isXML ) { - var val; - if ( !isXML ) { - return elem[ name ] === true ? name.toLowerCase() : - ( val = elem.getAttributeNode( name ) ) && val.specified ? - val.value : - null; - } - } ); -} - -return Sizzle; - -} )( window ); - - - -jQuery.find = Sizzle; -jQuery.expr = Sizzle.selectors; - -// Deprecated -jQuery.expr[ ":" ] = jQuery.expr.pseudos; -jQuery.uniqueSort = jQuery.unique = Sizzle.uniqueSort; -jQuery.text = Sizzle.getText; -jQuery.isXMLDoc = Sizzle.isXML; -jQuery.contains = Sizzle.contains; -jQuery.escapeSelector = Sizzle.escape; - - - - -var dir = function( elem, dir, until ) { - var matched = [], - truncate = until !== undefined; - - while ( ( elem = elem[ dir ] ) && elem.nodeType !== 9 ) { - if ( elem.nodeType === 1 ) { - if ( truncate && jQuery( elem ).is( until ) ) { - break; - } - matched.push( elem ); - } - } - return matched; -}; - - -var siblings = function( n, elem ) { - var matched = []; - - for ( ; n; n = n.nextSibling ) { - if ( n.nodeType === 1 && n !== elem ) { - matched.push( n ); - } - } - - return matched; -}; - - -var rneedsContext = jQuery.expr.match.needsContext; - - - -function nodeName( elem, name ) { - - return elem.nodeName && elem.nodeName.toLowerCase() === name.toLowerCase(); - -} -var rsingleTag = ( /^<([a-z][^\/\0>:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i ); - - - -// Implement the identical functionality for filter and not -function winnow( elements, qualifier, not ) { - if ( isFunction( qualifier ) ) { - return jQuery.grep( elements, function( elem, i ) { - return !!qualifier.call( elem, i, elem ) !== not; - } ); - } - - // Single element - if ( qualifier.nodeType ) { - return jQuery.grep( elements, function( elem ) { - return ( elem === qualifier ) !== not; - } ); - } - - // Arraylike of elements (jQuery, arguments, Array) - if ( typeof qualifier !== "string" ) { - return jQuery.grep( elements, function( elem ) { - return ( indexOf.call( qualifier, elem ) > -1 ) !== not; - } ); - } - - // Filtered directly for both simple and complex selectors - return jQuery.filter( qualifier, elements, not ); -} - -jQuery.filter = function( expr, elems, not ) { - var elem = elems[ 0 ]; - - if ( not ) { - expr = ":not(" + expr + ")"; - } - - if ( elems.length === 1 && elem.nodeType === 1 ) { - return jQuery.find.matchesSelector( elem, expr ) ? [ elem ] : []; - } - - return jQuery.find.matches( expr, jQuery.grep( elems, function( elem ) { - return elem.nodeType === 1; - } ) ); -}; - -jQuery.fn.extend( { - find: function( selector ) { - var i, ret, - len = this.length, - self = this; - - if ( typeof selector !== "string" ) { - return this.pushStack( jQuery( selector ).filter( function() { - for ( i = 0; i < len; i++ ) { - if ( jQuery.contains( self[ i ], this ) ) { - return true; - } - } - } ) ); - } - - ret = this.pushStack( [] ); - - for ( i = 0; i < len; i++ ) { - jQuery.find( selector, self[ i ], ret ); - } - - return len > 1 ? jQuery.uniqueSort( ret ) : ret; - }, - filter: function( selector ) { - return this.pushStack( winnow( this, selector || [], false ) ); - }, - not: function( selector ) { - return this.pushStack( winnow( this, selector || [], true ) ); - }, - is: function( selector ) { - return !!winnow( - this, - - // If this is a positional/relative selector, check membership in the returned set - // so $("p:first").is("p:last") won't return true for a doc with two "p". - typeof selector === "string" && rneedsContext.test( selector ) ? - jQuery( selector ) : - selector || [], - false - ).length; - } -} ); - - -// Initialize a jQuery object - - -// A central reference to the root jQuery(document) -var rootjQuery, - - // A simple way to check for HTML strings - // Prioritize #id over to avoid XSS via location.hash (#9521) - // Strict HTML recognition (#11290: must start with <) - // Shortcut simple #id case for speed - rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/, - - init = jQuery.fn.init = function( selector, context, root ) { - var match, elem; - - // HANDLE: $(""), $(null), $(undefined), $(false) - if ( !selector ) { - return this; - } - - // Method init() accepts an alternate rootjQuery - // so migrate can support jQuery.sub (gh-2101) - root = root || rootjQuery; - - // Handle HTML strings - if ( typeof selector === "string" ) { - if ( selector[ 0 ] === "<" && - selector[ selector.length - 1 ] === ">" && - selector.length >= 3 ) { - - // Assume that strings that start and end with <> are HTML and skip the regex check - match = [ null, selector, null ]; - - } else { - match = rquickExpr.exec( selector ); - } - - // Match html or make sure no context is specified for #id - if ( match && ( match[ 1 ] || !context ) ) { - - // HANDLE: $(html) -> $(array) - if ( match[ 1 ] ) { - context = context instanceof jQuery ? context[ 0 ] : context; - - // Option to run scripts is true for back-compat - // Intentionally let the error be thrown if parseHTML is not present - jQuery.merge( this, jQuery.parseHTML( - match[ 1 ], - context && context.nodeType ? context.ownerDocument || context : document, - true - ) ); - - // HANDLE: $(html, props) - if ( rsingleTag.test( match[ 1 ] ) && jQuery.isPlainObject( context ) ) { - for ( match in context ) { - - // Properties of context are called as methods if possible - if ( isFunction( this[ match ] ) ) { - this[ match ]( context[ match ] ); - - // ...and otherwise set as attributes - } else { - this.attr( match, context[ match ] ); - } - } - } - - return this; - - // HANDLE: $(#id) - } else { - elem = document.getElementById( match[ 2 ] ); - - if ( elem ) { - - // Inject the element directly into the jQuery object - this[ 0 ] = elem; - this.length = 1; - } - return this; - } - - // HANDLE: $(expr, $(...)) - } else if ( !context || context.jquery ) { - return ( context || root ).find( selector ); - - // HANDLE: $(expr, context) - // (which is just equivalent to: $(context).find(expr) - } else { - return this.constructor( context ).find( selector ); - } - - // HANDLE: $(DOMElement) - } else if ( selector.nodeType ) { - this[ 0 ] = selector; - this.length = 1; - return this; - - // HANDLE: $(function) - // Shortcut for document ready - } else if ( isFunction( selector ) ) { - return root.ready !== undefined ? - root.ready( selector ) : - - // Execute immediately if ready is not present - selector( jQuery ); - } - - return jQuery.makeArray( selector, this ); - }; - -// Give the init function the jQuery prototype for later instantiation -init.prototype = jQuery.fn; - -// Initialize central reference -rootjQuery = jQuery( document ); - - -var rparentsprev = /^(?:parents|prev(?:Until|All))/, - - // Methods guaranteed to produce a unique set when starting from a unique set - guaranteedUnique = { - children: true, - contents: true, - next: true, - prev: true - }; - -jQuery.fn.extend( { - has: function( target ) { - var targets = jQuery( target, this ), - l = targets.length; - - return this.filter( function() { - var i = 0; - for ( ; i < l; i++ ) { - if ( jQuery.contains( this, targets[ i ] ) ) { - return true; - } - } - } ); - }, - - closest: function( selectors, context ) { - var cur, - i = 0, - l = this.length, - matched = [], - targets = typeof selectors !== "string" && jQuery( selectors ); - - // Positional selectors never match, since there's no _selection_ context - if ( !rneedsContext.test( selectors ) ) { - for ( ; i < l; i++ ) { - for ( cur = this[ i ]; cur && cur !== context; cur = cur.parentNode ) { - - // Always skip document fragments - if ( cur.nodeType < 11 && ( targets ? - targets.index( cur ) > -1 : - - // Don't pass non-elements to Sizzle - cur.nodeType === 1 && - jQuery.find.matchesSelector( cur, selectors ) ) ) { - - matched.push( cur ); - break; - } - } - } - } - - return this.pushStack( matched.length > 1 ? jQuery.uniqueSort( matched ) : matched ); - }, - - // Determine the position of an element within the set - index: function( elem ) { - - // No argument, return index in parent - if ( !elem ) { - return ( this[ 0 ] && this[ 0 ].parentNode ) ? this.first().prevAll().length : -1; - } - - // Index in selector - if ( typeof elem === "string" ) { - return indexOf.call( jQuery( elem ), this[ 0 ] ); - } - - // Locate the position of the desired element - return indexOf.call( this, - - // If it receives a jQuery object, the first element is used - elem.jquery ? elem[ 0 ] : elem - ); - }, - - add: function( selector, context ) { - return this.pushStack( - jQuery.uniqueSort( - jQuery.merge( this.get(), jQuery( selector, context ) ) - ) - ); - }, - - addBack: function( selector ) { - return this.add( selector == null ? - this.prevObject : this.prevObject.filter( selector ) - ); - } -} ); - -function sibling( cur, dir ) { - while ( ( cur = cur[ dir ] ) && cur.nodeType !== 1 ) {} - return cur; -} - -jQuery.each( { - parent: function( elem ) { - var parent = elem.parentNode; - return parent && parent.nodeType !== 11 ? parent : null; - }, - parents: function( elem ) { - return dir( elem, "parentNode" ); - }, - parentsUntil: function( elem, _i, until ) { - return dir( elem, "parentNode", until ); - }, - next: function( elem ) { - return sibling( elem, "nextSibling" ); - }, - prev: function( elem ) { - return sibling( elem, "previousSibling" ); - }, - nextAll: function( elem ) { - return dir( elem, "nextSibling" ); - }, - prevAll: function( elem ) { - return dir( elem, "previousSibling" ); - }, - nextUntil: function( elem, _i, until ) { - return dir( elem, "nextSibling", until ); - }, - prevUntil: function( elem, _i, until ) { - return dir( elem, "previousSibling", until ); - }, - siblings: function( elem ) { - return siblings( ( elem.parentNode || {} ).firstChild, elem ); - }, - children: function( elem ) { - return siblings( elem.firstChild ); - }, - contents: function( elem ) { - if ( elem.contentDocument != null && - - // Support: IE 11+ - // elements with no `data` attribute has an object - // `contentDocument` with a `null` prototype. - getProto( elem.contentDocument ) ) { - - return elem.contentDocument; - } - - // Support: IE 9 - 11 only, iOS 7 only, Android Browser <=4.3 only - // Treat the template element as a regular one in browsers that - // don't support it. - if ( nodeName( elem, "template" ) ) { - elem = elem.content || elem; - } - - return jQuery.merge( [], elem.childNodes ); - } -}, function( name, fn ) { - jQuery.fn[ name ] = function( until, selector ) { - var matched = jQuery.map( this, fn, until ); - - if ( name.slice( -5 ) !== "Until" ) { - selector = until; - } - - if ( selector && typeof selector === "string" ) { - matched = jQuery.filter( selector, matched ); - } - - if ( this.length > 1 ) { - - // Remove duplicates - if ( !guaranteedUnique[ name ] ) { - jQuery.uniqueSort( matched ); - } - - // Reverse order for parents* and prev-derivatives - if ( rparentsprev.test( name ) ) { - matched.reverse(); - } - } - - return this.pushStack( matched ); - }; -} ); -var rnothtmlwhite = ( /[^\x20\t\r\n\f]+/g ); - - - -// Convert String-formatted options into Object-formatted ones -function createOptions( options ) { - var object = {}; - jQuery.each( options.match( rnothtmlwhite ) || [], function( _, flag ) { - object[ flag ] = true; - } ); - return object; -} - -/* - * Create a callback list using the following parameters: - * - * options: an optional list of space-separated options that will change how - * the callback list behaves or a more traditional option object - * - * By default a callback list will act like an event callback list and can be - * "fired" multiple times. - * - * Possible options: - * - * once: will ensure the callback list can only be fired once (like a Deferred) - * - * memory: will keep track of previous values and will call any callback added - * after the list has been fired right away with the latest "memorized" - * values (like a Deferred) - * - * unique: will ensure a callback can only be added once (no duplicate in the list) - * - * stopOnFalse: interrupt callings when a callback returns false - * - */ -jQuery.Callbacks = function( options ) { - - // Convert options from String-formatted to Object-formatted if needed - // (we check in cache first) - options = typeof options === "string" ? - createOptions( options ) : - jQuery.extend( {}, options ); - - var // Flag to know if list is currently firing - firing, - - // Last fire value for non-forgettable lists - memory, - - // Flag to know if list was already fired - fired, - - // Flag to prevent firing - locked, - - // Actual callback list - list = [], - - // Queue of execution data for repeatable lists - queue = [], - - // Index of currently firing callback (modified by add/remove as needed) - firingIndex = -1, - - // Fire callbacks - fire = function() { - - // Enforce single-firing - locked = locked || options.once; - - // Execute callbacks for all pending executions, - // respecting firingIndex overrides and runtime changes - fired = firing = true; - for ( ; queue.length; firingIndex = -1 ) { - memory = queue.shift(); - while ( ++firingIndex < list.length ) { - - // Run callback and check for early termination - if ( list[ firingIndex ].apply( memory[ 0 ], memory[ 1 ] ) === false && - options.stopOnFalse ) { - - // Jump to end and forget the data so .add doesn't re-fire - firingIndex = list.length; - memory = false; - } - } - } - - // Forget the data if we're done with it - if ( !options.memory ) { - memory = false; - } - - firing = false; - - // Clean up if we're done firing for good - if ( locked ) { - - // Keep an empty list if we have data for future add calls - if ( memory ) { - list = []; - - // Otherwise, this object is spent - } else { - list = ""; - } - } - }, - - // Actual Callbacks object - self = { - - // Add a callback or a collection of callbacks to the list - add: function() { - if ( list ) { - - // If we have memory from a past run, we should fire after adding - if ( memory && !firing ) { - firingIndex = list.length - 1; - queue.push( memory ); - } - - ( function add( args ) { - jQuery.each( args, function( _, arg ) { - if ( isFunction( arg ) ) { - if ( !options.unique || !self.has( arg ) ) { - list.push( arg ); - } - } else if ( arg && arg.length && toType( arg ) !== "string" ) { - - // Inspect recursively - add( arg ); - } - } ); - } )( arguments ); - - if ( memory && !firing ) { - fire(); - } - } - return this; - }, - - // Remove a callback from the list - remove: function() { - jQuery.each( arguments, function( _, arg ) { - var index; - while ( ( index = jQuery.inArray( arg, list, index ) ) > -1 ) { - list.splice( index, 1 ); - - // Handle firing indexes - if ( index <= firingIndex ) { - firingIndex--; - } - } - } ); - return this; - }, - - // Check if a given callback is in the list. - // If no argument is given, return whether or not list has callbacks attached. - has: function( fn ) { - return fn ? - jQuery.inArray( fn, list ) > -1 : - list.length > 0; - }, - - // Remove all callbacks from the list - empty: function() { - if ( list ) { - list = []; - } - return this; - }, - - // Disable .fire and .add - // Abort any current/pending executions - // Clear all callbacks and values - disable: function() { - locked = queue = []; - list = memory = ""; - return this; - }, - disabled: function() { - return !list; - }, - - // Disable .fire - // Also disable .add unless we have memory (since it would have no effect) - // Abort any pending executions - lock: function() { - locked = queue = []; - if ( !memory && !firing ) { - list = memory = ""; - } - return this; - }, - locked: function() { - return !!locked; - }, - - // Call all callbacks with the given context and arguments - fireWith: function( context, args ) { - if ( !locked ) { - args = args || []; - args = [ context, args.slice ? args.slice() : args ]; - queue.push( args ); - if ( !firing ) { - fire(); - } - } - return this; - }, - - // Call all the callbacks with the given arguments - fire: function() { - self.fireWith( this, arguments ); - return this; - }, - - // To know if the callbacks have already been called at least once - fired: function() { - return !!fired; - } - }; - - return self; -}; - - -function Identity( v ) { - return v; -} -function Thrower( ex ) { - throw ex; -} - -function adoptValue( value, resolve, reject, noValue ) { - var method; - - try { - - // Check for promise aspect first to privilege synchronous behavior - if ( value && isFunction( ( method = value.promise ) ) ) { - method.call( value ).done( resolve ).fail( reject ); - - // Other thenables - } else if ( value && isFunction( ( method = value.then ) ) ) { - method.call( value, resolve, reject ); - - // Other non-thenables - } else { - - // Control `resolve` arguments by letting Array#slice cast boolean `noValue` to integer: - // * false: [ value ].slice( 0 ) => resolve( value ) - // * true: [ value ].slice( 1 ) => resolve() - resolve.apply( undefined, [ value ].slice( noValue ) ); - } - - // For Promises/A+, convert exceptions into rejections - // Since jQuery.when doesn't unwrap thenables, we can skip the extra checks appearing in - // Deferred#then to conditionally suppress rejection. - } catch ( value ) { - - // Support: Android 4.0 only - // Strict mode functions invoked without .call/.apply get global-object context - reject.apply( undefined, [ value ] ); - } -} - -jQuery.extend( { - - Deferred: function( func ) { - var tuples = [ - - // action, add listener, callbacks, - // ... .then handlers, argument index, [final state] - [ "notify", "progress", jQuery.Callbacks( "memory" ), - jQuery.Callbacks( "memory" ), 2 ], - [ "resolve", "done", jQuery.Callbacks( "once memory" ), - jQuery.Callbacks( "once memory" ), 0, "resolved" ], - [ "reject", "fail", jQuery.Callbacks( "once memory" ), - jQuery.Callbacks( "once memory" ), 1, "rejected" ] - ], - state = "pending", - promise = { - state: function() { - return state; - }, - always: function() { - deferred.done( arguments ).fail( arguments ); - return this; - }, - "catch": function( fn ) { - return promise.then( null, fn ); - }, - - // Keep pipe for back-compat - pipe: function( /* fnDone, fnFail, fnProgress */ ) { - var fns = arguments; - - return jQuery.Deferred( function( newDefer ) { - jQuery.each( tuples, function( _i, tuple ) { - - // Map tuples (progress, done, fail) to arguments (done, fail, progress) - var fn = isFunction( fns[ tuple[ 4 ] ] ) && fns[ tuple[ 4 ] ]; - - // deferred.progress(function() { bind to newDefer or newDefer.notify }) - // deferred.done(function() { bind to newDefer or newDefer.resolve }) - // deferred.fail(function() { bind to newDefer or newDefer.reject }) - deferred[ tuple[ 1 ] ]( function() { - var returned = fn && fn.apply( this, arguments ); - if ( returned && isFunction( returned.promise ) ) { - returned.promise() - .progress( newDefer.notify ) - .done( newDefer.resolve ) - .fail( newDefer.reject ); - } else { - newDefer[ tuple[ 0 ] + "With" ]( - this, - fn ? [ returned ] : arguments - ); - } - } ); - } ); - fns = null; - } ).promise(); - }, - then: function( onFulfilled, onRejected, onProgress ) { - var maxDepth = 0; - function resolve( depth, deferred, handler, special ) { - return function() { - var that = this, - args = arguments, - mightThrow = function() { - var returned, then; - - // Support: Promises/A+ section 2.3.3.3.3 - // https://promisesaplus.com/#point-59 - // Ignore double-resolution attempts - if ( depth < maxDepth ) { - return; - } - - returned = handler.apply( that, args ); - - // Support: Promises/A+ section 2.3.1 - // https://promisesaplus.com/#point-48 - if ( returned === deferred.promise() ) { - throw new TypeError( "Thenable self-resolution" ); - } - - // Support: Promises/A+ sections 2.3.3.1, 3.5 - // https://promisesaplus.com/#point-54 - // https://promisesaplus.com/#point-75 - // Retrieve `then` only once - then = returned && - - // Support: Promises/A+ section 2.3.4 - // https://promisesaplus.com/#point-64 - // Only check objects and functions for thenability - ( typeof returned === "object" || - typeof returned === "function" ) && - returned.then; - - // Handle a returned thenable - if ( isFunction( then ) ) { - - // Special processors (notify) just wait for resolution - if ( special ) { - then.call( - returned, - resolve( maxDepth, deferred, Identity, special ), - resolve( maxDepth, deferred, Thrower, special ) - ); - - // Normal processors (resolve) also hook into progress - } else { - - // ...and disregard older resolution values - maxDepth++; - - then.call( - returned, - resolve( maxDepth, deferred, Identity, special ), - resolve( maxDepth, deferred, Thrower, special ), - resolve( maxDepth, deferred, Identity, - deferred.notifyWith ) - ); - } - - // Handle all other returned values - } else { - - // Only substitute handlers pass on context - // and multiple values (non-spec behavior) - if ( handler !== Identity ) { - that = undefined; - args = [ returned ]; - } - - // Process the value(s) - // Default process is resolve - ( special || deferred.resolveWith )( that, args ); - } - }, - - // Only normal processors (resolve) catch and reject exceptions - process = special ? - mightThrow : - function() { - try { - mightThrow(); - } catch ( e ) { - - if ( jQuery.Deferred.exceptionHook ) { - jQuery.Deferred.exceptionHook( e, - process.stackTrace ); - } - - // Support: Promises/A+ section 2.3.3.3.4.1 - // https://promisesaplus.com/#point-61 - // Ignore post-resolution exceptions - if ( depth + 1 >= maxDepth ) { - - // Only substitute handlers pass on context - // and multiple values (non-spec behavior) - if ( handler !== Thrower ) { - that = undefined; - args = [ e ]; - } - - deferred.rejectWith( that, args ); - } - } - }; - - // Support: Promises/A+ section 2.3.3.3.1 - // https://promisesaplus.com/#point-57 - // Re-resolve promises immediately to dodge false rejection from - // subsequent errors - if ( depth ) { - process(); - } else { - - // Call an optional hook to record the stack, in case of exception - // since it's otherwise lost when execution goes async - if ( jQuery.Deferred.getStackHook ) { - process.stackTrace = jQuery.Deferred.getStackHook(); - } - window.setTimeout( process ); - } - }; - } - - return jQuery.Deferred( function( newDefer ) { - - // progress_handlers.add( ... ) - tuples[ 0 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onProgress ) ? - onProgress : - Identity, - newDefer.notifyWith - ) - ); - - // fulfilled_handlers.add( ... ) - tuples[ 1 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onFulfilled ) ? - onFulfilled : - Identity - ) - ); - - // rejected_handlers.add( ... ) - tuples[ 2 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onRejected ) ? - onRejected : - Thrower - ) - ); - } ).promise(); - }, - - // Get a promise for this deferred - // If obj is provided, the promise aspect is added to the object - promise: function( obj ) { - return obj != null ? jQuery.extend( obj, promise ) : promise; - } - }, - deferred = {}; - - // Add list-specific methods - jQuery.each( tuples, function( i, tuple ) { - var list = tuple[ 2 ], - stateString = tuple[ 5 ]; - - // promise.progress = list.add - // promise.done = list.add - // promise.fail = list.add - promise[ tuple[ 1 ] ] = list.add; - - // Handle state - if ( stateString ) { - list.add( - function() { - - // state = "resolved" (i.e., fulfilled) - // state = "rejected" - state = stateString; - }, - - // rejected_callbacks.disable - // fulfilled_callbacks.disable - tuples[ 3 - i ][ 2 ].disable, - - // rejected_handlers.disable - // fulfilled_handlers.disable - tuples[ 3 - i ][ 3 ].disable, - - // progress_callbacks.lock - tuples[ 0 ][ 2 ].lock, - - // progress_handlers.lock - tuples[ 0 ][ 3 ].lock - ); - } - - // progress_handlers.fire - // fulfilled_handlers.fire - // rejected_handlers.fire - list.add( tuple[ 3 ].fire ); - - // deferred.notify = function() { deferred.notifyWith(...) } - // deferred.resolve = function() { deferred.resolveWith(...) } - // deferred.reject = function() { deferred.rejectWith(...) } - deferred[ tuple[ 0 ] ] = function() { - deferred[ tuple[ 0 ] + "With" ]( this === deferred ? undefined : this, arguments ); - return this; - }; - - // deferred.notifyWith = list.fireWith - // deferred.resolveWith = list.fireWith - // deferred.rejectWith = list.fireWith - deferred[ tuple[ 0 ] + "With" ] = list.fireWith; - } ); - - // Make the deferred a promise - promise.promise( deferred ); - - // Call given func if any - if ( func ) { - func.call( deferred, deferred ); - } - - // All done! - return deferred; - }, - - // Deferred helper - when: function( singleValue ) { - var - - // count of uncompleted subordinates - remaining = arguments.length, - - // count of unprocessed arguments - i = remaining, - - // subordinate fulfillment data - resolveContexts = Array( i ), - resolveValues = slice.call( arguments ), - - // the primary Deferred - primary = jQuery.Deferred(), - - // subordinate callback factory - updateFunc = function( i ) { - return function( value ) { - resolveContexts[ i ] = this; - resolveValues[ i ] = arguments.length > 1 ? slice.call( arguments ) : value; - if ( !( --remaining ) ) { - primary.resolveWith( resolveContexts, resolveValues ); - } - }; - }; - - // Single- and empty arguments are adopted like Promise.resolve - if ( remaining <= 1 ) { - adoptValue( singleValue, primary.done( updateFunc( i ) ).resolve, primary.reject, - !remaining ); - - // Use .then() to unwrap secondary thenables (cf. gh-3000) - if ( primary.state() === "pending" || - isFunction( resolveValues[ i ] && resolveValues[ i ].then ) ) { - - return primary.then(); - } - } - - // Multiple arguments are aggregated like Promise.all array elements - while ( i-- ) { - adoptValue( resolveValues[ i ], updateFunc( i ), primary.reject ); - } - - return primary.promise(); - } -} ); - - -// These usually indicate a programmer mistake during development, -// warn about them ASAP rather than swallowing them by default. -var rerrorNames = /^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/; - -jQuery.Deferred.exceptionHook = function( error, stack ) { - - // Support: IE 8 - 9 only - // Console exists when dev tools are open, which can happen at any time - if ( window.console && window.console.warn && error && rerrorNames.test( error.name ) ) { - window.console.warn( "jQuery.Deferred exception: " + error.message, error.stack, stack ); - } -}; - - - - -jQuery.readyException = function( error ) { - window.setTimeout( function() { - throw error; - } ); -}; - - - - -// The deferred used on DOM ready -var readyList = jQuery.Deferred(); - -jQuery.fn.ready = function( fn ) { - - readyList - .then( fn ) - - // Wrap jQuery.readyException in a function so that the lookup - // happens at the time of error handling instead of callback - // registration. - .catch( function( error ) { - jQuery.readyException( error ); - } ); - - return this; -}; - -jQuery.extend( { - - // Is the DOM ready to be used? Set to true once it occurs. - isReady: false, - - // A counter to track how many items to wait for before - // the ready event fires. See #6781 - readyWait: 1, - - // Handle when the DOM is ready - ready: function( wait ) { - - // Abort if there are pending holds or we're already ready - if ( wait === true ? --jQuery.readyWait : jQuery.isReady ) { - return; - } - - // Remember that the DOM is ready - jQuery.isReady = true; - - // If a normal DOM Ready event fired, decrement, and wait if need be - if ( wait !== true && --jQuery.readyWait > 0 ) { - return; - } - - // If there are functions bound, to execute - readyList.resolveWith( document, [ jQuery ] ); - } -} ); - -jQuery.ready.then = readyList.then; - -// The ready event handler and self cleanup method -function completed() { - document.removeEventListener( "DOMContentLoaded", completed ); - window.removeEventListener( "load", completed ); - jQuery.ready(); -} - -// Catch cases where $(document).ready() is called -// after the browser event has already occurred. -// Support: IE <=9 - 10 only -// Older IE sometimes signals "interactive" too soon -if ( document.readyState === "complete" || - ( document.readyState !== "loading" && !document.documentElement.doScroll ) ) { - - // Handle it asynchronously to allow scripts the opportunity to delay ready - window.setTimeout( jQuery.ready ); - -} else { - - // Use the handy event callback - document.addEventListener( "DOMContentLoaded", completed ); - - // A fallback to window.onload, that will always work - window.addEventListener( "load", completed ); -} - - - - -// Multifunctional method to get and set values of a collection -// The value/s can optionally be executed if it's a function -var access = function( elems, fn, key, value, chainable, emptyGet, raw ) { - var i = 0, - len = elems.length, - bulk = key == null; - - // Sets many values - if ( toType( key ) === "object" ) { - chainable = true; - for ( i in key ) { - access( elems, fn, i, key[ i ], true, emptyGet, raw ); - } - - // Sets one value - } else if ( value !== undefined ) { - chainable = true; - - if ( !isFunction( value ) ) { - raw = true; - } - - if ( bulk ) { - - // Bulk operations run against the entire set - if ( raw ) { - fn.call( elems, value ); - fn = null; - - // ...except when executing function values - } else { - bulk = fn; - fn = function( elem, _key, value ) { - return bulk.call( jQuery( elem ), value ); - }; - } - } - - if ( fn ) { - for ( ; i < len; i++ ) { - fn( - elems[ i ], key, raw ? - value : - value.call( elems[ i ], i, fn( elems[ i ], key ) ) - ); - } - } - } - - if ( chainable ) { - return elems; - } - - // Gets - if ( bulk ) { - return fn.call( elems ); - } - - return len ? fn( elems[ 0 ], key ) : emptyGet; -}; - - -// Matches dashed string for camelizing -var rmsPrefix = /^-ms-/, - rdashAlpha = /-([a-z])/g; - -// Used by camelCase as callback to replace() -function fcamelCase( _all, letter ) { - return letter.toUpperCase(); -} - -// Convert dashed to camelCase; used by the css and data modules -// Support: IE <=9 - 11, Edge 12 - 15 -// Microsoft forgot to hump their vendor prefix (#9572) -function camelCase( string ) { - return string.replace( rmsPrefix, "ms-" ).replace( rdashAlpha, fcamelCase ); -} -var acceptData = function( owner ) { - - // Accepts only: - // - Node - // - Node.ELEMENT_NODE - // - Node.DOCUMENT_NODE - // - Object - // - Any - return owner.nodeType === 1 || owner.nodeType === 9 || !( +owner.nodeType ); -}; - - - - -function Data() { - this.expando = jQuery.expando + Data.uid++; -} - -Data.uid = 1; - -Data.prototype = { - - cache: function( owner ) { - - // Check if the owner object already has a cache - var value = owner[ this.expando ]; - - // If not, create one - if ( !value ) { - value = {}; - - // We can accept data for non-element nodes in modern browsers, - // but we should not, see #8335. - // Always return an empty object. - if ( acceptData( owner ) ) { - - // If it is a node unlikely to be stringify-ed or looped over - // use plain assignment - if ( owner.nodeType ) { - owner[ this.expando ] = value; - - // Otherwise secure it in a non-enumerable property - // configurable must be true to allow the property to be - // deleted when data is removed - } else { - Object.defineProperty( owner, this.expando, { - value: value, - configurable: true - } ); - } - } - } - - return value; - }, - set: function( owner, data, value ) { - var prop, - cache = this.cache( owner ); - - // Handle: [ owner, key, value ] args - // Always use camelCase key (gh-2257) - if ( typeof data === "string" ) { - cache[ camelCase( data ) ] = value; - - // Handle: [ owner, { properties } ] args - } else { - - // Copy the properties one-by-one to the cache object - for ( prop in data ) { - cache[ camelCase( prop ) ] = data[ prop ]; - } - } - return cache; - }, - get: function( owner, key ) { - return key === undefined ? - this.cache( owner ) : - - // Always use camelCase key (gh-2257) - owner[ this.expando ] && owner[ this.expando ][ camelCase( key ) ]; - }, - access: function( owner, key, value ) { - - // In cases where either: - // - // 1. No key was specified - // 2. A string key was specified, but no value provided - // - // Take the "read" path and allow the get method to determine - // which value to return, respectively either: - // - // 1. The entire cache object - // 2. The data stored at the key - // - if ( key === undefined || - ( ( key && typeof key === "string" ) && value === undefined ) ) { - - return this.get( owner, key ); - } - - // When the key is not a string, or both a key and value - // are specified, set or extend (existing objects) with either: - // - // 1. An object of properties - // 2. A key and value - // - this.set( owner, key, value ); - - // Since the "set" path can have two possible entry points - // return the expected data based on which path was taken[*] - return value !== undefined ? value : key; - }, - remove: function( owner, key ) { - var i, - cache = owner[ this.expando ]; - - if ( cache === undefined ) { - return; - } - - if ( key !== undefined ) { - - // Support array or space separated string of keys - if ( Array.isArray( key ) ) { - - // If key is an array of keys... - // We always set camelCase keys, so remove that. - key = key.map( camelCase ); - } else { - key = camelCase( key ); - - // If a key with the spaces exists, use it. - // Otherwise, create an array by matching non-whitespace - key = key in cache ? - [ key ] : - ( key.match( rnothtmlwhite ) || [] ); - } - - i = key.length; - - while ( i-- ) { - delete cache[ key[ i ] ]; - } - } - - // Remove the expando if there's no more data - if ( key === undefined || jQuery.isEmptyObject( cache ) ) { - - // Support: Chrome <=35 - 45 - // Webkit & Blink performance suffers when deleting properties - // from DOM nodes, so set to undefined instead - // https://bugs.chromium.org/p/chromium/issues/detail?id=378607 (bug restricted) - if ( owner.nodeType ) { - owner[ this.expando ] = undefined; - } else { - delete owner[ this.expando ]; - } - } - }, - hasData: function( owner ) { - var cache = owner[ this.expando ]; - return cache !== undefined && !jQuery.isEmptyObject( cache ); - } -}; -var dataPriv = new Data(); - -var dataUser = new Data(); - - - -// Implementation Summary -// -// 1. Enforce API surface and semantic compatibility with 1.9.x branch -// 2. Improve the module's maintainability by reducing the storage -// paths to a single mechanism. -// 3. Use the same single mechanism to support "private" and "user" data. -// 4. _Never_ expose "private" data to user code (TODO: Drop _data, _removeData) -// 5. Avoid exposing implementation details on user objects (eg. expando properties) -// 6. Provide a clear path for implementation upgrade to WeakMap in 2014 - -var rbrace = /^(?:\{[\w\W]*\}|\[[\w\W]*\])$/, - rmultiDash = /[A-Z]/g; - -function getData( data ) { - if ( data === "true" ) { - return true; - } - - if ( data === "false" ) { - return false; - } - - if ( data === "null" ) { - return null; - } - - // Only convert to a number if it doesn't change the string - if ( data === +data + "" ) { - return +data; - } - - if ( rbrace.test( data ) ) { - return JSON.parse( data ); - } - - return data; -} - -function dataAttr( elem, key, data ) { - var name; - - // If nothing was found internally, try to fetch any - // data from the HTML5 data-* attribute - if ( data === undefined && elem.nodeType === 1 ) { - name = "data-" + key.replace( rmultiDash, "-$&" ).toLowerCase(); - data = elem.getAttribute( name ); - - if ( typeof data === "string" ) { - try { - data = getData( data ); - } catch ( e ) {} - - // Make sure we set the data so it isn't changed later - dataUser.set( elem, key, data ); - } else { - data = undefined; - } - } - return data; -} - -jQuery.extend( { - hasData: function( elem ) { - return dataUser.hasData( elem ) || dataPriv.hasData( elem ); - }, - - data: function( elem, name, data ) { - return dataUser.access( elem, name, data ); - }, - - removeData: function( elem, name ) { - dataUser.remove( elem, name ); - }, - - // TODO: Now that all calls to _data and _removeData have been replaced - // with direct calls to dataPriv methods, these can be deprecated. - _data: function( elem, name, data ) { - return dataPriv.access( elem, name, data ); - }, - - _removeData: function( elem, name ) { - dataPriv.remove( elem, name ); - } -} ); - -jQuery.fn.extend( { - data: function( key, value ) { - var i, name, data, - elem = this[ 0 ], - attrs = elem && elem.attributes; - - // Gets all values - if ( key === undefined ) { - if ( this.length ) { - data = dataUser.get( elem ); - - if ( elem.nodeType === 1 && !dataPriv.get( elem, "hasDataAttrs" ) ) { - i = attrs.length; - while ( i-- ) { - - // Support: IE 11 only - // The attrs elements can be null (#14894) - if ( attrs[ i ] ) { - name = attrs[ i ].name; - if ( name.indexOf( "data-" ) === 0 ) { - name = camelCase( name.slice( 5 ) ); - dataAttr( elem, name, data[ name ] ); - } - } - } - dataPriv.set( elem, "hasDataAttrs", true ); - } - } - - return data; - } - - // Sets multiple values - if ( typeof key === "object" ) { - return this.each( function() { - dataUser.set( this, key ); - } ); - } - - return access( this, function( value ) { - var data; - - // The calling jQuery object (element matches) is not empty - // (and therefore has an element appears at this[ 0 ]) and the - // `value` parameter was not undefined. An empty jQuery object - // will result in `undefined` for elem = this[ 0 ] which will - // throw an exception if an attempt to read a data cache is made. - if ( elem && value === undefined ) { - - // Attempt to get data from the cache - // The key will always be camelCased in Data - data = dataUser.get( elem, key ); - if ( data !== undefined ) { - return data; - } - - // Attempt to "discover" the data in - // HTML5 custom data-* attrs - data = dataAttr( elem, key ); - if ( data !== undefined ) { - return data; - } - - // We tried really hard, but the data doesn't exist. - return; - } - - // Set the data... - this.each( function() { - - // We always store the camelCased key - dataUser.set( this, key, value ); - } ); - }, null, value, arguments.length > 1, null, true ); - }, - - removeData: function( key ) { - return this.each( function() { - dataUser.remove( this, key ); - } ); - } -} ); - - -jQuery.extend( { - queue: function( elem, type, data ) { - var queue; - - if ( elem ) { - type = ( type || "fx" ) + "queue"; - queue = dataPriv.get( elem, type ); - - // Speed up dequeue by getting out quickly if this is just a lookup - if ( data ) { - if ( !queue || Array.isArray( data ) ) { - queue = dataPriv.access( elem, type, jQuery.makeArray( data ) ); - } else { - queue.push( data ); - } - } - return queue || []; - } - }, - - dequeue: function( elem, type ) { - type = type || "fx"; - - var queue = jQuery.queue( elem, type ), - startLength = queue.length, - fn = queue.shift(), - hooks = jQuery._queueHooks( elem, type ), - next = function() { - jQuery.dequeue( elem, type ); - }; - - // If the fx queue is dequeued, always remove the progress sentinel - if ( fn === "inprogress" ) { - fn = queue.shift(); - startLength--; - } - - if ( fn ) { - - // Add a progress sentinel to prevent the fx queue from being - // automatically dequeued - if ( type === "fx" ) { - queue.unshift( "inprogress" ); - } - - // Clear up the last queue stop function - delete hooks.stop; - fn.call( elem, next, hooks ); - } - - if ( !startLength && hooks ) { - hooks.empty.fire(); - } - }, - - // Not public - generate a queueHooks object, or return the current one - _queueHooks: function( elem, type ) { - var key = type + "queueHooks"; - return dataPriv.get( elem, key ) || dataPriv.access( elem, key, { - empty: jQuery.Callbacks( "once memory" ).add( function() { - dataPriv.remove( elem, [ type + "queue", key ] ); - } ) - } ); - } -} ); - -jQuery.fn.extend( { - queue: function( type, data ) { - var setter = 2; - - if ( typeof type !== "string" ) { - data = type; - type = "fx"; - setter--; - } - - if ( arguments.length < setter ) { - return jQuery.queue( this[ 0 ], type ); - } - - return data === undefined ? - this : - this.each( function() { - var queue = jQuery.queue( this, type, data ); - - // Ensure a hooks for this queue - jQuery._queueHooks( this, type ); - - if ( type === "fx" && queue[ 0 ] !== "inprogress" ) { - jQuery.dequeue( this, type ); - } - } ); - }, - dequeue: function( type ) { - return this.each( function() { - jQuery.dequeue( this, type ); - } ); - }, - clearQueue: function( type ) { - return this.queue( type || "fx", [] ); - }, - - // Get a promise resolved when queues of a certain type - // are emptied (fx is the type by default) - promise: function( type, obj ) { - var tmp, - count = 1, - defer = jQuery.Deferred(), - elements = this, - i = this.length, - resolve = function() { - if ( !( --count ) ) { - defer.resolveWith( elements, [ elements ] ); - } - }; - - if ( typeof type !== "string" ) { - obj = type; - type = undefined; - } - type = type || "fx"; - - while ( i-- ) { - tmp = dataPriv.get( elements[ i ], type + "queueHooks" ); - if ( tmp && tmp.empty ) { - count++; - tmp.empty.add( resolve ); - } - } - resolve(); - return defer.promise( obj ); - } -} ); -var pnum = ( /[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/ ).source; - -var rcssNum = new RegExp( "^(?:([+-])=|)(" + pnum + ")([a-z%]*)$", "i" ); - - -var cssExpand = [ "Top", "Right", "Bottom", "Left" ]; - -var documentElement = document.documentElement; - - - - var isAttached = function( elem ) { - return jQuery.contains( elem.ownerDocument, elem ); - }, - composed = { composed: true }; - - // Support: IE 9 - 11+, Edge 12 - 18+, iOS 10.0 - 10.2 only - // Check attachment across shadow DOM boundaries when possible (gh-3504) - // Support: iOS 10.0-10.2 only - // Early iOS 10 versions support `attachShadow` but not `getRootNode`, - // leading to errors. We need to check for `getRootNode`. - if ( documentElement.getRootNode ) { - isAttached = function( elem ) { - return jQuery.contains( elem.ownerDocument, elem ) || - elem.getRootNode( composed ) === elem.ownerDocument; - }; - } -var isHiddenWithinTree = function( elem, el ) { - - // isHiddenWithinTree might be called from jQuery#filter function; - // in that case, element will be second argument - elem = el || elem; - - // Inline style trumps all - return elem.style.display === "none" || - elem.style.display === "" && - - // Otherwise, check computed style - // Support: Firefox <=43 - 45 - // Disconnected elements can have computed display: none, so first confirm that elem is - // in the document. - isAttached( elem ) && - - jQuery.css( elem, "display" ) === "none"; - }; - - - -function adjustCSS( elem, prop, valueParts, tween ) { - var adjusted, scale, - maxIterations = 20, - currentValue = tween ? - function() { - return tween.cur(); - } : - function() { - return jQuery.css( elem, prop, "" ); - }, - initial = currentValue(), - unit = valueParts && valueParts[ 3 ] || ( jQuery.cssNumber[ prop ] ? "" : "px" ), - - // Starting value computation is required for potential unit mismatches - initialInUnit = elem.nodeType && - ( jQuery.cssNumber[ prop ] || unit !== "px" && +initial ) && - rcssNum.exec( jQuery.css( elem, prop ) ); - - if ( initialInUnit && initialInUnit[ 3 ] !== unit ) { - - // Support: Firefox <=54 - // Halve the iteration target value to prevent interference from CSS upper bounds (gh-2144) - initial = initial / 2; - - // Trust units reported by jQuery.css - unit = unit || initialInUnit[ 3 ]; - - // Iteratively approximate from a nonzero starting point - initialInUnit = +initial || 1; - - while ( maxIterations-- ) { - - // Evaluate and update our best guess (doubling guesses that zero out). - // Finish if the scale equals or crosses 1 (making the old*new product non-positive). - jQuery.style( elem, prop, initialInUnit + unit ); - if ( ( 1 - scale ) * ( 1 - ( scale = currentValue() / initial || 0.5 ) ) <= 0 ) { - maxIterations = 0; - } - initialInUnit = initialInUnit / scale; - - } - - initialInUnit = initialInUnit * 2; - jQuery.style( elem, prop, initialInUnit + unit ); - - // Make sure we update the tween properties later on - valueParts = valueParts || []; - } - - if ( valueParts ) { - initialInUnit = +initialInUnit || +initial || 0; - - // Apply relative offset (+=/-=) if specified - adjusted = valueParts[ 1 ] ? - initialInUnit + ( valueParts[ 1 ] + 1 ) * valueParts[ 2 ] : - +valueParts[ 2 ]; - if ( tween ) { - tween.unit = unit; - tween.start = initialInUnit; - tween.end = adjusted; - } - } - return adjusted; -} - - -var defaultDisplayMap = {}; - -function getDefaultDisplay( elem ) { - var temp, - doc = elem.ownerDocument, - nodeName = elem.nodeName, - display = defaultDisplayMap[ nodeName ]; - - if ( display ) { - return display; - } - - temp = doc.body.appendChild( doc.createElement( nodeName ) ); - display = jQuery.css( temp, "display" ); - - temp.parentNode.removeChild( temp ); - - if ( display === "none" ) { - display = "block"; - } - defaultDisplayMap[ nodeName ] = display; - - return display; -} - -function showHide( elements, show ) { - var display, elem, - values = [], - index = 0, - length = elements.length; - - // Determine new display value for elements that need to change - for ( ; index < length; index++ ) { - elem = elements[ index ]; - if ( !elem.style ) { - continue; - } - - display = elem.style.display; - if ( show ) { - - // Since we force visibility upon cascade-hidden elements, an immediate (and slow) - // check is required in this first loop unless we have a nonempty display value (either - // inline or about-to-be-restored) - if ( display === "none" ) { - values[ index ] = dataPriv.get( elem, "display" ) || null; - if ( !values[ index ] ) { - elem.style.display = ""; - } - } - if ( elem.style.display === "" && isHiddenWithinTree( elem ) ) { - values[ index ] = getDefaultDisplay( elem ); - } - } else { - if ( display !== "none" ) { - values[ index ] = "none"; - - // Remember what we're overwriting - dataPriv.set( elem, "display", display ); - } - } - } - - // Set the display of the elements in a second loop to avoid constant reflow - for ( index = 0; index < length; index++ ) { - if ( values[ index ] != null ) { - elements[ index ].style.display = values[ index ]; - } - } - - return elements; -} - -jQuery.fn.extend( { - show: function() { - return showHide( this, true ); - }, - hide: function() { - return showHide( this ); - }, - toggle: function( state ) { - if ( typeof state === "boolean" ) { - return state ? this.show() : this.hide(); - } - - return this.each( function() { - if ( isHiddenWithinTree( this ) ) { - jQuery( this ).show(); - } else { - jQuery( this ).hide(); - } - } ); - } -} ); -var rcheckableType = ( /^(?:checkbox|radio)$/i ); - -var rtagName = ( /<([a-z][^\/\0>\x20\t\r\n\f]*)/i ); - -var rscriptType = ( /^$|^module$|\/(?:java|ecma)script/i ); - - - -( function() { - var fragment = document.createDocumentFragment(), - div = fragment.appendChild( document.createElement( "div" ) ), - input = document.createElement( "input" ); - - // Support: Android 4.0 - 4.3 only - // Check state lost if the name is set (#11217) - // Support: Windows Web Apps (WWA) - // `name` and `type` must use .setAttribute for WWA (#14901) - input.setAttribute( "type", "radio" ); - input.setAttribute( "checked", "checked" ); - input.setAttribute( "name", "t" ); - - div.appendChild( input ); - - // Support: Android <=4.1 only - // Older WebKit doesn't clone checked state correctly in fragments - support.checkClone = div.cloneNode( true ).cloneNode( true ).lastChild.checked; - - // Support: IE <=11 only - // Make sure textarea (and checkbox) defaultValue is properly cloned - div.innerHTML = ""; - support.noCloneChecked = !!div.cloneNode( true ).lastChild.defaultValue; - - // Support: IE <=9 only - // IE <=9 replaces "; - support.option = !!div.lastChild; -} )(); - - -// We have to close these tags to support XHTML (#13200) -var wrapMap = { - - // XHTML parsers do not magically insert elements in the - // same way that tag soup parsers do. So we cannot shorten - // this by omitting or other required elements. - thead: [ 1, "", "
" ], - col: [ 2, "", "
" ], - tr: [ 2, "", "
" ], - td: [ 3, "", "
" ], - - _default: [ 0, "", "" ] -}; - -wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead; -wrapMap.th = wrapMap.td; - -// Support: IE <=9 only -if ( !support.option ) { - wrapMap.optgroup = wrapMap.option = [ 1, "" ]; -} - - -function getAll( context, tag ) { - - // Support: IE <=9 - 11 only - // Use typeof to avoid zero-argument method invocation on host objects (#15151) - var ret; - - if ( typeof context.getElementsByTagName !== "undefined" ) { - ret = context.getElementsByTagName( tag || "*" ); - - } else if ( typeof context.querySelectorAll !== "undefined" ) { - ret = context.querySelectorAll( tag || "*" ); - - } else { - ret = []; - } - - if ( tag === undefined || tag && nodeName( context, tag ) ) { - return jQuery.merge( [ context ], ret ); - } - - return ret; -} - - -// Mark scripts as having already been evaluated -function setGlobalEval( elems, refElements ) { - var i = 0, - l = elems.length; - - for ( ; i < l; i++ ) { - dataPriv.set( - elems[ i ], - "globalEval", - !refElements || dataPriv.get( refElements[ i ], "globalEval" ) - ); - } -} - - -var rhtml = /<|&#?\w+;/; - -function buildFragment( elems, context, scripts, selection, ignored ) { - var elem, tmp, tag, wrap, attached, j, - fragment = context.createDocumentFragment(), - nodes = [], - i = 0, - l = elems.length; - - for ( ; i < l; i++ ) { - elem = elems[ i ]; - - if ( elem || elem === 0 ) { - - // Add nodes directly - if ( toType( elem ) === "object" ) { - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( nodes, elem.nodeType ? [ elem ] : elem ); - - // Convert non-html into a text node - } else if ( !rhtml.test( elem ) ) { - nodes.push( context.createTextNode( elem ) ); - - // Convert html into DOM nodes - } else { - tmp = tmp || fragment.appendChild( context.createElement( "div" ) ); - - // Deserialize a standard representation - tag = ( rtagName.exec( elem ) || [ "", "" ] )[ 1 ].toLowerCase(); - wrap = wrapMap[ tag ] || wrapMap._default; - tmp.innerHTML = wrap[ 1 ] + jQuery.htmlPrefilter( elem ) + wrap[ 2 ]; - - // Descend through wrappers to the right content - j = wrap[ 0 ]; - while ( j-- ) { - tmp = tmp.lastChild; - } - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( nodes, tmp.childNodes ); - - // Remember the top-level container - tmp = fragment.firstChild; - - // Ensure the created nodes are orphaned (#12392) - tmp.textContent = ""; - } - } - } - - // Remove wrapper from fragment - fragment.textContent = ""; - - i = 0; - while ( ( elem = nodes[ i++ ] ) ) { - - // Skip elements already in the context collection (trac-4087) - if ( selection && jQuery.inArray( elem, selection ) > -1 ) { - if ( ignored ) { - ignored.push( elem ); - } - continue; - } - - attached = isAttached( elem ); - - // Append to fragment - tmp = getAll( fragment.appendChild( elem ), "script" ); - - // Preserve script evaluation history - if ( attached ) { - setGlobalEval( tmp ); - } - - // Capture executables - if ( scripts ) { - j = 0; - while ( ( elem = tmp[ j++ ] ) ) { - if ( rscriptType.test( elem.type || "" ) ) { - scripts.push( elem ); - } - } - } - } - - return fragment; -} - - -var rtypenamespace = /^([^.]*)(?:\.(.+)|)/; - -function returnTrue() { - return true; -} - -function returnFalse() { - return false; -} - -// Support: IE <=9 - 11+ -// focus() and blur() are asynchronous, except when they are no-op. -// So expect focus to be synchronous when the element is already active, -// and blur to be synchronous when the element is not already active. -// (focus and blur are always synchronous in other supported browsers, -// this just defines when we can count on it). -function expectSync( elem, type ) { - return ( elem === safeActiveElement() ) === ( type === "focus" ); -} - -// Support: IE <=9 only -// Accessing document.activeElement can throw unexpectedly -// https://bugs.jquery.com/ticket/13393 -function safeActiveElement() { - try { - return document.activeElement; - } catch ( err ) { } -} - -function on( elem, types, selector, data, fn, one ) { - var origFn, type; - - // Types can be a map of types/handlers - if ( typeof types === "object" ) { - - // ( types-Object, selector, data ) - if ( typeof selector !== "string" ) { - - // ( types-Object, data ) - data = data || selector; - selector = undefined; - } - for ( type in types ) { - on( elem, type, selector, data, types[ type ], one ); - } - return elem; - } - - if ( data == null && fn == null ) { - - // ( types, fn ) - fn = selector; - data = selector = undefined; - } else if ( fn == null ) { - if ( typeof selector === "string" ) { - - // ( types, selector, fn ) - fn = data; - data = undefined; - } else { - - // ( types, data, fn ) - fn = data; - data = selector; - selector = undefined; - } - } - if ( fn === false ) { - fn = returnFalse; - } else if ( !fn ) { - return elem; - } - - if ( one === 1 ) { - origFn = fn; - fn = function( event ) { - - // Can use an empty set, since event contains the info - jQuery().off( event ); - return origFn.apply( this, arguments ); - }; - - // Use same guid so caller can remove using origFn - fn.guid = origFn.guid || ( origFn.guid = jQuery.guid++ ); - } - return elem.each( function() { - jQuery.event.add( this, types, fn, data, selector ); - } ); -} - -/* - * Helper functions for managing events -- not part of the public interface. - * Props to Dean Edwards' addEvent library for many of the ideas. - */ -jQuery.event = { - - global: {}, - - add: function( elem, types, handler, data, selector ) { - - var handleObjIn, eventHandle, tmp, - events, t, handleObj, - special, handlers, type, namespaces, origType, - elemData = dataPriv.get( elem ); - - // Only attach events to objects that accept data - if ( !acceptData( elem ) ) { - return; - } - - // Caller can pass in an object of custom data in lieu of the handler - if ( handler.handler ) { - handleObjIn = handler; - handler = handleObjIn.handler; - selector = handleObjIn.selector; - } - - // Ensure that invalid selectors throw exceptions at attach time - // Evaluate against documentElement in case elem is a non-element node (e.g., document) - if ( selector ) { - jQuery.find.matchesSelector( documentElement, selector ); - } - - // Make sure that the handler has a unique ID, used to find/remove it later - if ( !handler.guid ) { - handler.guid = jQuery.guid++; - } - - // Init the element's event structure and main handler, if this is the first - if ( !( events = elemData.events ) ) { - events = elemData.events = Object.create( null ); - } - if ( !( eventHandle = elemData.handle ) ) { - eventHandle = elemData.handle = function( e ) { - - // Discard the second event of a jQuery.event.trigger() and - // when an event is called after a page has unloaded - return typeof jQuery !== "undefined" && jQuery.event.triggered !== e.type ? - jQuery.event.dispatch.apply( elem, arguments ) : undefined; - }; - } - - // Handle multiple events separated by a space - types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; - t = types.length; - while ( t-- ) { - tmp = rtypenamespace.exec( types[ t ] ) || []; - type = origType = tmp[ 1 ]; - namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); - - // There *must* be a type, no attaching namespace-only handlers - if ( !type ) { - continue; - } - - // If event changes its type, use the special event handlers for the changed type - special = jQuery.event.special[ type ] || {}; - - // If selector defined, determine special event api type, otherwise given type - type = ( selector ? special.delegateType : special.bindType ) || type; - - // Update special based on newly reset type - special = jQuery.event.special[ type ] || {}; - - // handleObj is passed to all event handlers - handleObj = jQuery.extend( { - type: type, - origType: origType, - data: data, - handler: handler, - guid: handler.guid, - selector: selector, - needsContext: selector && jQuery.expr.match.needsContext.test( selector ), - namespace: namespaces.join( "." ) - }, handleObjIn ); - - // Init the event handler queue if we're the first - if ( !( handlers = events[ type ] ) ) { - handlers = events[ type ] = []; - handlers.delegateCount = 0; - - // Only use addEventListener if the special events handler returns false - if ( !special.setup || - special.setup.call( elem, data, namespaces, eventHandle ) === false ) { - - if ( elem.addEventListener ) { - elem.addEventListener( type, eventHandle ); - } - } - } - - if ( special.add ) { - special.add.call( elem, handleObj ); - - if ( !handleObj.handler.guid ) { - handleObj.handler.guid = handler.guid; - } - } - - // Add to the element's handler list, delegates in front - if ( selector ) { - handlers.splice( handlers.delegateCount++, 0, handleObj ); - } else { - handlers.push( handleObj ); - } - - // Keep track of which events have ever been used, for event optimization - jQuery.event.global[ type ] = true; - } - - }, - - // Detach an event or set of events from an element - remove: function( elem, types, handler, selector, mappedTypes ) { - - var j, origCount, tmp, - events, t, handleObj, - special, handlers, type, namespaces, origType, - elemData = dataPriv.hasData( elem ) && dataPriv.get( elem ); - - if ( !elemData || !( events = elemData.events ) ) { - return; - } - - // Once for each type.namespace in types; type may be omitted - types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; - t = types.length; - while ( t-- ) { - tmp = rtypenamespace.exec( types[ t ] ) || []; - type = origType = tmp[ 1 ]; - namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); - - // Unbind all events (on this namespace, if provided) for the element - if ( !type ) { - for ( type in events ) { - jQuery.event.remove( elem, type + types[ t ], handler, selector, true ); - } - continue; - } - - special = jQuery.event.special[ type ] || {}; - type = ( selector ? special.delegateType : special.bindType ) || type; - handlers = events[ type ] || []; - tmp = tmp[ 2 ] && - new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ); - - // Remove matching events - origCount = j = handlers.length; - while ( j-- ) { - handleObj = handlers[ j ]; - - if ( ( mappedTypes || origType === handleObj.origType ) && - ( !handler || handler.guid === handleObj.guid ) && - ( !tmp || tmp.test( handleObj.namespace ) ) && - ( !selector || selector === handleObj.selector || - selector === "**" && handleObj.selector ) ) { - handlers.splice( j, 1 ); - - if ( handleObj.selector ) { - handlers.delegateCount--; - } - if ( special.remove ) { - special.remove.call( elem, handleObj ); - } - } - } - - // Remove generic event handler if we removed something and no more handlers exist - // (avoids potential for endless recursion during removal of special event handlers) - if ( origCount && !handlers.length ) { - if ( !special.teardown || - special.teardown.call( elem, namespaces, elemData.handle ) === false ) { - - jQuery.removeEvent( elem, type, elemData.handle ); - } - - delete events[ type ]; - } - } - - // Remove data and the expando if it's no longer used - if ( jQuery.isEmptyObject( events ) ) { - dataPriv.remove( elem, "handle events" ); - } - }, - - dispatch: function( nativeEvent ) { - - var i, j, ret, matched, handleObj, handlerQueue, - args = new Array( arguments.length ), - - // Make a writable jQuery.Event from the native event object - event = jQuery.event.fix( nativeEvent ), - - handlers = ( - dataPriv.get( this, "events" ) || Object.create( null ) - )[ event.type ] || [], - special = jQuery.event.special[ event.type ] || {}; - - // Use the fix-ed jQuery.Event rather than the (read-only) native event - args[ 0 ] = event; - - for ( i = 1; i < arguments.length; i++ ) { - args[ i ] = arguments[ i ]; - } - - event.delegateTarget = this; - - // Call the preDispatch hook for the mapped type, and let it bail if desired - if ( special.preDispatch && special.preDispatch.call( this, event ) === false ) { - return; - } - - // Determine handlers - handlerQueue = jQuery.event.handlers.call( this, event, handlers ); - - // Run delegates first; they may want to stop propagation beneath us - i = 0; - while ( ( matched = handlerQueue[ i++ ] ) && !event.isPropagationStopped() ) { - event.currentTarget = matched.elem; - - j = 0; - while ( ( handleObj = matched.handlers[ j++ ] ) && - !event.isImmediatePropagationStopped() ) { - - // If the event is namespaced, then each handler is only invoked if it is - // specially universal or its namespaces are a superset of the event's. - if ( !event.rnamespace || handleObj.namespace === false || - event.rnamespace.test( handleObj.namespace ) ) { - - event.handleObj = handleObj; - event.data = handleObj.data; - - ret = ( ( jQuery.event.special[ handleObj.origType ] || {} ).handle || - handleObj.handler ).apply( matched.elem, args ); - - if ( ret !== undefined ) { - if ( ( event.result = ret ) === false ) { - event.preventDefault(); - event.stopPropagation(); - } - } - } - } - } - - // Call the postDispatch hook for the mapped type - if ( special.postDispatch ) { - special.postDispatch.call( this, event ); - } - - return event.result; - }, - - handlers: function( event, handlers ) { - var i, handleObj, sel, matchedHandlers, matchedSelectors, - handlerQueue = [], - delegateCount = handlers.delegateCount, - cur = event.target; - - // Find delegate handlers - if ( delegateCount && - - // Support: IE <=9 - // Black-hole SVG instance trees (trac-13180) - cur.nodeType && - - // Support: Firefox <=42 - // Suppress spec-violating clicks indicating a non-primary pointer button (trac-3861) - // https://www.w3.org/TR/DOM-Level-3-Events/#event-type-click - // Support: IE 11 only - // ...but not arrow key "clicks" of radio inputs, which can have `button` -1 (gh-2343) - !( event.type === "click" && event.button >= 1 ) ) { - - for ( ; cur !== this; cur = cur.parentNode || this ) { - - // Don't check non-elements (#13208) - // Don't process clicks on disabled elements (#6911, #8165, #11382, #11764) - if ( cur.nodeType === 1 && !( event.type === "click" && cur.disabled === true ) ) { - matchedHandlers = []; - matchedSelectors = {}; - for ( i = 0; i < delegateCount; i++ ) { - handleObj = handlers[ i ]; - - // Don't conflict with Object.prototype properties (#13203) - sel = handleObj.selector + " "; - - if ( matchedSelectors[ sel ] === undefined ) { - matchedSelectors[ sel ] = handleObj.needsContext ? - jQuery( sel, this ).index( cur ) > -1 : - jQuery.find( sel, this, null, [ cur ] ).length; - } - if ( matchedSelectors[ sel ] ) { - matchedHandlers.push( handleObj ); - } - } - if ( matchedHandlers.length ) { - handlerQueue.push( { elem: cur, handlers: matchedHandlers } ); - } - } - } - } - - // Add the remaining (directly-bound) handlers - cur = this; - if ( delegateCount < handlers.length ) { - handlerQueue.push( { elem: cur, handlers: handlers.slice( delegateCount ) } ); - } - - return handlerQueue; - }, - - addProp: function( name, hook ) { - Object.defineProperty( jQuery.Event.prototype, name, { - enumerable: true, - configurable: true, - - get: isFunction( hook ) ? - function() { - if ( this.originalEvent ) { - return hook( this.originalEvent ); - } - } : - function() { - if ( this.originalEvent ) { - return this.originalEvent[ name ]; - } - }, - - set: function( value ) { - Object.defineProperty( this, name, { - enumerable: true, - configurable: true, - writable: true, - value: value - } ); - } - } ); - }, - - fix: function( originalEvent ) { - return originalEvent[ jQuery.expando ] ? - originalEvent : - new jQuery.Event( originalEvent ); - }, - - special: { - load: { - - // Prevent triggered image.load events from bubbling to window.load - noBubble: true - }, - click: { - - // Utilize native event to ensure correct state for checkable inputs - setup: function( data ) { - - // For mutual compressibility with _default, replace `this` access with a local var. - // `|| data` is dead code meant only to preserve the variable through minification. - var el = this || data; - - // Claim the first handler - if ( rcheckableType.test( el.type ) && - el.click && nodeName( el, "input" ) ) { - - // dataPriv.set( el, "click", ... ) - leverageNative( el, "click", returnTrue ); - } - - // Return false to allow normal processing in the caller - return false; - }, - trigger: function( data ) { - - // For mutual compressibility with _default, replace `this` access with a local var. - // `|| data` is dead code meant only to preserve the variable through minification. - var el = this || data; - - // Force setup before triggering a click - if ( rcheckableType.test( el.type ) && - el.click && nodeName( el, "input" ) ) { - - leverageNative( el, "click" ); - } - - // Return non-false to allow normal event-path propagation - return true; - }, - - // For cross-browser consistency, suppress native .click() on links - // Also prevent it if we're currently inside a leveraged native-event stack - _default: function( event ) { - var target = event.target; - return rcheckableType.test( target.type ) && - target.click && nodeName( target, "input" ) && - dataPriv.get( target, "click" ) || - nodeName( target, "a" ); - } - }, - - beforeunload: { - postDispatch: function( event ) { - - // Support: Firefox 20+ - // Firefox doesn't alert if the returnValue field is not set. - if ( event.result !== undefined && event.originalEvent ) { - event.originalEvent.returnValue = event.result; - } - } - } - } -}; - -// Ensure the presence of an event listener that handles manually-triggered -// synthetic events by interrupting progress until reinvoked in response to -// *native* events that it fires directly, ensuring that state changes have -// already occurred before other listeners are invoked. -function leverageNative( el, type, expectSync ) { - - // Missing expectSync indicates a trigger call, which must force setup through jQuery.event.add - if ( !expectSync ) { - if ( dataPriv.get( el, type ) === undefined ) { - jQuery.event.add( el, type, returnTrue ); - } - return; - } - - // Register the controller as a special universal handler for all event namespaces - dataPriv.set( el, type, false ); - jQuery.event.add( el, type, { - namespace: false, - handler: function( event ) { - var notAsync, result, - saved = dataPriv.get( this, type ); - - if ( ( event.isTrigger & 1 ) && this[ type ] ) { - - // Interrupt processing of the outer synthetic .trigger()ed event - // Saved data should be false in such cases, but might be a leftover capture object - // from an async native handler (gh-4350) - if ( !saved.length ) { - - // Store arguments for use when handling the inner native event - // There will always be at least one argument (an event object), so this array - // will not be confused with a leftover capture object. - saved = slice.call( arguments ); - dataPriv.set( this, type, saved ); - - // Trigger the native event and capture its result - // Support: IE <=9 - 11+ - // focus() and blur() are asynchronous - notAsync = expectSync( this, type ); - this[ type ](); - result = dataPriv.get( this, type ); - if ( saved !== result || notAsync ) { - dataPriv.set( this, type, false ); - } else { - result = {}; - } - if ( saved !== result ) { - - // Cancel the outer synthetic event - event.stopImmediatePropagation(); - event.preventDefault(); - - // Support: Chrome 86+ - // In Chrome, if an element having a focusout handler is blurred by - // clicking outside of it, it invokes the handler synchronously. If - // that handler calls `.remove()` on the element, the data is cleared, - // leaving `result` undefined. We need to guard against this. - return result && result.value; - } - - // If this is an inner synthetic event for an event with a bubbling surrogate - // (focus or blur), assume that the surrogate already propagated from triggering the - // native event and prevent that from happening again here. - // This technically gets the ordering wrong w.r.t. to `.trigger()` (in which the - // bubbling surrogate propagates *after* the non-bubbling base), but that seems - // less bad than duplication. - } else if ( ( jQuery.event.special[ type ] || {} ).delegateType ) { - event.stopPropagation(); - } - - // If this is a native event triggered above, everything is now in order - // Fire an inner synthetic event with the original arguments - } else if ( saved.length ) { - - // ...and capture the result - dataPriv.set( this, type, { - value: jQuery.event.trigger( - - // Support: IE <=9 - 11+ - // Extend with the prototype to reset the above stopImmediatePropagation() - jQuery.extend( saved[ 0 ], jQuery.Event.prototype ), - saved.slice( 1 ), - this - ) - } ); - - // Abort handling of the native event - event.stopImmediatePropagation(); - } - } - } ); -} - -jQuery.removeEvent = function( elem, type, handle ) { - - // This "if" is needed for plain objects - if ( elem.removeEventListener ) { - elem.removeEventListener( type, handle ); - } -}; - -jQuery.Event = function( src, props ) { - - // Allow instantiation without the 'new' keyword - if ( !( this instanceof jQuery.Event ) ) { - return new jQuery.Event( src, props ); - } - - // Event object - if ( src && src.type ) { - this.originalEvent = src; - this.type = src.type; - - // Events bubbling up the document may have been marked as prevented - // by a handler lower down the tree; reflect the correct value. - this.isDefaultPrevented = src.defaultPrevented || - src.defaultPrevented === undefined && - - // Support: Android <=2.3 only - src.returnValue === false ? - returnTrue : - returnFalse; - - // Create target properties - // Support: Safari <=6 - 7 only - // Target should not be a text node (#504, #13143) - this.target = ( src.target && src.target.nodeType === 3 ) ? - src.target.parentNode : - src.target; - - this.currentTarget = src.currentTarget; - this.relatedTarget = src.relatedTarget; - - // Event type - } else { - this.type = src; - } - - // Put explicitly provided properties onto the event object - if ( props ) { - jQuery.extend( this, props ); - } - - // Create a timestamp if incoming event doesn't have one - this.timeStamp = src && src.timeStamp || Date.now(); - - // Mark it as fixed - this[ jQuery.expando ] = true; -}; - -// jQuery.Event is based on DOM3 Events as specified by the ECMAScript Language Binding -// https://www.w3.org/TR/2003/WD-DOM-Level-3-Events-20030331/ecma-script-binding.html -jQuery.Event.prototype = { - constructor: jQuery.Event, - isDefaultPrevented: returnFalse, - isPropagationStopped: returnFalse, - isImmediatePropagationStopped: returnFalse, - isSimulated: false, - - preventDefault: function() { - var e = this.originalEvent; - - this.isDefaultPrevented = returnTrue; - - if ( e && !this.isSimulated ) { - e.preventDefault(); - } - }, - stopPropagation: function() { - var e = this.originalEvent; - - this.isPropagationStopped = returnTrue; - - if ( e && !this.isSimulated ) { - e.stopPropagation(); - } - }, - stopImmediatePropagation: function() { - var e = this.originalEvent; - - this.isImmediatePropagationStopped = returnTrue; - - if ( e && !this.isSimulated ) { - e.stopImmediatePropagation(); - } - - this.stopPropagation(); - } -}; - -// Includes all common event props including KeyEvent and MouseEvent specific props -jQuery.each( { - altKey: true, - bubbles: true, - cancelable: true, - changedTouches: true, - ctrlKey: true, - detail: true, - eventPhase: true, - metaKey: true, - pageX: true, - pageY: true, - shiftKey: true, - view: true, - "char": true, - code: true, - charCode: true, - key: true, - keyCode: true, - button: true, - buttons: true, - clientX: true, - clientY: true, - offsetX: true, - offsetY: true, - pointerId: true, - pointerType: true, - screenX: true, - screenY: true, - targetTouches: true, - toElement: true, - touches: true, - which: true -}, jQuery.event.addProp ); - -jQuery.each( { focus: "focusin", blur: "focusout" }, function( type, delegateType ) { - jQuery.event.special[ type ] = { - - // Utilize native event if possible so blur/focus sequence is correct - setup: function() { - - // Claim the first handler - // dataPriv.set( this, "focus", ... ) - // dataPriv.set( this, "blur", ... ) - leverageNative( this, type, expectSync ); - - // Return false to allow normal processing in the caller - return false; - }, - trigger: function() { - - // Force setup before trigger - leverageNative( this, type ); - - // Return non-false to allow normal event-path propagation - return true; - }, - - // Suppress native focus or blur as it's already being fired - // in leverageNative. - _default: function() { - return true; - }, - - delegateType: delegateType - }; -} ); - -// Create mouseenter/leave events using mouseover/out and event-time checks -// so that event delegation works in jQuery. -// Do the same for pointerenter/pointerleave and pointerover/pointerout -// -// Support: Safari 7 only -// Safari sends mouseenter too often; see: -// https://bugs.chromium.org/p/chromium/issues/detail?id=470258 -// for the description of the bug (it existed in older Chrome versions as well). -jQuery.each( { - mouseenter: "mouseover", - mouseleave: "mouseout", - pointerenter: "pointerover", - pointerleave: "pointerout" -}, function( orig, fix ) { - jQuery.event.special[ orig ] = { - delegateType: fix, - bindType: fix, - - handle: function( event ) { - var ret, - target = this, - related = event.relatedTarget, - handleObj = event.handleObj; - - // For mouseenter/leave call the handler if related is outside the target. - // NB: No relatedTarget if the mouse left/entered the browser window - if ( !related || ( related !== target && !jQuery.contains( target, related ) ) ) { - event.type = handleObj.origType; - ret = handleObj.handler.apply( this, arguments ); - event.type = fix; - } - return ret; - } - }; -} ); - -jQuery.fn.extend( { - - on: function( types, selector, data, fn ) { - return on( this, types, selector, data, fn ); - }, - one: function( types, selector, data, fn ) { - return on( this, types, selector, data, fn, 1 ); - }, - off: function( types, selector, fn ) { - var handleObj, type; - if ( types && types.preventDefault && types.handleObj ) { - - // ( event ) dispatched jQuery.Event - handleObj = types.handleObj; - jQuery( types.delegateTarget ).off( - handleObj.namespace ? - handleObj.origType + "." + handleObj.namespace : - handleObj.origType, - handleObj.selector, - handleObj.handler - ); - return this; - } - if ( typeof types === "object" ) { - - // ( types-object [, selector] ) - for ( type in types ) { - this.off( type, selector, types[ type ] ); - } - return this; - } - if ( selector === false || typeof selector === "function" ) { - - // ( types [, fn] ) - fn = selector; - selector = undefined; - } - if ( fn === false ) { - fn = returnFalse; - } - return this.each( function() { - jQuery.event.remove( this, types, fn, selector ); - } ); - } -} ); - - -var - - // Support: IE <=10 - 11, Edge 12 - 13 only - // In IE/Edge using regex groups here causes severe slowdowns. - // See https://connect.microsoft.com/IE/feedback/details/1736512/ - rnoInnerhtml = /\s*$/g; - -// Prefer a tbody over its parent table for containing new rows -function manipulationTarget( elem, content ) { - if ( nodeName( elem, "table" ) && - nodeName( content.nodeType !== 11 ? content : content.firstChild, "tr" ) ) { - - return jQuery( elem ).children( "tbody" )[ 0 ] || elem; - } - - return elem; -} - -// Replace/restore the type attribute of script elements for safe DOM manipulation -function disableScript( elem ) { - elem.type = ( elem.getAttribute( "type" ) !== null ) + "/" + elem.type; - return elem; -} -function restoreScript( elem ) { - if ( ( elem.type || "" ).slice( 0, 5 ) === "true/" ) { - elem.type = elem.type.slice( 5 ); - } else { - elem.removeAttribute( "type" ); - } - - return elem; -} - -function cloneCopyEvent( src, dest ) { - var i, l, type, pdataOld, udataOld, udataCur, events; - - if ( dest.nodeType !== 1 ) { - return; - } - - // 1. Copy private data: events, handlers, etc. - if ( dataPriv.hasData( src ) ) { - pdataOld = dataPriv.get( src ); - events = pdataOld.events; - - if ( events ) { - dataPriv.remove( dest, "handle events" ); - - for ( type in events ) { - for ( i = 0, l = events[ type ].length; i < l; i++ ) { - jQuery.event.add( dest, type, events[ type ][ i ] ); - } - } - } - } - - // 2. Copy user data - if ( dataUser.hasData( src ) ) { - udataOld = dataUser.access( src ); - udataCur = jQuery.extend( {}, udataOld ); - - dataUser.set( dest, udataCur ); - } -} - -// Fix IE bugs, see support tests -function fixInput( src, dest ) { - var nodeName = dest.nodeName.toLowerCase(); - - // Fails to persist the checked state of a cloned checkbox or radio button. - if ( nodeName === "input" && rcheckableType.test( src.type ) ) { - dest.checked = src.checked; - - // Fails to return the selected option to the default selected state when cloning options - } else if ( nodeName === "input" || nodeName === "textarea" ) { - dest.defaultValue = src.defaultValue; - } -} - -function domManip( collection, args, callback, ignored ) { - - // Flatten any nested arrays - args = flat( args ); - - var fragment, first, scripts, hasScripts, node, doc, - i = 0, - l = collection.length, - iNoClone = l - 1, - value = args[ 0 ], - valueIsFunction = isFunction( value ); - - // We can't cloneNode fragments that contain checked, in WebKit - if ( valueIsFunction || - ( l > 1 && typeof value === "string" && - !support.checkClone && rchecked.test( value ) ) ) { - return collection.each( function( index ) { - var self = collection.eq( index ); - if ( valueIsFunction ) { - args[ 0 ] = value.call( this, index, self.html() ); - } - domManip( self, args, callback, ignored ); - } ); - } - - if ( l ) { - fragment = buildFragment( args, collection[ 0 ].ownerDocument, false, collection, ignored ); - first = fragment.firstChild; - - if ( fragment.childNodes.length === 1 ) { - fragment = first; - } - - // Require either new content or an interest in ignored elements to invoke the callback - if ( first || ignored ) { - scripts = jQuery.map( getAll( fragment, "script" ), disableScript ); - hasScripts = scripts.length; - - // Use the original fragment for the last item - // instead of the first because it can end up - // being emptied incorrectly in certain situations (#8070). - for ( ; i < l; i++ ) { - node = fragment; - - if ( i !== iNoClone ) { - node = jQuery.clone( node, true, true ); - - // Keep references to cloned scripts for later restoration - if ( hasScripts ) { - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( scripts, getAll( node, "script" ) ); - } - } - - callback.call( collection[ i ], node, i ); - } - - if ( hasScripts ) { - doc = scripts[ scripts.length - 1 ].ownerDocument; - - // Reenable scripts - jQuery.map( scripts, restoreScript ); - - // Evaluate executable scripts on first document insertion - for ( i = 0; i < hasScripts; i++ ) { - node = scripts[ i ]; - if ( rscriptType.test( node.type || "" ) && - !dataPriv.access( node, "globalEval" ) && - jQuery.contains( doc, node ) ) { - - if ( node.src && ( node.type || "" ).toLowerCase() !== "module" ) { - - // Optional AJAX dependency, but won't run scripts if not present - if ( jQuery._evalUrl && !node.noModule ) { - jQuery._evalUrl( node.src, { - nonce: node.nonce || node.getAttribute( "nonce" ) - }, doc ); - } - } else { - DOMEval( node.textContent.replace( rcleanScript, "" ), node, doc ); - } - } - } - } - } - } - - return collection; -} - -function remove( elem, selector, keepData ) { - var node, - nodes = selector ? jQuery.filter( selector, elem ) : elem, - i = 0; - - for ( ; ( node = nodes[ i ] ) != null; i++ ) { - if ( !keepData && node.nodeType === 1 ) { - jQuery.cleanData( getAll( node ) ); - } - - if ( node.parentNode ) { - if ( keepData && isAttached( node ) ) { - setGlobalEval( getAll( node, "script" ) ); - } - node.parentNode.removeChild( node ); - } - } - - return elem; -} - -jQuery.extend( { - htmlPrefilter: function( html ) { - return html; - }, - - clone: function( elem, dataAndEvents, deepDataAndEvents ) { - var i, l, srcElements, destElements, - clone = elem.cloneNode( true ), - inPage = isAttached( elem ); - - // Fix IE cloning issues - if ( !support.noCloneChecked && ( elem.nodeType === 1 || elem.nodeType === 11 ) && - !jQuery.isXMLDoc( elem ) ) { - - // We eschew Sizzle here for performance reasons: https://jsperf.com/getall-vs-sizzle/2 - destElements = getAll( clone ); - srcElements = getAll( elem ); - - for ( i = 0, l = srcElements.length; i < l; i++ ) { - fixInput( srcElements[ i ], destElements[ i ] ); - } - } - - // Copy the events from the original to the clone - if ( dataAndEvents ) { - if ( deepDataAndEvents ) { - srcElements = srcElements || getAll( elem ); - destElements = destElements || getAll( clone ); - - for ( i = 0, l = srcElements.length; i < l; i++ ) { - cloneCopyEvent( srcElements[ i ], destElements[ i ] ); - } - } else { - cloneCopyEvent( elem, clone ); - } - } - - // Preserve script evaluation history - destElements = getAll( clone, "script" ); - if ( destElements.length > 0 ) { - setGlobalEval( destElements, !inPage && getAll( elem, "script" ) ); - } - - // Return the cloned set - return clone; - }, - - cleanData: function( elems ) { - var data, elem, type, - special = jQuery.event.special, - i = 0; - - for ( ; ( elem = elems[ i ] ) !== undefined; i++ ) { - if ( acceptData( elem ) ) { - if ( ( data = elem[ dataPriv.expando ] ) ) { - if ( data.events ) { - for ( type in data.events ) { - if ( special[ type ] ) { - jQuery.event.remove( elem, type ); - - // This is a shortcut to avoid jQuery.event.remove's overhead - } else { - jQuery.removeEvent( elem, type, data.handle ); - } - } - } - - // Support: Chrome <=35 - 45+ - // Assign undefined instead of using delete, see Data#remove - elem[ dataPriv.expando ] = undefined; - } - if ( elem[ dataUser.expando ] ) { - - // Support: Chrome <=35 - 45+ - // Assign undefined instead of using delete, see Data#remove - elem[ dataUser.expando ] = undefined; - } - } - } - } -} ); - -jQuery.fn.extend( { - detach: function( selector ) { - return remove( this, selector, true ); - }, - - remove: function( selector ) { - return remove( this, selector ); - }, - - text: function( value ) { - return access( this, function( value ) { - return value === undefined ? - jQuery.text( this ) : - this.empty().each( function() { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - this.textContent = value; - } - } ); - }, null, value, arguments.length ); - }, - - append: function() { - return domManip( this, arguments, function( elem ) { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - var target = manipulationTarget( this, elem ); - target.appendChild( elem ); - } - } ); - }, - - prepend: function() { - return domManip( this, arguments, function( elem ) { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - var target = manipulationTarget( this, elem ); - target.insertBefore( elem, target.firstChild ); - } - } ); - }, - - before: function() { - return domManip( this, arguments, function( elem ) { - if ( this.parentNode ) { - this.parentNode.insertBefore( elem, this ); - } - } ); - }, - - after: function() { - return domManip( this, arguments, function( elem ) { - if ( this.parentNode ) { - this.parentNode.insertBefore( elem, this.nextSibling ); - } - } ); - }, - - empty: function() { - var elem, - i = 0; - - for ( ; ( elem = this[ i ] ) != null; i++ ) { - if ( elem.nodeType === 1 ) { - - // Prevent memory leaks - jQuery.cleanData( getAll( elem, false ) ); - - // Remove any remaining nodes - elem.textContent = ""; - } - } - - return this; - }, - - clone: function( dataAndEvents, deepDataAndEvents ) { - dataAndEvents = dataAndEvents == null ? false : dataAndEvents; - deepDataAndEvents = deepDataAndEvents == null ? dataAndEvents : deepDataAndEvents; - - return this.map( function() { - return jQuery.clone( this, dataAndEvents, deepDataAndEvents ); - } ); - }, - - html: function( value ) { - return access( this, function( value ) { - var elem = this[ 0 ] || {}, - i = 0, - l = this.length; - - if ( value === undefined && elem.nodeType === 1 ) { - return elem.innerHTML; - } - - // See if we can take a shortcut and just use innerHTML - if ( typeof value === "string" && !rnoInnerhtml.test( value ) && - !wrapMap[ ( rtagName.exec( value ) || [ "", "" ] )[ 1 ].toLowerCase() ] ) { - - value = jQuery.htmlPrefilter( value ); - - try { - for ( ; i < l; i++ ) { - elem = this[ i ] || {}; - - // Remove element nodes and prevent memory leaks - if ( elem.nodeType === 1 ) { - jQuery.cleanData( getAll( elem, false ) ); - elem.innerHTML = value; - } - } - - elem = 0; - - // If using innerHTML throws an exception, use the fallback method - } catch ( e ) {} - } - - if ( elem ) { - this.empty().append( value ); - } - }, null, value, arguments.length ); - }, - - replaceWith: function() { - var ignored = []; - - // Make the changes, replacing each non-ignored context element with the new content - return domManip( this, arguments, function( elem ) { - var parent = this.parentNode; - - if ( jQuery.inArray( this, ignored ) < 0 ) { - jQuery.cleanData( getAll( this ) ); - if ( parent ) { - parent.replaceChild( elem, this ); - } - } - - // Force callback invocation - }, ignored ); - } -} ); - -jQuery.each( { - appendTo: "append", - prependTo: "prepend", - insertBefore: "before", - insertAfter: "after", - replaceAll: "replaceWith" -}, function( name, original ) { - jQuery.fn[ name ] = function( selector ) { - var elems, - ret = [], - insert = jQuery( selector ), - last = insert.length - 1, - i = 0; - - for ( ; i <= last; i++ ) { - elems = i === last ? this : this.clone( true ); - jQuery( insert[ i ] )[ original ]( elems ); - - // Support: Android <=4.0 only, PhantomJS 1 only - // .get() because push.apply(_, arraylike) throws on ancient WebKit - push.apply( ret, elems.get() ); - } - - return this.pushStack( ret ); - }; -} ); -var rnumnonpx = new RegExp( "^(" + pnum + ")(?!px)[a-z%]+$", "i" ); - -var getStyles = function( elem ) { - - // Support: IE <=11 only, Firefox <=30 (#15098, #14150) - // IE throws on elements created in popups - // FF meanwhile throws on frame elements through "defaultView.getComputedStyle" - var view = elem.ownerDocument.defaultView; - - if ( !view || !view.opener ) { - view = window; - } - - return view.getComputedStyle( elem ); - }; - -var swap = function( elem, options, callback ) { - var ret, name, - old = {}; - - // Remember the old values, and insert the new ones - for ( name in options ) { - old[ name ] = elem.style[ name ]; - elem.style[ name ] = options[ name ]; - } - - ret = callback.call( elem ); - - // Revert the old values - for ( name in options ) { - elem.style[ name ] = old[ name ]; - } - - return ret; -}; - - -var rboxStyle = new RegExp( cssExpand.join( "|" ), "i" ); - - - -( function() { - - // Executing both pixelPosition & boxSizingReliable tests require only one layout - // so they're executed at the same time to save the second computation. - function computeStyleTests() { - - // This is a singleton, we need to execute it only once - if ( !div ) { - return; - } - - container.style.cssText = "position:absolute;left:-11111px;width:60px;" + - "margin-top:1px;padding:0;border:0"; - div.style.cssText = - "position:relative;display:block;box-sizing:border-box;overflow:scroll;" + - "margin:auto;border:1px;padding:1px;" + - "width:60%;top:1%"; - documentElement.appendChild( container ).appendChild( div ); - - var divStyle = window.getComputedStyle( div ); - pixelPositionVal = divStyle.top !== "1%"; - - // Support: Android 4.0 - 4.3 only, Firefox <=3 - 44 - reliableMarginLeftVal = roundPixelMeasures( divStyle.marginLeft ) === 12; - - // Support: Android 4.0 - 4.3 only, Safari <=9.1 - 10.1, iOS <=7.0 - 9.3 - // Some styles come back with percentage values, even though they shouldn't - div.style.right = "60%"; - pixelBoxStylesVal = roundPixelMeasures( divStyle.right ) === 36; - - // Support: IE 9 - 11 only - // Detect misreporting of content dimensions for box-sizing:border-box elements - boxSizingReliableVal = roundPixelMeasures( divStyle.width ) === 36; - - // Support: IE 9 only - // Detect overflow:scroll screwiness (gh-3699) - // Support: Chrome <=64 - // Don't get tricked when zoom affects offsetWidth (gh-4029) - div.style.position = "absolute"; - scrollboxSizeVal = roundPixelMeasures( div.offsetWidth / 3 ) === 12; - - documentElement.removeChild( container ); - - // Nullify the div so it wouldn't be stored in the memory and - // it will also be a sign that checks already performed - div = null; - } - - function roundPixelMeasures( measure ) { - return Math.round( parseFloat( measure ) ); - } - - var pixelPositionVal, boxSizingReliableVal, scrollboxSizeVal, pixelBoxStylesVal, - reliableTrDimensionsVal, reliableMarginLeftVal, - container = document.createElement( "div" ), - div = document.createElement( "div" ); - - // Finish early in limited (non-browser) environments - if ( !div.style ) { - return; - } - - // Support: IE <=9 - 11 only - // Style of cloned element affects source element cloned (#8908) - div.style.backgroundClip = "content-box"; - div.cloneNode( true ).style.backgroundClip = ""; - support.clearCloneStyle = div.style.backgroundClip === "content-box"; - - jQuery.extend( support, { - boxSizingReliable: function() { - computeStyleTests(); - return boxSizingReliableVal; - }, - pixelBoxStyles: function() { - computeStyleTests(); - return pixelBoxStylesVal; - }, - pixelPosition: function() { - computeStyleTests(); - return pixelPositionVal; - }, - reliableMarginLeft: function() { - computeStyleTests(); - return reliableMarginLeftVal; - }, - scrollboxSize: function() { - computeStyleTests(); - return scrollboxSizeVal; - }, - - // Support: IE 9 - 11+, Edge 15 - 18+ - // IE/Edge misreport `getComputedStyle` of table rows with width/height - // set in CSS while `offset*` properties report correct values. - // Behavior in IE 9 is more subtle than in newer versions & it passes - // some versions of this test; make sure not to make it pass there! - // - // Support: Firefox 70+ - // Only Firefox includes border widths - // in computed dimensions. (gh-4529) - reliableTrDimensions: function() { - var table, tr, trChild, trStyle; - if ( reliableTrDimensionsVal == null ) { - table = document.createElement( "table" ); - tr = document.createElement( "tr" ); - trChild = document.createElement( "div" ); - - table.style.cssText = "position:absolute;left:-11111px;border-collapse:separate"; - tr.style.cssText = "border:1px solid"; - - // Support: Chrome 86+ - // Height set through cssText does not get applied. - // Computed height then comes back as 0. - tr.style.height = "1px"; - trChild.style.height = "9px"; - - // Support: Android 8 Chrome 86+ - // In our bodyBackground.html iframe, - // display for all div elements is set to "inline", - // which causes a problem only in Android 8 Chrome 86. - // Ensuring the div is display: block - // gets around this issue. - trChild.style.display = "block"; - - documentElement - .appendChild( table ) - .appendChild( tr ) - .appendChild( trChild ); - - trStyle = window.getComputedStyle( tr ); - reliableTrDimensionsVal = ( parseInt( trStyle.height, 10 ) + - parseInt( trStyle.borderTopWidth, 10 ) + - parseInt( trStyle.borderBottomWidth, 10 ) ) === tr.offsetHeight; - - documentElement.removeChild( table ); - } - return reliableTrDimensionsVal; - } - } ); -} )(); - - -function curCSS( elem, name, computed ) { - var width, minWidth, maxWidth, ret, - - // Support: Firefox 51+ - // Retrieving style before computed somehow - // fixes an issue with getting wrong values - // on detached elements - style = elem.style; - - computed = computed || getStyles( elem ); - - // getPropertyValue is needed for: - // .css('filter') (IE 9 only, #12537) - // .css('--customProperty) (#3144) - if ( computed ) { - ret = computed.getPropertyValue( name ) || computed[ name ]; - - if ( ret === "" && !isAttached( elem ) ) { - ret = jQuery.style( elem, name ); - } - - // A tribute to the "awesome hack by Dean Edwards" - // Android Browser returns percentage for some values, - // but width seems to be reliably pixels. - // This is against the CSSOM draft spec: - // https://drafts.csswg.org/cssom/#resolved-values - if ( !support.pixelBoxStyles() && rnumnonpx.test( ret ) && rboxStyle.test( name ) ) { - - // Remember the original values - width = style.width; - minWidth = style.minWidth; - maxWidth = style.maxWidth; - - // Put in the new values to get a computed value out - style.minWidth = style.maxWidth = style.width = ret; - ret = computed.width; - - // Revert the changed values - style.width = width; - style.minWidth = minWidth; - style.maxWidth = maxWidth; - } - } - - return ret !== undefined ? - - // Support: IE <=9 - 11 only - // IE returns zIndex value as an integer. - ret + "" : - ret; -} - - -function addGetHookIf( conditionFn, hookFn ) { - - // Define the hook, we'll check on the first run if it's really needed. - return { - get: function() { - if ( conditionFn() ) { - - // Hook not needed (or it's not possible to use it due - // to missing dependency), remove it. - delete this.get; - return; - } - - // Hook needed; redefine it so that the support test is not executed again. - return ( this.get = hookFn ).apply( this, arguments ); - } - }; -} - - -var cssPrefixes = [ "Webkit", "Moz", "ms" ], - emptyStyle = document.createElement( "div" ).style, - vendorProps = {}; - -// Return a vendor-prefixed property or undefined -function vendorPropName( name ) { - - // Check for vendor prefixed names - var capName = name[ 0 ].toUpperCase() + name.slice( 1 ), - i = cssPrefixes.length; - - while ( i-- ) { - name = cssPrefixes[ i ] + capName; - if ( name in emptyStyle ) { - return name; - } - } -} - -// Return a potentially-mapped jQuery.cssProps or vendor prefixed property -function finalPropName( name ) { - var final = jQuery.cssProps[ name ] || vendorProps[ name ]; - - if ( final ) { - return final; - } - if ( name in emptyStyle ) { - return name; - } - return vendorProps[ name ] = vendorPropName( name ) || name; -} - - -var - - // Swappable if display is none or starts with table - // except "table", "table-cell", or "table-caption" - // See here for display values: https://developer.mozilla.org/en-US/docs/CSS/display - rdisplayswap = /^(none|table(?!-c[ea]).+)/, - rcustomProp = /^--/, - cssShow = { position: "absolute", visibility: "hidden", display: "block" }, - cssNormalTransform = { - letterSpacing: "0", - fontWeight: "400" - }; - -function setPositiveNumber( _elem, value, subtract ) { - - // Any relative (+/-) values have already been - // normalized at this point - var matches = rcssNum.exec( value ); - return matches ? - - // Guard against undefined "subtract", e.g., when used as in cssHooks - Math.max( 0, matches[ 2 ] - ( subtract || 0 ) ) + ( matches[ 3 ] || "px" ) : - value; -} - -function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, computedVal ) { - var i = dimension === "width" ? 1 : 0, - extra = 0, - delta = 0; - - // Adjustment may not be necessary - if ( box === ( isBorderBox ? "border" : "content" ) ) { - return 0; - } - - for ( ; i < 4; i += 2 ) { - - // Both box models exclude margin - if ( box === "margin" ) { - delta += jQuery.css( elem, box + cssExpand[ i ], true, styles ); - } - - // If we get here with a content-box, we're seeking "padding" or "border" or "margin" - if ( !isBorderBox ) { - - // Add padding - delta += jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); - - // For "border" or "margin", add border - if ( box !== "padding" ) { - delta += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - - // But still keep track of it otherwise - } else { - extra += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - } - - // If we get here with a border-box (content + padding + border), we're seeking "content" or - // "padding" or "margin" - } else { - - // For "content", subtract padding - if ( box === "content" ) { - delta -= jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); - } - - // For "content" or "padding", subtract border - if ( box !== "margin" ) { - delta -= jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - } - } - } - - // Account for positive content-box scroll gutter when requested by providing computedVal - if ( !isBorderBox && computedVal >= 0 ) { - - // offsetWidth/offsetHeight is a rounded sum of content, padding, scroll gutter, and border - // Assuming integer scroll gutter, subtract the rest and round down - delta += Math.max( 0, Math.ceil( - elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - - computedVal - - delta - - extra - - 0.5 - - // If offsetWidth/offsetHeight is unknown, then we can't determine content-box scroll gutter - // Use an explicit zero to avoid NaN (gh-3964) - ) ) || 0; - } - - return delta; -} - -function getWidthOrHeight( elem, dimension, extra ) { - - // Start with computed style - var styles = getStyles( elem ), - - // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-4322). - // Fake content-box until we know it's needed to know the true value. - boxSizingNeeded = !support.boxSizingReliable() || extra, - isBorderBox = boxSizingNeeded && - jQuery.css( elem, "boxSizing", false, styles ) === "border-box", - valueIsBorderBox = isBorderBox, - - val = curCSS( elem, dimension, styles ), - offsetProp = "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ); - - // Support: Firefox <=54 - // Return a confounding non-pixel value or feign ignorance, as appropriate. - if ( rnumnonpx.test( val ) ) { - if ( !extra ) { - return val; - } - val = "auto"; - } - - - // Support: IE 9 - 11 only - // Use offsetWidth/offsetHeight for when box sizing is unreliable. - // In those cases, the computed value can be trusted to be border-box. - if ( ( !support.boxSizingReliable() && isBorderBox || - - // Support: IE 10 - 11+, Edge 15 - 18+ - // IE/Edge misreport `getComputedStyle` of table rows with width/height - // set in CSS while `offset*` properties report correct values. - // Interestingly, in some cases IE 9 doesn't suffer from this issue. - !support.reliableTrDimensions() && nodeName( elem, "tr" ) || - - // Fall back to offsetWidth/offsetHeight when value is "auto" - // This happens for inline elements with no explicit setting (gh-3571) - val === "auto" || - - // Support: Android <=4.1 - 4.3 only - // Also use offsetWidth/offsetHeight for misreported inline dimensions (gh-3602) - !parseFloat( val ) && jQuery.css( elem, "display", false, styles ) === "inline" ) && - - // Make sure the element is visible & connected - elem.getClientRects().length ) { - - isBorderBox = jQuery.css( elem, "boxSizing", false, styles ) === "border-box"; - - // Where available, offsetWidth/offsetHeight approximate border box dimensions. - // Where not available (e.g., SVG), assume unreliable box-sizing and interpret the - // retrieved value as a content box dimension. - valueIsBorderBox = offsetProp in elem; - if ( valueIsBorderBox ) { - val = elem[ offsetProp ]; - } - } - - // Normalize "" and auto - val = parseFloat( val ) || 0; - - // Adjust for the element's box model - return ( val + - boxModelAdjustment( - elem, - dimension, - extra || ( isBorderBox ? "border" : "content" ), - valueIsBorderBox, - styles, - - // Provide the current computed size to request scroll gutter calculation (gh-3589) - val - ) - ) + "px"; -} - -jQuery.extend( { - - // Add in style property hooks for overriding the default - // behavior of getting and setting a style property - cssHooks: { - opacity: { - get: function( elem, computed ) { - if ( computed ) { - - // We should always get a number back from opacity - var ret = curCSS( elem, "opacity" ); - return ret === "" ? "1" : ret; - } - } - } - }, - - // Don't automatically add "px" to these possibly-unitless properties - cssNumber: { - "animationIterationCount": true, - "columnCount": true, - "fillOpacity": true, - "flexGrow": true, - "flexShrink": true, - "fontWeight": true, - "gridArea": true, - "gridColumn": true, - "gridColumnEnd": true, - "gridColumnStart": true, - "gridRow": true, - "gridRowEnd": true, - "gridRowStart": true, - "lineHeight": true, - "opacity": true, - "order": true, - "orphans": true, - "widows": true, - "zIndex": true, - "zoom": true - }, - - // Add in properties whose names you wish to fix before - // setting or getting the value - cssProps: {}, - - // Get and set the style property on a DOM Node - style: function( elem, name, value, extra ) { - - // Don't set styles on text and comment nodes - if ( !elem || elem.nodeType === 3 || elem.nodeType === 8 || !elem.style ) { - return; - } - - // Make sure that we're working with the right name - var ret, type, hooks, - origName = camelCase( name ), - isCustomProp = rcustomProp.test( name ), - style = elem.style; - - // Make sure that we're working with the right name. We don't - // want to query the value if it is a CSS custom property - // since they are user-defined. - if ( !isCustomProp ) { - name = finalPropName( origName ); - } - - // Gets hook for the prefixed version, then unprefixed version - hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; - - // Check if we're setting a value - if ( value !== undefined ) { - type = typeof value; - - // Convert "+=" or "-=" to relative numbers (#7345) - if ( type === "string" && ( ret = rcssNum.exec( value ) ) && ret[ 1 ] ) { - value = adjustCSS( elem, name, ret ); - - // Fixes bug #9237 - type = "number"; - } - - // Make sure that null and NaN values aren't set (#7116) - if ( value == null || value !== value ) { - return; - } - - // If a number was passed in, add the unit (except for certain CSS properties) - // The isCustomProp check can be removed in jQuery 4.0 when we only auto-append - // "px" to a few hardcoded values. - if ( type === "number" && !isCustomProp ) { - value += ret && ret[ 3 ] || ( jQuery.cssNumber[ origName ] ? "" : "px" ); - } - - // background-* props affect original clone's values - if ( !support.clearCloneStyle && value === "" && name.indexOf( "background" ) === 0 ) { - style[ name ] = "inherit"; - } - - // If a hook was provided, use that value, otherwise just set the specified value - if ( !hooks || !( "set" in hooks ) || - ( value = hooks.set( elem, value, extra ) ) !== undefined ) { - - if ( isCustomProp ) { - style.setProperty( name, value ); - } else { - style[ name ] = value; - } - } - - } else { - - // If a hook was provided get the non-computed value from there - if ( hooks && "get" in hooks && - ( ret = hooks.get( elem, false, extra ) ) !== undefined ) { - - return ret; - } - - // Otherwise just get the value from the style object - return style[ name ]; - } - }, - - css: function( elem, name, extra, styles ) { - var val, num, hooks, - origName = camelCase( name ), - isCustomProp = rcustomProp.test( name ); - - // Make sure that we're working with the right name. We don't - // want to modify the value if it is a CSS custom property - // since they are user-defined. - if ( !isCustomProp ) { - name = finalPropName( origName ); - } - - // Try prefixed name followed by the unprefixed name - hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; - - // If a hook was provided get the computed value from there - if ( hooks && "get" in hooks ) { - val = hooks.get( elem, true, extra ); - } - - // Otherwise, if a way to get the computed value exists, use that - if ( val === undefined ) { - val = curCSS( elem, name, styles ); - } - - // Convert "normal" to computed value - if ( val === "normal" && name in cssNormalTransform ) { - val = cssNormalTransform[ name ]; - } - - // Make numeric if forced or a qualifier was provided and val looks numeric - if ( extra === "" || extra ) { - num = parseFloat( val ); - return extra === true || isFinite( num ) ? num || 0 : val; - } - - return val; - } -} ); - -jQuery.each( [ "height", "width" ], function( _i, dimension ) { - jQuery.cssHooks[ dimension ] = { - get: function( elem, computed, extra ) { - if ( computed ) { - - // Certain elements can have dimension info if we invisibly show them - // but it must have a current display style that would benefit - return rdisplayswap.test( jQuery.css( elem, "display" ) ) && - - // Support: Safari 8+ - // Table columns in Safari have non-zero offsetWidth & zero - // getBoundingClientRect().width unless display is changed. - // Support: IE <=11 only - // Running getBoundingClientRect on a disconnected node - // in IE throws an error. - ( !elem.getClientRects().length || !elem.getBoundingClientRect().width ) ? - swap( elem, cssShow, function() { - return getWidthOrHeight( elem, dimension, extra ); - } ) : - getWidthOrHeight( elem, dimension, extra ); - } - }, - - set: function( elem, value, extra ) { - var matches, - styles = getStyles( elem ), - - // Only read styles.position if the test has a chance to fail - // to avoid forcing a reflow. - scrollboxSizeBuggy = !support.scrollboxSize() && - styles.position === "absolute", - - // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-3991) - boxSizingNeeded = scrollboxSizeBuggy || extra, - isBorderBox = boxSizingNeeded && - jQuery.css( elem, "boxSizing", false, styles ) === "border-box", - subtract = extra ? - boxModelAdjustment( - elem, - dimension, - extra, - isBorderBox, - styles - ) : - 0; - - // Account for unreliable border-box dimensions by comparing offset* to computed and - // faking a content-box to get border and padding (gh-3699) - if ( isBorderBox && scrollboxSizeBuggy ) { - subtract -= Math.ceil( - elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - - parseFloat( styles[ dimension ] ) - - boxModelAdjustment( elem, dimension, "border", false, styles ) - - 0.5 - ); - } - - // Convert to pixels if value adjustment is needed - if ( subtract && ( matches = rcssNum.exec( value ) ) && - ( matches[ 3 ] || "px" ) !== "px" ) { - - elem.style[ dimension ] = value; - value = jQuery.css( elem, dimension ); - } - - return setPositiveNumber( elem, value, subtract ); - } - }; -} ); - -jQuery.cssHooks.marginLeft = addGetHookIf( support.reliableMarginLeft, - function( elem, computed ) { - if ( computed ) { - return ( parseFloat( curCSS( elem, "marginLeft" ) ) || - elem.getBoundingClientRect().left - - swap( elem, { marginLeft: 0 }, function() { - return elem.getBoundingClientRect().left; - } ) - ) + "px"; - } - } -); - -// These hooks are used by animate to expand properties -jQuery.each( { - margin: "", - padding: "", - border: "Width" -}, function( prefix, suffix ) { - jQuery.cssHooks[ prefix + suffix ] = { - expand: function( value ) { - var i = 0, - expanded = {}, - - // Assumes a single number if not a string - parts = typeof value === "string" ? value.split( " " ) : [ value ]; - - for ( ; i < 4; i++ ) { - expanded[ prefix + cssExpand[ i ] + suffix ] = - parts[ i ] || parts[ i - 2 ] || parts[ 0 ]; - } - - return expanded; - } - }; - - if ( prefix !== "margin" ) { - jQuery.cssHooks[ prefix + suffix ].set = setPositiveNumber; - } -} ); - -jQuery.fn.extend( { - css: function( name, value ) { - return access( this, function( elem, name, value ) { - var styles, len, - map = {}, - i = 0; - - if ( Array.isArray( name ) ) { - styles = getStyles( elem ); - len = name.length; - - for ( ; i < len; i++ ) { - map[ name[ i ] ] = jQuery.css( elem, name[ i ], false, styles ); - } - - return map; - } - - return value !== undefined ? - jQuery.style( elem, name, value ) : - jQuery.css( elem, name ); - }, name, value, arguments.length > 1 ); - } -} ); - - -function Tween( elem, options, prop, end, easing ) { - return new Tween.prototype.init( elem, options, prop, end, easing ); -} -jQuery.Tween = Tween; - -Tween.prototype = { - constructor: Tween, - init: function( elem, options, prop, end, easing, unit ) { - this.elem = elem; - this.prop = prop; - this.easing = easing || jQuery.easing._default; - this.options = options; - this.start = this.now = this.cur(); - this.end = end; - this.unit = unit || ( jQuery.cssNumber[ prop ] ? "" : "px" ); - }, - cur: function() { - var hooks = Tween.propHooks[ this.prop ]; - - return hooks && hooks.get ? - hooks.get( this ) : - Tween.propHooks._default.get( this ); - }, - run: function( percent ) { - var eased, - hooks = Tween.propHooks[ this.prop ]; - - if ( this.options.duration ) { - this.pos = eased = jQuery.easing[ this.easing ]( - percent, this.options.duration * percent, 0, 1, this.options.duration - ); - } else { - this.pos = eased = percent; - } - this.now = ( this.end - this.start ) * eased + this.start; - - if ( this.options.step ) { - this.options.step.call( this.elem, this.now, this ); - } - - if ( hooks && hooks.set ) { - hooks.set( this ); - } else { - Tween.propHooks._default.set( this ); - } - return this; - } -}; - -Tween.prototype.init.prototype = Tween.prototype; - -Tween.propHooks = { - _default: { - get: function( tween ) { - var result; - - // Use a property on the element directly when it is not a DOM element, - // or when there is no matching style property that exists. - if ( tween.elem.nodeType !== 1 || - tween.elem[ tween.prop ] != null && tween.elem.style[ tween.prop ] == null ) { - return tween.elem[ tween.prop ]; - } - - // Passing an empty string as a 3rd parameter to .css will automatically - // attempt a parseFloat and fallback to a string if the parse fails. - // Simple values such as "10px" are parsed to Float; - // complex values such as "rotate(1rad)" are returned as-is. - result = jQuery.css( tween.elem, tween.prop, "" ); - - // Empty strings, null, undefined and "auto" are converted to 0. - return !result || result === "auto" ? 0 : result; - }, - set: function( tween ) { - - // Use step hook for back compat. - // Use cssHook if its there. - // Use .style if available and use plain properties where available. - if ( jQuery.fx.step[ tween.prop ] ) { - jQuery.fx.step[ tween.prop ]( tween ); - } else if ( tween.elem.nodeType === 1 && ( - jQuery.cssHooks[ tween.prop ] || - tween.elem.style[ finalPropName( tween.prop ) ] != null ) ) { - jQuery.style( tween.elem, tween.prop, tween.now + tween.unit ); - } else { - tween.elem[ tween.prop ] = tween.now; - } - } - } -}; - -// Support: IE <=9 only -// Panic based approach to setting things on disconnected nodes -Tween.propHooks.scrollTop = Tween.propHooks.scrollLeft = { - set: function( tween ) { - if ( tween.elem.nodeType && tween.elem.parentNode ) { - tween.elem[ tween.prop ] = tween.now; - } - } -}; - -jQuery.easing = { - linear: function( p ) { - return p; - }, - swing: function( p ) { - return 0.5 - Math.cos( p * Math.PI ) / 2; - }, - _default: "swing" -}; - -jQuery.fx = Tween.prototype.init; - -// Back compat <1.8 extension point -jQuery.fx.step = {}; - - - - -var - fxNow, inProgress, - rfxtypes = /^(?:toggle|show|hide)$/, - rrun = /queueHooks$/; - -function schedule() { - if ( inProgress ) { - if ( document.hidden === false && window.requestAnimationFrame ) { - window.requestAnimationFrame( schedule ); - } else { - window.setTimeout( schedule, jQuery.fx.interval ); - } - - jQuery.fx.tick(); - } -} - -// Animations created synchronously will run synchronously -function createFxNow() { - window.setTimeout( function() { - fxNow = undefined; - } ); - return ( fxNow = Date.now() ); -} - -// Generate parameters to create a standard animation -function genFx( type, includeWidth ) { - var which, - i = 0, - attrs = { height: type }; - - // If we include width, step value is 1 to do all cssExpand values, - // otherwise step value is 2 to skip over Left and Right - includeWidth = includeWidth ? 1 : 0; - for ( ; i < 4; i += 2 - includeWidth ) { - which = cssExpand[ i ]; - attrs[ "margin" + which ] = attrs[ "padding" + which ] = type; - } - - if ( includeWidth ) { - attrs.opacity = attrs.width = type; - } - - return attrs; -} - -function createTween( value, prop, animation ) { - var tween, - collection = ( Animation.tweeners[ prop ] || [] ).concat( Animation.tweeners[ "*" ] ), - index = 0, - length = collection.length; - for ( ; index < length; index++ ) { - if ( ( tween = collection[ index ].call( animation, prop, value ) ) ) { - - // We're done with this property - return tween; - } - } -} - -function defaultPrefilter( elem, props, opts ) { - var prop, value, toggle, hooks, oldfire, propTween, restoreDisplay, display, - isBox = "width" in props || "height" in props, - anim = this, - orig = {}, - style = elem.style, - hidden = elem.nodeType && isHiddenWithinTree( elem ), - dataShow = dataPriv.get( elem, "fxshow" ); - - // Queue-skipping animations hijack the fx hooks - if ( !opts.queue ) { - hooks = jQuery._queueHooks( elem, "fx" ); - if ( hooks.unqueued == null ) { - hooks.unqueued = 0; - oldfire = hooks.empty.fire; - hooks.empty.fire = function() { - if ( !hooks.unqueued ) { - oldfire(); - } - }; - } - hooks.unqueued++; - - anim.always( function() { - - // Ensure the complete handler is called before this completes - anim.always( function() { - hooks.unqueued--; - if ( !jQuery.queue( elem, "fx" ).length ) { - hooks.empty.fire(); - } - } ); - } ); - } - - // Detect show/hide animations - for ( prop in props ) { - value = props[ prop ]; - if ( rfxtypes.test( value ) ) { - delete props[ prop ]; - toggle = toggle || value === "toggle"; - if ( value === ( hidden ? "hide" : "show" ) ) { - - // Pretend to be hidden if this is a "show" and - // there is still data from a stopped show/hide - if ( value === "show" && dataShow && dataShow[ prop ] !== undefined ) { - hidden = true; - - // Ignore all other no-op show/hide data - } else { - continue; - } - } - orig[ prop ] = dataShow && dataShow[ prop ] || jQuery.style( elem, prop ); - } - } - - // Bail out if this is a no-op like .hide().hide() - propTween = !jQuery.isEmptyObject( props ); - if ( !propTween && jQuery.isEmptyObject( orig ) ) { - return; - } - - // Restrict "overflow" and "display" styles during box animations - if ( isBox && elem.nodeType === 1 ) { - - // Support: IE <=9 - 11, Edge 12 - 15 - // Record all 3 overflow attributes because IE does not infer the shorthand - // from identically-valued overflowX and overflowY and Edge just mirrors - // the overflowX value there. - opts.overflow = [ style.overflow, style.overflowX, style.overflowY ]; - - // Identify a display type, preferring old show/hide data over the CSS cascade - restoreDisplay = dataShow && dataShow.display; - if ( restoreDisplay == null ) { - restoreDisplay = dataPriv.get( elem, "display" ); - } - display = jQuery.css( elem, "display" ); - if ( display === "none" ) { - if ( restoreDisplay ) { - display = restoreDisplay; - } else { - - // Get nonempty value(s) by temporarily forcing visibility - showHide( [ elem ], true ); - restoreDisplay = elem.style.display || restoreDisplay; - display = jQuery.css( elem, "display" ); - showHide( [ elem ] ); - } - } - - // Animate inline elements as inline-block - if ( display === "inline" || display === "inline-block" && restoreDisplay != null ) { - if ( jQuery.css( elem, "float" ) === "none" ) { - - // Restore the original display value at the end of pure show/hide animations - if ( !propTween ) { - anim.done( function() { - style.display = restoreDisplay; - } ); - if ( restoreDisplay == null ) { - display = style.display; - restoreDisplay = display === "none" ? "" : display; - } - } - style.display = "inline-block"; - } - } - } - - if ( opts.overflow ) { - style.overflow = "hidden"; - anim.always( function() { - style.overflow = opts.overflow[ 0 ]; - style.overflowX = opts.overflow[ 1 ]; - style.overflowY = opts.overflow[ 2 ]; - } ); - } - - // Implement show/hide animations - propTween = false; - for ( prop in orig ) { - - // General show/hide setup for this element animation - if ( !propTween ) { - if ( dataShow ) { - if ( "hidden" in dataShow ) { - hidden = dataShow.hidden; - } - } else { - dataShow = dataPriv.access( elem, "fxshow", { display: restoreDisplay } ); - } - - // Store hidden/visible for toggle so `.stop().toggle()` "reverses" - if ( toggle ) { - dataShow.hidden = !hidden; - } - - // Show elements before animating them - if ( hidden ) { - showHide( [ elem ], true ); - } - - /* eslint-disable no-loop-func */ - - anim.done( function() { - - /* eslint-enable no-loop-func */ - - // The final step of a "hide" animation is actually hiding the element - if ( !hidden ) { - showHide( [ elem ] ); - } - dataPriv.remove( elem, "fxshow" ); - for ( prop in orig ) { - jQuery.style( elem, prop, orig[ prop ] ); - } - } ); - } - - // Per-property setup - propTween = createTween( hidden ? dataShow[ prop ] : 0, prop, anim ); - if ( !( prop in dataShow ) ) { - dataShow[ prop ] = propTween.start; - if ( hidden ) { - propTween.end = propTween.start; - propTween.start = 0; - } - } - } -} - -function propFilter( props, specialEasing ) { - var index, name, easing, value, hooks; - - // camelCase, specialEasing and expand cssHook pass - for ( index in props ) { - name = camelCase( index ); - easing = specialEasing[ name ]; - value = props[ index ]; - if ( Array.isArray( value ) ) { - easing = value[ 1 ]; - value = props[ index ] = value[ 0 ]; - } - - if ( index !== name ) { - props[ name ] = value; - delete props[ index ]; - } - - hooks = jQuery.cssHooks[ name ]; - if ( hooks && "expand" in hooks ) { - value = hooks.expand( value ); - delete props[ name ]; - - // Not quite $.extend, this won't overwrite existing keys. - // Reusing 'index' because we have the correct "name" - for ( index in value ) { - if ( !( index in props ) ) { - props[ index ] = value[ index ]; - specialEasing[ index ] = easing; - } - } - } else { - specialEasing[ name ] = easing; - } - } -} - -function Animation( elem, properties, options ) { - var result, - stopped, - index = 0, - length = Animation.prefilters.length, - deferred = jQuery.Deferred().always( function() { - - // Don't match elem in the :animated selector - delete tick.elem; - } ), - tick = function() { - if ( stopped ) { - return false; - } - var currentTime = fxNow || createFxNow(), - remaining = Math.max( 0, animation.startTime + animation.duration - currentTime ), - - // Support: Android 2.3 only - // Archaic crash bug won't allow us to use `1 - ( 0.5 || 0 )` (#12497) - temp = remaining / animation.duration || 0, - percent = 1 - temp, - index = 0, - length = animation.tweens.length; - - for ( ; index < length; index++ ) { - animation.tweens[ index ].run( percent ); - } - - deferred.notifyWith( elem, [ animation, percent, remaining ] ); - - // If there's more to do, yield - if ( percent < 1 && length ) { - return remaining; - } - - // If this was an empty animation, synthesize a final progress notification - if ( !length ) { - deferred.notifyWith( elem, [ animation, 1, 0 ] ); - } - - // Resolve the animation and report its conclusion - deferred.resolveWith( elem, [ animation ] ); - return false; - }, - animation = deferred.promise( { - elem: elem, - props: jQuery.extend( {}, properties ), - opts: jQuery.extend( true, { - specialEasing: {}, - easing: jQuery.easing._default - }, options ), - originalProperties: properties, - originalOptions: options, - startTime: fxNow || createFxNow(), - duration: options.duration, - tweens: [], - createTween: function( prop, end ) { - var tween = jQuery.Tween( elem, animation.opts, prop, end, - animation.opts.specialEasing[ prop ] || animation.opts.easing ); - animation.tweens.push( tween ); - return tween; - }, - stop: function( gotoEnd ) { - var index = 0, - - // If we are going to the end, we want to run all the tweens - // otherwise we skip this part - length = gotoEnd ? animation.tweens.length : 0; - if ( stopped ) { - return this; - } - stopped = true; - for ( ; index < length; index++ ) { - animation.tweens[ index ].run( 1 ); - } - - // Resolve when we played the last frame; otherwise, reject - if ( gotoEnd ) { - deferred.notifyWith( elem, [ animation, 1, 0 ] ); - deferred.resolveWith( elem, [ animation, gotoEnd ] ); - } else { - deferred.rejectWith( elem, [ animation, gotoEnd ] ); - } - return this; - } - } ), - props = animation.props; - - propFilter( props, animation.opts.specialEasing ); - - for ( ; index < length; index++ ) { - result = Animation.prefilters[ index ].call( animation, elem, props, animation.opts ); - if ( result ) { - if ( isFunction( result.stop ) ) { - jQuery._queueHooks( animation.elem, animation.opts.queue ).stop = - result.stop.bind( result ); - } - return result; - } - } - - jQuery.map( props, createTween, animation ); - - if ( isFunction( animation.opts.start ) ) { - animation.opts.start.call( elem, animation ); - } - - // Attach callbacks from options - animation - .progress( animation.opts.progress ) - .done( animation.opts.done, animation.opts.complete ) - .fail( animation.opts.fail ) - .always( animation.opts.always ); - - jQuery.fx.timer( - jQuery.extend( tick, { - elem: elem, - anim: animation, - queue: animation.opts.queue - } ) - ); - - return animation; -} - -jQuery.Animation = jQuery.extend( Animation, { - - tweeners: { - "*": [ function( prop, value ) { - var tween = this.createTween( prop, value ); - adjustCSS( tween.elem, prop, rcssNum.exec( value ), tween ); - return tween; - } ] - }, - - tweener: function( props, callback ) { - if ( isFunction( props ) ) { - callback = props; - props = [ "*" ]; - } else { - props = props.match( rnothtmlwhite ); - } - - var prop, - index = 0, - length = props.length; - - for ( ; index < length; index++ ) { - prop = props[ index ]; - Animation.tweeners[ prop ] = Animation.tweeners[ prop ] || []; - Animation.tweeners[ prop ].unshift( callback ); - } - }, - - prefilters: [ defaultPrefilter ], - - prefilter: function( callback, prepend ) { - if ( prepend ) { - Animation.prefilters.unshift( callback ); - } else { - Animation.prefilters.push( callback ); - } - } -} ); - -jQuery.speed = function( speed, easing, fn ) { - var opt = speed && typeof speed === "object" ? jQuery.extend( {}, speed ) : { - complete: fn || !fn && easing || - isFunction( speed ) && speed, - duration: speed, - easing: fn && easing || easing && !isFunction( easing ) && easing - }; - - // Go to the end state if fx are off - if ( jQuery.fx.off ) { - opt.duration = 0; - - } else { - if ( typeof opt.duration !== "number" ) { - if ( opt.duration in jQuery.fx.speeds ) { - opt.duration = jQuery.fx.speeds[ opt.duration ]; - - } else { - opt.duration = jQuery.fx.speeds._default; - } - } - } - - // Normalize opt.queue - true/undefined/null -> "fx" - if ( opt.queue == null || opt.queue === true ) { - opt.queue = "fx"; - } - - // Queueing - opt.old = opt.complete; - - opt.complete = function() { - if ( isFunction( opt.old ) ) { - opt.old.call( this ); - } - - if ( opt.queue ) { - jQuery.dequeue( this, opt.queue ); - } - }; - - return opt; -}; - -jQuery.fn.extend( { - fadeTo: function( speed, to, easing, callback ) { - - // Show any hidden elements after setting opacity to 0 - return this.filter( isHiddenWithinTree ).css( "opacity", 0 ).show() - - // Animate to the value specified - .end().animate( { opacity: to }, speed, easing, callback ); - }, - animate: function( prop, speed, easing, callback ) { - var empty = jQuery.isEmptyObject( prop ), - optall = jQuery.speed( speed, easing, callback ), - doAnimation = function() { - - // Operate on a copy of prop so per-property easing won't be lost - var anim = Animation( this, jQuery.extend( {}, prop ), optall ); - - // Empty animations, or finishing resolves immediately - if ( empty || dataPriv.get( this, "finish" ) ) { - anim.stop( true ); - } - }; - - doAnimation.finish = doAnimation; - - return empty || optall.queue === false ? - this.each( doAnimation ) : - this.queue( optall.queue, doAnimation ); - }, - stop: function( type, clearQueue, gotoEnd ) { - var stopQueue = function( hooks ) { - var stop = hooks.stop; - delete hooks.stop; - stop( gotoEnd ); - }; - - if ( typeof type !== "string" ) { - gotoEnd = clearQueue; - clearQueue = type; - type = undefined; - } - if ( clearQueue ) { - this.queue( type || "fx", [] ); - } - - return this.each( function() { - var dequeue = true, - index = type != null && type + "queueHooks", - timers = jQuery.timers, - data = dataPriv.get( this ); - - if ( index ) { - if ( data[ index ] && data[ index ].stop ) { - stopQueue( data[ index ] ); - } - } else { - for ( index in data ) { - if ( data[ index ] && data[ index ].stop && rrun.test( index ) ) { - stopQueue( data[ index ] ); - } - } - } - - for ( index = timers.length; index--; ) { - if ( timers[ index ].elem === this && - ( type == null || timers[ index ].queue === type ) ) { - - timers[ index ].anim.stop( gotoEnd ); - dequeue = false; - timers.splice( index, 1 ); - } - } - - // Start the next in the queue if the last step wasn't forced. - // Timers currently will call their complete callbacks, which - // will dequeue but only if they were gotoEnd. - if ( dequeue || !gotoEnd ) { - jQuery.dequeue( this, type ); - } - } ); - }, - finish: function( type ) { - if ( type !== false ) { - type = type || "fx"; - } - return this.each( function() { - var index, - data = dataPriv.get( this ), - queue = data[ type + "queue" ], - hooks = data[ type + "queueHooks" ], - timers = jQuery.timers, - length = queue ? queue.length : 0; - - // Enable finishing flag on private data - data.finish = true; - - // Empty the queue first - jQuery.queue( this, type, [] ); - - if ( hooks && hooks.stop ) { - hooks.stop.call( this, true ); - } - - // Look for any active animations, and finish them - for ( index = timers.length; index--; ) { - if ( timers[ index ].elem === this && timers[ index ].queue === type ) { - timers[ index ].anim.stop( true ); - timers.splice( index, 1 ); - } - } - - // Look for any animations in the old queue and finish them - for ( index = 0; index < length; index++ ) { - if ( queue[ index ] && queue[ index ].finish ) { - queue[ index ].finish.call( this ); - } - } - - // Turn off finishing flag - delete data.finish; - } ); - } -} ); - -jQuery.each( [ "toggle", "show", "hide" ], function( _i, name ) { - var cssFn = jQuery.fn[ name ]; - jQuery.fn[ name ] = function( speed, easing, callback ) { - return speed == null || typeof speed === "boolean" ? - cssFn.apply( this, arguments ) : - this.animate( genFx( name, true ), speed, easing, callback ); - }; -} ); - -// Generate shortcuts for custom animations -jQuery.each( { - slideDown: genFx( "show" ), - slideUp: genFx( "hide" ), - slideToggle: genFx( "toggle" ), - fadeIn: { opacity: "show" }, - fadeOut: { opacity: "hide" }, - fadeToggle: { opacity: "toggle" } -}, function( name, props ) { - jQuery.fn[ name ] = function( speed, easing, callback ) { - return this.animate( props, speed, easing, callback ); - }; -} ); - -jQuery.timers = []; -jQuery.fx.tick = function() { - var timer, - i = 0, - timers = jQuery.timers; - - fxNow = Date.now(); - - for ( ; i < timers.length; i++ ) { - timer = timers[ i ]; - - // Run the timer and safely remove it when done (allowing for external removal) - if ( !timer() && timers[ i ] === timer ) { - timers.splice( i--, 1 ); - } - } - - if ( !timers.length ) { - jQuery.fx.stop(); - } - fxNow = undefined; -}; - -jQuery.fx.timer = function( timer ) { - jQuery.timers.push( timer ); - jQuery.fx.start(); -}; - -jQuery.fx.interval = 13; -jQuery.fx.start = function() { - if ( inProgress ) { - return; - } - - inProgress = true; - schedule(); -}; - -jQuery.fx.stop = function() { - inProgress = null; -}; - -jQuery.fx.speeds = { - slow: 600, - fast: 200, - - // Default speed - _default: 400 -}; - - -// Based off of the plugin by Clint Helfers, with permission. -// https://web.archive.org/web/20100324014747/http://blindsignals.com/index.php/2009/07/jquery-delay/ -jQuery.fn.delay = function( time, type ) { - time = jQuery.fx ? jQuery.fx.speeds[ time ] || time : time; - type = type || "fx"; - - return this.queue( type, function( next, hooks ) { - var timeout = window.setTimeout( next, time ); - hooks.stop = function() { - window.clearTimeout( timeout ); - }; - } ); -}; - - -( function() { - var input = document.createElement( "input" ), - select = document.createElement( "select" ), - opt = select.appendChild( document.createElement( "option" ) ); - - input.type = "checkbox"; - - // Support: Android <=4.3 only - // Default value for a checkbox should be "on" - support.checkOn = input.value !== ""; - - // Support: IE <=11 only - // Must access selectedIndex to make default options select - support.optSelected = opt.selected; - - // Support: IE <=11 only - // An input loses its value after becoming a radio - input = document.createElement( "input" ); - input.value = "t"; - input.type = "radio"; - support.radioValue = input.value === "t"; -} )(); - - -var boolHook, - attrHandle = jQuery.expr.attrHandle; - -jQuery.fn.extend( { - attr: function( name, value ) { - return access( this, jQuery.attr, name, value, arguments.length > 1 ); - }, - - removeAttr: function( name ) { - return this.each( function() { - jQuery.removeAttr( this, name ); - } ); - } -} ); - -jQuery.extend( { - attr: function( elem, name, value ) { - var ret, hooks, - nType = elem.nodeType; - - // Don't get/set attributes on text, comment and attribute nodes - if ( nType === 3 || nType === 8 || nType === 2 ) { - return; - } - - // Fallback to prop when attributes are not supported - if ( typeof elem.getAttribute === "undefined" ) { - return jQuery.prop( elem, name, value ); - } - - // Attribute hooks are determined by the lowercase version - // Grab necessary hook if one is defined - if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { - hooks = jQuery.attrHooks[ name.toLowerCase() ] || - ( jQuery.expr.match.bool.test( name ) ? boolHook : undefined ); - } - - if ( value !== undefined ) { - if ( value === null ) { - jQuery.removeAttr( elem, name ); - return; - } - - if ( hooks && "set" in hooks && - ( ret = hooks.set( elem, value, name ) ) !== undefined ) { - return ret; - } - - elem.setAttribute( name, value + "" ); - return value; - } - - if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { - return ret; - } - - ret = jQuery.find.attr( elem, name ); - - // Non-existent attributes return null, we normalize to undefined - return ret == null ? undefined : ret; - }, - - attrHooks: { - type: { - set: function( elem, value ) { - if ( !support.radioValue && value === "radio" && - nodeName( elem, "input" ) ) { - var val = elem.value; - elem.setAttribute( "type", value ); - if ( val ) { - elem.value = val; - } - return value; - } - } - } - }, - - removeAttr: function( elem, value ) { - var name, - i = 0, - - // Attribute names can contain non-HTML whitespace characters - // https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 - attrNames = value && value.match( rnothtmlwhite ); - - if ( attrNames && elem.nodeType === 1 ) { - while ( ( name = attrNames[ i++ ] ) ) { - elem.removeAttribute( name ); - } - } - } -} ); - -// Hooks for boolean attributes -boolHook = { - set: function( elem, value, name ) { - if ( value === false ) { - - // Remove boolean attributes when set to false - jQuery.removeAttr( elem, name ); - } else { - elem.setAttribute( name, name ); - } - return name; - } -}; - -jQuery.each( jQuery.expr.match.bool.source.match( /\w+/g ), function( _i, name ) { - var getter = attrHandle[ name ] || jQuery.find.attr; - - attrHandle[ name ] = function( elem, name, isXML ) { - var ret, handle, - lowercaseName = name.toLowerCase(); - - if ( !isXML ) { - - // Avoid an infinite loop by temporarily removing this function from the getter - handle = attrHandle[ lowercaseName ]; - attrHandle[ lowercaseName ] = ret; - ret = getter( elem, name, isXML ) != null ? - lowercaseName : - null; - attrHandle[ lowercaseName ] = handle; - } - return ret; - }; -} ); - - - - -var rfocusable = /^(?:input|select|textarea|button)$/i, - rclickable = /^(?:a|area)$/i; - -jQuery.fn.extend( { - prop: function( name, value ) { - return access( this, jQuery.prop, name, value, arguments.length > 1 ); - }, - - removeProp: function( name ) { - return this.each( function() { - delete this[ jQuery.propFix[ name ] || name ]; - } ); - } -} ); - -jQuery.extend( { - prop: function( elem, name, value ) { - var ret, hooks, - nType = elem.nodeType; - - // Don't get/set properties on text, comment and attribute nodes - if ( nType === 3 || nType === 8 || nType === 2 ) { - return; - } - - if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { - - // Fix name and attach hooks - name = jQuery.propFix[ name ] || name; - hooks = jQuery.propHooks[ name ]; - } - - if ( value !== undefined ) { - if ( hooks && "set" in hooks && - ( ret = hooks.set( elem, value, name ) ) !== undefined ) { - return ret; - } - - return ( elem[ name ] = value ); - } - - if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { - return ret; - } - - return elem[ name ]; - }, - - propHooks: { - tabIndex: { - get: function( elem ) { - - // Support: IE <=9 - 11 only - // elem.tabIndex doesn't always return the - // correct value when it hasn't been explicitly set - // https://web.archive.org/web/20141116233347/http://fluidproject.org/blog/2008/01/09/getting-setting-and-removing-tabindex-values-with-javascript/ - // Use proper attribute retrieval(#12072) - var tabindex = jQuery.find.attr( elem, "tabindex" ); - - if ( tabindex ) { - return parseInt( tabindex, 10 ); - } - - if ( - rfocusable.test( elem.nodeName ) || - rclickable.test( elem.nodeName ) && - elem.href - ) { - return 0; - } - - return -1; - } - } - }, - - propFix: { - "for": "htmlFor", - "class": "className" - } -} ); - -// Support: IE <=11 only -// Accessing the selectedIndex property -// forces the browser to respect setting selected -// on the option -// The getter ensures a default option is selected -// when in an optgroup -// eslint rule "no-unused-expressions" is disabled for this code -// since it considers such accessions noop -if ( !support.optSelected ) { - jQuery.propHooks.selected = { - get: function( elem ) { - - /* eslint no-unused-expressions: "off" */ - - var parent = elem.parentNode; - if ( parent && parent.parentNode ) { - parent.parentNode.selectedIndex; - } - return null; - }, - set: function( elem ) { - - /* eslint no-unused-expressions: "off" */ - - var parent = elem.parentNode; - if ( parent ) { - parent.selectedIndex; - - if ( parent.parentNode ) { - parent.parentNode.selectedIndex; - } - } - } - }; -} - -jQuery.each( [ - "tabIndex", - "readOnly", - "maxLength", - "cellSpacing", - "cellPadding", - "rowSpan", - "colSpan", - "useMap", - "frameBorder", - "contentEditable" -], function() { - jQuery.propFix[ this.toLowerCase() ] = this; -} ); - - - - - // Strip and collapse whitespace according to HTML spec - // https://infra.spec.whatwg.org/#strip-and-collapse-ascii-whitespace - function stripAndCollapse( value ) { - var tokens = value.match( rnothtmlwhite ) || []; - return tokens.join( " " ); - } - - -function getClass( elem ) { - return elem.getAttribute && elem.getAttribute( "class" ) || ""; -} - -function classesToArray( value ) { - if ( Array.isArray( value ) ) { - return value; - } - if ( typeof value === "string" ) { - return value.match( rnothtmlwhite ) || []; - } - return []; -} - -jQuery.fn.extend( { - addClass: function( value ) { - var classes, elem, cur, curValue, clazz, j, finalValue, - i = 0; - - if ( isFunction( value ) ) { - return this.each( function( j ) { - jQuery( this ).addClass( value.call( this, j, getClass( this ) ) ); - } ); - } - - classes = classesToArray( value ); - - if ( classes.length ) { - while ( ( elem = this[ i++ ] ) ) { - curValue = getClass( elem ); - cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); - - if ( cur ) { - j = 0; - while ( ( clazz = classes[ j++ ] ) ) { - if ( cur.indexOf( " " + clazz + " " ) < 0 ) { - cur += clazz + " "; - } - } - - // Only assign if different to avoid unneeded rendering. - finalValue = stripAndCollapse( cur ); - if ( curValue !== finalValue ) { - elem.setAttribute( "class", finalValue ); - } - } - } - } - - return this; - }, - - removeClass: function( value ) { - var classes, elem, cur, curValue, clazz, j, finalValue, - i = 0; - - if ( isFunction( value ) ) { - return this.each( function( j ) { - jQuery( this ).removeClass( value.call( this, j, getClass( this ) ) ); - } ); - } - - if ( !arguments.length ) { - return this.attr( "class", "" ); - } - - classes = classesToArray( value ); - - if ( classes.length ) { - while ( ( elem = this[ i++ ] ) ) { - curValue = getClass( elem ); - - // This expression is here for better compressibility (see addClass) - cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); - - if ( cur ) { - j = 0; - while ( ( clazz = classes[ j++ ] ) ) { - - // Remove *all* instances - while ( cur.indexOf( " " + clazz + " " ) > -1 ) { - cur = cur.replace( " " + clazz + " ", " " ); - } - } - - // Only assign if different to avoid unneeded rendering. - finalValue = stripAndCollapse( cur ); - if ( curValue !== finalValue ) { - elem.setAttribute( "class", finalValue ); - } - } - } - } - - return this; - }, - - toggleClass: function( value, stateVal ) { - var type = typeof value, - isValidValue = type === "string" || Array.isArray( value ); - - if ( typeof stateVal === "boolean" && isValidValue ) { - return stateVal ? this.addClass( value ) : this.removeClass( value ); - } - - if ( isFunction( value ) ) { - return this.each( function( i ) { - jQuery( this ).toggleClass( - value.call( this, i, getClass( this ), stateVal ), - stateVal - ); - } ); - } - - return this.each( function() { - var className, i, self, classNames; - - if ( isValidValue ) { - - // Toggle individual class names - i = 0; - self = jQuery( this ); - classNames = classesToArray( value ); - - while ( ( className = classNames[ i++ ] ) ) { - - // Check each className given, space separated list - if ( self.hasClass( className ) ) { - self.removeClass( className ); - } else { - self.addClass( className ); - } - } - - // Toggle whole class name - } else if ( value === undefined || type === "boolean" ) { - className = getClass( this ); - if ( className ) { - - // Store className if set - dataPriv.set( this, "__className__", className ); - } - - // If the element has a class name or if we're passed `false`, - // then remove the whole classname (if there was one, the above saved it). - // Otherwise bring back whatever was previously saved (if anything), - // falling back to the empty string if nothing was stored. - if ( this.setAttribute ) { - this.setAttribute( "class", - className || value === false ? - "" : - dataPriv.get( this, "__className__" ) || "" - ); - } - } - } ); - }, - - hasClass: function( selector ) { - var className, elem, - i = 0; - - className = " " + selector + " "; - while ( ( elem = this[ i++ ] ) ) { - if ( elem.nodeType === 1 && - ( " " + stripAndCollapse( getClass( elem ) ) + " " ).indexOf( className ) > -1 ) { - return true; - } - } - - return false; - } -} ); - - - - -var rreturn = /\r/g; - -jQuery.fn.extend( { - val: function( value ) { - var hooks, ret, valueIsFunction, - elem = this[ 0 ]; - - if ( !arguments.length ) { - if ( elem ) { - hooks = jQuery.valHooks[ elem.type ] || - jQuery.valHooks[ elem.nodeName.toLowerCase() ]; - - if ( hooks && - "get" in hooks && - ( ret = hooks.get( elem, "value" ) ) !== undefined - ) { - return ret; - } - - ret = elem.value; - - // Handle most common string cases - if ( typeof ret === "string" ) { - return ret.replace( rreturn, "" ); - } - - // Handle cases where value is null/undef or number - return ret == null ? "" : ret; - } - - return; - } - - valueIsFunction = isFunction( value ); - - return this.each( function( i ) { - var val; - - if ( this.nodeType !== 1 ) { - return; - } - - if ( valueIsFunction ) { - val = value.call( this, i, jQuery( this ).val() ); - } else { - val = value; - } - - // Treat null/undefined as ""; convert numbers to string - if ( val == null ) { - val = ""; - - } else if ( typeof val === "number" ) { - val += ""; - - } else if ( Array.isArray( val ) ) { - val = jQuery.map( val, function( value ) { - return value == null ? "" : value + ""; - } ); - } - - hooks = jQuery.valHooks[ this.type ] || jQuery.valHooks[ this.nodeName.toLowerCase() ]; - - // If set returns undefined, fall back to normal setting - if ( !hooks || !( "set" in hooks ) || hooks.set( this, val, "value" ) === undefined ) { - this.value = val; - } - } ); - } -} ); - -jQuery.extend( { - valHooks: { - option: { - get: function( elem ) { - - var val = jQuery.find.attr( elem, "value" ); - return val != null ? - val : - - // Support: IE <=10 - 11 only - // option.text throws exceptions (#14686, #14858) - // Strip and collapse whitespace - // https://html.spec.whatwg.org/#strip-and-collapse-whitespace - stripAndCollapse( jQuery.text( elem ) ); - } - }, - select: { - get: function( elem ) { - var value, option, i, - options = elem.options, - index = elem.selectedIndex, - one = elem.type === "select-one", - values = one ? null : [], - max = one ? index + 1 : options.length; - - if ( index < 0 ) { - i = max; - - } else { - i = one ? index : 0; - } - - // Loop through all the selected options - for ( ; i < max; i++ ) { - option = options[ i ]; - - // Support: IE <=9 only - // IE8-9 doesn't update selected after form reset (#2551) - if ( ( option.selected || i === index ) && - - // Don't return options that are disabled or in a disabled optgroup - !option.disabled && - ( !option.parentNode.disabled || - !nodeName( option.parentNode, "optgroup" ) ) ) { - - // Get the specific value for the option - value = jQuery( option ).val(); - - // We don't need an array for one selects - if ( one ) { - return value; - } - - // Multi-Selects return an array - values.push( value ); - } - } - - return values; - }, - - set: function( elem, value ) { - var optionSet, option, - options = elem.options, - values = jQuery.makeArray( value ), - i = options.length; - - while ( i-- ) { - option = options[ i ]; - - /* eslint-disable no-cond-assign */ - - if ( option.selected = - jQuery.inArray( jQuery.valHooks.option.get( option ), values ) > -1 - ) { - optionSet = true; - } - - /* eslint-enable no-cond-assign */ - } - - // Force browsers to behave consistently when non-matching value is set - if ( !optionSet ) { - elem.selectedIndex = -1; - } - return values; - } - } - } -} ); - -// Radios and checkboxes getter/setter -jQuery.each( [ "radio", "checkbox" ], function() { - jQuery.valHooks[ this ] = { - set: function( elem, value ) { - if ( Array.isArray( value ) ) { - return ( elem.checked = jQuery.inArray( jQuery( elem ).val(), value ) > -1 ); - } - } - }; - if ( !support.checkOn ) { - jQuery.valHooks[ this ].get = function( elem ) { - return elem.getAttribute( "value" ) === null ? "on" : elem.value; - }; - } -} ); - - - - -// Return jQuery for attributes-only inclusion - - -support.focusin = "onfocusin" in window; - - -var rfocusMorph = /^(?:focusinfocus|focusoutblur)$/, - stopPropagationCallback = function( e ) { - e.stopPropagation(); - }; - -jQuery.extend( jQuery.event, { - - trigger: function( event, data, elem, onlyHandlers ) { - - var i, cur, tmp, bubbleType, ontype, handle, special, lastElement, - eventPath = [ elem || document ], - type = hasOwn.call( event, "type" ) ? event.type : event, - namespaces = hasOwn.call( event, "namespace" ) ? event.namespace.split( "." ) : []; - - cur = lastElement = tmp = elem = elem || document; - - // Don't do events on text and comment nodes - if ( elem.nodeType === 3 || elem.nodeType === 8 ) { - return; - } - - // focus/blur morphs to focusin/out; ensure we're not firing them right now - if ( rfocusMorph.test( type + jQuery.event.triggered ) ) { - return; - } - - if ( type.indexOf( "." ) > -1 ) { - - // Namespaced trigger; create a regexp to match event type in handle() - namespaces = type.split( "." ); - type = namespaces.shift(); - namespaces.sort(); - } - ontype = type.indexOf( ":" ) < 0 && "on" + type; - - // Caller can pass in a jQuery.Event object, Object, or just an event type string - event = event[ jQuery.expando ] ? - event : - new jQuery.Event( type, typeof event === "object" && event ); - - // Trigger bitmask: & 1 for native handlers; & 2 for jQuery (always true) - event.isTrigger = onlyHandlers ? 2 : 3; - event.namespace = namespaces.join( "." ); - event.rnamespace = event.namespace ? - new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ) : - null; - - // Clean up the event in case it is being reused - event.result = undefined; - if ( !event.target ) { - event.target = elem; - } - - // Clone any incoming data and prepend the event, creating the handler arg list - data = data == null ? - [ event ] : - jQuery.makeArray( data, [ event ] ); - - // Allow special events to draw outside the lines - special = jQuery.event.special[ type ] || {}; - if ( !onlyHandlers && special.trigger && special.trigger.apply( elem, data ) === false ) { - return; - } - - // Determine event propagation path in advance, per W3C events spec (#9951) - // Bubble up to document, then to window; watch for a global ownerDocument var (#9724) - if ( !onlyHandlers && !special.noBubble && !isWindow( elem ) ) { - - bubbleType = special.delegateType || type; - if ( !rfocusMorph.test( bubbleType + type ) ) { - cur = cur.parentNode; - } - for ( ; cur; cur = cur.parentNode ) { - eventPath.push( cur ); - tmp = cur; - } - - // Only add window if we got to document (e.g., not plain obj or detached DOM) - if ( tmp === ( elem.ownerDocument || document ) ) { - eventPath.push( tmp.defaultView || tmp.parentWindow || window ); - } - } - - // Fire handlers on the event path - i = 0; - while ( ( cur = eventPath[ i++ ] ) && !event.isPropagationStopped() ) { - lastElement = cur; - event.type = i > 1 ? - bubbleType : - special.bindType || type; - - // jQuery handler - handle = ( dataPriv.get( cur, "events" ) || Object.create( null ) )[ event.type ] && - dataPriv.get( cur, "handle" ); - if ( handle ) { - handle.apply( cur, data ); - } - - // Native handler - handle = ontype && cur[ ontype ]; - if ( handle && handle.apply && acceptData( cur ) ) { - event.result = handle.apply( cur, data ); - if ( event.result === false ) { - event.preventDefault(); - } - } - } - event.type = type; - - // If nobody prevented the default action, do it now - if ( !onlyHandlers && !event.isDefaultPrevented() ) { - - if ( ( !special._default || - special._default.apply( eventPath.pop(), data ) === false ) && - acceptData( elem ) ) { - - // Call a native DOM method on the target with the same name as the event. - // Don't do default actions on window, that's where global variables be (#6170) - if ( ontype && isFunction( elem[ type ] ) && !isWindow( elem ) ) { - - // Don't re-trigger an onFOO event when we call its FOO() method - tmp = elem[ ontype ]; - - if ( tmp ) { - elem[ ontype ] = null; - } - - // Prevent re-triggering of the same event, since we already bubbled it above - jQuery.event.triggered = type; - - if ( event.isPropagationStopped() ) { - lastElement.addEventListener( type, stopPropagationCallback ); - } - - elem[ type ](); - - if ( event.isPropagationStopped() ) { - lastElement.removeEventListener( type, stopPropagationCallback ); - } - - jQuery.event.triggered = undefined; - - if ( tmp ) { - elem[ ontype ] = tmp; - } - } - } - } - - return event.result; - }, - - // Piggyback on a donor event to simulate a different one - // Used only for `focus(in | out)` events - simulate: function( type, elem, event ) { - var e = jQuery.extend( - new jQuery.Event(), - event, - { - type: type, - isSimulated: true - } - ); - - jQuery.event.trigger( e, null, elem ); - } - -} ); - -jQuery.fn.extend( { - - trigger: function( type, data ) { - return this.each( function() { - jQuery.event.trigger( type, data, this ); - } ); - }, - triggerHandler: function( type, data ) { - var elem = this[ 0 ]; - if ( elem ) { - return jQuery.event.trigger( type, data, elem, true ); - } - } -} ); - - -// Support: Firefox <=44 -// Firefox doesn't have focus(in | out) events -// Related ticket - https://bugzilla.mozilla.org/show_bug.cgi?id=687787 -// -// Support: Chrome <=48 - 49, Safari <=9.0 - 9.1 -// focus(in | out) events fire after focus & blur events, -// which is spec violation - http://www.w3.org/TR/DOM-Level-3-Events/#events-focusevent-event-order -// Related ticket - https://bugs.chromium.org/p/chromium/issues/detail?id=449857 -if ( !support.focusin ) { - jQuery.each( { focus: "focusin", blur: "focusout" }, function( orig, fix ) { - - // Attach a single capturing handler on the document while someone wants focusin/focusout - var handler = function( event ) { - jQuery.event.simulate( fix, event.target, jQuery.event.fix( event ) ); - }; - - jQuery.event.special[ fix ] = { - setup: function() { - - // Handle: regular nodes (via `this.ownerDocument`), window - // (via `this.document`) & document (via `this`). - var doc = this.ownerDocument || this.document || this, - attaches = dataPriv.access( doc, fix ); - - if ( !attaches ) { - doc.addEventListener( orig, handler, true ); - } - dataPriv.access( doc, fix, ( attaches || 0 ) + 1 ); - }, - teardown: function() { - var doc = this.ownerDocument || this.document || this, - attaches = dataPriv.access( doc, fix ) - 1; - - if ( !attaches ) { - doc.removeEventListener( orig, handler, true ); - dataPriv.remove( doc, fix ); - - } else { - dataPriv.access( doc, fix, attaches ); - } - } - }; - } ); -} -var location = window.location; - -var nonce = { guid: Date.now() }; - -var rquery = ( /\?/ ); - - - -// Cross-browser xml parsing -jQuery.parseXML = function( data ) { - var xml, parserErrorElem; - if ( !data || typeof data !== "string" ) { - return null; - } - - // Support: IE 9 - 11 only - // IE throws on parseFromString with invalid input. - try { - xml = ( new window.DOMParser() ).parseFromString( data, "text/xml" ); - } catch ( e ) {} - - parserErrorElem = xml && xml.getElementsByTagName( "parsererror" )[ 0 ]; - if ( !xml || parserErrorElem ) { - jQuery.error( "Invalid XML: " + ( - parserErrorElem ? - jQuery.map( parserErrorElem.childNodes, function( el ) { - return el.textContent; - } ).join( "\n" ) : - data - ) ); - } - return xml; -}; - - -var - rbracket = /\[\]$/, - rCRLF = /\r?\n/g, - rsubmitterTypes = /^(?:submit|button|image|reset|file)$/i, - rsubmittable = /^(?:input|select|textarea|keygen)/i; - -function buildParams( prefix, obj, traditional, add ) { - var name; - - if ( Array.isArray( obj ) ) { - - // Serialize array item. - jQuery.each( obj, function( i, v ) { - if ( traditional || rbracket.test( prefix ) ) { - - // Treat each array item as a scalar. - add( prefix, v ); - - } else { - - // Item is non-scalar (array or object), encode its numeric index. - buildParams( - prefix + "[" + ( typeof v === "object" && v != null ? i : "" ) + "]", - v, - traditional, - add - ); - } - } ); - - } else if ( !traditional && toType( obj ) === "object" ) { - - // Serialize object item. - for ( name in obj ) { - buildParams( prefix + "[" + name + "]", obj[ name ], traditional, add ); - } - - } else { - - // Serialize scalar item. - add( prefix, obj ); - } -} - -// Serialize an array of form elements or a set of -// key/values into a query string -jQuery.param = function( a, traditional ) { - var prefix, - s = [], - add = function( key, valueOrFunction ) { - - // If value is a function, invoke it and use its return value - var value = isFunction( valueOrFunction ) ? - valueOrFunction() : - valueOrFunction; - - s[ s.length ] = encodeURIComponent( key ) + "=" + - encodeURIComponent( value == null ? "" : value ); - }; - - if ( a == null ) { - return ""; - } - - // If an array was passed in, assume that it is an array of form elements. - if ( Array.isArray( a ) || ( a.jquery && !jQuery.isPlainObject( a ) ) ) { - - // Serialize the form elements - jQuery.each( a, function() { - add( this.name, this.value ); - } ); - - } else { - - // If traditional, encode the "old" way (the way 1.3.2 or older - // did it), otherwise encode params recursively. - for ( prefix in a ) { - buildParams( prefix, a[ prefix ], traditional, add ); - } - } - - // Return the resulting serialization - return s.join( "&" ); -}; - -jQuery.fn.extend( { - serialize: function() { - return jQuery.param( this.serializeArray() ); - }, - serializeArray: function() { - return this.map( function() { - - // Can add propHook for "elements" to filter or add form elements - var elements = jQuery.prop( this, "elements" ); - return elements ? jQuery.makeArray( elements ) : this; - } ).filter( function() { - var type = this.type; - - // Use .is( ":disabled" ) so that fieldset[disabled] works - return this.name && !jQuery( this ).is( ":disabled" ) && - rsubmittable.test( this.nodeName ) && !rsubmitterTypes.test( type ) && - ( this.checked || !rcheckableType.test( type ) ); - } ).map( function( _i, elem ) { - var val = jQuery( this ).val(); - - if ( val == null ) { - return null; - } - - if ( Array.isArray( val ) ) { - return jQuery.map( val, function( val ) { - return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; - } ); - } - - return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; - } ).get(); - } -} ); - - -var - r20 = /%20/g, - rhash = /#.*$/, - rantiCache = /([?&])_=[^&]*/, - rheaders = /^(.*?):[ \t]*([^\r\n]*)$/mg, - - // #7653, #8125, #8152: local protocol detection - rlocalProtocol = /^(?:about|app|app-storage|.+-extension|file|res|widget):$/, - rnoContent = /^(?:GET|HEAD)$/, - rprotocol = /^\/\//, - - /* Prefilters - * 1) They are useful to introduce custom dataTypes (see ajax/jsonp.js for an example) - * 2) These are called: - * - BEFORE asking for a transport - * - AFTER param serialization (s.data is a string if s.processData is true) - * 3) key is the dataType - * 4) the catchall symbol "*" can be used - * 5) execution will start with transport dataType and THEN continue down to "*" if needed - */ - prefilters = {}, - - /* Transports bindings - * 1) key is the dataType - * 2) the catchall symbol "*" can be used - * 3) selection will start with transport dataType and THEN go to "*" if needed - */ - transports = {}, - - // Avoid comment-prolog char sequence (#10098); must appease lint and evade compression - allTypes = "*/".concat( "*" ), - - // Anchor tag for parsing the document origin - originAnchor = document.createElement( "a" ); - -originAnchor.href = location.href; - -// Base "constructor" for jQuery.ajaxPrefilter and jQuery.ajaxTransport -function addToPrefiltersOrTransports( structure ) { - - // dataTypeExpression is optional and defaults to "*" - return function( dataTypeExpression, func ) { - - if ( typeof dataTypeExpression !== "string" ) { - func = dataTypeExpression; - dataTypeExpression = "*"; - } - - var dataType, - i = 0, - dataTypes = dataTypeExpression.toLowerCase().match( rnothtmlwhite ) || []; - - if ( isFunction( func ) ) { - - // For each dataType in the dataTypeExpression - while ( ( dataType = dataTypes[ i++ ] ) ) { - - // Prepend if requested - if ( dataType[ 0 ] === "+" ) { - dataType = dataType.slice( 1 ) || "*"; - ( structure[ dataType ] = structure[ dataType ] || [] ).unshift( func ); - - // Otherwise append - } else { - ( structure[ dataType ] = structure[ dataType ] || [] ).push( func ); - } - } - } - }; -} - -// Base inspection function for prefilters and transports -function inspectPrefiltersOrTransports( structure, options, originalOptions, jqXHR ) { - - var inspected = {}, - seekingTransport = ( structure === transports ); - - function inspect( dataType ) { - var selected; - inspected[ dataType ] = true; - jQuery.each( structure[ dataType ] || [], function( _, prefilterOrFactory ) { - var dataTypeOrTransport = prefilterOrFactory( options, originalOptions, jqXHR ); - if ( typeof dataTypeOrTransport === "string" && - !seekingTransport && !inspected[ dataTypeOrTransport ] ) { - - options.dataTypes.unshift( dataTypeOrTransport ); - inspect( dataTypeOrTransport ); - return false; - } else if ( seekingTransport ) { - return !( selected = dataTypeOrTransport ); - } - } ); - return selected; - } - - return inspect( options.dataTypes[ 0 ] ) || !inspected[ "*" ] && inspect( "*" ); -} - -// A special extend for ajax options -// that takes "flat" options (not to be deep extended) -// Fixes #9887 -function ajaxExtend( target, src ) { - var key, deep, - flatOptions = jQuery.ajaxSettings.flatOptions || {}; - - for ( key in src ) { - if ( src[ key ] !== undefined ) { - ( flatOptions[ key ] ? target : ( deep || ( deep = {} ) ) )[ key ] = src[ key ]; - } - } - if ( deep ) { - jQuery.extend( true, target, deep ); - } - - return target; -} - -/* Handles responses to an ajax request: - * - finds the right dataType (mediates between content-type and expected dataType) - * - returns the corresponding response - */ -function ajaxHandleResponses( s, jqXHR, responses ) { - - var ct, type, finalDataType, firstDataType, - contents = s.contents, - dataTypes = s.dataTypes; - - // Remove auto dataType and get content-type in the process - while ( dataTypes[ 0 ] === "*" ) { - dataTypes.shift(); - if ( ct === undefined ) { - ct = s.mimeType || jqXHR.getResponseHeader( "Content-Type" ); - } - } - - // Check if we're dealing with a known content-type - if ( ct ) { - for ( type in contents ) { - if ( contents[ type ] && contents[ type ].test( ct ) ) { - dataTypes.unshift( type ); - break; - } - } - } - - // Check to see if we have a response for the expected dataType - if ( dataTypes[ 0 ] in responses ) { - finalDataType = dataTypes[ 0 ]; - } else { - - // Try convertible dataTypes - for ( type in responses ) { - if ( !dataTypes[ 0 ] || s.converters[ type + " " + dataTypes[ 0 ] ] ) { - finalDataType = type; - break; - } - if ( !firstDataType ) { - firstDataType = type; - } - } - - // Or just use first one - finalDataType = finalDataType || firstDataType; - } - - // If we found a dataType - // We add the dataType to the list if needed - // and return the corresponding response - if ( finalDataType ) { - if ( finalDataType !== dataTypes[ 0 ] ) { - dataTypes.unshift( finalDataType ); - } - return responses[ finalDataType ]; - } -} - -/* Chain conversions given the request and the original response - * Also sets the responseXXX fields on the jqXHR instance - */ -function ajaxConvert( s, response, jqXHR, isSuccess ) { - var conv2, current, conv, tmp, prev, - converters = {}, - - // Work with a copy of dataTypes in case we need to modify it for conversion - dataTypes = s.dataTypes.slice(); - - // Create converters map with lowercased keys - if ( dataTypes[ 1 ] ) { - for ( conv in s.converters ) { - converters[ conv.toLowerCase() ] = s.converters[ conv ]; - } - } - - current = dataTypes.shift(); - - // Convert to each sequential dataType - while ( current ) { - - if ( s.responseFields[ current ] ) { - jqXHR[ s.responseFields[ current ] ] = response; - } - - // Apply the dataFilter if provided - if ( !prev && isSuccess && s.dataFilter ) { - response = s.dataFilter( response, s.dataType ); - } - - prev = current; - current = dataTypes.shift(); - - if ( current ) { - - // There's only work to do if current dataType is non-auto - if ( current === "*" ) { - - current = prev; - - // Convert response if prev dataType is non-auto and differs from current - } else if ( prev !== "*" && prev !== current ) { - - // Seek a direct converter - conv = converters[ prev + " " + current ] || converters[ "* " + current ]; - - // If none found, seek a pair - if ( !conv ) { - for ( conv2 in converters ) { - - // If conv2 outputs current - tmp = conv2.split( " " ); - if ( tmp[ 1 ] === current ) { - - // If prev can be converted to accepted input - conv = converters[ prev + " " + tmp[ 0 ] ] || - converters[ "* " + tmp[ 0 ] ]; - if ( conv ) { - - // Condense equivalence converters - if ( conv === true ) { - conv = converters[ conv2 ]; - - // Otherwise, insert the intermediate dataType - } else if ( converters[ conv2 ] !== true ) { - current = tmp[ 0 ]; - dataTypes.unshift( tmp[ 1 ] ); - } - break; - } - } - } - } - - // Apply converter (if not an equivalence) - if ( conv !== true ) { - - // Unless errors are allowed to bubble, catch and return them - if ( conv && s.throws ) { - response = conv( response ); - } else { - try { - response = conv( response ); - } catch ( e ) { - return { - state: "parsererror", - error: conv ? e : "No conversion from " + prev + " to " + current - }; - } - } - } - } - } - } - - return { state: "success", data: response }; -} - -jQuery.extend( { - - // Counter for holding the number of active queries - active: 0, - - // Last-Modified header cache for next request - lastModified: {}, - etag: {}, - - ajaxSettings: { - url: location.href, - type: "GET", - isLocal: rlocalProtocol.test( location.protocol ), - global: true, - processData: true, - async: true, - contentType: "application/x-www-form-urlencoded; charset=UTF-8", - - /* - timeout: 0, - data: null, - dataType: null, - username: null, - password: null, - cache: null, - throws: false, - traditional: false, - headers: {}, - */ - - accepts: { - "*": allTypes, - text: "text/plain", - html: "text/html", - xml: "application/xml, text/xml", - json: "application/json, text/javascript" - }, - - contents: { - xml: /\bxml\b/, - html: /\bhtml/, - json: /\bjson\b/ - }, - - responseFields: { - xml: "responseXML", - text: "responseText", - json: "responseJSON" - }, - - // Data converters - // Keys separate source (or catchall "*") and destination types with a single space - converters: { - - // Convert anything to text - "* text": String, - - // Text to html (true = no transformation) - "text html": true, - - // Evaluate text as a json expression - "text json": JSON.parse, - - // Parse text as xml - "text xml": jQuery.parseXML - }, - - // For options that shouldn't be deep extended: - // you can add your own custom options here if - // and when you create one that shouldn't be - // deep extended (see ajaxExtend) - flatOptions: { - url: true, - context: true - } - }, - - // Creates a full fledged settings object into target - // with both ajaxSettings and settings fields. - // If target is omitted, writes into ajaxSettings. - ajaxSetup: function( target, settings ) { - return settings ? - - // Building a settings object - ajaxExtend( ajaxExtend( target, jQuery.ajaxSettings ), settings ) : - - // Extending ajaxSettings - ajaxExtend( jQuery.ajaxSettings, target ); - }, - - ajaxPrefilter: addToPrefiltersOrTransports( prefilters ), - ajaxTransport: addToPrefiltersOrTransports( transports ), - - // Main method - ajax: function( url, options ) { - - // If url is an object, simulate pre-1.5 signature - if ( typeof url === "object" ) { - options = url; - url = undefined; - } - - // Force options to be an object - options = options || {}; - - var transport, - - // URL without anti-cache param - cacheURL, - - // Response headers - responseHeadersString, - responseHeaders, - - // timeout handle - timeoutTimer, - - // Url cleanup var - urlAnchor, - - // Request state (becomes false upon send and true upon completion) - completed, - - // To know if global events are to be dispatched - fireGlobals, - - // Loop variable - i, - - // uncached part of the url - uncached, - - // Create the final options object - s = jQuery.ajaxSetup( {}, options ), - - // Callbacks context - callbackContext = s.context || s, - - // Context for global events is callbackContext if it is a DOM node or jQuery collection - globalEventContext = s.context && - ( callbackContext.nodeType || callbackContext.jquery ) ? - jQuery( callbackContext ) : - jQuery.event, - - // Deferreds - deferred = jQuery.Deferred(), - completeDeferred = jQuery.Callbacks( "once memory" ), - - // Status-dependent callbacks - statusCode = s.statusCode || {}, - - // Headers (they are sent all at once) - requestHeaders = {}, - requestHeadersNames = {}, - - // Default abort message - strAbort = "canceled", - - // Fake xhr - jqXHR = { - readyState: 0, - - // Builds headers hashtable if needed - getResponseHeader: function( key ) { - var match; - if ( completed ) { - if ( !responseHeaders ) { - responseHeaders = {}; - while ( ( match = rheaders.exec( responseHeadersString ) ) ) { - responseHeaders[ match[ 1 ].toLowerCase() + " " ] = - ( responseHeaders[ match[ 1 ].toLowerCase() + " " ] || [] ) - .concat( match[ 2 ] ); - } - } - match = responseHeaders[ key.toLowerCase() + " " ]; - } - return match == null ? null : match.join( ", " ); - }, - - // Raw string - getAllResponseHeaders: function() { - return completed ? responseHeadersString : null; - }, - - // Caches the header - setRequestHeader: function( name, value ) { - if ( completed == null ) { - name = requestHeadersNames[ name.toLowerCase() ] = - requestHeadersNames[ name.toLowerCase() ] || name; - requestHeaders[ name ] = value; - } - return this; - }, - - // Overrides response content-type header - overrideMimeType: function( type ) { - if ( completed == null ) { - s.mimeType = type; - } - return this; - }, - - // Status-dependent callbacks - statusCode: function( map ) { - var code; - if ( map ) { - if ( completed ) { - - // Execute the appropriate callbacks - jqXHR.always( map[ jqXHR.status ] ); - } else { - - // Lazy-add the new callbacks in a way that preserves old ones - for ( code in map ) { - statusCode[ code ] = [ statusCode[ code ], map[ code ] ]; - } - } - } - return this; - }, - - // Cancel the request - abort: function( statusText ) { - var finalText = statusText || strAbort; - if ( transport ) { - transport.abort( finalText ); - } - done( 0, finalText ); - return this; - } - }; - - // Attach deferreds - deferred.promise( jqXHR ); - - // Add protocol if not provided (prefilters might expect it) - // Handle falsy url in the settings object (#10093: consistency with old signature) - // We also use the url parameter if available - s.url = ( ( url || s.url || location.href ) + "" ) - .replace( rprotocol, location.protocol + "//" ); - - // Alias method option to type as per ticket #12004 - s.type = options.method || options.type || s.method || s.type; - - // Extract dataTypes list - s.dataTypes = ( s.dataType || "*" ).toLowerCase().match( rnothtmlwhite ) || [ "" ]; - - // A cross-domain request is in order when the origin doesn't match the current origin. - if ( s.crossDomain == null ) { - urlAnchor = document.createElement( "a" ); - - // Support: IE <=8 - 11, Edge 12 - 15 - // IE throws exception on accessing the href property if url is malformed, - // e.g. http://example.com:80x/ - try { - urlAnchor.href = s.url; - - // Support: IE <=8 - 11 only - // Anchor's host property isn't correctly set when s.url is relative - urlAnchor.href = urlAnchor.href; - s.crossDomain = originAnchor.protocol + "//" + originAnchor.host !== - urlAnchor.protocol + "//" + urlAnchor.host; - } catch ( e ) { - - // If there is an error parsing the URL, assume it is crossDomain, - // it can be rejected by the transport if it is invalid - s.crossDomain = true; - } - } - - // Convert data if not already a string - if ( s.data && s.processData && typeof s.data !== "string" ) { - s.data = jQuery.param( s.data, s.traditional ); - } - - // Apply prefilters - inspectPrefiltersOrTransports( prefilters, s, options, jqXHR ); - - // If request was aborted inside a prefilter, stop there - if ( completed ) { - return jqXHR; - } - - // We can fire global events as of now if asked to - // Don't fire events if jQuery.event is undefined in an AMD-usage scenario (#15118) - fireGlobals = jQuery.event && s.global; - - // Watch for a new set of requests - if ( fireGlobals && jQuery.active++ === 0 ) { - jQuery.event.trigger( "ajaxStart" ); - } - - // Uppercase the type - s.type = s.type.toUpperCase(); - - // Determine if request has content - s.hasContent = !rnoContent.test( s.type ); - - // Save the URL in case we're toying with the If-Modified-Since - // and/or If-None-Match header later on - // Remove hash to simplify url manipulation - cacheURL = s.url.replace( rhash, "" ); - - // More options handling for requests with no content - if ( !s.hasContent ) { - - // Remember the hash so we can put it back - uncached = s.url.slice( cacheURL.length ); - - // If data is available and should be processed, append data to url - if ( s.data && ( s.processData || typeof s.data === "string" ) ) { - cacheURL += ( rquery.test( cacheURL ) ? "&" : "?" ) + s.data; - - // #9682: remove data so that it's not used in an eventual retry - delete s.data; - } - - // Add or update anti-cache param if needed - if ( s.cache === false ) { - cacheURL = cacheURL.replace( rantiCache, "$1" ); - uncached = ( rquery.test( cacheURL ) ? "&" : "?" ) + "_=" + ( nonce.guid++ ) + - uncached; - } - - // Put hash and anti-cache on the URL that will be requested (gh-1732) - s.url = cacheURL + uncached; - - // Change '%20' to '+' if this is encoded form body content (gh-2658) - } else if ( s.data && s.processData && - ( s.contentType || "" ).indexOf( "application/x-www-form-urlencoded" ) === 0 ) { - s.data = s.data.replace( r20, "+" ); - } - - // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. - if ( s.ifModified ) { - if ( jQuery.lastModified[ cacheURL ] ) { - jqXHR.setRequestHeader( "If-Modified-Since", jQuery.lastModified[ cacheURL ] ); - } - if ( jQuery.etag[ cacheURL ] ) { - jqXHR.setRequestHeader( "If-None-Match", jQuery.etag[ cacheURL ] ); - } - } - - // Set the correct header, if data is being sent - if ( s.data && s.hasContent && s.contentType !== false || options.contentType ) { - jqXHR.setRequestHeader( "Content-Type", s.contentType ); - } - - // Set the Accepts header for the server, depending on the dataType - jqXHR.setRequestHeader( - "Accept", - s.dataTypes[ 0 ] && s.accepts[ s.dataTypes[ 0 ] ] ? - s.accepts[ s.dataTypes[ 0 ] ] + - ( s.dataTypes[ 0 ] !== "*" ? ", " + allTypes + "; q=0.01" : "" ) : - s.accepts[ "*" ] - ); - - // Check for headers option - for ( i in s.headers ) { - jqXHR.setRequestHeader( i, s.headers[ i ] ); - } - - // Allow custom headers/mimetypes and early abort - if ( s.beforeSend && - ( s.beforeSend.call( callbackContext, jqXHR, s ) === false || completed ) ) { - - // Abort if not done already and return - return jqXHR.abort(); - } - - // Aborting is no longer a cancellation - strAbort = "abort"; - - // Install callbacks on deferreds - completeDeferred.add( s.complete ); - jqXHR.done( s.success ); - jqXHR.fail( s.error ); - - // Get transport - transport = inspectPrefiltersOrTransports( transports, s, options, jqXHR ); - - // If no transport, we auto-abort - if ( !transport ) { - done( -1, "No Transport" ); - } else { - jqXHR.readyState = 1; - - // Send global event - if ( fireGlobals ) { - globalEventContext.trigger( "ajaxSend", [ jqXHR, s ] ); - } - - // If request was aborted inside ajaxSend, stop there - if ( completed ) { - return jqXHR; - } - - // Timeout - if ( s.async && s.timeout > 0 ) { - timeoutTimer = window.setTimeout( function() { - jqXHR.abort( "timeout" ); - }, s.timeout ); - } - - try { - completed = false; - transport.send( requestHeaders, done ); - } catch ( e ) { - - // Rethrow post-completion exceptions - if ( completed ) { - throw e; - } - - // Propagate others as results - done( -1, e ); - } - } - - // Callback for when everything is done - function done( status, nativeStatusText, responses, headers ) { - var isSuccess, success, error, response, modified, - statusText = nativeStatusText; - - // Ignore repeat invocations - if ( completed ) { - return; - } - - completed = true; - - // Clear timeout if it exists - if ( timeoutTimer ) { - window.clearTimeout( timeoutTimer ); - } - - // Dereference transport for early garbage collection - // (no matter how long the jqXHR object will be used) - transport = undefined; - - // Cache response headers - responseHeadersString = headers || ""; - - // Set readyState - jqXHR.readyState = status > 0 ? 4 : 0; - - // Determine if successful - isSuccess = status >= 200 && status < 300 || status === 304; - - // Get response data - if ( responses ) { - response = ajaxHandleResponses( s, jqXHR, responses ); - } - - // Use a noop converter for missing script but not if jsonp - if ( !isSuccess && - jQuery.inArray( "script", s.dataTypes ) > -1 && - jQuery.inArray( "json", s.dataTypes ) < 0 ) { - s.converters[ "text script" ] = function() {}; - } - - // Convert no matter what (that way responseXXX fields are always set) - response = ajaxConvert( s, response, jqXHR, isSuccess ); - - // If successful, handle type chaining - if ( isSuccess ) { - - // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. - if ( s.ifModified ) { - modified = jqXHR.getResponseHeader( "Last-Modified" ); - if ( modified ) { - jQuery.lastModified[ cacheURL ] = modified; - } - modified = jqXHR.getResponseHeader( "etag" ); - if ( modified ) { - jQuery.etag[ cacheURL ] = modified; - } - } - - // if no content - if ( status === 204 || s.type === "HEAD" ) { - statusText = "nocontent"; - - // if not modified - } else if ( status === 304 ) { - statusText = "notmodified"; - - // If we have data, let's convert it - } else { - statusText = response.state; - success = response.data; - error = response.error; - isSuccess = !error; - } - } else { - - // Extract error from statusText and normalize for non-aborts - error = statusText; - if ( status || !statusText ) { - statusText = "error"; - if ( status < 0 ) { - status = 0; - } - } - } - - // Set data for the fake xhr object - jqXHR.status = status; - jqXHR.statusText = ( nativeStatusText || statusText ) + ""; - - // Success/Error - if ( isSuccess ) { - deferred.resolveWith( callbackContext, [ success, statusText, jqXHR ] ); - } else { - deferred.rejectWith( callbackContext, [ jqXHR, statusText, error ] ); - } - - // Status-dependent callbacks - jqXHR.statusCode( statusCode ); - statusCode = undefined; - - if ( fireGlobals ) { - globalEventContext.trigger( isSuccess ? "ajaxSuccess" : "ajaxError", - [ jqXHR, s, isSuccess ? success : error ] ); - } - - // Complete - completeDeferred.fireWith( callbackContext, [ jqXHR, statusText ] ); - - if ( fireGlobals ) { - globalEventContext.trigger( "ajaxComplete", [ jqXHR, s ] ); - - // Handle the global AJAX counter - if ( !( --jQuery.active ) ) { - jQuery.event.trigger( "ajaxStop" ); - } - } - } - - return jqXHR; - }, - - getJSON: function( url, data, callback ) { - return jQuery.get( url, data, callback, "json" ); - }, - - getScript: function( url, callback ) { - return jQuery.get( url, undefined, callback, "script" ); - } -} ); - -jQuery.each( [ "get", "post" ], function( _i, method ) { - jQuery[ method ] = function( url, data, callback, type ) { - - // Shift arguments if data argument was omitted - if ( isFunction( data ) ) { - type = type || callback; - callback = data; - data = undefined; - } - - // The url can be an options object (which then must have .url) - return jQuery.ajax( jQuery.extend( { - url: url, - type: method, - dataType: type, - data: data, - success: callback - }, jQuery.isPlainObject( url ) && url ) ); - }; -} ); - -jQuery.ajaxPrefilter( function( s ) { - var i; - for ( i in s.headers ) { - if ( i.toLowerCase() === "content-type" ) { - s.contentType = s.headers[ i ] || ""; - } - } -} ); - - -jQuery._evalUrl = function( url, options, doc ) { - return jQuery.ajax( { - url: url, - - // Make this explicit, since user can override this through ajaxSetup (#11264) - type: "GET", - dataType: "script", - cache: true, - async: false, - global: false, - - // Only evaluate the response if it is successful (gh-4126) - // dataFilter is not invoked for failure responses, so using it instead - // of the default converter is kludgy but it works. - converters: { - "text script": function() {} - }, - dataFilter: function( response ) { - jQuery.globalEval( response, options, doc ); - } - } ); -}; - - -jQuery.fn.extend( { - wrapAll: function( html ) { - var wrap; - - if ( this[ 0 ] ) { - if ( isFunction( html ) ) { - html = html.call( this[ 0 ] ); - } - - // The elements to wrap the target around - wrap = jQuery( html, this[ 0 ].ownerDocument ).eq( 0 ).clone( true ); - - if ( this[ 0 ].parentNode ) { - wrap.insertBefore( this[ 0 ] ); - } - - wrap.map( function() { - var elem = this; - - while ( elem.firstElementChild ) { - elem = elem.firstElementChild; - } - - return elem; - } ).append( this ); - } - - return this; - }, - - wrapInner: function( html ) { - if ( isFunction( html ) ) { - return this.each( function( i ) { - jQuery( this ).wrapInner( html.call( this, i ) ); - } ); - } - - return this.each( function() { - var self = jQuery( this ), - contents = self.contents(); - - if ( contents.length ) { - contents.wrapAll( html ); - - } else { - self.append( html ); - } - } ); - }, - - wrap: function( html ) { - var htmlIsFunction = isFunction( html ); - - return this.each( function( i ) { - jQuery( this ).wrapAll( htmlIsFunction ? html.call( this, i ) : html ); - } ); - }, - - unwrap: function( selector ) { - this.parent( selector ).not( "body" ).each( function() { - jQuery( this ).replaceWith( this.childNodes ); - } ); - return this; - } -} ); - - -jQuery.expr.pseudos.hidden = function( elem ) { - return !jQuery.expr.pseudos.visible( elem ); -}; -jQuery.expr.pseudos.visible = function( elem ) { - return !!( elem.offsetWidth || elem.offsetHeight || elem.getClientRects().length ); -}; - - - - -jQuery.ajaxSettings.xhr = function() { - try { - return new window.XMLHttpRequest(); - } catch ( e ) {} -}; - -var xhrSuccessStatus = { - - // File protocol always yields status code 0, assume 200 - 0: 200, - - // Support: IE <=9 only - // #1450: sometimes IE returns 1223 when it should be 204 - 1223: 204 - }, - xhrSupported = jQuery.ajaxSettings.xhr(); - -support.cors = !!xhrSupported && ( "withCredentials" in xhrSupported ); -support.ajax = xhrSupported = !!xhrSupported; - -jQuery.ajaxTransport( function( options ) { - var callback, errorCallback; - - // Cross domain only allowed if supported through XMLHttpRequest - if ( support.cors || xhrSupported && !options.crossDomain ) { - return { - send: function( headers, complete ) { - var i, - xhr = options.xhr(); - - xhr.open( - options.type, - options.url, - options.async, - options.username, - options.password - ); - - // Apply custom fields if provided - if ( options.xhrFields ) { - for ( i in options.xhrFields ) { - xhr[ i ] = options.xhrFields[ i ]; - } - } - - // Override mime type if needed - if ( options.mimeType && xhr.overrideMimeType ) { - xhr.overrideMimeType( options.mimeType ); - } - - // X-Requested-With header - // For cross-domain requests, seeing as conditions for a preflight are - // akin to a jigsaw puzzle, we simply never set it to be sure. - // (it can always be set on a per-request basis or even using ajaxSetup) - // For same-domain requests, won't change header if already provided. - if ( !options.crossDomain && !headers[ "X-Requested-With" ] ) { - headers[ "X-Requested-With" ] = "XMLHttpRequest"; - } - - // Set headers - for ( i in headers ) { - xhr.setRequestHeader( i, headers[ i ] ); - } - - // Callback - callback = function( type ) { - return function() { - if ( callback ) { - callback = errorCallback = xhr.onload = - xhr.onerror = xhr.onabort = xhr.ontimeout = - xhr.onreadystatechange = null; - - if ( type === "abort" ) { - xhr.abort(); - } else if ( type === "error" ) { - - // Support: IE <=9 only - // On a manual native abort, IE9 throws - // errors on any property access that is not readyState - if ( typeof xhr.status !== "number" ) { - complete( 0, "error" ); - } else { - complete( - - // File: protocol always yields status 0; see #8605, #14207 - xhr.status, - xhr.statusText - ); - } - } else { - complete( - xhrSuccessStatus[ xhr.status ] || xhr.status, - xhr.statusText, - - // Support: IE <=9 only - // IE9 has no XHR2 but throws on binary (trac-11426) - // For XHR2 non-text, let the caller handle it (gh-2498) - ( xhr.responseType || "text" ) !== "text" || - typeof xhr.responseText !== "string" ? - { binary: xhr.response } : - { text: xhr.responseText }, - xhr.getAllResponseHeaders() - ); - } - } - }; - }; - - // Listen to events - xhr.onload = callback(); - errorCallback = xhr.onerror = xhr.ontimeout = callback( "error" ); - - // Support: IE 9 only - // Use onreadystatechange to replace onabort - // to handle uncaught aborts - if ( xhr.onabort !== undefined ) { - xhr.onabort = errorCallback; - } else { - xhr.onreadystatechange = function() { - - // Check readyState before timeout as it changes - if ( xhr.readyState === 4 ) { - - // Allow onerror to be called first, - // but that will not handle a native abort - // Also, save errorCallback to a variable - // as xhr.onerror cannot be accessed - window.setTimeout( function() { - if ( callback ) { - errorCallback(); - } - } ); - } - }; - } - - // Create the abort callback - callback = callback( "abort" ); - - try { - - // Do send the request (this may raise an exception) - xhr.send( options.hasContent && options.data || null ); - } catch ( e ) { - - // #14683: Only rethrow if this hasn't been notified as an error yet - if ( callback ) { - throw e; - } - } - }, - - abort: function() { - if ( callback ) { - callback(); - } - } - }; - } -} ); - - - - -// Prevent auto-execution of scripts when no explicit dataType was provided (See gh-2432) -jQuery.ajaxPrefilter( function( s ) { - if ( s.crossDomain ) { - s.contents.script = false; - } -} ); - -// Install script dataType -jQuery.ajaxSetup( { - accepts: { - script: "text/javascript, application/javascript, " + - "application/ecmascript, application/x-ecmascript" - }, - contents: { - script: /\b(?:java|ecma)script\b/ - }, - converters: { - "text script": function( text ) { - jQuery.globalEval( text ); - return text; - } - } -} ); - -// Handle cache's special case and crossDomain -jQuery.ajaxPrefilter( "script", function( s ) { - if ( s.cache === undefined ) { - s.cache = false; - } - if ( s.crossDomain ) { - s.type = "GET"; - } -} ); - -// Bind script tag hack transport -jQuery.ajaxTransport( "script", function( s ) { - - // This transport only deals with cross domain or forced-by-attrs requests - if ( s.crossDomain || s.scriptAttrs ) { - var script, callback; - return { - send: function( _, complete ) { - script = jQuery( " - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Additional Resources

- -
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/api/hugectr_layer_book.html b/review/pr-458/api/hugectr_layer_book.html deleted file mode 100644 index f7eb0aa840..0000000000 --- a/review/pr-458/api/hugectr_layer_book.html +++ /dev/null @@ -1,1392 +0,0 @@ - - - - - - - HugeCTR Layer Classes and Methods — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

HugeCTR Layer Classes and Methods

- -

This document introduces different layer classes and corresponding methods in the Python API of HugeCTR. The description of each method includes its functionality, arguments, and examples of usage.

-
-

Input Layer

-
hugectr.Input()
-
-
-

Input layer specifies the parameters related to the data input. Input layer should be added to the Model instance first so that the following SparseEmbedding and DenseLayer instances can access the inputs with their specified names.

-

Arguments

-
    -
  • label_dim: Integer, the label dimension. 1 implies it is a binary label. For example, if an item is clicked or not. There is NO default value and it should be specified by users.

  • -
  • label_name: String, the name of the label tensor to be referenced by following layers. There is NO default value and it should be specified by users.

  • -
  • dense_dim: Integer, the number of dense (or continuous) features. If there is no dense feature, set it to 0. There is NO default value and it should be specified by users.

  • -
  • dense_name: Integer, the name of the dense input tensor to be referenced by following layers. There is NO default value and it should be specified by users.

  • -
  • data_reader_sparse_param_array: List[hugectr.DataReaderSparseParam], the list of the sparse parameters for categorical inputs. Each DataReaderSparseParam instance should be constructed with sparse_name, nnz_per_slot, is_fixed_length and slot_num.

    -
      -
    • sparse_name is the name of the sparse input tensors to be referenced by following layers. There is NO default value and it should be specified by users.

    • -
    • nnz_per_slot is the maximum hotness for input sparse features and is used by data reader. The nnz_per_slot can be an int which will apply on every slot. It could be convenient if all slots have the same hotness. Or one can use List[int] to initialize nnz_per_slot when hotness of slots differs, in which case the length of the array nnz_per_slot should be identical to slot_num. Note that for RawAsync data reader, only static hotness is support. This parameter has no impact on Parquet and Raw data reader.

    • -
    • is_fixed_length is used to identify whether categorical inputs has the same length for each slot among all samples. If different samples have the same number of features for each slot, then user can set is_fixed_length = True and HugeCTR can use this information to reduce data transferring time.

    • -
    • slot_num specifies the number of slots used for this sparse input in the dataset.

    • -
    -
  • -
-

Example:

-
model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 13, dense_name = "dense",
-                        data_reader_sparse_param_array =
-                            [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
-
-
-
model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 13, dense_name = "dense",
-                        data_reader_sparse_param_array =
-                            [hugectr.DataReaderSparseParam("wide_data", 2, True, 2),
-                            hugectr.DataReaderSparseParam("deep_data", 2, True, 26)]))
-
-
-
-
-

Sparse Embedding

-

SparseEmbedding class

-
hugectr.SparseEmbedding()
-
-
-

SparseEmbedding specifies the parameters related to the sparse embedding layer. One or several SparseEmbedding layers should be added to the Model instance after Input and before DenseLayer.

-

Arguments

-
    -
  • embedding_type: The embedding type. -Specify one of the following values:

    -
      -
    • hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash

    • -
    • hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash

    • -
    • hugectr.Embedding_t.LocalizedSlotSparseEmbeddingOneHot

    • -
    -

    For information about the different embedding types, see Embedding Types Detail. -This argument does not have a default value. -You must specify a value.

    -
  • -
  • workspace_size_per_gpu_in_mb: Integer, the workspace memory size in megabyte per GPU. -This workspace memory must be big enough to hold all the embedding vocabulary and its corresponding optimizer state that is used during the training and evaluation. -To understand how to set this value, see How to set workspace_size_per_gpu_in_mb and slot_size_array. -This argument does not have a default value. -You must specify a value.

  • -
  • embedding_vec_size: Integer, the embedding vector size. -This argument does not have a default value. -You must specify a value.

  • -
  • combiner: String, the intra-slot reduction operation. -Specify sum or mean. -This argument does not have a default value. -You must specify a value.

  • -
  • sparse_embedding_name: String, the name of the sparse embedding tensor. -This name is referenced by the following layers. -This argument does not have a default value. -You must specify a value.

  • -
  • bottom_name: String, the number of the bottom tensor to consume with this sparse embedding layer. -Please note that the value should be a predefined sparse input name. -This argument does not have a default value. -You must specify a value.

  • -
  • slot_size_array: List[int], specify the maximum key value from each slot. -It should be consistent with that of the sparse input. -This parameter is used in LocalizedSlotSparseEmbeddingHash and LocalizedSlotSparseEmbeddingOneHot. -The value you specify can help avoid wasting memory that is caused by an imbalanced vocabulary size. -For more information, see How to set workspace_size_per_gpu_in_mb and slot_size_array. -This argument does not have a default value. -You must specify a value.

  • -
  • optimizer: OptParamsPy, the optimizer that is dedicated to this sparse embedding layer. -If you do not specify the optimizer for the sparse embedding, the sparse embedding layer adopts the same optimizer as dense layers.

  • -
-
-
-

Embedding Types Detail

-
-

DistributedSlotSparseEmbeddingHash Layer

-

The DistributedSlotSparseEmbeddingHash stores embeddings in an embedding table and gets them by using a set of integers or indices. The embedding table can be segmented into multiple slots or feature fields, which spans multiple GPUs and nodes. With DistributedSlotSparseEmbeddingHash, each GPU will have a portion of a slot. This type of embedding is useful when there’s an existing load imbalance among slots and OOM issues.

-

Important Notes:

-
    -
  • In a single embedding layer, it is assumed that input integers represent unique feature IDs, which are mapped to unique embedding vectors. -All the embedding vectors in a single embedding layer must have the same size. If you want some input categorical features to have different embedding vector sizes, use multiple embedding layers.

  • -
  • The input indices’ data type, input_key_type, is specified in the solver. By default, the 32-bit integer (I32) is used, but the 64-bit integer type (I64) is also allowed even if it is constrained by the dataset type. For additional information, see Solver.

  • -
  • The DistributedSlotSparseEmbeddingHash Layer performs overflow checking in every iteration by default to verify if -the number of inserted keys is beyond the size set by workspace_size_per_gpu_in_mb. However, this can negatively -impact performance when the table is large. If user are confident that there will be no overflow, you can disable -overflow checking by setting the environment variable HUGECTR_DISABLE_OVERFLOW_CHECK=1.

  • -
-

Example:

-
model.add(hugectr.SparseEmbedding(
-            embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-            workspace_size_per_gpu_in_mb = 23,
-            embedding_vec_size = 1,
-            combiner = 'sum',
-            sparse_embedding_name = "sparse_embedding1",
-            bottom_name = "input_data",
-            optimizer = optimizer))
-
-
-
-
-

LocalizedSlotSparseEmbeddingHash Layer

-

The LocalizedSlotSparseEmbeddingHash layer to store embeddings in an embedding table and get them by using a set of integers or indices. The embedding table can be segmented into multiple slots or feature fields, which spans multiple GPUs and nodes. Unlike the DistributedSlotSparseEmbeddingHash layer, with this type of embedding layer, each individual slot is located in each GPU and not shared. This type of embedding layer provides the best scalability.

-

Important Notes:

-
    -
  • In a single embedding layer, it is assumed that input integers represent unique feature IDs, which are mapped to unique embedding vectors. -All the embedding vectors in a single embedding layer must have the same size. If you want some input categorical features to have different embedding vector sizes, use multiple embedding layers.

  • -
  • The input indices’ data type, input_key_type, is specified in the solver. By default, the 32-bit integer (I32) is used, but the 64-bit integer type (I64) is also allowed even if it is constrained by the dataset type. For additional information, see Solver.

  • -
  • The LocalizedSlotSparseEmbeddingHash Layer performs overflow checking in every iteration by default to verify if the -number of inserted keys is beyond the size set by workspace_size_per_gpu_in_mb or slot_size_array. However, this -can negatively impact performance when the table is large. If user are confident that there will be no overflow, you -can disable overflow checking by setting the environment variable HUGECTR_DISABLE_OVERFLOW_CHECK=1.

  • -
-

Example:

-
model.add(hugectr.SparseEmbedding(
-            embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
-            workspace_size_per_gpu_in_mb = 23,
-            embedding_vec_size = 1,
-            combiner = 'sum',
-            sparse_embedding_name = "sparse_embedding1",
-            bottom_name = "input_data",
-            optimizer = optimizer))
-
-
-
-
-

LocalizedSlotSparseEmbeddingOneHot Layer

-

The LocalizedSlotSparseEmbeddingOneHot layer stores embeddings in an embedding table and gets them by using a set of integers or indices. The embedding table can be segmented into multiple slots or feature fields, which spans multiple GPUs and nodes. This is a performance-optimized version of LocalizedSlotSparseEmbeddingHash for the case where NVSwitch is available and inputs are one-hot categorical features.

-

Note: LocalizedSlotSparseEmbeddingOneHot can only be used together with the Raw dataset format. Unlike other types of embeddings, LocalizedSlotSparseEmbeddingOneHot only supports single-node training and can be used only in a NVSwitch equipped system such as DGX-2 and DGX A100. -The input indices’ data type, input_key_type, is specified in the solver. By default, the 32-bit integer (I32) is used, but the 64-bit integer type (I64) is also allowed even if it is constrained by the dataset type. For additional information, see Solver.

-

Example:

-
model.add(hugectr.SparseEmbedding(
-            embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingOneHot,
-            slot_size_array = [1221, 754, 8, 4, 12, 49, 2]
-            embedding_vec_size = 128,
-            combiner = 'sum',
-            sparse_embedding_name = "sparse_embedding1",
-            bottom_name = "input_data",
-            optimizer = optimizer))
-
-
-
-
-
-

Dense Layers

-

DenseLayer class

-
hugectr.DenseLayer()
-
-
-

DenseLayer specifies the parameters related to the dense layer or the loss function. HugeCTR currently supports multiple dense layers and loss functions. Please NOTE that the final sigmoid function is fused with the loss function to better utilize memory bandwidth.

-

Arguments

-
    -
  • layer_type: The layer type to be used. The supported types include hugectr.Layer_t.Add, hugectr.Layer_t.BatchNorm, hugectr.Layer_t.Cast, hugectr.Layer_t.Concat, hugectr.Layer_t.Dropout, hugectr.Layer_t.ELU, hugectr.Layer_t.FmOrder2, hugectr.Layer_t.InnerProduct, hugectr.Layer_t.MLP, hugectr.Layer_t.Interaction, hugectr.Layer_t.MultiCross, hugectr.Layer_t.ReLU, hugectr.Layer_t.ReduceSum, hugectr.Layer_t.Reshape, hugectr.Layer_t.Select, hugectr.Layer_t.Sigmoid, hugectr.Layer_t.Slice, hugectr.Layer_t.WeightMultiply, hugectr.Layer_t.ElementwiseMultiply, hugectr.Layer_t.GRU, hugectr.Layer_t.Scale, hugectr.Layer_t.FusedReshapeConcat, hugectr.Layer_t.FusedReshapeConcatGeneral, hugectr.Layer_t.Softmax, hugectr.Layer_t.PReLU_Dice, hugectr.Layer_t.ReduceMean, hugectr.Layer_t.Sub, hugectr.Layer_t.Gather, hugectr.Layer_t.BinaryCrossEntropyLoss, hugectr.Layer_t.CrossEntropyLoss and hugectr.Layer_t.MultiCrossEntropyLoss. There is NO default value and it should be specified by users.

  • -
  • bottom_names: List[str], the list of bottom tensor names to be consumed by this dense layer. Each name in the list should be the predefined tensor name. There is NO default value and it should be specified by users.

  • -
  • top_names: List[str], the list of top tensor names, which specify the output tensors of this dense layer. There is NO default value and it should be specified by users.

  • -
  • For details about the usage of each layer type and its parameters, please refer to Dense Layers Usage.

  • -
-
-
-

Dense Layers Usage

-
-

FullyConnected Layer

-

The FullyConnected layer is a densely connected layer (or MLP layer). It is usually made of a InnerProduct layer and a ReLU.

-

Parameters:

-
    -
  • num_output: Integer, the number of output elements for the InnerProduct layer. The default value is 1.

  • -
  • weight_init_type: Specifies how to initialize the weight array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • bias_init_type: Specifies how to initialize the bias array for the InnerProduct or MultiCross layer. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: (batch_size, num_output)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu1"],
-                            top_names = ["fc2"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc2"],
-                            top_names = ["relu2"]))
-
-
-
-
-

MLP Layer

-

The MLP layer is comprised of multiple fused fully-connected layers. The MLP layer supports FP16, FP32, and TF32.

-

Arguments

-
    -
  • num_outputs: List[Integer], specifies the number of output elements for each fused fully-connected layer in the MLP. There is NO default value and it should be specified by users.

  • -
  • act_type: The activation type of the MLP layer. This argument is applied to all layers in the MLP. The supported types include Activation_t.Relu and Activation_t.Non. The default value is Activation_t.Relu.

  • -
  • use_bias: Boolean, whether to use bias. This argument is applied to all layers in the MLP. The default value is True.

  • -
  • activations: List[Activation_t], specifies the activation type for each layer in the MLP. This argument overrides the act_type argument.

  • -
  • biases: List[Boolean], specifies for each layer in the MLP Layer whether to use bias. This argument overrides the use_bias argument.

  • -
  • weight_init_type: Specifies how to initialize the weight array of all layers in the MLP. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • bias_init_type: Specifies how to initialize the bias array of all layers in the MLP. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • compute_config: hugectr.DenseLayerComputeConfig, specifies the computation configuration of all layers in the MLP. For MLP, the valid flags in compute_config are hugectr.DenseLayerComputeConfig.async_wgrad and hugectr.DenseLayerComputeConfig.fuse_wb.

    -
      -
    • hugectr.DenseLayerComputeConfig.async_wgrad: Specifies whether the wgrad compute is asynchronous to dgrad. The default value is False.

    • -
    • hugectr.DenseLayerComputeConfig.fuse_wb: Specifies whether to fuse wgrad with bgrad. The default value is False.

    • -
    -
  • -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: (batch_size, num_output of the last layer)

  • -
-

Example:

-

-compute_config_bottom = hugectr.DenseLayerComputeConfig(
-    async_wgrad=True,
-    fuse_wb=False,
-)
-
-compute_config_top = hugectr.DenseLayerComputeConfig(
-    async_wgrad=True,
-    fuse_wb=True,
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MLP,
-        bottom_names=["dense"],
-        top_names=["mlp1"],
-        num_outputs=[512, 256, 128],
-        act_type=hugectr.Activation_t.Relu,
-        use_bias=True,
-        compute_config=compute_config_bottom,
-    )
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Interaction,
-        bottom_names=["mlp1", "sparse_embedding1"],
-        top_names=["interaction1", "interaction_grad"],
-    )
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MLP,
-        bottom_names=["interaction1", "interaction_grad"],
-        top_names=["mlp2"],
-        num_outputs=[1024, 1024, 512, 256, 1],
-        activations=[
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Non,
-        ],
-        biases = [True, True, True, True, True],
-        compute_config=compute_config_top,
-    )
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["mlp2", "label"],
-        top_names=["loss"],
-    )
-)
-
-
-
-
-

MultiCross Layer

-

The MultiCross layer is a cross network where explicit feature crossing is applied across cross layers. -There are two versions of cross network which are invented in DCN v1 and DCN v2 respectively.

-

Suppose the dimension of features to be interacted is \(n\), the mathematical formulas of feature crossing for those two versions are:

-
-
DCN v1
-\[ - x_{l+1}=x_{0}x^{T}_{l}w_{l}+b_l+x_l - \]
-

where \( w_l, b_l \in \mathbb{R}^{n\times1}\) are learnable parameter, \(x_{l},x_0\) are input and \(x_{l+1}\) is output.

-
-
DCN v2
-\[ - x_{l+1}=x_{0}\odot (\mathbf{W}_{l} x_{l}+b_l )+x_l - \]
-

where \( \odot \) represents elementwise dot, \(\mathbf{W}_l \in \mathbb{R}^{n\times n}, b_l \in \mathbb{R}^{n\times 1 }\) are learnable parameter, \(x_{l},x_0\) are input and \(x_{l+1}\) is output.

-

To decrease the computation complexity, \(\mathbf{W}_l\) can be approximately factorized into multiplication of two lower rank matrices \(\mathbf{U} \in \mathbb{R}^{n \times k}, \mathbf{V} \in \mathbb{R}^{k \times n}\), where \(k\) is a so-called projection dimension. -Correspondingly the formula evolves and can be expressed as follows:

-
-\[ - x_{l+1}=x_{0}\odot (\mathbf{U}_{l} \mathbf{V}_{l} x_{l}+b_l )+x_l - \]
-
-
-

Parameters:

-
    -
  • num_layers: Integer, number of cross layers in the cross network. It should be set as a positive number if you want to use the cross network. The default value is 0.

  • -
  • projection_dim: Integer, the projection dimension for DCN v2. If you specify 0, the layer degrades to DCN v1. The default value is 0.

  • -
  • weight_init_type: Specifies how to initialize the weight array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • bias_init_type: Specifies how to initialize the bias array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • compute_config: hugectr.DenseLayerComputeConfig, specifies the computation configuration of all layers in the cross network. The valid flags in compute_config is hugectr.DenseLayerComputeConfig.async_wgrad and applies only to DCN v2.

    -
      -
    • hugectr.DenseLayerComputeConfig.async_wgrad: Specifies whether the wgrad compute is asynchronous to dgrad. The default value is False.

    • -
    -
  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross,
-                            bottom_names = ["slice11"],
-                            top_names = ["multicross1"],
-                            num_layers=6,
-                            projection_dim=512))
-
-
-
-
-

FmOrder2 Layer

-

TheFmOrder2 layer is the second-order factorization machine (FM), which models linear and pairwise interactions as dot products of latent vectors.

-

Parameters:

-
    -
  • out_dim: Integer, the output vector size. It should be set as a positive number if you want to use factorization machine. The default value is 0.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: (batch_size, out_dim)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FmOrder2,
-                            bottom_names = ["slice32"],
-                            top_names = ["fmorder2"],
-                            out_dim=10))
-
-
-
-
-

WeightMultiply Layer

-

The Multiply Layer maps input elements into a latent vector space by multiplying each feature with a corresponding weight vector.

-

Parameters:

-
    -
  • weight_dims: List[Integer], the shape of the weight matrix (slot_dim, vec_dim) where vec_dim corresponds to the latent vector length for the WeightMultiply layer. It should be set correctly if you want to employ the weight multiplication. The default value is [].

  • -
  • weight_init_type: Specifies how to initialize the weight array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, slot_dim) where slot_dim represents the number of input features

  • -
  • output: (batch_size, slot_dim * vec_dim)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["slice32"],
-                            top_names = ["fmorder2"],
-                            weight_dims = [13, 10]),
-                            weight_init_type = hugectr.Initializer_t.XavierUniform)
-
-
-
-
-

ElementwiseMultiply Layer

-

The ElementwiseMultiply Layer maps two inputs into a single resulting vector by performing an element-wise multiplication of the two inputs.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: 2x(batch_size, num_elem)

  • -
  • output: (batch_size, num_elem)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ElementwiseMultiply,
-                            bottom_names = ["slice1","slice2"],
-                            top_names = ["eltmultiply1"])
-
-
-
-
-

BatchNorm Layer

-

The BatchNorm layer implements a cuDNN based batch normalization.

-

Parameters:

-
    -
  • factor: Float, exponential average factor such as runningMean = runningMean*(1-factor) + newMean*factor for the BatchNorm layer. The default value is 1.

  • -
  • eps: Float, epsilon value used in the batch normalization formula for the BatchNorm layer. The default value is 1e-5.

  • -
  • gamma_init_type: Specifies how to initialize the gamma (or scale) array for the BatchNorm layer. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • beta_init_type: Specifies how to initialize the beta (or offset) array for the BatchNorm layer. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elem)

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BatchNorm,
-                            bottom_names = ["slice32"],
-                            top_names = ["fmorder2"],
-                            factor = 1.0,
-                            eps = 0.00001,
-                            gamma_init_type = hugectr.Initializer_t.XavierUniform,
-                            beta_init_type = hugectr.Initializer_t.XavierUniform)
-
-
-

When training a model, each BatchNorm layer stores mean and variance in a JSON file using the following format: -“snapshot_prefix” + “dense” + str(iter) + ”.model”

-

Example: my_snapshot_dense_5000.model

-

In the JSON file, you can find the batch norm parameters as shown below:

-
    {
-      "layers": [
-        {
-          "type": "BatchNorm",
-          "mean": [-0.192325, 0.003050, -0.323447, -0.034817, -0.091861],
-          "var": [0.738942, 0.410794, 1.370279, 1.156337, 0.638146]
-        },
-        {
-          "type": "BatchNorm",
-          "mean": [-0.759954, 0.251507, -0.648882, -0.176316, 0.515163],
-          "var": [1.434012, 1.422724, 1.001451, 1.756962, 1.126412]
-        },
-        {
-          "type": "BatchNorm",
-          "mean": [0.851878, -0.837513, -0.694674, 0.791046, -0.849544],
-          "var": [1.694500, 5.405566, 4.211646, 1.936811, 5.659098]
-        }
-      ]
-    }
-
-
-
-
-

LayerNorm Layer

-

The LayerNorm layer implements a layer normalization.

-

Parameters:

-
    -
  • eps: Float, epsilon value used in the batch normalization formula for the LayerNorm layer. The default value is 1e-5.

  • -
  • gamma_init_type: Specifies how to initialize the gamma (or scale) array for the LayerNorm layer. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • beta_init_type: Specifies how to initialize the beta (or offset) array for the LayerNorm layer. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
-

Input and Output Shapes:

-
    -
  • input: 2D: (batch_size, num_elem), 3D: (batch_size, seq_len, num_elem), 4D: (batch_size, num_attention_heads, seq_len, num_elem)

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.LayerNorm,
-                            bottom_names = ["slice32"],
-                            top_names = ["fmorder2"],
-                            eps = 0.00001,
-                            gamma_init_type = hugectr.Initializer_t.XavierUniform,
-                            beta_init_type = hugectr.Initializer_t.XavierUniform))
-
-
-
-
-

Concat Layer

-

The Concat layer concatenates a list of inputs.

-

Parameters:

-
    -
  • axis: Integer, the dimension to concat for the Concat layer. If the input is N-dimensional, 0 <= axis < N. The default value is 1.

  • -
-

Input and Output Shapes:

-
    -
  • input: 3D: {(batch_size, num_feas_0, num_elems_0), (batch_size, num_feas + 1, num_elems_1), …} or 2D: {(batch_size, num_elems_0), (batch_size, num_elems_1), …}

  • -
  • output: 3D and axis=1: (batch_size, num_feas_0+num_feas_1+…, num_elems). 3D and axis=2: (batch_size, num_feas, num_elems_0+num_elems_1+…). 2D: (batch_size, num_elems_0+num_elems_1+…)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
-                            bottom_names = ["reshape3","weight_multiply2"],
-                            top_names = ["concat2"],
-                            axis = 2))
-
-
-
-
-

Reshape Layer

-

The Reshape layer reshapes a 3D input tensor into 2D shape.

-

Parameter:

-
    -
  • leading_dim: Integer, the innermost dimension of the output tensor. It must be the multiple of the total number of input elements. If it is unspecified, n_slots * num_elems (see below) is used as the default leading_dim.

  • -
  • time_step: Integer, the second dimension of the 3D output tensor. It must be the multiple of the total number of input elements and must be defined with leading_dim.

  • -
  • selected: Boolean, whether to use the selected mode for the Reshape layer. The default value is False.

  • -
  • selected_slots: List[int], the selected slots for the Reshape layer. It will be ignored if selected is False. The default value is [].

  • -
  • shape: List of Integer, the destination shape of output. You can use -1 as a placeholder for dimensions that are variable, such as batch size. This parameter cannot be used together with other parameters and other parameters will be deprecated in the future. This parameter does not restrict dimensions.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, n_slots, num_elems)

  • -
  • output: (tailing_dim, leading_dim) where tailing_dim is batch_size * n_slots * num_elems / leading_dim

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding1"],
-                            top_names = ["reshape1"],
-                            leading_dim=416))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                             bottom_names = ["sparse_embedding1"],
-                             top_names = ["reshape1"],
-                             shape = [-1, 32, 128]))
-
-
-
-
-

Select Layer

-

The Select layer can be used to select some index from a dimension.

-

Parameter:

-
    -
  • dim: Integer, the dimension user want to do select.

  • -
  • index: List of Integer, the index user want to select from the specified dimension.

  • -
-

Input and Output Shapes:

-
    -
  • input: any shape

  • -
  • output: depending on the parameter dim and index

  • -
-

Example:

-
# if the shape of "sparse_embedding1" is (batch_size, 10, 128) the shape of "select1" will be (batch_size, 2, 128).
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Selcte,
-                            bottom_names = ["sparse_embedding1"],
-                            top_names = ["select1"],
-                            dim = 1,
-                            index = [2, 4]))
-
-
-
-
-

Slice Layer

-

The Slice layer extracts multiple output tensors from input tensors.

-

Parameter:

-
    -
  • ranges: List[Tuple[int, int]], used for the Slice layer. A list of tuples in which each one represents a range in the input tensor to generate the corresponding output tensor. For example, (2, 8) indicates that 6 elements starting from the second element in the input tensor are used to create an output tensor. Note that the start index is inclusive and the end index is exclusive. The number of tuples corresponds to the number of output tensors. Ranges are allowed to overlap unless it is a reverse or negative range. The default value is []. The input tensors are sliced along the last dimension.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elems)

  • -
  • output: {(batch_size, b-a), (batch_size, d-c), …) where ranges ={[a, b), [c, d), …} and len(ranges) <= 5

  • -
-

Example:

-

You can apply the Slice layer to actually slicing a tensor. In this case, it must be explicitly added with Python API.

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
-                            bottom_names = ["dense"],
-                            top_names = ["slice21", "slice22"],
-                            ranges=[(0,10),(10,13)]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["slice21"],
-                            top_names = ["weight_multiply1"],
-                            weight_dims= [10,10]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["slice22"],
-                            top_names = ["weight_multiply2"],
-                            weight_dims= [3,1]))
-
-
-

The Slice layer can also be employed to create copies of a tensor, which helps to express a branch topology in your model graph.

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
-                            bottom_names = ["dense"],
-                            top_names = ["slice21", "slice22"],
-                            ranges=[(0,13),(0,13)]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["slice21"],
-                            top_names = ["weight_multiply1"],
-                            weight_dims= [13,10]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["slice22"],
-                            top_names = ["weight_multiply2"],
-                            weight_dims= [13,1]))
-
-
-

From HugeCTR v.3.3, the aforementioned, Slice layer based branching can be abstracted away. When the same tensor is referenced multiple times in constructing a model in Python, the HugeCTR parser can internally add a Slice layer to handle such a situation. Thus, the example below behaves as the same as the one above whilst simplifying the code.

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["dense"],
-                            top_names = ["weight_multiply1"],
-                            weight_dims= [13,10]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
-                            bottom_names = ["dense"],
-                            top_names = ["weight_multiply2"],
-                            weight_dims= [13,1]))
-
-
-
-
-

Dropout Layer

-

The Dropout layer randomly zeroizes or drops some of the input elements.

-

Parameter:

-
    -
  • dropout_rate: Float, The dropout rate to be used for the Dropout layer. It should be between 0 and 1. Setting it to 0 indicates that there is no dropped element at all. The default value is 0.5.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elems)

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
-                            bottom_names = ["relu1"],
-                            top_names = ["dropout1"],
-                            dropout_rate=0.5))
-
-
-
-
-

ELU Layer

-

The ELU layer represents the Exponential Linear Unit.

-

Parameter:

-
    -
  • elu_alpha: Float, the scalar that decides the value where this ELU function saturates for negative values. The default value is 1.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ELU,
-                            bottom_names = ["fc1"],
-                            top_names = ["elu1"],
-                            elu_alpha=1.0))
-
-
-
-
-

ReLU Layer

-

The ReLU layer represents the Rectified Linear Unit.

-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc1"],
-                            top_names = ["relu1"]))
-
-
-
-
-

Sigmoid Layer

-

The Sigmoid layer represents the Sigmoid Unit.

-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Sigmoid,
-                            bottom_names = ["fc1"],
-                            top_names = ["sigmoid1"]))
-
-
-

Note: The final sigmoid function is fused with the loss function to better utilize memory bandwidth, so do NOT add a Sigmoid layer before the loss layer.

-
-
-

Interaction Layer

-

The interaction layer is used to explicitly capture second-order interactions between features.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: {(batch_size, num_elems), (batch_size, num_feas, num_elems)} where the first tensor typically represents a fully connected layer and the second is an embedding.

  • -
  • output: (batch_size, output_dim) where output_dim = num_elems + (num_feas + 1) * (num_feas + 2 ) / 2 - (num_feas + 1) + 1

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Interaction,
-                            bottom_names = ["layer1", "layer3"],
-                            top_names = ["interaction1"]))
-
-
-

Important Notes: -There are optimizations that can be employed on the Interaction layer and the following MLP layer during fp16 training. In this case, you should specify two output tensor names for the Interaction layer, and use them as the input tensors for the following MLP layer. Please refer to the example of MLP layer for the detailed usage.

-
-
-

Add Layer

-

The Add layer adds up an arbitrary number of tensors that have the same size in an element-wise manner.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: Nx(batch_size, num_elems) where N is the number of input tensors

  • -
  • output: (batch_size, num_elems)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
-                            bottom_names = ["fc4", "reducesum1", "reducesum2"],
-                            top_names = ["add"]))
-
-
-
-
-

ReduceSum Layer

-

The ReduceSum Layer sums up all the elements across a specified dimension.

-

Parameter:

-
    -
  • axis: Integer, the dimension to reduce for the ReduceSum layer. If the input is N-dimensional, 0 <= axis < N. The default value is 1.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, …) where … represents any number of elements with an arbitrary number of dimensions

  • -
  • output: Dimension corresponding to axis is set to 1. The others remain the same as the input.

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReduceSum,
-                            bottom_names = ["fmorder2"],
-                            top_names = ["reducesum1"],
-                            axis=1))
-
-
-
-

GRU Layer

-

The GRU layer is Gated Recurrent Unit.

-

Parameters:

-
    -
  • num_output: Number of output elements.

  • -
  • batchsize: Number of batchsize.

  • -
  • SeqLength: Length of the sequence.

  • -
  • vector_size: size of the input vector.

  • -
  • weight_init_type: Specifies how to initialize the weight array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
  • bias_init_type: Specifies how to initialize the bias array. The supported types include hugectr.Initializer_t.Default, hugectr.Initializer_t.Uniform, hugectr.Initializer_t.XavierNorm, hugectr.Initializer_t.XavierUniform and hugectr.Initializer_t.Zero. The default value is hugectr.Initializer_t.Default.

  • -
-

Input and Output Shapes:

-
    -
  • input: (1, batch_sizeSeqLengthembedding_vec_size)

  • -
  • output: (1, batch_sizeSeqLengthembedding_vec_size)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.GRU,
-                            bottom_names = ["GRU1"],
-                            top_names = ["conncat1"],
-                            num_output=256,
-                            batchsize=13,
-                            SeqLength=20,
-                            vector_size=20))
-
-
-
-
-

PReLUDice Layer

-

The PReLUDice layer represents the Parametric Rectified Linear Unit, which adaptively adjusts the rectified point according to distribution of input data.

-

Parameters:

-
    -
  • elu_alpha: A scalar that decides the value where this activation function saturates for negative values.

  • -
  • eps: Epsilon value used in the PReLU/Dice formula.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, *) where * represents any number of elements

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.PReLU_Dice,
-                            bottom_names = ["fc_din_i1"],
-                            top_names = ["dice_1"],
-                            elu_alpha=0.2, eps=1e-8))
-
-
-
-
-

Scale Layer

-

The Scale layer scales the input 2D tensor to specific size on the designate axis.

-

Parameters:

-
    -
  • axis: Along the designate axis to scale the tensor. The designate axis could be axis 0, 1.

  • -
  • factor : scale factor.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elems)

  • -
  • output: if axis = 0; (batch_size, num_elems * factor), if axis = 1; (batch_size * factor, num_elems)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Scale,
-                            bottom_names = ["item1"],
-                            top_names = ["Scale_item"],
-                            axis = 1, factor = 10))
-
-
-
-
-

FusedReshapeConcat Layer

-

The FusedReshapeConcat layer cross combines the input tensors and outputs item tensor, AD tensor.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: {(batch_size, num_feas + 1, num_elems_0), (batch_size, num_feas + 1, num_elems_1), …}, the input tensors are embeddings.

  • -
  • output: {(batch_size x num_feas, (num_elems_0 + num_elems_1 + …)), (batch_size, (num_elems_0 + num_elems_1 + …))}.

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedReshapeConcat,
-                            bottom_names = ["sparse_embedding_good", "sparse_embedding_cate"],
-                            top_names = ["FusedReshapeConcat_item_his_em", "FusedReshapeConcat_item"]))
-
-
-
-
-

FusedReshapeConcatGeneral Layer

-

The FusedReshapeConcatGeneral layer cross combines the input tensors and outputs item tensor, AD tensor.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: {(batch_size, num_feas, num_elems_0), (batch_size, num_feas, num_elems_1), …}, the input tensors are embeddings.

  • -
  • output: (batch_size x num_feas, (num_elems_0 + num_elems_1 + …)).

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FusedReshapeConcatGeneral,
-                            bottom_names = ["sparse_embedding_good", "sparse_embedding_cate"],
-                            top_names = ["FusedReshapeConcat_item_his_em"]))
-
-
-
-
-

Softmax Layer

-

The Softmax layer computes softmax activations. -When the softmax layer accept two inputs tensors, the first one is the tensor need to do softmax and the other one is mask which mask some positions of the first tensor (setting them to -10000) before the softmax step.

-

Parameter: None

-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elems)

  • -
  • output: same as input

  • -
  • input: (batch_size, num_attention_heads, seq_len, seq_len) (batch_size, 1, 1, seq_len)

  • -
  • output: same as input

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Softmax,
-                            bottom_names = ["reshape1"],
-                            top_names = ["softmax_i"]))
-
-
-
-
-

Sub Layer

-

Inputs: x tensor, y tensor in same size. -Produce x - y in element wise manner.

-

Parameters: None

-

Input and Output Shapes:

-
    -
  • input: Nx(batch_size, num_elems) where N is the number of input tensors

  • -
  • output: (batch_size, num_elems)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Sub,
-                            bottom_names = ["Scale_item1", "item_his1"],
-                            top_names = ["sub_ih"]))
-
-
-
-
-

ReduceMean Layer

-

The ReduceMean Layer computes the mean of elements across a specified dimension.

-

Parameter:

-
    -
  • axis: The dimension to reduce. If the input is N-dimensional, 0 <= axis < N.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, …) where … represents any number of elements with an arbitrary number of dimensions

  • -
  • output: Dimension corresponding to axis is set to 1. The others remain the same as the input.

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReduceMean,
-                            bottom_names = ["fmorder2"],
-                            top_names = ["reducemean1"],
-                            axis=1))
-
-
-
-
-

MatrixMutiply Layer

-

The MatrixMutiply Layer is a binary operation that produces a matrix output from two matrix inputs by performing matrix mutiplication.

-

Parameters: None

-

Input and Output Shapes:

-

There are following shape configuration supported

-
    -
  • input: 2D x 2D (m, n)x(n, k) and the output will be 2D (m,k)

  • -
  • input: 3D x 3D (batch_size, m, n)x(batch_size, n, k) and the output will be 3D (batch_size, m, k)

  • -
  • input: 2D x 3D (batch_size, m)x(m, g, h) and the output will be 3D (batch_size, g, h)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MatrixMutiply,
-                            bottom_names = ["slice1","slice2"],
-                            top_names = ["MatrixMutiply1"])
-
-
-
-
-

MultiHeadAttention Layer

-

The MultiHeadAttention Layer is a binary operation that produces a matrix output from 3 matrix inputs by performing matrix mutiplication. The formulas is as follows: -$\( -\mathbf{O} = \text {softmax} (s \cdot (\mathbf{Q} \cdot \mathbf{K}) \odot \mathbf{M}) \cdot \mathbf{V} -\)\( -Where \)Q, K, V\( are 3D inputs and \)O\( is 3D output. The \)\odot\( represents element-wise dot while \)\cdot\( represents matrix inner product. \)\mathbf{M}$ is used to mask out padded input due to the inequality of sequence length. -Please refer to Attention is all you need for more details. -Parameter:

-
    -
  • num_attention_heads: The number of attention heads. Default value is 1.

  • -
-

Input and Output Shapes:

-
    -
  • input:

    -
      -
    • \(Q\): (batch_size, seq_from, hidden_dim),

    • -
    • \(K\): (batch_size, seq_to, hidden_dim),

    • -
    • \(V\): (batch_size, seq_to, hidden_dim)

    • -
    • \(M\) (optional): (batch_size, 1, seq_from, seq_to)

    • -
    -
  • -
  • output:

    -
      -
    • \(O\): (batch_size, seq_from, hidden_dim)

    • -
    -
  • -
-

Example:

-
model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MultiHeadAttention,
-        bottom_names=["query", "key", "value", "mask"],
-        top_names=["attention_out"],
-        num_attention_heads=4,
-    )
-)
-
-
-
-
-

SequenceMask Layer

-

The SequenceMask Layer can generate a binary padding mask which marks the zero padding values in the input by 0. The importance of having a padding mask is to make sure that these zero values are not processed along with the actual input values

-

Parameter:

-
    -
  • max_sequence_len_from: The maximum length of query sequences. Default value is 1.

  • -
  • max_sequence_len_to: The maximum length of key sequences. Default value is 1.

  • -
-

Input and Output Shapes:

-
    -
  • input: 2D: (batch_size, 1), (batch_size, 1)

  • -
  • output: 4D: (batch_size, 1, max_sequence_len_from, max_sequence_len_to)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type=hugectr.Layer_t.SequenceMask,
-                             bottom_names=["dense","dense"],
-                             top_names=["sequence_mask"],
-                             max_sequence_len_from=10,
-                             max_sequence_len_to=10,))
-
-
-
-
-

Gather Layer

-

The Gather layer gather multiple output tensor slices from an input tensors on the last dimension.

-

Parameter:

-
    -
  • indices: A list of indices in which each one represents an index in the input tensor to generate the corresponding output tensor. For example, [2, 8] indicates the second and eights tensor slice in the input tensor which are used to create an output tensor.

  • -
-

Input and Output Shapes:

-
    -
  • input: (batch_size, num_elems)

  • -
  • output: (num_indices, num_elems)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Gather,
-                            bottom_names = ["reshape1"],
-                            top_names = ["gather1"],
-                            indices=[1,3,5]))
-
-
-
-
-
-

BinaryCrossEntropyLoss

-

BinaryCrossEntropyLoss calculates loss from labels and predictions where each label is binary. The final sigmoid function is fused with the loss function to better utilize memory bandwidth.

-

Parameter:

-
    -
  • use_regularizer: Boolean, whether to use regulariers. THe default value is False.

  • -
  • regularizer_type: The regularizer type for the BinaryCrossEntropyLoss, CrossEntropyLoss or MultiCrossEntropyLoss layer. The supported types include hugectr.Regularizer_t.L1 and hugectr.Regularizer_t.L2. It will be ignored if use_regularizer is False. The default value is hugectr.Regularizer_t.L1.

  • -
  • lambda: Float, the lambda value of the regularization term. It will be ignored if use_regularier is False. The default value is 0.

  • -
-

Input and Output Shapes:

-
    -
  • input: [(batch_size, 1), (batch_size, 1)] where the first tensor represents the predictions while the second tensor represents the labels

  • -
  • output: (batch_size, 1)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
-                            bottom_names = ["add", "label"],
-                            top_names = ["loss"]))
-
-
-
-
-

CrossEntropyLoss

-

CrossEntropyLoss calculates loss from labels and predictions between the forward propagation phases and backward propagation phases. It assumes that each label is two-dimensional.

-

Parameter:

-
    -
  • use_regularizer: Boolean, whether to use regulariers. THe default value is False.

  • -
  • regularizer_type: The regularizer type for the BinaryCrossEntropyLoss, CrossEntropyLoss or MultiCrossEntropyLoss layer. The supported types include hugectr.Regularizer_t.L1 and hugectr.Regularizer_t.L2. It will be ignored if use_regularizer is False. The default value is hugectr.Regularizer_t.L1.

  • -
  • lambda: Float, the lambda value of the regularization term. It will be ignored if use_regularier is False. The default value is 0.

  • -
-

Input and Output Shapes:

-
    -
  • input: [(batch_size, 2), (batch_size, 2)] where the first tensor represents the predictions while the second tensor represents the labels

  • -
  • output: (batch_size, 2)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.CrossEntropyLoss,
-                            bottom_names = ["add", "label"],
-                            top_names = ["loss"],
-                            use_regularizer = True,
-                            regularizer_type = hugectr.Regularizer_t.L2,
-                            lambda = 0.1))
-
-
-
-
-

MultiCrossEntropyLoss

-

MultiCrossEntropyLoss calculates loss from labels and predictions between the forward propagation phases and backward propagation phases. It allows labels in an arbitrary dimension, but all the labels must be in the same shape.

-

Parameter:

-
    -
  • use_regularizer: Boolean, whether to use regulariers. THe default value is False.

  • -
  • regularizer_type: The regularizer type for the BinaryCrossEntropyLoss, CrossEntropyLoss or MultiCrossEntropyLoss layer. The supported types include hugectr.Regularizer_t.L1 and hugectr.Regularizer_t.L2. It will be ignored if use_regularizer is False. The default value is hugectr.Regularizer_t.L1.

  • -
  • lambda: Float, the lambda value of the regularization term. It will be ignored if use_regularier is False. The default value is 0.

  • -
-

Input and Output Shapes:

-
    -
  • input: [(batch_size, *), (batch_size, *)] where the first tensor represents the predictions while the second tensor represents the labels. * represents any even number of elements.

  • -
  • output: (batch_size, *)

  • -
-

Example:

-
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCrossEntropyLoss,
-                            bottom_names = ["add", "label"],
-                            top_names = ["loss"],
-                            use_regularizer = True,
-                            regularizer_type = hugectr.Regularizer_t.L1,
-                            lambda = 0.1
-                            ))
-
-
-
-
-
-

Embedding Collection

-
-

About the HugeCTR embedding collection

-

Embedding collection is introduced in the v3.7 release. -The embedding collection enables you to use embeddings with different vector sizes, optimizers, and arbitrary table placement strategy. -Compared with the hugectr.SparseEmbedding class, the embedding collection has three key advantages:

-
    -
  1. The embedding collection can fuse embedding tables with different embedding vector sizes. -The previous embedding can only fuse embedding tables with the same embedding vector size. -The enhancement boosts both flexibility and performance.

  2. -
  3. The embedding collection extends the functionality of embedding by supporting the concat combiner and supporting different lookups on the same embedding table.

  4. -
  5. The embedding collection supports arbitrary embedding table placement, such as data parallel and model parallel.

  6. -
-
-
-

Overview of using the HugeCTR embedding collection

-

To use an embedding collection, you need the following items:

-
    -
  • A list of hugectr.EmbeddingTableConfig objects that represent the embedding tables, user needs to configure table name/max_vocabulary_size/ev_size/optimizer(optional).

  • -
  • A hugectr.EmbeddingCollectionConfig object that uses the embedding table config objects to organize the lookup operations between the input data and the embedding tables. It also provides method to configure the table placement strategy.

  • -
-

You can use the add() method from hugectr.Model to use the embedding collection for training and evaluation.

-
-
-

Known Limitations

-
    -
  1. Only embedding_vec_size values of up to 256 are currently supported in the embedding collection.

  2. -
  3. If you use a dynamic hash table (by setting max_vocabulary_size to -1 in hugectr.EmbeddingTableConfig), it is -recommended that you set the NCCL_LAUNCH_MODE=GROUP environment variable to avoid potential hangs.

  4. -
  5. Mixed-precision training is not supported when using a dynamic hash table.

  6. -
-
-
-

EmbeddingTableConfig

-

The hugectr.EmbeddingTableConfig class enables you to specify the attributes of an embedding table.

-

Parameter:

-
    -
  • name: String, a name which is used when dumping and loading embedding table.

  • -
  • max_vocabulary_size: Integer, specifies the vocabulary size of this table. -If positive, then the value indicates the number of embedding vectors that this table contains. -If you specify the value incorrectly and exceed the value during training or evaluation, you will cause an overflow and receive an error. -If you do not know the exact size of the embedding table, you can specify -1 to use a dynamic hash embedding table with a size that can be extended dynamically during training or evaluation.

  • -
  • ev_size: Integer, specifies the embedding vector size that this embedding consists of.

  • -
  • opt_params: Optional, hugectr.Optimizer, the optimizer you want to use for this embedding table. -If not specified, the embedding table uses the optimizer specified in hugectr.Model. -Currently, if the user sets max_vocabulary_size to a value greater than 0, the supported optimizer types are SGD and AdaGrad. If the user sets max_vocabulary_size to -1, a dynamic hash embedding table is used, and the supported optimizer types are SGD, MomentumSGD, Nesterov, AdaGrad, RMSProp, Adam, and Ftrl.

  • -
-

Example:

-
# Create the embedding table.
-slot_size_array = [203931, 18598, 14092, 7012, 18977, 4, 6385, 1245, 49,
-                   186213, 71328, 67288, 11, 2168, 7338, 61, 4, 932, 15,
-                   204515, 141526, 199433, 60919, 9137, 71, 34]
-embedding_table_list = []
-for i in range(len(slot_size_array))):
-    embedding_table_list.append(
-        hugectr.EmbeddingTableConfig(
-            name="table_" + str(i),
-            max_vocabulary_size=slot_size_array[i],
-            ev_size=128,
-        )
-    )
-
-
-
-
-

EmbeddingCollectionConfig

-

Create a hugectr.EmbeddingCollectionConfig instance to construct the lookup operation and configure the table placement strategy.

-

Parameter:

-
    -
  • use_exclusive_keys: bool, if true, any key is exclusively owned by only one table.

  • -
  • comm_strategy: hugectr.CommunicationStrategy, can be hugectr.CommunicationStrategy.Uniform or hugectr.CommunicationStrategy.Hierarchical.

  • -
-
-

embedding_lookup method

-

The embedding_lookup method enables you to specify the lookup operations between the input data and an embedding table.

-

Parameter:

-
    -
  • table_config : hugectr.EmbeddingTableConfig, the embedding table for the lookup operation.

  • -
  • bottom_name: str, the bottom tensor name. -Specify a tensor that is compatible with the data_reader_sparse_param_array parameter of hugectr.Input in hugectr.Model.

  • -
  • top_name: str, the output tensor name. -The shape of output tensor is (<batch size>, 1, <embedding vector size>).

  • -
  • combiner: str, specifies the combiner operation. -Specify mean, sum, or concat.

  • -
-

Embedding Collection supports configuring the batch-major output with list of args in embedding_lookup.

-

Parameter:

-
    -
  • table_config : list of hugectr.EmbeddingTableConfig, the embedding table for the lookup operation.

  • -
  • bottom_name: list of str, the bottom tensor name. -Specify a tensor that is compatible with the data_reader_sparse_param_array parameter of hugectr.Input in hugectr.Model.

  • -
  • top_name: str, the output tensor name. -The shape of output tensor is (<batch size>, sum of all <embedding vector size>).

  • -
  • combiner: list of str, specifies the combiner operation. -Specify mean, sum, or concat.

  • -
-
-
-

shard method

-

In the recommendation system, the embedding table is usually so large that a single GPU is not able to hold all embedding tables. -One strategy for addressing the challenge is to use sharding to distribute the embedding tables across multiple GPUs. -We call this sharding strategy the embedding table placement strategy (ETPS).

-

ETPS can significantly boost the performance of embedding because different sharding strategies influence the communication between GPUs. -The optimal strategy is highly dependent on your dataset and your lookup operation.

-

EmbeddingCollectionConfig provides shard method for users to configure the ETPS so that users can adjust the ETPS according their own use case to achieve optimal performance.

-

Parameter:

-
    -
  • shard_matrix: list of list of str, a matrix with num_gpus row and each row stores the name of embedding table that user want to place on row-th GPU.

  • -
  • shard_strategy: list of tuple(str, list of str), for each tuple(str, list of str), the first str means the table placement strategy, which can be “mp”(model parallel) or “dp”(data parallel), and the second list of str means table name which user want to apply the table placement strategy to. User can configure multiple table placement strategy. For example, [(“mp”, [“t0”, “t1”]), (“dp”, [“t2”, “t3”])]. Note, the shard_strategy should be consistent with shard_matrix, which means for the table which is “dp” sharded should be placed on every GPU. And also one table can only be applied with one shard strategy.

  • -
-

Example:

-
# create embedding table configs
-embedding_table_names = ["goods", "ads", "userID", "time"]
-embedding_table_list = []
-for name in embedding_table_names:
-    embedding_table_list.append(
-        hugectr.EmbeddingTableConfig(
-            name=name,
-            max_vocabulary_size=...,
-            ev_size=...,
-        )
-    )
-
-# create embedding collection config and configure lookup
-ebc_config = hugectr.EmbeddingCollectionConfig()
-ebc_config.embedding_lookup(
-    table_config=[embedding_table_list[i] for i in range(NUM_TABLE)],
-    bottom_name=["data{}".format(i) for i in range(NUM_TABLE)],
-    top_name="sparse_embedding",
-    combiner=["sum" for _ in range(NUM_TABLE)],
-)
-
-# configure the table placement strategy, suppose we have 4 GPUs
-shard_matrix = [
-    ["goods", "userID", "time"],
-    ["ads", "time"],
-    ["userID", "time"],
-    ["goods", "time"]
-]
-shard_strategy = [
-    ("mp", ["goods", "userID", "ads"]),
-    ("dp", ["time"]),
-]
-ebc_config.shard(shard_matrix=shard_matrix, shard_strategy=shard_strategy)
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/api/index.html b/review/pr-458/api/index.html deleted file mode 100644 index 9b0b253173..0000000000 --- a/review/pr-458/api/index.html +++ /dev/null @@ -1,322 +0,0 @@ - - - - - - - HugeCTR API Documentation — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

HugeCTR API Documentation

-
- -
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/api/python_interface.html b/review/pr-458/api/python_interface.html deleted file mode 100644 index 13459a0364..0000000000 --- a/review/pr-458/api/python_interface.html +++ /dev/null @@ -1,1334 +0,0 @@ - - - - - - - HugeCTR Python Interface — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

HugeCTR Python Interface

- -
-

About the HugeCTR Python Interface

-

As a recommendation system domain specific framework, HugeCTR has a set of high level abstracted Python Interface which includes training API and inference API. Users only need to focus on algorithm design, the training and inference jobs can be automatically deployed on the specific hardware topology in the optimized manner. From version 3.1, users can complete the process of training and inference without manually writing JSON configuration files. All supported functionalities have been wrapped into high-level Python APIs. Meanwhile, the low-level training API is maintained for users who want to have precise control of each training iteration and each evaluation step. Still, the high-level training API is friendly to users who are already familiar with other deep learning frameworks like Keras and it is worthwhile to switch to it from low-level training API. Please refer to HugeCTR Python Interface Notebook to get familiar with the workflow of HugeCTR training and inference. Meanwhile we have a lot of samples for demonstration in the samples directory of the HugeCTR repository.

-
-
-

High-level Training API

-

For HugeCTR high-level training API, the core data structures are Solver, DataReaderParams, OptParamsPy, Input, SparseEmbedding, DenseLayer and Model. You can create a Model instance with Solver, DataReaderParams and OptParamsPy instances, and then add instances of Input, SparseEmbedding or DenseLayer to it. After compiling the model with the Model.compile() method, you can start the epoch mode or non-epoch mode training by simply calling the Model.fit() method. Moreover, the Model.summary() method gives you an overview of the model structure. We also provide some other methods, such as saving the model graph to a JSON file, constructing the model graph based on the saved JSON file, loading model weights and optimizer status, etc.

-
-

Solver

-
-

CreateSolver method

-
hugectr.CreateSolver()
-
-
-

CreateSolver returns an Solver object according to the custom argument values,which specify the training resources.

-

Arguments

-
    -
  • model_name: String, the name of the model. The default value is empty string. If you want to dump the model graph and save the model weights for inference, a unique value should be specified for each model that needs to be deployed.

  • -
  • seed: A random seed to be specified. The default value is 0.

  • -
  • lr_policy: The learning rate policy which suppots only fixed. The default value is LrPolicy_t.fixed.

  • -
  • lr: The learning rate, which is also the base learning rate for the learning rate scheduler. The default value is 0.001.

  • -
  • warmup_steps: The warmup steps for the internal learning rate scheduler within Model instance. The default value is 1.

  • -
  • decay_start: The step at which the learning rate decay starts for the internal learning rate scheduler within Model instance. The default value is 0.

  • -
  • decay_steps: The number of steps of the learning rate decay for the internal learning rate scheduler within Model instance. The default value is 1.

  • -
  • decay_power: The power of the learning rate decay for the internal learning rate scheduler within Model instance. The default value is 2.

  • -
  • end_lr: The final learning rate for the internal learning rate scheduler within Model instance. The default value is 0. Please refer to SGD Optimizer and Learning Rate Scheduling if you want to get detailed information about LearningRateScheduler.

  • -
  • max_eval_batches: Maximum number of batches used in evaluation. It is recommended that the number is equal to or bigger than the actual number of bathces in the evaluation dataset. The default value is 100.

  • -
  • batchsize_eval: Minibatch size used in evaluation. The default value is 2048. Note that batchsize here is the global batch size across gpus and nodes, not per worker batch size.

  • -
  • batchsize: Minibatch size used in training. The default value is 2048. Note that batchsize here is the global batch size across gpus and nodes , not per worker batch size.

  • -
  • vvgpu: GPU indices used in the training process, which has two levels. For example: [[0,1],[1,2]] indicates that two physical nodes (each physical node can have multiple NUMA nodes) are used. In the first node, GPUs 0 and 1 are used while GPUs 1 and 2 are used for the second node. It is also possible to specify non-continuous GPU indices such as [0, 2, 4, 7]. The default value is [[0]].

  • -
  • repeat_dataset: Whether to repeat the dataset for training. If the value is True, non-epoch mode training will be employed. Otherwise, epoch mode training will be adopted. The default value is True.

  • -
  • use_mixed_precision: Whether to enable mixed precision training. The default value is False.

  • -
  • enable_tf32_compute: If you want to accelerate FP32 matrix multiplications within the FullyConnectedLayer and InteractionLayer, set this value to True. The default value is False.

  • -
  • scaler: The scaler to be used when mixed precision training is enabled. Only 128, 256, 512, and 1024 scalers are supported for mixed precision training. The default value is 1.0, which corresponds to no mixed precision training.

  • -
  • metrics_spec: Map of enabled evaluation metrics. You can use either AUC, AverageLoss, HitRate, or any combination of them. For AUC, you can set its threshold, such as {MetricsType.AUC: 0.8025}, so that the training terminates when it reaches that threshold. The default value is {MetricsType.AUC: 1.0}. Multiple metrics can be specified in one job. For example: metrics_spec = {hugectr.MetricsType.HitRate: 0.8, hugectr.MetricsType.AverageLoss:0.0, hugectr.MetricsType.AUC: 1.0})

  • -
  • i64_input_key: If your dataset format is Norm, you can choose the data type of each input key. For the Parquet format dataset generated by NVTabular, only I64 is allowed. For the Raw dataset format, only I32 is allowed. Set this value to True when you need to use I64 input key. The default value is False.

  • -
  • use_algorithm_search: Whether to use algorithm search for cublasGemmEx within the FullyConnectedLayer. The default value is True.

  • -
  • use_cuda_graph: Whether to enable cuda graph in the training. If you are using AsyncDataReader and HybridEmbedding, all GPU tasks including embeddings and network inside each training iteration will be packed into a single CUDA Graph. Otherwise only the CUDA Graph includes the network only. The default value is True.

  • -
  • device_layout: this option is deprecated and no longer used.

  • -
  • train_intra_iteration_overlap: Whether to enable overlap inside every training iteration. If true, hugectr detects the model toplogy and tries to overlap among DataReader, Embedding and Network in every training iteration. The default value is False.

  • -
  • train_inter_iteration_overlap: Whether to enable overlap between training iterations. If true, hugectr tries to fetch some data copy/computation in the next iteration during the current iteration, so that the next iteration can start earlier. The default value is False.

  • -
  • eval_intra_iteration_overlap: Whether to enable overlap inside every eval iteration. The knob provides similar functionality with train_intra_iteration_overlap while it applies to evaluation iterations. The default value is False.

  • -
  • eval_inter_iteration_overlap: Whether to enable overlap between eval iteration. The knob provides similar functionality with train_inter_iteration_overlap while it applies to evaluation iterations. The default value is False.

  • -
  • all_reduce_algo: The algorithm to be used for all reduce. The supported options are AllReduceAlgo.OneShot and AllReduceAlgo.NCCL. The default value is AllReduceAlgo.NCCL. When you are doing multi-node training, AllReduceAlgo.OneShot will require RDMA support while AllReduceAlgo.NCCL can run on both RDMA and non-RDMA hardware.

  • -
  • grouped_all_reduce: The default value is False. If True, the gradients for the dense network and the gradients for data-parallel embedding are grouped and all reduced in one kernel, effectively combining two small all-reduce operations into a single larger one for higher efficiency. Requirements: Hybrid embedding is used (see HybridEmbeddingParam).

  • -
  • num_iterations_statistics: The number of batches used to perform statistics for hybrid embedding. The default value is 20. Requirement: The data reader is asynchronous (see AsyncParam).

  • -
-

Example:

-
solver = hugectr.CreateSolver(max_eval_batches = 300,
-                              batchsize_eval = 16384,
-                              batchsize = 16384,
-                              lr = 0.001,
-                              vvgpu = [[0]],
-                              repeat_dataset = True,
-                              i64_input_key = True)
-
-
-
-
-
-
-

AsyncParam

-
-

AsyncParam class

-
hugectr.AsyncParam()
-
-
-

A data reader can be optimized using asynchronous reading. This is done by creating the data reader with a async_param argument (see DataReaderParams), which is of type AsyncParam. AsyncParam specifies the parameters related to asynchronous raw data reader, An asynchronous data reader uses the Linux asynchronous I/O library (AIO) to achieve peak I/O throughput. Requirements: The input dataset has only one-hot feature items and is in raw format.

-

Arguments

-
    -
  • num_threads: Integer, the number of the data reading threads, should be at least 1 per GPU。 There is NO default value.

  • -
  • num_batches_per_thread: Integer, the number of the batches each data reader thread works on simultaneously, typically 2-4. There is NO default value.

  • -
  • max_num_requests_per_thread: Integer, the max number of individual IO requests for each thread. It should be a multiple of num_batches_per_thread and no less than 2 * num_batches_per_thread. The value 72 should work in most cases. There is NO default value. Ignored when multi_hot_reader=True.

  • -
  • io_depth: Integer, the size of the asynchronous IO queue, the value 4 should work in most cases. There is NO default value. Ignored when multi_hot_reader=True.

  • -
  • io_alignment: Integer, the byte alignment of IO requests, the value 512 or 4096 should work in most cases. There is NO default value. Ignored when multi_hot_reader=True.

  • -
  • shuffle: Boolean, if this option is enabled, the order in which the batches are fed into training will be randomized. There is NO default value.

  • -
  • aligned_type: The supported types include hugectr.Alignment_t.Auto and hugectr.Alignment_t.Non. If hugectr.Alignment_t.Auto is chosen, the dimension of dense input will be padded to an 8-aligned value. There is NO default value. Ignored when multi_hot_reader=True.

  • -
  • multi_hot_reader: Boolean, if this option is enabled, multi-hot RawAsync reader is activated and static hotness for categorical feature is supported. The hotness information is obtained from Input layer. Meanwhile the dense data type can be either float or unsigned int. The default value is True.

  • -
  • is_dense_float : Boolean, if this option is enabled, data type of dense features is float otherwise unsigned int. The default value is True.

  • -
-

Note

-

When multi_hot_reader=False, is_dense_float must be False, otherwise exception will be thrown. When multi_hot_reader=False,

-
max_num_requests_per_thread
-io_depth
-io_alignment
-aligned_type
-
-
-

are ignored. In addition, when multi_hot_reader=True, the param num_threads actually refers to the number of IO threads per GPU.

-

Example:

-
    -
  1. one-hot data reader AsyncParam

  2. -
-
async_param = hugectr.AsyncParam(32, 4, 10, 2, 512, True, hugectr.Alignment_t.Non, False, False)
-
-
-
    -
  1. multi-hot data reader AsyncParam

  2. -
-
async_param = hugectr.AsyncParam(
-        num_threads=1,
-        num_batches_per_thread=16,
-        shuffle=False,
-        multi_hot_reader=True,
-        is_dense_float=True)
-
-
-
-
-
-
-

HybridEmbeddingParam

-
-

HybridEmbeddingParam class

-
hugectr.HybridEmbeddingParam()
-
-
-

A sparse embedding layer can be optimized using hybrid embedding. -Hybrid embedding is designed to overcome the bandwidth constraint that is imposed by the embedding part of the embedding training workload by algorithmically reducing the traffic over the network. -Hybrid embedding can improve performance in multi-node and multi-GPU deployments with one-hot data. -Conversely, hybrid embedding does not improve performance on a single-machine and single-GPU deployment or with multi-hot encoded data.

-

You can use hybrid embedding by creating a sparse embedding layer with a hybrid_embedding_param argument that is of type HybridEmbeddingParam and specifying the parameters that are related to hybrid embedding.

-

Requirements: The input dataset has only one-hot feature items and the model uses the SGD optimizer.

-

For information about creating a sparse embedding layer, refer to the class documentation.

-

Arguments

-
    -
  • max_num_frequent_categories: Integer, the maximum number of frequent categories in unit of batch size. This argument does not have a default value.

  • -
  • max_num_infrequent_samples: Integer, the maximum number of infrequent samples in unit of batch size. This argument does not have a default value.

  • -
  • p_dup_max: Float, the maximum probability that the category appears more than once within the gpu-batch. This way of determining the number of frequent categories is used in single-node or NVLink connected systems only. This argument does not have a default value.

  • -
  • max_all_reduce_bandwidth: Float, the algorithmic bandwidth of the all reduce. This argument does not have a default value.

  • -
  • max_all_to_all_bandwidth: Float, the algorithmic bandwidth of the all-to-all. The unit of bandwidth is per-GPU. This argument does not have a default value.

  • -
  • efficiency_bandwidth_ratio: Float, this argument is used in combination with max_all_reduce_bandwidth and max_all_to_all_bandwidth to determine the optimal threshold for the number of frequent categories. This way of determining the frequent categories is used for multi node only. This argument does not have a default value.

  • -
  • communication_type: The type of communication that is being used. The supported types include CommunicationType.IB_NVLink, CommunicationType.IB_NVLink_Hier and CommunicationType.NVLink_SingleNode. This argument does not have a default value.

    -
      -
    • CommunicationType.IB_NVLink_Hier supports two protocols: InfiniBand and RoCE v2. If you rely on the RoCE network device which has the special GID and traffic class type, two environment variables should be set:

      -
        -
      • HUGECTR_ROCE_GID sets the RoCE GID of your device(default 0).

      • -
      • HUGECTR_ROCE_TC sets the RoCE traffic class type of your device(default 0).

      • -
      -
    • -
    -
  • -
  • hybrid_embedding_type: The type of hybrid embedding, which supports only HybridEmbeddingType.Distributed for now. This argument does not have a default value.

  • -
-

Example:

-
hybrid_embedding_param = hugectr.HybridEmbeddingParam(2, -1, 0.01, 1.3e11, 1.9e11, 1.0,
-                                                    hugectr.CommunicationType.IB_NVLink_Hier,
-                                                    hugectr.HybridEmbeddingType.Distributed))
-
-
-
-
-
-

DataReaderParams

-
-

DataReaderParams class

-
hugectr.DataReaderParams()
-
-
-

DataReaderParams specifies the parameters related to the data reader. HugeCTR currently supports three dataset formats, i.e., Raw and Parquet. An DataReaderParams instance is required to initialize the Model instance.

-

Arguments

-
-

Deprecation Warning: Norm and Raw data reader will be deprecated in a future release. Please check out the Parquet and RawAsync for alternatives.

-
-
    -
  • data_reader_type: The type of the data reader which should be consistent with the dataset format. -Specify one of the following types:

    -
      -
    • hugectr.DataReaderType_t.Parquet can read Parquet format dataset

    • -
    • hugectr.DataReaderType_t.RawAsync can read Raw format dataset

    • -
    -
  • -
  • source: List[str] or String, the training dataset source. -For Norm or Parquet dataset, specify the file list of training data, such as source = "file_list.txt". -For Raw dataset, specify a single training file, such as source = "train_data.bin". -When using the embedding training cache, you can specify several file lists, such as source = ["file_list.1.txt", "file_list.2.txt"]. -This argument has no default value and you must specify a value.

  • -
  • keyset: List[str] or String, the keyset files. -This argument is only valid when you use the embedding training cache. -The value should correspond to the value for the source argument. -For example, you can specify source = ["file_list.1.txt", "file_list.2.txt"] and keyset = ["file_list.1.keyset", "file_list.2.keyset"] -The example shows the one-to-one correspondence between the source and keyset values.

  • -
  • eval_source: String, the evaluation dataset source. -For Norm or Parquet dataset, specify the file list of the evaluation data. -For Raw dataset, specify a single evaluation file. -This argument has no default value and you must specify a value.

  • -
  • check_type: The data error detection mechanism. -Specify hugectr.Check_t.Sum (CheckSum) or hugectr.Check_t.Non (no detection). -This argument has no default value and you must specify a value.

  • -
  • cache_eval_data: Integer, the cache size of evaluation data on device. -Specify a value that is greater than zero to restrict the memory use. -The default value is 0.

  • -
  • num_samples: Integer, the number of samples in the training dataset. -This argument is valid for the Raw dataset format only. -The default value is 0.

  • -
  • eval_num_samples: Integer, the number of samples in the evaluation dataset. -This argument is valid for the Raw dataset format only. -The default value is 0.

  • -
  • float_label_dense: Boolean, this argument is valid for the Raw dataset format only. -When set to True, the label and dense features for each sample are interpreted as float values. -Otherwise, they are read as integer values while the dense features are preprocessed with \(log(dense[i] + \text{1.f})\). -The default value is True.

  • -
  • num_workers: Integer, the number of data reader workers to load data concurrently. -You can empirically decide the best value based on your dataset and training environment. -The default value is 12.

  • -
  • slot_size_array: List[int], specify the maximum key value for each slot. -Refer to the following equation. -The array should be consistent with that of the sparse input. -HugeCTR requires this argument for Parquet format data and RawAsync format when you want to add an offset to the input key. -The default value is an empty list.

    -

    The following equation shows how to determine the values to specify:

    -

    \(slot\_size\_array[k] = \max\limits_i slot^k_i + 1\)

    -
  • -
  • data_source_params: DataSourceParams(), specify the configurations of the data sources(Local, HDFS, AWS S3, Google Cloud Storage or others) for data reading.

  • -
  • async_param: AsyncParam, the parameters for async raw data reader. Please find more information in the AsyncParam section in this document.

  • -
-
-
-
-

Dataset formats

-

We support the following dataset formats within our DataReaderParams.

-
-

Deprecation Warning: Norm format will be deprecated in a future release. Please check out the Parquet and Raw for alternatives.

-
- -../_images/dataset.png -
Fig. 1: (a) Raw (b) Parquet Dataset Formats
-



-
-

Data Files

-

A data file is the minimum reading granularity for a reading thread, so it’s better to have more files than the number of reading threads to achieve the best performance. A data file consists of a header and actual tabular data. A complete header always starts with a 4-byte constant 64 which is the size of header in bytes.

-

Dynamic hotness for categorical features is allowed for Norm, along with the payment for is a 4-byte nnz value indicates number of values of current slot preceding to each slot (The small yellow box depicted in Fig.1 (a)). Optionally, Norm reserves a 1-byte checksum for each sample (including the header) which is the sum of all significant bytes of a sample (excluding the nnz). Users should be in charge of correctly specifying if Norm dataset supports checksum in DataReaderParams

-

Header Definition:

-
typedef struct DataSetHeader_ {
-  long long error_check;        // 0: no error check; 1: check_num
-  long long number_of_records;  // the number of samples in this data file
-  long long label_dim;          // dimension of label
-  long long dense_dim;          // dimension of dense feature
-  long long slot_num;           // slot_num for each embedding
-  long long reserved[3];        // reserved for future use
-} DataSetHeader;
-
-
-

Data Definition (each sample):

-
typedef struct Data_ {
-  int length;                   // bytes in this sample (optional: only in check_sum mode )
-  float label[label_dim];
-  float dense[dense_dim];
-  Slot slots[slot_num];
-  char checkbits;                // checkbit for this sample (optional: only in checksum mode)
-} Data;
-
-typedef struct Slot_ {
-  int nnz;
-  unsigned int*  keys; // changeable to `long long` with `"input_key_type"` in `solver` object of the configuration file.
-} Slot;
-
-
-

The input keys for categorical are distributed to the slots with no overlap allowed. For example: slot[0] = {0,10,32,45}, slot[1] = {1,2,5,67}. If there is any overlap, it will cause an undefined behavior. For example, given slot[0] = {0,10,32,45}, slot[1] = {1,10,5,67}, the table looking up the 10 key will produce different results based on how the slots are assigned to the GPUs.

-
-
-

File List

-

The first line of a file list should be the number of data files in the dataset with the paths to those files listed below as shown here:

-
$ cat simple_sparse_embedding_file_list.txt
-10
-./simple_sparse_embedding/simple_sparse_embedding0.parquet
-./simple_sparse_embedding/simple_sparse_embedding1.parquet
-./simple_sparse_embedding/simple_sparse_embedding2.parquet
-./simple_sparse_embedding/simple_sparse_embedding3.parquet
-./simple_sparse_embedding/simple_sparse_embedding4.parquet
-./simple_sparse_embedding/simple_sparse_embedding5.parquet
-./simple_sparse_embedding/simple_sparse_embedding6.parquet
-./simple_sparse_embedding/simple_sparse_embedding7.parquet
-./simple_sparse_embedding/simple_sparse_embedding8.parquet
-./simple_sparse_embedding/simple_sparse_embedding9.parquet
-
-
-

Example:

-
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = ["./wdl_norm/file_list.txt"],
-                                  eval_source = "./wdl_norm/file_list_test.txt",
-                                  check_type = hugectr.Check_t.Non)
-
-
-
-
-

Raw

-

The Raw dataset format is different from the Parquet dataset format in several aspects:

-
    -
  1. Raw dataset only consists of a single binary file.

  2. -
  3. Raw dataset file only supports static hotness.

  4. -
  5. Raw dataset file only supports unsigned int datatype of categorical features.

  6. -
  7. The datatype of dense features can be either float or unsigned int.

  8. -
-

Raw dataset outperforms others in terms of IO throughput. HugeCTR has 3 types of data reader that can load data from disk to model, with respect to embedding types.

- - - - - - - - - - - - - - - - - - - - -

reader type

hotness

specific embedding type

dense data type

hugectr.DataReaderType_t.RawAsync
+AsyncParam::multi_hot_reader=False

1-hot

HybridSparseEmbedding

unsigned int

hugectr.DataReaderType_t.RawAsync
+AsyncParam::multi_hot_reader=True

static multi-hot

embedding collection

float or unsigned int

-

Please refer to DataReaderParams for more details about AsyncParam.

-

NOTE

-

When the dense type of Raw dataset is unsigned int, the data reader will perform log(x+1) on dense features x before feeding them into model network.

-

The LocalizedSlotSparseEmbeddingOneHot and HybridSparseEmbedding are going to be incorporated into 3G embedding.

-

Example:

-
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.RawAsync,
-                                  source = ["./wdl_raw/train_data.bin"],
-                                  eval_source = "./wdl_raw/validation_data.bin",
-                                  async_param=hugectr.AsyncParam(
-                                  multi_hot_reader=True,
-                                  is_dense_float=True))
-
-
-
-
-

Parquet

-

Parquet is a column-oriented, open source, and free data format. It is available to any project in the Apache Hadoop ecosystem. To reduce the file size, it supports compression and encoding. Fig. 1 © shows an example Parquet dataset. For additional information, see the parquet documentation.

-

Please note the following:

-
    -
  • Nested column types are not currently supported in the Parquet data loader.

  • -
  • Any missing values in a column are not allowed.

  • -
  • Like the Norm dataset format, the label and dense feature columns should use the float format.

  • -
  • The Slot feature columns should use the Int64 format.

  • -
  • The data columns within the Parquet file can be arranged in any order.

  • -
  • To obtain the required information from all the rows in each parquet file and column index mapping for each label, dense (numerical), and slot (categorical) feature, a separate _metadata.json file is required.

  • -
-

Example _metadata.json file:

-
{
-   "file_stats":[
-      {
-         "file_name":"file0.parquet",
-         "num_rows":409600
-      },
-      {
-         "file_name":"file1.parquet",
-         "num_rows":409600
-      }
-   ],
-   "cats":[
-      {
-         "col_name":"C1",
-         "index":4
-      },
-      {
-         "col_name":"C2",
-         "index":5
-      },
-      {
-         "col_name":"C3",
-         "index":6
-      },
-      {
-         "col_name":"C4",
-         "index":7
-      }
-   ],
-   "conts":[
-      {
-         "col_name":"I1",
-         "index":1
-      },
-      {
-         "col_name":"I2",
-         "index":2
-      },
-      {
-         "col_name":"I3",
-         "index":3
-      }
-   ],
-   "labels":[
-      {
-         "col_name":"label",
-         "index":0
-      }
-   ]
-}
-
-
-
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = ["./parquet_data/train/_file_list.txt"],
-                                  eval_source = "./parquet_data/val/_file_list.txt",
-                                  check_type = hugectr.Check_t.Non,
-                                  slot_size_array = [10000, 50000, 20000, 300])
-
-
-

We provide an option to add offset for each slot by specifying slot_size_array. slot_size_array is an array whose length is equal to the number of slots. To avoid duplicate keys after adding offset, we need to ensure that the key range of the i-th slot is between 0 and slot_size_array[i]. We will do the offset in this way: for i-th slot key, we add it with offset slot_size_array[0] + slot_size_array[1] + … + slot_size_array[i - 1]. In the configuration snippet noted above, for the 0th slot, offset 0 will be added. For the 1st slot, offset 10000 will be added. And for the third slot, offset 60000 will be added. The length of slot_size_array should be equal to the length of "cats" entry in _metadata.json.

-

The _metadata.json is generated by NVTabular preprocessing and reside in the same folder of the file list. Basically, it contain four entries of "file_stats" (file statistics), "cats" (categorical columns), "conts" (continuous columns), and "labels" (label columns). The "col_name" and "index" in _metadata.json indicate the name and the index of a specific column in the parquet data frame. You can also edit the generated _metadata.json to only read the desired columns for model training. For example, you can modify the above _metadata.json and change the configuration correspondingly:

-

Example _metadata.json file after edits:

-
{
-   "file_stats":[
-      {
-         "file_name":"file0.parquet",
-         "num_rows":409600
-      },
-      {
-         "file_name":"file1.parquet",
-         "num_rows":409600
-      }
-   ],
-   "cats":[
-      {
-         "col_name":"C2",
-         "index":5
-      },
-      {
-         "col_name":"C4",
-         "index":7
-      }
-   ],
-   "conts":[
-      {
-         "col_name":"I1",
-         "index":1
-      },
-      {
-         "col_name":"I3",
-         "index":3
-      }
-   ],
-   "labels":[
-      {
-         "col_name":"label",
-         "index":0
-      }
-   ]
-}
-
-
-
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = ["./parquet_data/train/_file_list.txt"],
-                                  eval_source = "./parquet_data/val/_file_list.txt",
-                                  check_type = hugectr.Check_t.Non,
-                                  slot_size_array = [50000, 300])
-
-
-
-
-
-

OptParamsPy

-
-

CreateOptimizer method

-
hugectr.CreateOptimizer()
-
-
-

CreateOptimizer returns an OptParamsPy object according to the custom argument values,which specify the optimizer type and the corresponding hyperparameters. The OptParamsPy object will be used to initialize the Model instance and it applies to the weights of dense layers. Sparse embedding layers which do not have a specified optimizer will adopt this optimizer as well. Please NOTE that the hyperparameters should be configured meticulously when mixed precision training is employed, e.g., the epsilon value for the Adam optimizer should be set larger.

-

The embedding update supports three algorithms specified with update_type:

-
    -
  • Local (default value): The optimizer will only update the hot columns (embedding vectors which is hit in this iteration of training) of an embedding in each iteration.

  • -
  • Global: The optimizer will update all the columns. The embedding update type takes longer than the other embedding update types.

  • -
  • LazyGlobal: The optimizer will only update the hot columns of an embedding in each iteration while using different semantics from the local and global updates.

  • -
-

Arguments

-
    -
  • optimizer_type: The optimizer type to be used. The supported types include hugectr.Optimizer_t.Adam, hugectr.Optimizer_t.MomentumSGD, hugectr.Optimizer_t.Nesterov and hugectr.Optimizer_t.SGD, hugectr.Optimizer_t.AdaGrad, hugectr.Optimizer_t.Ftrl. The default value is hugectr.Optimizer_t.Adam.

  • -
  • update_type: The update type for the embedding. The supported types include hugectr.Update_t.Global, hugectr.Update_t.Local, and hugectr.Update_t.LazyGlobal(Adam only). The default value is hugectr.Update_t.Global.

  • -
  • beta: The beta value when using Ftrl optimizer. The default value is 0.

  • -
  • beta1: The beta1 value when using Adam optimizer. The default value is 0.9.

  • -
  • beta2: The beta2 value when using Adam optimizer. The default value is 0.999.

  • -
  • lambda1: The lambda1 value when using Ftrl optimizer. The default value is 0.

  • -
  • lambda2: The lambda2 value when using Ftrl optimizer. The default value is 0.

  • -
  • epsilon: The epsilon value when using Adam optimizer. This argument should be well configured when mixed precision training is employed. The default value is 1e-7.

  • -
  • momentum_factor: The momentum_factor value when using MomentumSGD or Nesterov optimizer. The default value is 0.

  • -
  • atomic_update: Whether to employ atomic update when using SGD optimizer. The default value is True.

  • -
-

Example:

-
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
-                                    update_type = hugectr.Update_t.Global,
-                                    beta1 = 0.9,
-                                    beta2 = 0.999,
-                                    epsilon = 0.0000001)
-
-
-
-
-
-
-

Layers

-

There are three major kinds of layer in HugeCTR:

- -

Please refer to hugectr_layer_book for detail guides on how to use different layer types.

-
-

Model

-
-

Model class

-
hugectr.Model()
-
-
-

Model groups data input, embeddings and dense network into an object with traning features. The construction of Model requires a Solver instance , a DataReaderParams instance, an OptParamsPy instance.

-

Arguments

-
    -
  • solver: A hugectr.Solver object, the solver configuration for the model.

  • -
  • reader_params: A hugectr.DataReaderParams object, the data reader configuration for the model.

  • -
  • opt_params: A hugectr.OptParamsPy object, the optimizer configuration for the model.

  • -
-
-
-
-

add method

-
hugectr.Model.add()
-
-
-

The add method of Model adds an instance of Input, SparseEmbedding, DenseLayer, GroupDenseLayer, or EmbeddingCollection to the created Model object. -Typically, a Model object is comprised of one Input, several SparseEmbedding and a series of DenseLayer instances. -Please note that the loss function for HugeCTR model training is taken as a DenseLayer instance.

-

Arguments

-
    -
  • input or sparse_embedding or dense_layer: This method is an overloaded method that can accept hugectr.Input, hugectr.SparseEmbedding, hugectr.DenseLayer, hugectr.GroupDenseLayer, or hugectr.EmbeddingCollection as an argument. -It allows the users to construct their model flexibly without the JSON configuration file.

  • -
-

Refer to the HugeCTR Layer Classes and Methods for information about the layers and embedding collection.

-
-
-
-

compile method

-
hugectr.Model.compile()
-
-
-

This method requires no extra arguments. It allocates the internal buffer and initializes the model. For multi-task models, can optionally take two arguments.

-

Arguments

-
    -
  • loss_names: List of Strings, the list of loss label names to provide weights for.

  • -
  • loss_weights: List of Floats, the weights to be assigned to each loss label. Number of elements must match the number of loss_names.

  • -
-
-
-
-

fit method

-
hugectr.Model.fit()
-
-
-

It trains the model for a fixed number of epochs (epoch mode) or iterations (non-epoch mode). You can switch the mode of training through different configurations. To use epoch mode training, repeat_dataset within CreateSolver() should be set as False and num_epochs within Model.fit() should be set as a positive number. To use non-epoch mode training, repeat_dataset within CreateSolver() should be set as True and max_iter within Model.fit() should be set as a positive number.

-

Arguments

-
    -
  • num_epochs: Integer, the number of epochs for epoch mode training. It will be ignored if repeat_dataset is True. The default value is 0.

  • -
  • max_iter: Integer, the maximum iteration of non-epoch mode training. It will be ignored if repeat_dataset is False. The default value is 2000.

  • -
  • display: Integer, the interval of iterations at which the training loss will be displayed. The default value is 200.

  • -
  • eval_interval: Integer, the interval of iterations at which the evaluation will be executed. The default value is 1000.

  • -
  • snapshot: Integer, the interval of iterations at which the snapshot model weights and optimizer states will be saved to files. This argument is invalid when embedding training cache is being used, which means no model parameters will be saved. The default value is 10000.

  • -
  • snapshot_prefix: String, the prefix of the file names for the saved model weights and optimizer states. This argument is invalid when embedding training cache is being used, which means no model parameters will be saved. The default value is ''. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported. -Please note that dumping models to remote file system when enabled MPI is not supported yet.

  • -
-
-
-
-

summary method

-
hugectr.Model.summary()
-
-
-

This method takes no extra arguments and prints a string summary of the model. Users can have an overview of the model structure with this method. Please NOTE that the first dimension of displayed tensors is the per-GPU batchsize.

-
-
-
-

graph_to_json method

-
hugectr.Model.graph_to_json()
-
-
-

This method saves the model graph to a JSON file, which can be used for continuous training and inference.

-

Arguments

-
    -
  • graph_config_file: The JSON file to which the model graph will be saved. There is NO default value and it should be specified by users.

  • -
-
-
-
-

construct_from_json method

-
hugectr.Model.construct_from_json()
-
-
-

This method constructs the model graph from a saved JSON file, which is useful for continuous training and fine-tune.

-

Arguments

-
    -
  • graph_config_file: The saved JSON file from which the model graph will be constructed. There is NO default value and it should be specified by users.

  • -
  • include_dense_network: Boolean, whether to include the dense network when constructing the model graph. If it is True, the whole model graph will be constructed, then both saved sparse model weights and dense model weights can be loaded. If it is False, only the sparse embedding layers will be constructed and the corresponding sparse model weights can be loaded, which enables users to construct a new dense network on top of that. Please NOTE that the HugeCTR layers are organized by names and you can check the input name, output name and output shape and of the added layers with Model.summary(). There is NO default value and it should be specified by users.

  • -
-
-
-
-

load_dense_weights method

-
hugectr.Model.load_dense_weights()
-
-
-

This method load the dense weights from the saved dense model file.

-

Arguments

-
    -
  • dense_model_file: String, the saved dense model file from which the dense weights will be loaded. There is NO default value and it should be specified by users. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.

  • -
-
-
-
-

load_dense_optimizer_states method

-
hugectr.Model.load_dense_optimizer_states()
-
-
-

This method load the dense optimizer states from the saved dense optimizer states file.

-

Arguments

-
    -
  • dense_opt_states_file: String, the saved dense optimizer states file from which the dense optimizer states will be loaded. There is NO default value and it should be specified by users. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.

  • -
-
-
-
-

load_sparse_weights method

-
hugectr.Model.load_sparse_weights()
-
-
-

This method load the sparse weights from the saved sparse embedding files.

-

Implementation Ⅰ

-

Arguments

-
    -
  • sparse_embedding_files: List[str], the sparse embedding files from which the sparse weights will be loaded. The number of files should equal to that of the sparse embedding layers in the model. There is NO default value and it should be specified by users. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.

  • -
-

Implementation Ⅱ

-

Arguments

-
    -
  • sparse_embedding_files_map: Dict[str, str], the sparse embedding file will be loaded by the embedding layer with the specified sparse embedding name. There is NO default value and it should be specified by users.

  • -
-

Example:

-
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                            workspace_size_per_gpu_in_mb = 23,
-                            embedding_vec_size = 1,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding2",
-                            bottom_name = "wide_data",
-                            optimizer = optimizer))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                            workspace_size_per_gpu_in_mb = 358,
-                            embedding_vec_size = 16,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding1",
-                            bottom_name = "deep_data",
-                            optimizer = optimizer))
-# ...
-model.load_sparse_weights(["wdl_0_sparse_4000.model", "wdl_1_sparse_4000.model"]) # load models for both embedding layers
-model.load_sparse_weights({"sparse_embedding1": "wdl_1_sparse_4000.model"}) # or load the model for one embedding layer
-
-
-
-
-
-

load_sparse_optimizer_states method

-
hugectr.Model.load_sparse_optimizer_states()
-
-
-

This method load the sparse optimizer states from the saved sparse optimizer states files.

-

Implementation Ⅰ

-

Arguments

-
    -
  • sparse_opt_states_files: List[str], the sparse optimizer states files from which the sparse optimizer states will be loaded. The number of files should equal to that of the sparse embedding layers in the model. There is NO default value and it should be specified by users. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.

  • -
-

Implementation Ⅱ

-

Arguments

-
    -
  • sparse_opt_states_files_map: Dict[str, str], the sparse optimizer states file will be loaded by the embedding layer with the specified sparse embedding name. There is NO default value and it should be specified by users.

  • -
-
-
-
-

freeze_dense method

-
hugectr.Model.freeze_dense()
-
-
-

This method takes no extra arguments and freezes the dense weights of the model. Users can use this method when they want to fine-tune the sparse weights.

-
-
-
-

freeze_embedding method

-
hugectr.Model.freeze_embedding()
-
-
-

Implementation Ⅰ: freeze the weights of all the embedding layers. -This method takes no extra arguments and freezes the sparse weights of the model. Users can use this method when they only want to train the dense weights.

-

Implementation Ⅱ: freeze the weights of a specific embedding layer. Please refer to Section 3.4 of HugeCTR Criteo Notebook for the usage.

-

Arguments

-
    -
  • embedding_name: String, the name of the embedding layer.

  • -
-

Example:

-
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                            workspace_size_per_gpu_in_mb = 23,
-                            embedding_vec_size = 1,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding2",
-                            bottom_name = "wide_data",
-                            optimizer = optimizer))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                            workspace_size_per_gpu_in_mb = 358,
-                            embedding_vec_size = 16,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding1",
-                            bottom_name = "deep_data",
-                            optimizer = optimizer))
-# ...
-model.freeze_embedding() # freeze all the embedding layers
-model.freeze_embedding("sparse_embedding1") # or free a specific embedding layer
-
-
-
-
-
-

unfreeze_dense method

-
hugectr.Model.unfreeze_dense()
-
-
-

This method takes no extra arguments and unfreezes the dense weights of the model.

-
-
-
-

unfreeze_embedding method

-
hugectr.Model.unfreeze_embedding()
-
-
-

Implementation Ⅰ: unfreeze the weights of all the embedding layers. -This method takes no extra arguments and unfreezes the sparse weights of the model.

-

Implementation Ⅱ: unfreeze the weights of a specific embedding layer.

-

Arguments

-
    -
  • embedding_name: String, the name of the embedding layer.

  • -
-
-
-
-

reset_learning_rate_scheduler method

-
hugectr.Model.reset_learning_rate_scheduler()
-
-
-

This method resets the learning rate scheduler of the model. Users can use this method when they want to fine-tune the model weights.

-

Arguments

-
    -
  • base_lr: The base learning rate for the internal learning rate scheduler within Model instance. There is NO default value and it should be specified by users.

  • -
  • warmup_steps: The warmup steps for the internal learning rate scheduler within Model instance. The default value is 1.

  • -
  • decay_start: The step at which the learning rate decay starts for the internal learning rate scheduler within Model instance. The default value is 0.

  • -
  • decay_steps: The number of steps of the learning rate decay for the internal learning rate scheduler within Model instance. The default value is 1.

  • -
  • decay_power: The power of the learning rate decay for the internal learning rate scheduler within Model instance. The default value is 2.

  • -
  • end_lr: The final learning rate for the internal learning rate scheduler within Model instance. The default value is 0.

  • -
-
-
-
-

set_source method

-
hugectr.Model.set_source()
-
-
-

The set_source method can set the data source and keyset files under epoch mode training. This overloaded method has two implementations.

-

Implementation Ⅰ: only valid when repeat_dataset is False.

-

Arguments

-
    -
  • source: List[str], the training dataset source. It can be specified with several file lists, e.g., source = ["file_list.1.txt", "file_list.2.txt"]. There is NO default value and it should be specified by users.

  • -
  • keyset: List[str], the keyset files. It should be corresponding to the source. For example, we can specify source = ["file_list.1.txt", "file_list.2.txt"] and source = ["file_list.1.keyset", "file_list.2.keyset"], which have a one-to-one correspondence. There is NO default value and it should be specified by users.

  • -
  • eval_source: String, the evaluation dataset source. There is NO default value and it should be specified by users.

  • -
-

Implementation Ⅱ: only valid when repeat_dataset is False.

-

Arguments

-
    -
  • source: String, the training dataset source. For Norm or Parquet dataset, it should be the file list of training data. For Raw dataset, it should be a single training file. There is NO default value and it should be specified by users.

  • -
  • eval_source: String, the evaluation dataset source. For Norm or Parquet dataset, it should be the file list of evaluation data. For Raw dataset, it should be a single evaluation file. There is NO default value and it should be specified by users.

  • -
-
-
-
-
-

Low-level Training API

-

For HugeCTR low-level training API, the core data structures are basically the same as the high-level training API. On this basis, we expose the internal LearningRateScheduler, DataReader within the Model, and provide some low-level training methods as well.HugeCTR currently supports both epoch mode training and non-epoch mode training for dataset in Norm and Raw formats, and only supports non-epoch mode training for dataset in Parquet format. While introducing the API usage, we will elaborate how to employ these two modes of training.

-
-

LearningRateScheduler

-
-

get_next method

-
hugectr.LearningRateScheduler.get_next()
-
-
-

This method takes no extra arguments and returns the learning rate to be used for the next iteration.

-
-
-
-

DataReader

-
-

set_source method

-
hugectr.DataReader32.set_source()
-hugectr.DataReader64.set_source()
-
-
-

The set_source method of DataReader currently supports the dataset in Norm and Raw formats, and should be used in epoch mode training. When the data reader reaches the end of file for the current training data or evaluation data, this method can be used to re-specify the training data file or evaluation data file.

-

Arguments

-
    -
  • file_name: The file name of the new training source or evaluation source. For Norm format dataset, it takes the form of file_list.txt. For Raw format dataset, it appears as data.bin. The default value is '', which means that the data reader will reset to the beginning of the current data file.

  • -
-
-
-
-

is_eof method

-
hugectr.DataReader32.is_eof()
-hugectr.DataReader64.is_eof()
-
-
-

This method takes no extra arguments and returns whether the data reader has reached the end of the current source file.

-
-
-
-

Model

-
-

get_learning_rate_scheduler method

-
hugectr.Model.get_learning_rate_scheduler()
-
-
-

hugectr.Model.get_learning_rate_scheduler generates and returns the LearningRateScheduler object of the model instance. When the SGD optimizer is adopted for training, the returned object can obtain the dynamically changing learning rate according to the warmup_steps, decay_start and decay_steps configured in the hugectr.CreateSolver method. -Refer to SGD Optimizer and Learning Rate Scheduling) if you want to get detailed information about LearningRateScheduler.

-
-
-
-

get_data_reader_train method

-
hugectr.Model.get_data_reader_train()
-
-
-

This method takes no extra arguments and returns the DataReader object that reads the training data.

-
-
-
-

get_data_reader_eval method

-
hugectr.Model.get_data_reader_eval()
-
-
-

This method takes no extra arguments and returns the DataReader object that reads the evaluation data.

-
-
-
-

start_data_reading method

-
hugectr.Model.start_data_reading()
-
-
-

This method takes no extra arguments and should be used if and only if it is under the non-epoch mode training. The method starts the train_data_reader and eval_data_reader before entering the training loop.

-
-
-
-

set_learning_rate method

-
hugectr.Model.set_learning_rate()
-
-
-

This method is used together with the get_next method of LearningRateScheduler and sets the learning rate for the next training iteration.

-

Arguments

-
    -
  • lr: Float, the learning rate to be set。

  • -
-
-
-
-

train method

-
hugectr.Model.train()
-
-
-

This method takes no extra arguments and executes one iteration of the model weights based on one minibatch of training data.

-
-
-
-

get_current_loss method

-
hugectr.Model.get_current_loss()
-
-
-

This method takes no extra arguments and returns the loss value for the current iteration.

-
-
-
-

eval method

-
hugectr.Model.eval()
-
-
-

This method takes no arguments and calculates the evaluation metrics based on one minibatch of evaluation data.

-
-
-
-

get_eval_metrics method

-
hugectr.Model.get_eval_metrics()
-
-
-

This method takes no extra arguments and returns the average evaluation metrics of several minibatches of evaluation data.

-
-
-
-

save_params_to_files method

-
hugectr.Model.save_params_to_files()
-
-
-

This method save the model parameters to files. If Embedding Training Cache is utilized, this method will save sparse weights, dense weights and dense optimizer states. Otherwise, this method will save sparse weights, sparse optimizer states, dense weights and dense optimizer states.

-

The stored sparse model can be used for both the later training and inference cases. Each sparse model will be dumped as a separate folder that contains two files (key, emb_vector) for the DistributedSlotEmbedding or three files (key, slot_id, emb_vector) for the LocalizedSlotEmbedding. Details of these files are:

-
    -
  • key: The unique keys appeared in the training data. All keys are stored in long long format, and HugeCTR will handle the datatype conversion internally for the case when i64_input_key = False.

  • -
  • slot_id: The key distribution info internally used by the LocalizedSlotEmbedding.

  • -
  • emb_vector: The embedding vectors corresponding to keys stored in the key file.

  • -
-

Note that the key, slot id, and embedding vector are stored in the sparse model in the same sequence, so both the nth slot id in slot_id file and the nth embedding vector in the emb_vector file are mapped to the nth key in the key file.

-

Arguments

-
    -
  • prefix: String, the prefix of the saved files for model weights and optimizer states. There is NO default value and it should be specified by users. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.Please note that dumping models to remote file system when enabled MPI is not supported yet.

  • -
  • iter: Integer, the current number of iterations, which will be the suffix of the saved files for model weights and optimizer states. The default value is 0.

  • -
-
-
-
-

check_out_tensor method

-
hugectr.Model.check_out_tensor()
-
-
-

This method check out the tensor values for the latest training or evaluation iteration. The tensor values will be returned via a numpy array that has the same dimensions as the tensor. The data type of returned numpy array will always be float32, while the data type of the tensor can be float32 or float16 depending on use_mixed_precision. Please NOTE that separate tensors are used for HugeCTR training and evaluation flows, which needs to be specified as an argument of the method. This method can be helpful for debugging and verifying the correctness, given that the values of intermediate tensors can be easily checked out.

-

Arguments

-
    -
  • tensor_name: String, the name of the tensor that needs to be checked out. It should be within the names that are specified for tensors when creating the model graph using model.add.

  • -
  • tensor_type: hugectr.Tensor_t, the flow that the tensor belongs to, i.e., the training flow or the evaluation flow. The supported types are hugectr.Tensor_t.Train and hugectr.Tensor_t.Evaluate. If hugectr.Tensor_t.Train is specified, the gradients during backward propagation of the latest training iteration will be returned. If hugectr.Tensor_t.Evaluate is specified, the results during forward propagation of the latest evaluation iteration will be returned.

  • -
-

Example:

-
solver = hugectr.CreateSolver(max_eval_batches = 1280,
-                              batchsize_eval = 1024,
-                              batchsize = 4096,
-                              lr = 0.001,
-                              vvgpu = [[0]],
-                              repeat_dataset = True)
-...
-model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 13, dense_name = "dense",
-                        data_reader_sparse_param_array =
-                        [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                           workspace_size_per_gpu_in_mb = 75,
-                           embedding_vec_size = 16,
-                           combiner = "sum",
-                           sparse_embedding_name = "sparse_embedding1",
-                           bottom_name = "data1",
-                           optimizer = optimizer))
-...
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                           bottom_names = ["concat1"],
-                           top_names = ["fc1"],
-                           num_output=1024))
-...
-model.fit(...)
-# Return a numpy array of (4096, 26, 16)
-sparse_embedding1_train_flow = model.check_out_tensor("sparse_embedding1", hugectr.Tensor_t.Train) 
-# Return a numpy array of (1024, 1024)
-fc1_evaluate_flow = model.check_out_tensor("fc1", hugectr.Tensor_t.Evaluate)
-
-
-
-
-
-
-

Inference API

-
-

Deprecation Warning: the offline inference based on InferenceSession and InferenceModel will be deprecated in a future release. Please check out the Hierarchical Parameter Server for alternatives based on TensorFlow and TensorRT.

-
-

For HugeCTR inference API, the core data structures are InferenceParams and InferenceModel. They are designed and implemented for the purpose of multi-GPU offline inference. Please refer to HugeCTR Backend if online inference with Triton is needed.

-

Please NOTE that Inference API requires a configuration JSON file of the model graph, which can derived from the Model.graph_to_json() method. Besides, model_name within CreateSolver should be specified during training in order to correctly dump the JSON file.

-
-

InferenceParams

-
-

InferenceParams class

-
hugectr.inference.InferenceParams()
-
-
-

InferenceParams specifies the parameters related to the inference. An InferenceParams instance is required to initialize the InferenceModel instance.

-

Refer to the HPS Configuration documentation for the parameters.

-
-
-
-

InferenceModel

-
-

InferenceModel class

-
hugectr.inference.InferenceModel()
-
-
-

InferenceModel is a collection of inference sessions deployed on multiple GPUs, which can leverage Hierarchical Parameter Server Database Backend and enable concurrent execution. The construction of InferenceModel requires a model configuration file and an InferenceParams instance.

-

Arguments

-
    -
  • model_config_path: String, the inference model configuration file (which can be derived from Model.graph_to_json). There is NO default value and it should be specified by users.

  • -
  • inference_params: InferenceParams, the InferenceParams object. There is NO default value and it should be specified by users.

  • -
-
-
-
-

predict method

-
hugectr.inference.InferenceModel.predict()
-
-
-

The predict method of InferenceModel makes predictions based on the dataset of Norm or Parquet format. It will return the 2-D numpy array of the shape (max_batchsize*num_batches, label_dim), whose order is consistent with the sample order in the dataset. If max_batchsize*num_batches is greater than the total number of samples in the dataset, it will loop over the dataset. For example, there are totally 40000 samples in the dataset, max_batchsize equals 4096, num_batches equals 10 and label_dim equals 2. The returned array will be of the shape (40960, 2), of which first 40000 rows should be desired results and the last 960 rows correspond to the first 960 samples in the dataset.

-

Arguments

-
    -
  • num_batches: Integer, the number of prediction batches.

  • -
  • source: String, the source of prediction dataset. It should be the file list for Norm or Parquet format data.

  • -
  • data_reader_type: hugectr.DataReaderType_t, the data reader type. We support hugectr.DataReaderType_t.Parquet.

  • -
  • check_type: hugectr.Check_t, the check type for the data source. We currently support hugectr.Check_t.Sum and hugectr.Check_t.Non.

  • -
  • slot_size_array: List[int], the cardinality array of input features. It should be consistent with that of the sparse input. We requires this argument for Parquet format data. The default value is an empty list, which is suitable for Norm format data.

  • -
  • data_source_params: DataSourceParams(), specify the configurations of the data sources(Local, HDFS, or others) for data reading.

  • -
-
-
-
-

evaluate method

-
hugectr.inference.InferenceModel.evaluate()
-
-
-

The evaluate method of InferenceModel does evaluations based on the dataset of Norm or Parquet format. It requires that the dataset contains the label field. This method returns the AUC value for the specified evaluation batches.

-

Arguments

-
    -
  • num_batches: Integer, the number of evaluation batches.

  • -
  • source: String, the source of evaluation dataset. It should be the file list for Norm or Parquet format data.

  • -
  • data_reader_type: hugectr.DataReaderType_t, the data reader type. We support hugectr.DataReaderType_t.Parquet.

  • -
  • slot_size_array: List[int], the cardinality array of input features. It should be consistent with that of the sparse input. We requires this argument for Parquet format data. The default value is an empty list, which is suitable for Norm format data.

  • -
  • data_source_params: DataSourceParams(), specify the configurations of the data sources(Local, HDFS, or others) for data reading.

  • -
-
-
-
-

check_out_tensor method

-
hugectr.inference.InferenceModel.check_out_tensor()
-
-
-

This method check out the tensor values for the latest inference iteration. The tensor values will be returned via a numpy array that has the same dimensions as the tensor. The data type of returned numpy array will always be float32, while the data type of the tensor can be float32 or float16 depending on use_mixed_precision. This method can be helpful for debugging and verifying the correctness, given that the values of intermediate tensors can be easily checked out.

-

Arguments

-
    -
  • tensor_name: String, the name of the tensor that needs to be checked out. It should be within the tensor names of the graph JSON file that is used to create the InferenceModel object.

  • -
-

Example:

-
model_config = "dcn.json"
-inference_params = hugectr.inference.InferenceParams(
-    model_name = "dcn",
-    max_batchsize = 16,
-    hit_rate_threshold = 1.0,
-    dense_model_file = "dcn_dense_1000.model",
-    sparse_model_files = ["dcn0_sparse_1000.model"],
-    deployed_devices = [0,1,2,3],
-    use_gpu_embedding_cache = True,
-    cache_size_percentage = 0.5,
-    i64_input_key = True,
-    use_mixed_precision = False,
-    use_cuda_graph = True,
-    number_of_worker_buffers_in_pool = 16
-)
-inference_model = hugectr.inference.InferenceModel(model_config, inference_params)
-pred = inference_model.predict(
-    1,
-    EVAL_SOURCE,
-    hugectr.DataReaderType_t.Parquet,
-    hugectr.Check_t.Non,
-    SLOT_SIZE_ARRAY
-)
-# Return a numpy array of (16, 26, 16), assuming slot_num is 26, embed_vec_size is 16
-sparse_embedding1_inference_flow = inference_model.check_out_tensor("sparse_embedding1")
-
-
-
-
-
-
-

Data Generator API

-

For HugeCTR data generator API, the core data structures are DataGeneratorParams and DataGenerator. Please refer to data_generator directory in the HugeCTR repository on GitHub to acknowledge how to write Python scripts to generate synthetic dataset and start training HugeCTR model.

-
-

DataGeneratorParams class

-
hugectr.tools.DataGeneratorParams()
-
-
-

DataGeneratorParams specifies the parameters related to the data generation. An DataGeneratorParams instance is required to initialize the DataGenerator instance.

-

Arguments

-
    -
  • format: The format for synthetic dataset. The supported types include hugectr.DataReaderType_t.Parquet and hugectr.DataReaderType_t.Raw. There is NO default value and it should be specified by users.

  • -
  • label_dim: Integer, the label dimension for synthetic dataset. There is NO default value and it should be specified by users.

  • -
  • dense_dim: Integer, the number of dense (or continuous) features for synthetic dataset. There is NO default value and it should be specified by users.

  • -
  • num_slot: Integer, the number of sparse feature slots for synthetic dataset. There is NO default value and it should be specified by users.

  • -
  • i64_input_key: Boolean, whether to use I64 for input keys for synthetic dataset. If your dataset format is Norm or Paruqet, you can choose the data type of each input key. For the Raw dataset format, only I32 is allowed. There is NO default value and it should be specified by users.

  • -
  • source: String, the synthetic training dataset source. For Norm or Parquet dataset, it should be the file list of training data, e.g., source = “file_list.txt”. For Raw dataset, it should be a single training file, e.g., source = “train_data.bin”. There is NO default value and it should be specified by users.

  • -
  • eval_source: String, the synthetic evaluation dataset source. For Norm or Parquet dataset, it should be the file list of evaluation data, e.g., source = “file_list_test.txt”. For Raw dataset, it should be a single evaluation file, e.g., source = “test_data.bin”. There is NO default value and it should be specified by users.

  • -
  • slot_size_array: List[int], the cardinality array of input features for synthetic dataset. The list length should be equal to num_slot. There is NO default value and it should be specified by users.

  • -
  • nnz_array: List[int], the number of non-zero entries in each slot for synthetic dataset. The list length should be equal to num_slot. This argument helps to simulate one-hot or multi-hot encodings. The default value is an empty list and one-hot encoding will be employed then.

  • -
  • dist_type: The distribution of the sparse input keys for synthetic dataset. The supported types include hugectr.Distribution_t.PowerLaw and hugectr.Distribution_t.Uniform. The default value is hugectr.Distribution_t.PowerLaw.

  • -
  • power_law_type: The specific distribution of power law distribution. The supported types include hugectr.PowerLaw_t.Long (alpha=0.9), hugectr.PowerLaw_t.Medium (alpha=1.1), hugectr.PowerLaw_t.Short (alpha=1.3) and hugectr.PowerLaw_t.Specific (requiring a specific alpha value). This argument is only valid when dist_type is hugectr.Distribution_t.PowerLaw. The default value is hugectr.PowerLaw_t.Specific.

  • -
  • alpha: Float, the alpha value for power law distribution. This argument is only valid when dist_type is hugectr.Distribution_t.PowerLaw and power_law_type is hugectr.PowerLaw_t.Specific. The alpha value should be greater than zero and not equal to 1.0. The default value is 1.2.

  • -
  • num_files: Integer, the number of training data files that will be generated. This argument is valid when format is hugectr.DataReaderType_t.Parquet. The default value is 128.

  • -
  • eval_num_files: Integer, the number of evaluation data files that will be generated. This argument is valid when format is hugectr.DataReaderType_t.Parquet. The default value is 32.

  • -
  • num_samples_per_file: Integer, the number of samples per generated data file. This argument is valid when format is hugectr.DataReaderType_t.Parquet. The default value is 40960.

  • -
  • num_samples: Integer, the number of samples in the generated single training data file (e.g., train_data.bin). This argument is only valid when format is hugectr.DataReaderType_t.Raw. The default value is 5242880.

  • -
  • eval_num_samples: Integer, the number of samples in the generated single evaluation data file (e.g., test_data.bin). This argument is only valid when format is hugectr.DataReaderType_t.Raw. The default value is 1310720.

  • -
  • float_label_dense: Boolean, this is only valid when format is hugectr.DataReaderType_t.Raw. If its value is set to True, the label and dense features for each sample are interpreted as float values. Otherwise, they are regarded as integer values while the dense features are preprocessed with log(dense[i] + 1.f). The default value is False.

  • -
-
-
-

DataGenerator

-
-

DataGenerator class

-
hugectr.tools.DataGenerator()
-
-
-

DataGenerator provides an API to generate synthetic Norm, Parquet or Raw dataset. The construction of DataGenerator requires a DataGeneratorParams instance.

-

Arguments

-
    -
  • data_generator_params: The DataGeneratorParams instance which encapsulates the required parameters for data generation. There is NO default value and it should be specified by users.

  • -
-
-
-

generate method

-
hugectr.tools.DataGenerator.generate()
-
-
-

This method takes no extra arguments and starts to generate the synthetic dataset based on the configurations within data_generator_params.

-
-
-
-
-

Data Source API

-
-

DataSourceParams class

-
hugectr.parquet.DataSourceParams()
-
-
-

DataSourceParams specifies the file system information and the paths to data and model used for training. A DataSourceParams instance is required to initialize the DataSource instance.

-

Arguments

-
    -
  • source: hugect.FileSystemType_t, can be Local or HDFS or S3 or GCS, specifying the file system. Default is hugectr.FileSystemType_t.Local.

  • -
  • server: String, the IP address of your file system. For Hadoop cluster(HDFS), it is your namenode. For AWS S3, it is the region. For GCS, it is the endpoint override (please put storage.googleapis.com if you are using the default GCS endpoint). Will be ignored if source is FileSystemType_t.Local. Default is ‘localhost’.

  • -
  • port: Integer, the port to listen from your Hadoop server. Will be ignored if source is FileSystemType_t.Local or FileSystemType_t.S3 or FileSystemType_t.GCS. Default is 9000.

  • -
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/genindex.html b/review/pr-458/genindex.html deleted file mode 100644 index b1aca8621b..0000000000 --- a/review/pr-458/genindex.html +++ /dev/null @@ -1,178 +0,0 @@ - - - - - - Index — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_database_backend.html b/review/pr-458/hierarchical_parameter_server/hps_database_backend.html deleted file mode 100644 index a4abf2cffe..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_database_backend.html +++ /dev/null @@ -1,844 +0,0 @@ - - - - - - - Hierarchical Parameter Server Database Backend — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Database Backend

- -
-

Introduction to the HPS Database Backend

-

The Hierarchical Parameter Server database backend (HPS database backend) allows HugeCTR to use models with huge embedding tables by extending HugeCTRs storage space beyond the constraints of GPU memory through utilizing various memory resources across you cluster. Further, it grants the ability to permanently store embedding tables in a structured manner. For an end-to-end demo on how to use the HPS database backend, please refer to samples.

-
-
-

Background

-

GPU clusters offer superior compute power, compared to their CPU-only counterparts. However, although modern data-center GPUs by NVIDIA are equipped with increasing amounts of memory, new and more powerful AI algorithms come into existence that require more memory. Recommendation models with their huge embedding tables are spearheading these developments. The HPS database backend allows you to efficiently perform inference with models that rely on embedding tables that vastly exceed the available GPU device storage space.

-

This is achieved through utilizing other memory resources, available within your clsuter, such as CPU accessible RAM and non-volatile memory. Aside from general advantages of non-volatile memory with respect to retaining stored information, storage devices such as HDDs and SDDs offer orders of magnitude more storage space than DDR memory and HBM (High Bandwidth Memory), at significantly lower cost. However, their throughout is lower and latency is higher than that of DRR and HBM.

-

The HPS database backend acts as an intermediate layer between your GPU and non-volatile memory to store all embeddings of your model. Thereby, available local RAM and/or RAM resources available across the cluster can be used as a cache to improve response times.

-
-
-

Architecture

-

As of version 3.3, the HugeCTR hierarchical parameter server database backend defines 3 storage layers.

-
    -
  1. The CPU Memory Database layer utilizes volatile CPU addressable RAM memory to cache embeddings. -This database is created and maintained separately by each machine that runs HugeCTR in your cluster.

  2. -
  3. The Distributed Database layer allows utilizing Redis cluster deployments to store and retrieve embeddings in and from the RAM memory that is available in your cluster. -The HugeCTR distributed database layer is designed for compatibility with Redis persistence features such as RDB and AOF to allow seamless continued operation across device restart. -This kind of database is shared by all nodes that participate in the training or inference of a HugeCTR model.

    -

    Note: Many products claim Redis compatibility. -We cannot guarantee or make any statements regarding the suitability of these with our distributed database layer. -However, we note that Redis alternatives are likely to be compatible with the Redis cluster distributed database layer as long as they are compatible with hiredis. -We would love to hear about your experiences. -Please let us know if you successfully or unsuccessfully deployed such Redis alternatives as storage targets with HugeCTR.

    -
  4. -
  5. The Persistent Database layer links HugeCTR with a persistent database. -Each node that has such a persistent storage layer configured retains a separate copy of all embeddings in its locally available non-volatile memory. -This layer is best considered as a compliment to the distributed database to further expand storage capabilities and to provide high availability. -As a result, if your model exceeds even the total RAM capacity of your entire cluster or if—for whatever reason—the Redis cluster becomes unavailable, all nodes that are configured with a persistent database are still able to respond to inference requests, though likely with increased latency.

  6. -
-

The following table provides an overview of the typical properties for the different parameter database layers and the embedding cache. -We emphasize that this table provides rough guidelines. -Properties for production deployments are often different.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

GPU Embedding Cache

CPU Memory Database

Distributed Database (InfiniBand)

Distributed Database (Ethernet)

Persistent Database

Mean Latency

ns ~ us

us ~ ms

us ~ ms

several ms

ms ~ s

Capacity (relative)

++

+++

+++++

+++++

+++++++

Capacity (range in practice)

10 GBs ~ few TBs

100 GBs ~ several TBs

several TBs

several TBs

up to 100s of TBs

Cost / Capacity

++++

+++

++++

++++

+

Volatile

yes

yes

configuration dependent

configuration dependent

no

Configuration / maintenance complexity

low

low

high

high

low

-
-
-

Training and Iterative Model Updates

-

Models that are deployed with the HugeCTR HPS database backend allow streaming model parameter updates from external sources through Apache Kafka. -This ability provides zero-downtime online model retraining.

-
-
-

Execution

-
-

Inference

-

With respect to embedding lookups from the HugeCTR GPU embedding cache and HPS database backend, the following logic applies:

-
    -
  • Whenever the HugeCTR inference engine receives a batch of model input parameters for inference, the inference engine first determines the associated unique embedding keys and tries to resolve these embeddings using the embedding cache.

  • -
  • When there is a cache miss, the inference engine then turns to the HPS database backend to determine the embedding representations.

  • -
  • The HPS database backend queries its configured backends in the following order to fill in the missing embeddings:

    -
      -
    1. Local and remote CPU memory locations.

    2. -
    3. Persistent storage.

    4. -
    -
  • -
-

HugeCTR first tries to look up missing embeddings in either the CPU memory database or the distributed database. -If, and only if, there are still missing embedding representations after that, HugeCTR tries the non-volatile memory from the persistent database to find the corresponding embedding representations. -The persistent database contains a copy of all existing embeddings.

-
-
-

Training

-

After a training iteration, model updates for updated embeddings are published through Kafka by the HugeCTR training process. -The HPS database backend can be configured to listen automatically to change requests for certain models and then ingest these updates in its various database stages.

-
-
-

Lookup Optimization

-

If the volatile memory resources—the CPU memory database and distributed database—are not sufficient to retain the entire model, HugeCTR attempts to minimize the average latency for lookup through managing these resources like a cache by using a least recently used (LRU) algorithm.

-
-
-
-

Configuration

-

The HugeCTR HPS database backend and iterative update can be configured using three separate configuration objects. -The VolatileDatabaseParams and PersistentDatabaseParams objects are used to configure the database backends of each HPS database backend instance. -If you want iterative or online model updating, you must also provide the UpdateSourceParams object to link the HPS database backend instance with your Kafka deployment. -These objects are part of the hugectr.inference Python package.

-

If you deploy HugeCTR as a backend for NVIDIA Triton Inference Server, you can also provide these configuration options by extending your Triton deployment’s JSON configuration:

-
{
-  "supportlonglong": true,
-  "fuse_embedding_table": false,
-  // ...
-  "volatile_db": {
-    // ...
-  },
-  "persistent_db": {
-    // ...
-  },
-  "update_source": {
-    // ...
-  },
-  // ...
-}
-
-
-

Set the supportlonglong field to True when you need to use a 64-bit integer input key. -You must set this field to true if you specify True for the i64_input_key parameter. -The default value is True.

-

Set the fuse_embedding_table field to True when you want to fuse embedding tables. The tables with the same embedding vector size will be fused in storage during HPS initialization. At each iteration, original lookup queries are packed into one via CPU multi-thread synchronization and the packed query is forward to the fused embedding table. To use this feature, please ensure that key values in different tables have no overlap and the embedding lookup layers have no dependency to each other in the model graph. This is valid for HPS Plugin for TensorFlow, HPS Plugin for Torch and HPS Backend for Triton Inference Server. The default value is False.

-

The following sections describe the configuration parameters. -Generally speaking, each node in your HugeCTR cluster should deploy the same configuration. -In rare cases, it might make sense to vary some parameters. -The most common reason to vary the configuration by node is for heterogeneous clusters.

-
-

Inference Parameters and Embedding Cache Configuration

-
-

Inference Params Syntax

-
params = hugectr.inference.InferenceParams(
-  model_name = "string",
-  max_batchsize = int,
-  hit_rate_threshold = 0.9,
-  dense_model_file = "string",
-  network_file = "string",
-  sparse_model_files = ["string-1", "string-2", ...],
-  use_gpu_embedding_cache = True,
-  cache_size_percentage = 0.2,
-  i64_input_key = <True|False>,
-  use_mixed_precision = False,
-  scaler = 1.0,
-  use_algorithm_search = True,
-  use_cuda_graph = True,
-  number_of_worker_buffers_in_pool = 2,
-  number_of_refresh_buffers_in_pool = 1,
-  thread_pool_size = 16,
-  cache_refresh_percentage_per_iteration = 0.1,
-  deployed_devices = [int-1, int-2, ...],
-  default_value_for_each_table = [float-1, float-2, ...],
-  volatile_db = <volatile-database-configuration>,
-  persistent_db = <persistent-database-configuration>,
-  update_source = <update-source-parameters>,
-  maxnum_des_feature_per_sample = 26,
-  embedding_cache_type = "dynamic",
-  refresh_delay = 0.0,
-  refresh_interval = 0.0,
-  maxnum_catfeature_query_per_table_per_sample = [int-1, int-2, ...],
-  embedding_vecsize_per_table = [int-1, int-2, ...],
-  embedding_table_names = ["string-1", "string-2", ...]
-)
-
-
-

The InferenceParams object specifies the parameters related to the inference. -An InferenceParams object is required to initialize the InferenceModel instance.

-
-
-

Inference Parameters

-
    -
  • model_name: String, specifies the name of the model to use for inference. -It should be consistent with the model_name that you specified during training. -This parameter has no default value and you must specify a value.

  • -
  • max_batchsize: Integer, the maximum batch size for inference. -The specific value is the global batch size and should be divisible by the length of deployed_devices. -This parameter has no default value and you must specify a value.

  • -
  • hit_rate_threshold: Float, the real hit rate of GPU embedding cache during inference. -When the real hit rate of the GPU embedding cache is higher than the specified threshold, the GPU embedding cache performs an asynchronous insertion of missing embedding keys. -Otherwise, the GPU embedding cache inserts the keys synchronously. -Specify a value between 0 and 1. -The default value is 0.9

  • -
  • dense_model_file: String, the dense model file to load for inference. -This parameter has no default value and you must specify a value.

  • -
  • network_file: String, specifies a file that includes the model network structure in JSON format. -This file is exported after model training and is used for the initialization of the network structure of the dense part of the model. -This parameter has no default value and you must specify a value.

  • -
  • sparse_model_files: List[str], the sparse model files to load for inference. -This parameter has no default value and you must specify a value. Remote file systems(HDFS, S3, and GCS) are also supported. For example, for HDFS, the prefix can be hdfs://localhost:9000/dir/to/model. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official documentation. For GCS, both URI (gs://bucket/object) and URL (https://https://storage.googleapis.com/bucket/object) are supported.

  • -
  • device_id: Integer, is scheduled to be deprecated and replaced by devicelist.

  • -
  • use_gpu_embedding_cache: Boolean, whether to employ the features of GPU embedding cache. -When set to True, the embedding vector look up goes to the GPU embedding cache. -Otherwise, the look up attempts to use the CPU HPS database backend directly. -The default value is True.

  • -
  • embedding_cache_type: String, specify the type of embedding cache. Three types are supported: "dynamic", "static", "uvm". The lookup performance can be ranked from low to high as "dynamic", "uvm", "static". The default value is "dynamic". The functional differences between the three types of embedding cache are shown in the following table

  • -
-
- - - - - - - - - - - - - - - - - - - - - -

Type

Support Dynamic Update

Offload Embeddings to CPU

"dynamic"

Yes

Yes

"static"

No

No

"uvm"

No

Yes

-
-
    -
  • cache_size_percentage: Float, the percentage of cached embeddings on the GPU, relative to all the embedding tables on the CPU. -The default value is 0.2.

  • -
  • i64_input_key: Boolean, this value should be set to True when you need to use an Int64 input key. -This parameter has no default value and you must specify a value.

  • -
  • use_mixed_precision: Boolean, whether to enable mixed precision inference. -The default value is False.

  • -
  • scaler: Float, the scaler to use when mixed precision training is enabled. -The function supports 128, 256, 512, and 1024 scalers only for mixed precision training. -The default value is 1.0 and corresponds to no mixed precision training.

  • -
  • use_algorithm_search: Boolean, whether to use algorithm search for cublasGemmEx within the fully connected layer. -The default value is True.

  • -
  • use_cuda_graph: Boolean, whether to enable CUDA graph for dense-network forward propagation. -The default value is True.

  • -
  • number_of_worker_buffers_in_pool: Integer, specifies the number of worker buffers to allocate in the embedded cache memory pool. -Specify a value such as two times the number of model instances to avoid resource exhaustion. -An alternative to specifying a larger value while still avoiding resource exhaustion is to disable asynchronous updates by setting the hit_rate_threshold parameter to greater than 1. -The default value is 2.

  • -
  • number_of_refresh_buffers_in_pool: Integer, specifies the number of refresh buffers to allocate in the embedded cache memory pool. -HPS uses the refresh memory pool to support online updates of incremental models. -Specify larger values if model updates occur at a high-frequency or you have a large volume of incremental model updates. -The default value is 1.

  • -
  • thread_pool_size: Integer, specifies the size of the thread pool. The thread pool is used by the GPU embedding cache to perform asynchronous insertion of missing keys. -The actual thread pool size is set to the maximum of the value that you specify and the value returned by std::thread::hardware_concurrency(). -The default value is 16.

  • -
-

The actual thread pool size will be set as the maximum value of this configured one and std::thread::hardware_concurrency(). -The default value is 16.

-
    -
  • cache_refresh_percentage_per_iteration: Float, specifies the percentage of the embedding cache to refresh during each iteration. -To avoid reducing the performance of the GPU cache during online updating, you can configure the update percentage of GPU embedding cache. -For example, if you specify cache_refresh_percentage_per_iteration = 0.2, the entire GPU embedding cache is refreshed during 5 iterations. -Specify a smaller value if model updates occur at a high-frequency or you have a large volume of incremental model updates. -The default value is 0.0.

  • -
  • deployed_devices: List[Integer], specifies a list of the device IDs of your GPUs. -The offline inference is executed concurrently on the specified GPUs. -The default value is [0].

  • -
  • default_value_for_each_table:List[Float], specifies a default value when an embedding key cannot be returned. -When an embedding key can not be queried in the GPU cache or volatile and persistent databases, the default value is returned. -For models with multiple embedding tables, each embedding table has a default value.

  • -
  • volatile_db: See the Volatile Database Configuration section.

  • -
  • persistent_db: See the Persistent Database Configuration section.

  • -
  • update_source: See the Update Source Configuration section.

  • -
  • maxnum_des_feature_per_sample: Integer, specifies the maximum number of dense features in each sample. -Because each sample can contain a varying number of numeric (dense) features, use this parameter to specify the maximum number of dense feature in each sample. -The specified value determines the pre-allocated memory size on the host and device. -The default value is 26.

  • -
  • refresh_delay: Float, specifies an initial delay, in seconds, to wait before beginning to refresh the embedding cache. -The timer begins when the service launches. -The default value is 0.0.

  • -
  • refresh_interval: Float, specifies the interval, in seconds, for the periodic refresh of the embedding keys in the GPU embedding cache. -The embedding keys are refreshed from volatile and persistent data sources based on the specified number of seconds. -The default value is 0.0.

  • -
  • maxnum_catfeature_query_per_table_per_sample: List[Int], this parameter determines the pre-allocated memory size on the host and device. -We assume that for each input sample, there is a maximum number of embedding keys per sample in each embedding table that need to be looked up. -Specify this parameter as [max(the number of embedding keys that need to be queried from embedding table 1 in each sample), max(the number of embedding keys that need to be queried from embedding table 2 in each sample), …] -This parameter has no default value and you must specify a value.

  • -
  • embedding_vecsize_per_table:List[Int], this parameter determines the pre-allocated memory size on the host and device. -For the case of multiple embedding tables, we assume that the size of the embedding vector in each embedding table is different. -Specify the maximum vector size for each embedding table. -This parameter has no default value and you must specify a value.

  • -
  • embedding_table_names: List[String], specifies the name of each embedding table. -The names are used to name the data partition and data table in the hierarchical database backend. -The default value is ["sparse_embedding1", "sparse_embedding2", ...]

  • -
  • label_dim: Int, each model can contain a varying size of prediction result, such as a multi-task model. -Specify the maximum size of prediction result in each sample. -The specified value determines the pre-allocated memory size on the host and device. -The default value is 1.

  • -
  • slot_num: Int, each model can contain a fixed size of feature fields. -Specify the number of feature fields (the number of slots). -The specified value determines the pre-allocated memory size on the host and device. -The default value is 10.

  • -
  • embedding_cache_type: String, specify the type of embedding cache. Three types are supported: "dynamic", "static", "uvm". The default value is "dynamic".

  • -
  • use_context_stream: Boolean, whether to use context stream of TensorFlow or TensorRT for HPS embedding lookup. This is only valid for HPS Plugin for TensorFlow and HPS Plugin for TensorRT. The default value is True.

  • -
-
-
-

Parameter Server Configuration: Models

-

The following JSON shows a sample configuration for the models key in a parameter server configuration file.

-
"supportlonglong": true,
-"fuse_embedding_table": false,
-"models":[
-  {
-    "model":"wdl",
-    "sparse_files":["/wdl_infer/model/wdl/1/wdl0_sparse_20000.model", "/wdl_infer/model/wdl/1/wdl1_sparse_20000.model"],
-    "dense_file":"/wdl_infer/model/wdl/1/wdl_dense_20000.model",
-    "network_file":"/wdl_infer/model/wdl/1/wdl.json",
-    "num_of_worker_buffer_in_pool": 4,
-    "num_of_refresher_buffer_in_pool": 1,
-    "deployed_device_list":[0],
-    "max_batch_size":64,
-    "default_value_for_each_table":[0.0,0.0],
-    "maxnum_des_feature_per_sample":26,
-    "maxnum_catfeature_query_per_table_per_sample":[2,26],
-    "embedding_vecsize_per_table":[1,15],
-    "embedding_table_names":["table1","table2"],
-    "refresh_delay":0,
-    "refresh_interval":0,
-    "hit_rate_threshold":0.9,
-    "gpucacheper":0.1,
-    "embedding_cache_type": "dynamic",
-    "gpucache":true,
-    "cache_refresh_percentage_per_iteration": 0.2,
-    "label_dim": 1,
-    "slot_num":10,
-    "use_context_stream": false
-  }
-]
-
-
-
-
-
-

Volatile Database Configuration

-

For HugeCTR, the volatile database implementations are grouped into two categories:

-
    -
  • CPU memory databases have an instance on each machine and only use the locally available RAM memory as backing storage. -As a result, you can indvidually vary their configuration parameters for each machine.

  • -
  • Distributed CPU memory databases are typically shared by all machines in your HugeCTR deployment. -They enable you to use the combined memory capacity of your cluster machines. -The configuration parameters for this kind of database should be identical across all machines in your deployment.

    -

    Distributed databases are shared by all your HugeCTR nodes. -These nodes collaborate to inject updates into the underlying database. -The assignment of which nodes update specific partition can change at runtime.

    -
  • -
-
-

Volatile Database Params Syntax

-
params = hugectr.inference.VolatileDatabaseParams(
-  type = "redis_cluster",
-  address = "127.0.0.1:7000",
-  user_name = "default",
-  password = "",
-  num_partitions = int,
-  allocation_rate = 268435456,  # 256 MiB
-  shared_memory_size = 17179869184,  # 16 GiB
-  shared_memory_name = "hctr_mp_hash_map_database",
-  shared_memory_auto_remove = True,
-  max_batch_size = 65536,
-  enable_tls = False,
-  tls_ca_certificate = "cacertbundle.crt",
-  tls_client_certificate = "client_cert.pem",
-  tls_client_key = "client_key.pem",
-  tls_server_name_identification = "redis.localhost",
-  overflow_margin = int,
-  overflow_policy = hugectr.DatabaseOverflowPolicy_t.<enum_value>,
-  overflow_resolution_target = 0.8,
-  initialize_after_startup = True,
-  initial_cache_rate = 1.0,
-  cache_missed_embeddings = False,
-  update_filters = ["filter-0", "filter-1", ...]
-)
-
-
-
-
-

Parameter Server Configuration: Volatile Database

-

The following JSON shows a sample configuration for the volatile_db key in a parameter server configuration file.

-
"volatile_db": {
-  "type": "redis_cluster",
-  "address": "127.0.0.1:7003,127.0.0.1:7004,127.0.0.1:7005",
-  "user_name":  "default",
-  "password": "",
-  "num_partitions": 8,
-  "allocation_rate": 268435456,  // 256 MiB
-  "shared_memory_size": 17179869184,  // 16 GiB
-  "shared_memory_name": "hctr_mp_hash_map_database",
-  "shared_memory_auto_remove": true,
-  "max_batch_size": 65536,
-  "enable_tls": false,
-  "tls_ca_certificate": "cacertbundle.crt",
-  "tls_client_certificate": "client_cert.pem",
-  "tls_client_key": "client_key.pem",
-  "tls_server_name_identification": "redis.localhost",
-  "overflow_margin": 10000000,
-  "overflow_policy": "evict_random",
-  "overflow_resolution_target": 0.8,
-  "initialize_after_startup": true,
-  "initial_cache_rate": 1.0,
-  "cache_missed_embeddings": false,
-  "update_filters": [".+"]
-}
-
-
-
-
-

Volatile Database Parameters

-
    -
  • type: specifies the volatile database implementation. -Specify one of the following:

    -
      -
    • hash_map: Hash-map based CPU memory database implementation.

    • -
    • multi_process_hash_map: A hash-map that can be shared by multiple processes. This hash map lives in your operating system’s shared memory (i.e., /dev/shm).

    • -
    • parallel_hash_map: Hash-map based CPU memory database implementation with multi threading support. This is the default value.

    • -
    • redis_cluster: Connect to an existing Redis cluster deployment (Distributed CPU memory database implementation).

    • -
    -
  • -
-

The following parameters apply when you set type="hash_map" or type="parallel_hash_map":

-
    -
  • num_partitions: Integer, specifies the number of partitions for embedding tables and controls the degree of parallelism. -Parallel hashmap implementations split your embedding tables into approximately evenly-sized partitions and parallelizes look up and insert operations. -The default value is calculated as min(number_of_cpu_cores, 16) of the system that you used to build the HugeCTR binaries.

  • -
  • allocation_rate: Integer, specifies the maximum number of bytes to allocate for each memory allocation request. -The default value is 268435456 bytes, 256 MiB.

  • -
-

The following parameters apply when you set type="multi_process_hash_map":

-
    -
  • shared_memory_size: Integer, denotes the amount of shared memory that should be reserved in the operating system. In other words, this value determines the size of the memory mapped file that will be created in /dev/shm. The upper bound size of /dev/shm is determined by your hardware and operating system configuration. The latter of which may need to be adjusted to share large embedding tables between processes. This is particularly true when running HugeCTR in a Docker image. By default, Docker will only allocate 64 MiB for /dev/shm, which is insufficient for most recommendation models. You can try starting your docker deployment with --shm-size=... to reserve more shared memory of the native OS for the respective docker container (see also docs.docker.com/engine/reference/run).

  • -
  • shared_memory_name: String, the symbolic name of the shared memory. System-unique, and must be the same for all processes that attach to the same shared memory.

  • -
  • shared_memory_auto_remove: Boolean, disables removal of the shared memory when the last process disconnects. If this is flag is set to False (True by default), the state of the shared memory is retained across program restarts.

  • -
-

The following parameters apply when you set type="redis_cluster":

-
    -
  • address: String, specifies the address of one of servers of the Redis cluster. -Use the pattern "host-1[:port],host-2[:port],...". -The default value is "127.0.0.1:7000".

  • -
  • user_name: String, specifies the user name of the Redis cluster. -The default value is "default".

  • -
  • password: String, specifies the password of your account. -The default value is "" and corresponds to no password.

  • -
  • num_partitions: Integer, specifies the number of partitions for embedding tables. -Each embedding table is divided into num_partitions of approximately evenly-sized partitions. -Each partition is assigned a storage location in your Redis cluster. -HugeCTR does not provide any guarantees regarding the placement of partitions. -As a result, multiple partitions can be stored the same node for some models and deployments. -In most cases, to take advantage of your cluster resources, set num_partitions to at least equal to the number of Redis nodes. -For optimal performance, set num_parititions to be strictly larger than the number of machines. -However, each partition incurs a small processing overhead so do not specify a value that is too large. -A typical value that retains high performance and provides good cluster utilization is 2-5x the number of machines in your Redis cluster. -The default value is 8.

  • -
  • max_batch_size: Integer, specifies optimization parameters. Mass lookup and insert requests to distributed endpoints are chunked into max_batch_size-sized batches. For maximum performance, this parameters should be large. However, if the available memory for buffering requests in your endpoints is limited or you experience transmission stability issues, specifying smaller values can help. The default value is 65536. With high-performance networking and endpoint hardware, try setting the values to 1000000.

    -

    Note: when using the Redis backend (type = "redis_cluster") is used in conjunction with certain open source versions of Redis, setting a maximum batch size above 262143 (2^18 - 1) can lead to obscure errors and, therefore, should be avoided.

    -
  • -
  • enable_tls: Boolean, allows enabling TLS/SSL secured connections with Redis clusters. The default is False (=disable TLS/SSL). Enabling encryption may slightly increase latency and decrease the overall throughput when communicating with the Redis cluster.

  • -
  • tls_ca_certificate: String, allows you specify the filesystem path to the certificate(s) of the CA for TLS/SSL secured connections. If the provided path denotes a directory, all valid certificates in the directory will be considered. Default value: cacertbundle.crt.

  • -
  • tls_client_certificate: String, filesystem path of the client certificate to use for TLS/SSL secured connections. Default value: client_cert.pem.

  • -
  • tls_client_key: String, file system path of the private key to use for TLS/SSL secured connections. Default value: client_key.pem.

  • -
  • tls_server_name_identification: String, SNI used by the server. Can be different from the actual connection address. Default value: redis.localhost.

  • -
-
-
-

Overflow Parameters

-

To maximize performance and avoid instabilities that can be caused by sporadic high memory usage, such as an out of memory situations, HugeCTR provides an overflow handling mechanism. -This mechanism enables you to limit the maximum amount of embeddings to store for each partition. -The limit acts as an upper bound for the memory consumption of your distributed database.

-
    -
  • overflow_margin: Integer, specifies the maximum amount of embeddings to store for each partition. -Inserting more than overflow_margin embeddings into the database triggers the configured overflow_policy. -This parameter sets the upper bound for the maximum amount of memory that your CPU memory database can occupy. -Larger values for this parameter result in higher hit rates but also consume more memory. -The default value is 2^64 - 1 and indicates no limit.

    -

    When you use a CPU memory database in conjunction with a persistent database, the ideal value for overflow_margin can vary. -In practice, a value in the range [1000000, 100000000] provides reliable performance and throughput.

    -
  • -
  • overflow_policy: specifies how to respond to an overflow condition (i.e., which embeddings should be pruned first). Pruning is conducted per-partition in max_batch_size-sized batches until the respective partition contains at most overflow_margin * overflow_resolution_target embeddings. -Specify one of the following:

    -
      -
    • evict_random (default): Embeddings for pruning are chosen at random.

    • -
    • evict_least_used: Prune the least-frequently used (LFU) embeddings. This is a best effort. For performance reasons, we implement different algorithms. Identical behavior across backends is not guaranteed.

    • -
    • evict_oldest: Prune the least-recently used (LRU) embeddings.

    • -
    -

    Unlike evict_least_used and evict_oldest, the evict_random policy does not require complicated comparisons and can be faster. However, evict_least_used and evict_oldest are likely to deliver better performance over time because these policies evict embeddings based on the access statistics.

    -
  • -
  • overflow_resolution_target: Double, specifies the fraction of the embeddings to keep when embeddings must be evicted. -Specify a value between 0 and 1, but not exactly 0 or 1. -The default value is 0.8 and indicates to evict embeddings from a partition until it is shrunk to 80% of its maximum size. -In other words, when the partition size surpasses overflow_margin embeddings, 20% of the embeddings are evicted according to the specified overflow_policy.

  • -
  • initialize_after_startup: Boolean,when set to True (default), the contents of the sparse model files are used to initialize this database. This is useful if multiple processes should connect to the same database, or if restarting processes connect to a previously-initialized database that retains its state between inference process restarts. For example, if you reconnect to an existing RocksDB or Redis deployment, or an already materialized multi-process hashmap.

  • -
  • initial_cache_rate: Double, specifies the fraction of the embeddings to initially attempt to cache. -Specify a value in the range [0.0, 1.0]. -HugeCTR attempts to cache the specified fraction of the dataset immediately upon startup of the HPS database backend. -For example, a value of 0.5 causes the HugeCTR HPS database backend to attempt to cache up to 50% of your dataset using the volatile database after initialization. -The default value is 1.0.

  • -
-
-
-

Common Volatile Database Parameters

-

The following parameters are common to all volatile database types.

-
    -
  • cache_missed_embeddings: Bool, when set to True and an embedding could not be retrieved from the volatile database, but could be retrieved from the persistent database, the embedding is inserted into the volatile database. -The insert operation could replace another value. -The default value is False and disables this functionality.

    -

    This setting optimizes the volatile database in response to the queries that are received in inference mode. -In training mode, updated embeddings are automatically written back to the database after each training step. -As a result, setting the value to True during training is likely to increase the number of writes to the database and degrade performance without providing significant improvements.

    -
  • -
  • update_filters: List[str], specifies regular expressions that are used to control sending model updates from Kafka to the CPU memory database backend. -The default value is ["^hps_.+$"] and processes updates for all HPS models because the filter matches all HPS model names.

    -

    The functionality of this parameter might change in future versions.

    -
  • -
-
-
-
-

Persistent Database Configuration

-

Persistent databases have an instance on each machine and use the locally available non-volatile memory as backing storage. -As a result, some configuration parameters can vary according to the specifications of the machine.

-
-

Persistent Database Params Syntax

-
params = hugectr.inference.PersistentDatabaseParams(
-  type = hugectr.DatabaseType_t.<enum_value>,
-  path = "/tmp/rocksdb",
-  num_threads = 16,
-  read_only = False,
-  max_batch_size = 65536,
-  update_filters = ["filter-0", "filter-1", ... ]
-)
-
-
-
-
-

Parameter Server Configuration: Persistent Database

-

The following JSON shows a sample configuration for the persistent_db key in a parameter server configuration file.

-
"persistent_db": {
-  "type": "rocks_db",
-  "path": "/tmp/rocksdb",
-  "num_threads": 16,
-  "read_only": false,
-  "max_batch_size": 65536,
-  "update_filters": [".+"]
-}
-
-
-
-
-

Persistent Database Parameters

-
    -
  • type: specifies the persistent datatabase implementation. -Specify one of the following:

    -
      -
    • disabled (default): Prevents the use of a persistent database.

    • -
    • rocks_db: Create or connect to a RocksDB database.

    • -
    -
  • -
  • path: String, specifies the directory on each machine where the RocksDB database can be found. -If the directory does not contain a RocksDB database, HugeCTR creates a database for you. -Be aware that this behavior can overwrite files that are stored in the directory. -For best results, make sure that path specifies an existing RocksDB database or an empty directory. -The default value is /tmp/rocksdb.

  • -
  • num_threads: Integer, specifies the number of threads for the RocksDB driver. -The default value is 16.

  • -
  • read_only: Bool, when set to True, the database is opened in read-only mode. -Read-only mode is suitable for use with inference if the model is static and the database is shared by multiple machines, such as with NFS. -The default value is False.

  • -
  • max_batch_size: Integer, specifies the batch size for lookup and insert requests. Mass lookup and insert requests to RocksDB are chunked into batches. For maximum performance this parameter should be large. However, if the available memory for buffering requests in your endpoints is limited, lowering this value might improve performance. The default value is 65536. With high-performance hardware, you can attempt to set these parameters to 1000000.

  • -
  • update_filters: List[str], specifies regular expressions that are used to control sending model updates from Kafka to the CPU memory database backend. -The default value is ["^hps_.+$"] and processes updates for all HPS models because the filter matches all HPS model names.

    -

    The functionality of this parameter might change in future versions.

    -
  • -
-
-
-
-

Update Source Configuration

-

The real-time update source is the origin for model updates during online retraining. -To ensure that all database layers are kept in sync, configure all the nodes in your HugeCTR deployment identically.

-
-

Update Source Params Syntax

-
params = hugectr.UpdateSourceParams(
-  type = "kafka_message_queue",
-  brokers = "host-1[:port][;host-2[:port]...]",
-  metadata_refresh_interval_ms = 30000,
-  poll_timeout_ms = 500,
-  receive_buffer_size = 262144,
-  max_batch_size = 8192,
-  failure_backoff_ms = 50
-  max_commit_interval = 32
-)
-
-
-
-
-

Parameter Server Configuration: Update Source

-

The following JSON shows a sample configuration for the update_source key in a parameter server configuration file.

-
"update_source": {
-  "type": "kafka_message_queue",
-  "brokers": "127.0.0.1:9092",
-  "metadata_refresh_interval_ms": 30000,
-  "poll_timeout_ms": 500,
-  "receive_buffer_size": 262144,
-  "max_batch_size": 8192,
-  "failure_backoff_ms": 50,
-  "max_commit_interval": 32
-}
-
-
-
-
-

Update Source Parameters

-
    -
  • type: String, specifies the update source implementation. -Specify one of the following:

    -
      -
    • null: Prevents the use of an update source. This is the default value.

    • -
    • kafka_message_queue: Connect to an existing Apache Kafka message queue.

    • -
    -
  • -
  • brokers: String, specifies a semicolon-delimited list of host name or IP address and port pairs. -You must specify at least one host name and port of a Kafka broker node. -The default value is 127.0.0.1:9092.

  • -
  • metadata_refresh_interval_ms: Int, specifies the frequency at which the topic metadata downloaded from the Kafka broker.

  • -
  • receive_buffer_size Int, specifies the size of the buffer, in bytes, that stores data that is received from Kafka. -The best value to specify is equal to send_buffer_size of the KafkaMessageSink that is used to push updates to Kafka. -The message.max.bytes setting of the Kafka broker must be at least receive_buffer_size + 1024 bytes. -The default value is 262144 bytes.

  • -
  • poll_timeout_ms: Int, specifies the maximum time to wait, in milliseconds, for additional updates before dispatching updates to the database layers. -The default value is 500 ms.

  • -
  • max_batch_size: Int, specifies the maximum number of keys and values from messages to consume before dispatching updates to the database. -HugeCTR dispatches the updates in chunks. -The maximum size of these chunks is set with this parameter. -The default value is 8192.

  • -
  • failure_backoff_ms: Int, specifies a delay, in milliseconds, to wait after failing to dispatch updates to the database successfully. -In some situations, there can be issues that prevent the successful dispatch such as if a Redis node is temporarily unreachable. -After the delay, HugeCTR retries dispatching a set of updates. -The default value is 50 ms.

  • -
  • max_commit_interval: Int, specifies the maximum number of messages to hold before delivering and committing the messages to Kafka. -This parameter is evaluated independent of any other conditions or parameters. -Any received data is forwarded and committed if at most max_commit_interval were processed since the previous commit. -The default value is 32.

  • -
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_dlrm_benchmark.html b/review/pr-458/hierarchical_parameter_server/hps_dlrm_benchmark.html deleted file mode 100644 index 5b4830cd00..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_dlrm_benchmark.html +++ /dev/null @@ -1,302 +0,0 @@ - - - - - - - Benchmark the DLRM Model with HPS — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Benchmark the DLRM Model with HPS

- -
-

Benchmark Setup

-

We create the DLRM model with native TensorFlow and its counterpart with HPS Plugin for TensorFlow using the create_tf_models.py script. The DLRM model with native TF is in the SavedModel format and the size is about 16GB, which is almost the size of embedding weights because the size of dense layer weights is small. The DLRM model with the plugin leverages HPS to store the embedding table and perform embedding lookup. The JSON configuration file and the embedding table file required by HPS are also generated by the script.

-

Furthermore, we build the TensorRT engines for the DLRM model using the create_trt_engines.py script, in both fp32 and fp16 modes. The script configures the engines with the HPS Plugin for TensorRT. The workflow can be summarized as three steps: convert TF SavedModel to ONNX, perform ONNX graph surgery to insert HPS plugin layer and build the TensorRT engines with HPS Plugin for TensorRT.

-

We compare three deployment methods on the Triton Inference Server:

-
    -
  • DLRM with Native TensorFlow: The experimental option VariablePolicy.SAVE_VARIABLE_DEVICES is used to enable the CPU and GPU hybrid deployment of the DLRM SavedModel, i.e., the embedding table is on CPU while the MLP layers are on GPU. This deployment method is common for native TF models with large embedding tables and can be regarded as the baseline of this benchmark. The deployment is on the Triton backend for TensorFlow.

  • -
  • DLRM with HPS Plugin for TensorFlow: In this DLRM SavedModel, tf.nn.embedding_lookup is replaced by hps.LookupLayer to perform embedding lookup and the MLP layers are kept unchanged. The deployment is on the Triton backend for TensorFlow.

  • -
  • DLRM with HPS Plugin for TensorRT: The HPS plugin layer is integrated into the built TensorRT engines, and the MLP layers are accelerated by TensorRT. The TensorRT engines are built with minimum batch size 1, optimum 1024 and maximum 131072. Both fp32 and fp16 modes are investigated. The deployment is on the Triton backend for TensorRT.

  • -
-

The benchmark is conducted on the A100-SXM4-80GB GPU with one Triton model instance on it. The GPU embedding cache of HPS is turned on and the cache percentage is configured as 0.2. For details about how to deploy TF models with HPS Plugin for TensorFlow and TRT engines with HPS Plugin for TensorRT on Triton, please refer to hps_tensorflow_triton_deployment_demo.ipynb and demo_for_tf_trained_model.ipynb.

-

After launching the Triton Inference Server, we send the same batch of inference data repeatedly using Triton Performance Analyzer. In this case, the embedding lookup is served by the GPU embedding cache of HPS and the best-case performance of HPS can be studied. The command and the sample data to measure the latency for a batch with one sample follows:

-
perf_analyzer -m ${MODEL_NAME} -u localhost:8000 --input-data 1.json --shape categorical_features:1,26 --shape numerical_features:1,13
-
-
-
{
-"data":[
-{
-"categorical_features":[276633,7912898,7946796,7963854,7971191,7991237,7991368,7998351,7999728,8014930,13554004,14136456,14382203,14382219,14384425,14395091,14395194,14395215,14396165,14671338,22562171,25307802,32394527,32697105,32709007,32709104],
-"numerical_features":[3.76171875,3.806640625,1.609375,4.04296875,1.7919921875,1.0986328125,1.0986328125,1.609375,2.9453125,1.0986328125,1.38671875,8.3984375,1.9462890625]
-}
-]
-}
-
-
-

We take the forward latency at the server side as our benchmark metric, which is reported by the performance analyzer via the compute infer field:

-
  Server:
-    Inference count: 28589
-    Execution count: 28589
-    Successful request count: 28589
-    Avg request latency: 562 usec (overhead 9 usec + queue 9 usec + compute input 59 usec + compute infer 431 usec + compute output 53 usec)
-
-
-
-
-

Results

-

The benchmark is conducted with the Merlin TensorFlow container nvcr.io/nvidia/merlin/merlin-tensorflow:23.02 on a machine with A100-SXM4-80GB + 2 x AMD EPYC 7742 64-Core Processor. The software versions are listed below:

-
TensorFlow version: 2.10.0
-Triton version: 22.11
-TensorRT version: 8.5.1-1+cuda11.8
-
-
-

The per-batch forward latency, in microseconds, measured at the server side is shown in the following table and Figure 1. The Y-axis is logarithmic. The FP16 TRT engine with HPS achieves the best performance on almost all batch sizes, and has about 10x speedup to the Native TF baseline on large batch sizes.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Batch size

Native TF

TF with HPS

FP32 TRT with HPS

FP16 TRT with HPS

Speedup - FP16 TRT with HPS to Native TF

32

551

612

380

389

1.42

64

608

667

381

346

1.76

256

832

639

438

428

1.94

1024

1911

849

604

534

3.58

2048

4580

1059

927

766

5.98

4096

9872

1459

1446

1114

8.86

8192

19643

2490

2432

1767

11.12

16384

35292

4131

4355

3053

11.56

32768

54090

7795

6816

5247

10.31

65536

107742

15036

13012

10022

10.75

131072

213990

29374

25440

19340

11.06

-The DLRM inference latency for different deployment methods -
Figure 1. The DLRM inference latency.
-



-
-
-

Resources

- -
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_tf_api/index.html b/review/pr-458/hierarchical_parameter_server/hps_tf_api/index.html deleted file mode 100644 index c8c4e1070f..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_tf_api/index.html +++ /dev/null @@ -1,172 +0,0 @@ - - - - - - - Hierarchical Parameter Server API — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_tf_api/initialize.html b/review/pr-458/hierarchical_parameter_server/hps_tf_api/initialize.html deleted file mode 100644 index 325f50ad6c..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_tf_api/initialize.html +++ /dev/null @@ -1,265 +0,0 @@ - - - - - - - HPS Initialize — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-

HPS Initialize

-
-
-hierarchical_parameter_server.Init(**kwargs)[source]
-

Abbreviated as hps.Init(**kwargs).

-

This function initializes the HPS for all the deployed models. It can be used -explicitly or implicitly. When used explicitly, you must call the function -only once and you must call it before any other HPS APIs. -When used implicitly, ps_config_file and global_batch_size -should be specified in the constructor of hps.SparseLookupLayer -and hps.LookupLayer. When the layer is executed for the first time, it triggers -the internal HPS initialization implicitly in a thread-safe call-once manner. The -implicit initialization is especially useful for deploying the SavedModels that -leverage the HPS layers for online inference.

-

HPS leverages all available GPUs for the current CPU process. Set -CUDA_VISIBLE_DEVICES or tf.config.set_visible_devices to specify which -GPUs to use in this process before you launch the TensorFlow runtime -and calling this function. Additionally, ensure that the deployed_device_list -parameter in the HPS configuration JSON file matches the visible devices.

-

In TensorFlow 2.x, HPS can be used with tf.distribute.Strategy or Horovod. -When it is used with tf.distribute.Strategy, you must call it under strategy.scope() -as shown in the following code block.

-
import hierarchical_parameter_server as hps
-
-with strategy.scope():
-    hps.Init(**kwargs)
-
-
-

To use the function with Horovod, call it one for each time you initialize a -Horovod process such as the following code block shows.

-
import hierarchical_parameter_server as hps
-import horovod.tensorflow as hvd
-
-hvd.init()
-
-hps.Init(**kwargs)
-
-
-

In TensorFlow 1.15, HPS can only work with Horovod. The returned status -must be evaluated with sess.run and it must be the first step before evaluating -any other HPS APIs.

-
import hierarchical_parameter_server as hps
-
-hps_init = hps.Init(**kwargs)
-with tf.Session() as sess:
-    sess.run(hps_init)
-    ...
-
-
-
-
Parameters:
-

kwargs (dict) –

Keyword arguments for this function. -The dictionary must contain global_batch_size and ps_config_file.

-
    -
  • global_batch_size: int, the global batch size for HPS that is deployed on multiple GPUs.

  • -
  • ps_config_file: str, the JSON configuration file for HPS initialization.

  • -
-

An example ps_config_file is as follows and global_batch_size can be -configured as 16384 correspondingly:

-
ps_config_file = {
-    "supportlonglong" : True,
-    "models" :
-    [{
-        "model": "demo_model",
-        "sparse_files": ["demo_model_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [16],
-        "maxnum_catfeature_query_per_table_per_sample": [10],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 16384,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": True
-    },
-    {
-        "model": "demo_model2",
-        "sparse_files": ["demo_model2_sparse_0.model", "demo_model2_sparse_1.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0", "sparse_embedding1"],
-        "embedding_vecsize_per_table": [64, 32],
-        "maxnum_catfeature_query_per_table_per_sample": [3, 5],
-        "default_value_for_each_table": [1.0, 1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 16384,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": True},
-    ]
-}
-
-
-

-
-
Returns:
-

status – On success, the function returns string with the value OK.

-
-
Return type:
-

str

-
-
-
- -
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_tf_api/layers.html b/review/pr-458/hierarchical_parameter_server/hps_tf_api/layers.html deleted file mode 100644 index 1898470fa2..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_tf_api/layers.html +++ /dev/null @@ -1,353 +0,0 @@ - - - - - - - HPS Layers — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-

HPS Layers

-
-

SparseLookupLayer

-
-
-class hierarchical_parameter_server.SparseLookupLayer(*args, **kwargs)[source]
-

Bases: Layer

-

Abbreviated as hps.SparseLookupLayer(*args, **kwargs).

-

This is a wrapper class for HPS sparse lookup layer, which basically performs -the same function as tf.nn.embedding_lookup_sparse. Note that ps_config_file -and global_batch_size should be specified in the constructor if you want -to use implicit HPS initialization.

-
-
Parameters:
-
    -
  • model_name (str) – The name of the model that has embedding tables.

  • -
  • table_id (int) – The index of the embedding table for the model specified by -model_name.

  • -
  • emb_vec_size (int) – The embedding vector size for the embedding table specified -by model_name and table_id.

  • -
  • emb_vec_dtype – The data type of embedding vectors which must be tf.float32.

  • -
  • ps_config_file (str) – The JSON configuration file for HPS initialization.

  • -
  • global_batch_size (int) – The global batch size for HPS that is deployed on multiple GPUs.

  • -
-
-
-

Examples

-
import hierarchical_parameter_server as hps
-
-sparse_lookup_layer = hps.SparseLookupLayer(model_name = args.model_name,
-                                           table_id = args.table_id,
-                                           emb_vec_size = args.embed_vec_size,
-                                           emb_vec_dtype = tf.float32,
-                                           ps_config_file = args.ps_config_file,
-                                           global_batch_size = args.global_batch_size)
-
-@tf.function
-def _infer_step(inputs):
-    embedding_vector = sparse_lookup_layer(sp_ids=inputs,
-                                          sp_weights = None,
-                                          combiner="mean")
-    ...
-
-for i, (inputs, labels) in enumerate(dataset):
-    _infer_step(inputs)
-
-
-
-
-call(sp_ids, sp_weights, name=None, combiner=None, max_norm=None)[source]
-

Looks up embeddings for the given ids and weights from a list of tensors. -This op assumes that there is at least one ID for each row in the dense tensor -represented by sp_ids (i.e. there are no rows with empty features), and that -all the indices of sp_ids are in canonical row-major order. The sp_ids -and sp_weights (if not None) are SparseTensor with rank of 2. -Embeddings are always aggregated along the last dimension. -If an ID value cannot be found in the HPS, the default embeddings are retrieved, -which can be specified in the HPS configuration JSON file.

-
-
Parameters:
-
    -
  • sp_ids – N x M SparseTensor of int32 or int64 IDs where N is typically batch size -and M is arbitrary.

  • -
  • sp_weights – Either a SparseTensor of float or double weights, or None to -indicate all weights should be taken to be 1. If specified, sp_weights -must have exactly the same shape and indices as sp_ids.

  • -
  • combiner

    A string that specifies the reduction op:

    -
    -
    "sum"

    Computes the weighted sum of the embedding results for each row.

    -
    -
    "mean"

    Computes the weighted sum divided by the total weight.

    -
    -
    "sqrtn"

    Computes the weighted sum divided by the square root of the sum of the -squares of the weights.

    -
    -
    -

    The default value is "mean".

    -

  • -
  • max_norm – if not None, each embedding is clipped if its l2-norm is larger -than this value, before combining.

  • -
-
-
Returns:
-

emb_vector – A dense tensor representing the combined embeddings for the -sparse IDs. For each row in the dense tensor represented by sp_ids, the op -looks up the embeddings for all IDs in that row, multiplies them by the -corresponding weight, and combines these embeddings as specified. -In other words, if

-
shape(sp_ids) = shape(sp_weights) = [d0, d1]
-
-
-

then

-
shape(output) = [d0, self.emb_vec_dtype]
-
-
-

For instance, if self.emb_vec_dtype is 16, and sp_ids / sp_weights are

-
[0, 0]: id 1, weight 2.0
-[0, 1]: id 3, weight 0.5
-[1, 0]: id 0, weight 1.0
-[2, 3]: id 1, weight 3.0
-
-
-

with combiner = "mean", then the output is a 3x16 matrix where

-
output[0, :] = (vector_for_id_1 * 2.0 + vector_for_id_3 * 0.5) / (2.0 + 0.5)
-output[1, :] = (vector_for_id_0 * 1.0) / 1.0
-output[2, :] = (vector_for_id_1 * 3.0) / 3.0
-
-
-

-
-
Return type:
-

tf.Tensor of float32

-
-
Raises:
-
    -
  • TypeError – If sp_ids is not a SparseTensor, or if sp_weights is: neither None nor SparseTensor.

  • -
  • ValueError – If combiner is not one of {"mean", "sqrtn", "sum"}.:

  • -
-
-
-
- -
- -
-
-

LookupLayer

-
-
-class hierarchical_parameter_server.LookupLayer(*args, **kwargs)[source]
-

Bases: Layer

-

Abbreviated as hps.LookupLayer(*args, **kwargs).

-

This is a wrapper class for HPS lookup layer, which basically performs -the same function as tf.nn.embedding_lookup. Note that ps_config_file -and global_batch_size should be specified in the constructor if you want -to use implicit HPS initialization.

-
-
Parameters:
-
    -
  • model_name (str) – The name of the model that has embedding tables.

  • -
  • table_id (int) – The index of the embedding table for the model specified by -model_name.

  • -
  • emb_vec_size (int) – The embedding vector size for the embedding table specified -by model_name and table_id.

  • -
  • emb_vec_dtype – The data type of embedding vectors which must be tf.float32.

  • -
  • ps_config_file (str) – The JSON configuration file for HPS initialization.

  • -
  • global_batch_size (int) – The global batch size for HPS that is deployed on multiple GPUs.

  • -
-
-
-

Examples

-
import hierarchical_parameter_server as hps
-
-lookup_layer = hps.LookupLayer(model_name = args.model_name,
-                              table_id = args.table_id,
-                              emb_vec_size = args.embed_vec_size,
-                              emb_vec_dtype = tf.float32,
-                              ps_config_file = args.ps_config_file,
-                              global_batch_size = args.global_batch_size)
-
-@tf.function
-def _infer_step(inputs):
-    embedding_vector = lookup_layer(inputs)
-    ...
-
-for i, (inputs, labels) in enumerate(dataset):
-    _infer_step(inputs)
-
-
-
-
-call(ids, max_norm=None)[source]
-

The forward logic of this wrapper class.

-
-
Parameters:
-
    -
  • ids – Keys are stored in Tensor. The supported data types are tf.int32 and tf.int64.

  • -
  • max_norm – if not None, each embedding is clipped if its l2-norm is larger -than this value.

  • -
-
-
Returns:
-

emb_vector – the embedding vectors for the input keys. Its shape is -ids.get_shape() + emb_vec_size.

-
-
Return type:
-

tf.Tensor of float32

-
-
-
- -
- -
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_tf_user_guide.html b/review/pr-458/hierarchical_parameter_server/hps_tf_user_guide.html deleted file mode 100644 index 0c084dce47..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_tf_user_guide.html +++ /dev/null @@ -1,272 +0,0 @@ - - - - - - - Hierarchical Parameter Server Plugin for TensorFlow — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Plugin for TensorFlow

- -
-

Introduction to the HPS Plugin for TensorFlow

-

Hierarchical Parameter Server (HPS) is a distributed inference framework that is dedicated to deploying large embedding tables and realizing the low-latency retrieval of embeddings. -The framework combines a high-performance GPU embedding cache with a hierarchical storage architecture that encompasses different types of database backends. -The plugin is provided as a Python toolkit that you can integrate easily into the TensorFlow (TF) model graph. -Integration with the graph facilitates the TensorFlow model deployment of large embedding tables.

-
-
-

Benefits of the Plugin for TensorFlow

-

When you deploy deep learning models with large embedding tables in TensorFlow, you are faced with the following challenges:

-
    -
  • Large Embedding Tables: Trained embedding tables of hundreds of gigabytes cannot fit into the GPU memory.

  • -
  • Low Latency Requirement: Online inference requires that the latency of embedding lookup should be low to maintain the quality of experience and the user engagement.

  • -
  • Scalability on multiple GPUs: Dozens of models need to be deployed on multiple GPUs and each model can have several embedding tables.

  • -
  • Pre-trained embeddings: Large embedding tables need to be loaded as pre-trained embeddings for tasks like transfer learning.

  • -
-

The HPS plugin for TensorFlow mitigates these challenges and helps in the following ways:

-
    -
  • Extend the GPU memory by utilizing other memory resources available within the cluster, such as CPU-accessible RAM and non-volatile memory such as HDDs and SDDs, as shown in Fig. 1.

  • -
  • Use the GPU embedding cache to exploit the long-tail characteristics of the keys. The cache automatically stores the embeddings for hot keys as queries are constantly received, providing the low-latency lookup service.

  • -
  • Manage the embedding tables of multiple models in a structured manner across the whole memory hierarchy of GPUs, CPUs, and SSDs.

  • -
  • Make the lookup service subscribable through custom TensorFlow layers, enabling transfer learning with large embedding tables.

  • -
-../_images/memory_hierarchy.png -
Fig. 1: HPS Memory Hierarchy
-



-
-
-

Workflow

-

The workflow of leveraging HPS for deployment of TensorFlow models is illustrated in Fig. 2.

-../_images/workflow.png -
Fig. 2: Workflow of deploying TF models with HPS
-



-

The steps in the workflow can be summarized as:

-
    -
  • Train: The model graph should be created with native TensorFlow embedding layers (e.g., tf.nn.embedding_lookup_sparse) or model parallelism enabled SOK embedding layers (e,g., sok.DistributedEmbedding). There is no restriction on the usage of dense layers or the topology of the model graph as long as the model can be successfully trained with TensorFlow.

  • -
  • Dissect the training graph: The subgraph composided of only dense layers should be extracted from the trained graph, and then saved separately. For native TensorFlow embedding layers, the trained embedding weights should be obtained and converted to the HPS-compatible formats. For SOK embedding layers, sok.Saver.dump_to_file can be utilized to derive the desired formats. Basically, each embedding table should be stored in a directory with two binary files, i.e., key (int64) and emb_vector (float32). For example, if there are totally 1000 trained keys and the embedding vector size is 16, then the size of key file and the emb_vector file should be 1000*8 bytes and 1000*16*4 bytes respectively.

  • -
  • Create and save the inference graph: The inference graph should be created with HPS layers (e.g., hps.SparseLookupLayer) and the saved subgraph of dense layers. It can be then saved as a whole so as to be deployed in the production environment.

  • -
  • Deploy the inference graph with HPS: The configurations for the models to be deployed should be specified in a JSON file and the HPS should be started via hps.Init before any executions. The saved inference graph can be deployed to perform online inference leveraging the benefits of the HPS embedding lookup. Please refer to HPS Configuration for more information.

  • -
-
-
-

Installation

-
-

Compute Capability

-

We support the following compute capabilities:

- - - - - - - - - - - - - - - - - - - - - - - - - -

Compute Capability

GPU

SM

7.0

NVIDIA V100 (Volta)

70

7.5

NVIDIA T4 (Turing)

75

8.0

NVIDIA A100 (Ampere)

80

9.0

NVIDIA H100 (Hopper)

90

-
-
-

Installing HPS Using NGC Containers

-

All NVIDIA Merlin components are available as open source projects. However, a more convenient way to utilize these components is by using our Merlin NGC containers. These containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. When installing HPS using NGC containers, the application environment remains portable, consistent, reproducible, and agnostic to the underlying host system’s software configuration.

-

HPS is included in the Merlin Docker containers that are available from the NVIDIA container repository. To use these Docker containers, you’ll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers.

-

The following sample command pulls and starts the Merlin TensorFlow container:

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
-
-
-

You can check the existence of the HPS Python toolkit after launching this container:

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Example Notebooks

-

We provide a collection of examples as Jupyter Notebooks that cover the following topics:

-
    -
  • Basic workflow of HPS deployment for TensorFlow models

  • -
  • Migrating from SOK training to HPS inference

  • -
  • Leveraging HPS to load pre-trained embeddings

  • -
-
-
-

Benchmark

-

We benchmark the DLRM TensorFlow model with HPS Plugin for TensorFlow in hps_dlrm_benchmark.md.

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_torch_api/index.html b/review/pr-458/hierarchical_parameter_server/hps_torch_api/index.html deleted file mode 100644 index 3b740e03a1..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_torch_api/index.html +++ /dev/null @@ -1,166 +0,0 @@ - - - - - - - HPS Plugin for Torch API — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_torch_api/lookup_layer.html b/review/pr-458/hierarchical_parameter_server/hps_torch_api/lookup_layer.html deleted file mode 100644 index 41d930eba8..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_torch_api/lookup_layer.html +++ /dev/null @@ -1,189 +0,0 @@ - - - - - - - HPS Plugin for Torch — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-

HPS Plugin for Torch

- -
-

LookupLayer class

-

This is a wrapper class for HPS lookup layer, which basically performs the same function as torch.nn.Embedding. It inherits torch.nn.Module.

-
hps_torch.LookupLayer.__init__
-
-
-

Arguments

-
    -
  • ps_config_file: String. The JSON configuration file for HPS initialization.

  • -
  • model_name: String. The name of the model that has embedding tables.

  • -
  • table_id: Integer. The index of the embedding table for the model specified by model_name.

  • -
  • emb_vec_size: Integer. The embedding vector size for the embedding table specified by model_name and table_id.

  • -
-
hps_torch.LookupLayer.forward
-
-
-

Arguments

-
    -
  • keys: Tensor of torch.int32 or torch.int64.

  • -
-

Returns

-
    -
  • vectors: Tensor of torch.float32.

  • -
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_torch_user_guide.html b/review/pr-458/hierarchical_parameter_server/hps_torch_user_guide.html deleted file mode 100644 index 4af9f04a5c..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_torch_user_guide.html +++ /dev/null @@ -1,225 +0,0 @@ - - - - - - - Hierarchical Parameter Server Plugin for Torch — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Plugin for Torch

- -
-

Introduction to the HPS Plugin for Torch

-

The Hierarchical Parameter Server (HPS) is a distributed inference framework designed to efficiently deploy large embedding tables and enable low-latency retrieval of embeddings. It achieves this through a combination of a high-performance GPU embedding cache and a hierarchical storage architecture that supports various database backends. The HPS plugin for Torch allows users to harness the HPS by incorporating it into their Torch model as a custom layer. By doing so, you can seamlessly deploy large embedding tables within your model.

-
-
-

Installation

-
-

Compute Capability

-

The plugin supports the following compute capabilities:

- - - - - - - - - - - - - - - - - - - - - - - - - -

Compute Capability

GPU

SM

7.0

NVIDIA V100 (Volta)

70

7.5

NVIDIA T4 (Turing)

75

8.0

NVIDIA A100 (Ampere)

80

9.0

NVIDIA H100 (Hopper)

90

-
-
-

Installing HPS Using NGC Containers

-

While all NVIDIA Merlin components are open source, the most convenient way to leverage them is through Merlin NGC containers. These containers enable you to encapsulate your software application, libraries, dependencies, and runtime compilers within a self-contained environment. By installing HPS using NGC containers, you ensure that your application environment remains portable, consistent, reproducible, and independent of the underlying host system’s software configuration.

-

HPS is available within the Merlin Docker containers, which can be accessed through the NVIDIA GPU Cloud (NGC) catalog. You can explore and obtain these containers from the catalog by visiting https://catalog.ngc.nvidia.com/containers.

-

To utilize these Docker containers, you will need to install the NVIDIA Container Toolkit to provide GPU support for Docker.

-

The following sample commands pull and start the Merlin HugeCTR container:

-

Merlin HugeCTR

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-

You can check the existence of the HPS plugin for Torch after launching the container by running the following Python statements:

-
import hps_torch
-
-
-
-
-
-

Example Notebooks

-

We provide a collection of examples as Jupyter Notebooks that demonstrate how to apply HPS to the Torch model.

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin.html b/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin.html deleted file mode 100644 index 2117ae2c9f..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin.html +++ /dev/null @@ -1,168 +0,0 @@ - - - - - - - HPS Plugin — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-

HPS Plugin

-

The HPS plugin has plugin class, HpsPlugin, with the registration name HPS_TRT.

-

The HPS plugin accepts one input. -The input data type must be int32. -The input shape must be (batch_size, num_keys_per_sample).

-

The HPS plugin generates one output. -The output data type is float32. -The output shape is (batch_size, num_keys_per_sample, embedding_vector_size).

-

This plugin works for network with graph node named HPS_TRT. This is also the plugin name that should be used when getting the HpsPluginCreator from the Plugin Registry.

-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin_creator.html b/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin_creator.html deleted file mode 100644 index a557bc877a..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_trt_api/hps_plugin_creator.html +++ /dev/null @@ -1,201 +0,0 @@ - - - - - - - HPS Plugin Creator — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-

HPS Plugin Creator

-

The HPS plugin has plugin creator class, HpsPluginCreator, with the registration name HPS_TRT.

-

The parameters are defined below and consists of the following attributes:

- - - - - - - - - - - - - - - - - - - - - - - - - -

Type

Parameter

Description

string

ps_config_file

The configuration JSON file for HPS.

string

model_name

The name of the model.

int32

table_id

The index for the embedding table.

int32

emb_vec_size

The embedding vector size.

-

Refer to the HPS configuration documentation for details about writing the ps_config_file.

-
-

Important

-

Add a trailing null character, '\0', when you configure the ps_config_file and model_name with TensorRT Python APIs. -This requirement is due to limitations of the supported plugin field types.

-
-

Refer to the following Python code for an example of using the trailing null characters:

-
import tensorrt as trt
-import numpy as np
-ps_config_file = trt.PluginField("ps_config_file", np.array(["hps_conf.json\0"], dtype=np.string_), trt.PluginFieldType.CHAR)
-model_name = trt.PluginField("model_name", np.array(["demo_model\0"], dtype=np.string_), trt.PluginFieldType.CHAR)
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_trt_api/index.html b/review/pr-458/hierarchical_parameter_server/hps_trt_api/index.html deleted file mode 100644 index ae97e5523b..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_trt_api/index.html +++ /dev/null @@ -1,165 +0,0 @@ - - - - - - - HPS Plugin for TensorRT API — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/hps_trt_user_guide.html b/review/pr-458/hierarchical_parameter_server/hps_trt_user_guide.html deleted file mode 100644 index 3c012c64fd..0000000000 --- a/review/pr-458/hierarchical_parameter_server/hps_trt_user_guide.html +++ /dev/null @@ -1,261 +0,0 @@ - - - - - - - Hierarchical Parameter Server Plugin for TensorRT — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Plugin for TensorRT

- -
-

Introduction to the HPS Plugin for TensorRT

-

Hierarchical Parameter Server (HPS) is a distributed inference framework that is dedicated to deploying large embedding tables and realizing the low-latency retrieval of embeddings. -The framework combines a high-performance GPU embedding cache with a hierarchical storage architecture that encompasses different types of database backends. -The HPS plugin for TensorRT can be integrated into the TensorRT network as a custom layer to build the engine. The TensorRT engine with HPS Plugin for TensorRT can perform low-latency embedding lookup for large tables and accelerated forward propagation for dense network at the same time.

-
-
-

Workflow

-../_images/workflow1.png -
Fig. 1: Workflow of using HPS plugin for TensorRT
-



-

The workflow to leverage the HPS plugin for TensorRT is shown in Fig. 1:

-
    -
  • Convert trained models to ONNX: The models trained with different frameworks are converted to ONNX using the popular tools tf2onnx, torch.onnx, hugectr2onnx, and so on.

  • -
  • Perform ONNX graph surgery: The node for embedding lookup in the ONNX graph is replaced by the placeholder of HPS plugin for TensorRT using the tool ONNX GraphSurgeon, as shown in Fig. 2.

  • -
  • Build the TensorRT engine with HPS Plugin for TensorRT: We can build the TensorRT engine based on the modified ONNX graph where the HPS can leveraged as a custom plugin layer.

  • -
  • Deploy the engine on the Triton backend for TensorRT: The TensorRT engine with HPS Plugin for TensorRT is deployed on the Triton backend for TensorRT. Set the LD_PRELOAD=/usr/local/hps_trt/lib/libhps_plugin.so environment variable to load the plugin shared library when you start Triton Inference Server.

  • -
-Logical diagram of using ONNX GraphSurgeon to set the embedding lookup to the HPS plugin for TensorRT -
Fig. 2: ONNX Graph Surgery
-



-
-
-

Installation

-
-

Compute Capability

-

The plugin supports the following compute capabilities:

- - - - - - - - - - - - - - - - - - - - - - - - - -

Compute Capability

GPU

SM

7.0

NVIDIA V100 (Volta)

70

7.5

NVIDIA T4 (Turing)

75

8.0

NVIDIA A100 (Ampere)

80

9.0

NVIDIA H100 (Hopper)

90

-
-
-

Installing HPS Using NGC Containers

-

All NVIDIA Merlin components are available as open source projects. However, a more convenient way to use these components is by using our Merlin NGC containers. These containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. When installing HPS using NGC containers, the application environment remains portable, consistent, reproducible, and agnostic to the underlying host system’s software configuration.

-

HPS is included in the Merlin Docker containers that are available from the NVIDIA GPU Cloud (NGC) catalog. -Access the catalog of containers at https://catalog.ngc.nvidia.com/containers. -To use these Docker containers, you must install the NVIDIA Container Toolkit to provide GPU support for Docker.

-

The following sample commands pull and start the Merlin TensorFlow container, Merlin PyTorch container, or Merlin HugeCTR container:

-

Merlin TensorFlow

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
-
-
-

Merlin PyTorch

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-pytorch:23.02
-
-
-

Merlin HugeCTR

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-hugectr:23.02
-
-
-

You can check the existence of the HPS plugin for TensorRT after launching the container by running the following Python statements:

-
import ctypes
-handle = ctypes.CDLL("/usr/local/hps_trt/lib/libhps_plugin.so", mode=ctypes.RTLD_GLOBAL)
-
-
-
-
-
-

Example Notebooks

-

We provide a collection of examples as Jupyter Notebooks that demonstrate how to build the TensorRT engine with HPS Plugin for TensorRT for models trained with TensorFlow, PyTorch, or HugeCTR.

-
-
-

Benchmark

-

We benchmark the DLRM TensorRT engine with HPS Plugin for TensorRT in hps_dlrm_benchmark.md.

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/index.html b/review/pr-458/hierarchical_parameter_server/index.html deleted file mode 100644 index 4f0ab55651..0000000000 --- a/review/pr-458/hierarchical_parameter_server/index.html +++ /dev/null @@ -1,181 +0,0 @@ - - - - - - - Hierarchical Parameter Server — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Hierarchical Parameter Server

-

The Hierarchical Parameter Server (HPS) library is a native C++ library that provides -caching and hierarchical storage for embeddings. -The library is built from the GPU embedding cache and HPS database backend subcomponents.

-

HPS offers a flexible deployment and configuration to meet site-specific recommender system needs -and is integrated by other projects that need the ability to work with embeddings that exceed -the capacity of GPU and host memory. -Two projects that include the HPS library are the HPS plugin for TensorFlow and the -HPS backend for Triton Inference Server.

-

The following figure shows the relationships between the projects that use HPS, -the HPS library, and the subcomponents of the library.

- -HPS Library and subcomponents
Fig. 1: HPS Library and subcomponents
-
-
HPS Database Backend

Provides a three-level storage architecture. -The first and highest performing level is GPU memory and is followed by CPU memory. -The third layer can be high-speed local SSDs with or without a distributed database. -The key benefit of the HPS database backend is serving embedding tables that exceed GPU and CPU memory while providing the highest possible performance.

-
-
HPS plugin for TensorFlow

Provides high-performance, scalability, and low-latency access to embedding tables for deep learning models that have large embedding tables in TensorFlow.

-
-
HPS plugin for TensorRT

Provides a unified solution to build and deploy HPS-integrated TensorRT engines for models trained with different frameworks.

-
-
HPS plugin for Torch

Provides HPS as a Torch extension to deploy Torch models with large embedding tables.

-
-
HPS Backend for Triton Inference Server

The backend for Triton Inference Server is an inference deployment framework that integrates HPS for end-to-end inference on Triton. -Documentation for the backend is available from the hugectr_backend repository at the preceding URL.

-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hierarchical_parameter_server/profiling_hps.html b/review/pr-458/hierarchical_parameter_server/profiling_hps.html deleted file mode 100644 index baf94d969e..0000000000 --- a/review/pr-458/hierarchical_parameter_server/profiling_hps.html +++ /dev/null @@ -1,382 +0,0 @@ - - - - - - - Profiling HPS — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - -
-

Profiling HPS

-

A critical part of optimizing the inference performance of HPS -is being able to measure changes in performance as you experiment with -different optimization strategies and data distribution. There are two ways to profile HPS:

-
    -
  1. The HPS profiler. -The hps_profiler application performs benchmark tasks for the Hierarchical Parameter Server. The hps_profiler will be compiled and installed from the following instructions in Build and install the HPS Profiler.

  2. -
  3. The Triton Perf Analyzer. For detailed documentation of Triton Perf Analyzer, please refer to here. For how to use Trion Perf Analyzer to profile HPS, take a look at the procedure.

  4. -
-
-

HPS profiler

-

The hps_profiler application generates inference requests to HPS and measures the throughput and latency of different components, such as embedding cache, Database Backend and Lookup session. To -get representative results, hps_profiler measures the throughput and -latency over the configurable iteration, and then repeats the measurements until it reaches a specified number of iterations. -For example, if --embedding_cache is used the results will be show below:

-
$ hps_profiler --iterations 1000 --num_key 2000 --powerlaw --alpha 1.2 --config /hugectr/model/ps.json --table_size 630000 --warmup_iterations 100   --embedding_cache
-
-...
-*** Measurement Results ***
-  The Benchmark of: Apply for workspace from the memory pool for Embedding Cache Lookup
-Latencies [900 iterations] min = 0.000285ms, mean = 0.000384853ms, median = 0.000365ms, 95% = 0.000428ms, 99% = 0.000465ms, max = 0.009736ms, throughput = 2.73973e+06/s
-The Benchmark of: Copy the input to workspace of Embedding Cache
-Latencies [900 iterations] min = 0.010842ms, mean = 0.0117076ms, median = 0.011596ms, 95% = 0.012219ms, 99% = 0.016642ms, max = 0.027379ms, throughput = 86236.6/s
-The Benchmark of: Deduplicate the input embedding key for Embedding Cache
-Latencies [900 iterations] min = 0.019159ms, mean = 0.0272492ms, median = 0.027262ms, 95% = 0.028104ms, 99% = 0.029548ms, max = 0.052309ms, throughput = 36681.1/s
-The Benchmark of: Lookup the embedding keys from Embedding Cache
-Latencies [900 iterations] min = 0.178875ms, mean = 0.231377ms, median = 0.227815ms, 95% = 0.267493ms, 99% = 0.284738ms, max = 0.47672ms, throughput = 4389.53/s
-The Benchmark of: Merge output from Embedding Cache
-Latencies [900 iterations] min = 0.007656ms, mean = 0.00850756ms, median = 0.008434ms, 95% = 0.009117ms, 99% = 0.011863ms, max = 0.018697ms, throughput = 118568/s
-The Benchmark of: Missing key synchronization insert into Embedding Cache
-Latencies [900 iterations] min = 0.105163ms, mean = 0.15741ms, median = 0.153763ms, 95% = 0.192302ms, 99% = 0.208846ms, max = 0.402043ms, throughput = 6503.52/s
-The Benchmark of: Native Embedding Cache Query API
-Latencies [900 iterations] min = 0.021729ms, mean = 0.0227739ms, median = 0.02253ms, 95% = 0.023695ms, 99% = 0.025035ms, max = 0.043024ms, throughput = 44385.3/s
-The Benchmark of: decompress/deunique output from Embedding Cache
-Latencies [900 iterations] min = 0.011247ms, mean = 0.0121274ms, median = 0.011953ms, 95% = 0.013055ms, 99% = 0.014706ms, max = 0.022186ms, throughput = 83661/s
-The Benchmark of: The hit rate of Embedding Cache
-Occupancy [900 iterations] min = 0.719323, mean = 0.843972, median = 0.854749, 95% = 0.894188, 99% = 0.90276, max = 0.918169
-
-
-

-
-
-

Build and install the HPS Profiler

-

To build HPS profiler from source, do the following: -2. Download the HugeCTR repository and the third-party modules that it relies on by running the following commands:

-
   $ git clone https://github.com/NVIDIA/HugeCTR.git
-   $ cd HugeCTR
-   $ git submodule update --init --recursive
-
-
-
    -
  1. Pull the NGC Docker and run it

  2. -
-

Pull the container using the following command:

-
docker pull nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-

Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:

-
docker run --gpus all --rm -it --cap-add SYS_NICE --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -u root -v $(pwd):/HugeCTR -w /HugeCTR -p 8888:8888 nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-
    -
  1. Here is an example of how you can build HPS Profiler using the build options:

    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Release -DSM="70;80" -DENABLE_INFERENCE=ON -DENABLE_PROFILER=ON .. # Target is NVIDIA V100 / A100 with Inference mode ON.
    -$ make -j && make install
    -
    -
    -
  2. -
  3. You will get hps_profiler under bin folder.

  4. -
-
-
-

Create a synthetic embedding table

-

The embedding generator is used to generate synthetic HugeCTR sparse model files that can be loaded into HugeCTR HPS for inference. To generate a HugeCTR embedding file, please refer to the Model generator

-
-
-

Use the HPS Profiler to get the measurement results

-
    -
  1. Generate HPS json configuration file based on synthetic model file. -For configuration information about HPS, you can refer to here. Here is an example:

  2. -
-
{
-	"supportlonglong": true,
-	"models": [{
-			"model": "model_name",
-			"sparse_files": ["The path of synthetic embedding files"],
-			"dense_file": "",
-			"network_file": "",
-			"num_of_worker_buffer_in_pool": 2,
-			"num_of_refresher_buffer_in_pool":1,
-			"deployed_device_list":[0],
-			"max_batch_size":1024,
-			"default_value_for_each_table":[0.0],
-			"cache_refresh_percentage_per_iteration":0.1,
-			"hit_rate_threshold":1.0,
-			"gpucacheper":0.9,
-			"gpucache":true,
-			"maxnum_des_feature_per_sample": 0,
-			"maxnum_catfeature_query_per_table_per_sample" : [26],
-			"embedding_vecsize_per_table" : [16]
-
-		}
-	]
-}
-
-
-

NOTE: The product of the max_batch_size size and the maxnum_catfeature_query_per_table_per_sample needs to be greater than or equal to the --num_key option in the hps_profiler.

-
    -
  1. Add arguments to hps_profiler for benchmark

  2. -
-
$ hps_profiler 
---config: required.
-Usage: HPS_Profiler [options] 
-
-Optional arguments:
--h --help                       shows help message and exits [default: false]
--v --version                    prints version information and exits [default: false]
---config                        The path of the HPS json configuration file [required]
---distribution                      The distribution of the generated query key in each iteration. Can be 'powerlaw', 'hotkey', or 'histogram' [default: "powerlaw"]
---table_size                    The number of keys in the embedded table [default: 100000]
---alpha                         Alpha of power distribution [default: 1.2]
---hot_key_percentage            Percentage of hot keys in embedding tables [default: 0.2]
---hot_key_coverage              The probability of the hot key in each iteration [default: 0.8]
---num_key                       The number of keys to query for each iteration [default: 1000]
---iterations                    The number of iterations of the test [default: 1000]
---warmup_iterations             Performance results in warmup stage will be discarded [default: 0]
---embedding_cache               Enable embedding cache profiler, including the performance of lookup, insert, etc. [default: false]
---database_backend              Enable database backend profiler, which is to get the lookup performance of VDB/PDB [default: false]
---refresh_embeddingcache        Enable refreshing embedding cache. If the embedding cache tool is also enabled, the refresh will be performed asynchronously [default: false]
---lookup_session                Enable lookup_session profiler, which is E2E profiler, including embedding cache and data backend query delay [default: false]
-
-
-

Measurement example of the HPS Lookup Session

-
$hps_profiler --iterations 1000 --num_key 2000 --powerlaw --alpha 1.2 --config /hugectr/Model_Samples/wdl/wdl_infer/model/ps.json --table_size 630000 --warmup_iterations 100   --lookup_session
-...
-*** Measurement Results ***
-The Benchmark of: End-to-end lookup embedding keys for Lookup session
-Latencies [900 iterations] min = 0.190813ms, mean = 0.243117ms, median = 0.238085ms, 95% = 0.283761ms, 99% = 0.346377ms, max = 0.511712ms, throughput= 4200.18/s
-
-
-

Measurement example of the HPS Data Backend

-
$hps_profiler --iterations 1000 --num_key 2000 --powerlaw --alpha 1.2 --config /hugectr/Model_Samples/wdl/wdl_infer/model/ps.json --table_size 630000 --warmup_iterations 100   --database_backend
-...
-*** Measurement Results ***
-The Benchmark of: Lookup the embedding key from default HPS database Backend
-Latencies [900 iterations] min = 0.075086ms, mean = 0.127312ms, median = 0.121235ms, 95% = 0.166826ms, 99% = 0.219295ms, max = 0.285409ms, throughput = 8248.44/s
-
-
-

NOTE:

-
    -
  1. If the user add the --powerlaw option, the queried embedding key will be generated with the specified argument --alpha = **.

  2. -
  3. If the user add the --hot_key_percentage=** and --hot_key_coverage=xx options, the queried embedding key will generate the number of --table_size * --hot_key_percentage keys with this probability of --hot_key_percentage=**. -For example --hot_key_percentage=0.01, --hot_key_coverage=0.9 and --table_size=1000, then the first 1000*0.01=10 keys will appear in the request with a probability of 90%.

  4. -
  5. It is recommended that users make mutually exclusive selections of three components(--embedding_cache,--database_backend and --lookup_session) to ensure the most accurate performance. Because the measurement results of the lookup session will include the performance results of the database backend and embedding cache.

  6. -
  7. If enable the static embedding table in HPS json file, the hps_profiler does not support the refresh operation.

  8. -
-
-
-

Profile HPS with Triton Perf Analyzer:

-

To profile HPS with Triton Perf Analyzer, make sure you know how to deploy your model using the hugectr backend in Triton. If you don’t, please refer to here.

-

To profile HPS, follow the procedure below:

-
    -
  1. Prepare your embedding table. This can be either a real model trained by HugeCTR or a synthetic model generated using the model generator.

  2. -
  3. Prepare your HPS configuration file ps.config, demo showed above.

  4. -
  5. Prepare the Triton required JSON like request. The request can be generated using the request generator

  6. -
  7. After everything is prepared, start Triton. For example:

  8. -
-
tritonserver --model-repository=/dir/to/model/ --load-model=your_model_name --model-control-mode=explicit --backend-directory=/usr/local/hugectr/backends --backend-config=hugectr,ps=/dir/to/your/ps.json
-
-
-
    -
  1. Run the Triton Perf Analyzer. For example:

  2. -
-
perf_analyzer -m your_model_name --collect-metrics -f perf_output.csv --verbose-csv --input-data your_generated_request.json
-
-
-
-
-

HPS Profiler vs. Triton Perf Analyzer:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Functionalities

HPS profiler

Triton Perf Analyzer

Profile client side E2E Pipeline

NO

YES

Profile sever side key lookup session

YES

YES

Pofile the embedding cache component

YES

NO

Profile the database backend component

YES

NO

Support different key distributions

YES

YES

Concurrency Support

NO

YES

GPU/Memory Utilization

NO

YES

-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/hierarchical_parameter_server_demo.html b/review/pr-458/hps_tf/notebooks/hierarchical_parameter_server_demo.html deleted file mode 100644 index 9c3821fab8..0000000000 --- a/review/pr-458/hps_tf/notebooks/hierarchical_parameter_server_demo.html +++ /dev/null @@ -1,644 +0,0 @@ - - - - - - - Hierarchical Parameter Server Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hierarchical-parameter-server-demo/nvidia_logo.png -
-

Hierarchical Parameter Server Demo

-
-

Overview

-

Hierarchical Parameter Server (HPS) is a distributed recommendation inference framework, which combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for inference tasks. It is provided as a Python toolkit and can be easily integrated into the TensorFlow (TF) model graph.

-

This notebook demonstrates how to apply HPS to the trained model and then use it for inference in TensorFlow. For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the paths to save the model and the model parameters. We will use a naive deep neural network (DNN) model which has one embedding table and several dense layers in this notebook.

-
-
-
import hierarchical_parameter_server as hps
-import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                               # the number of available GPUs
-args["iter_num"] = 10                             # the number of training iteration
-args["slot_num"] = 3                              # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 16                       # the dimension of embedding vectors
-args["global_batch_size"] = 65536                 # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 30000
-args["vocabulary_range_per_slot"] = [[0,10000],[10000,20000],[20000,30000]]
-args["ps_config_file"] = "naive_dnn.json"
-args["dense_model_path"] = "naive_dnn_dense.model"
-args["embedding_table_path"] = "naive_dnn_sparse.model"
-args["saved_path"] = "naive_dnn_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, key_dtype = args["np_key_type"]):
-    keys = list()
-    for vocab_range in vocabulary_range_per_slot:
-        keys_per_slot = np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(num_samples, 1), dtype=key_dtype)
-        keys.append(keys_per_slot)
-    keys = np.concatenate(np.array(keys), axis = 1)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return keys, labels
-
-def tf_dataset(keys, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((keys, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with native TF layers

-

We define the model graph for training with native TF layers, i.e., tf.nn.embedding_lookup and tf.keras.layers.Dense. Besides, the embedding weights are stored in tf.Variable. We can then train the model and extract the trained weights of the embedding table. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

-
-
-
class TrainModel(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors,
-                 slot_num,
-                 embed_vec_size,
-                 **kwargs):
-        super(TrainModel, self).__init__(**kwargs)
-        
-        self.slot_num = slot_num
-        self.embed_vec_size = embed_vec_size
-        self.init_tensors = init_tensors
-        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))
-        self.fc_1 = tf.keras.layers.Dense(units=256, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_1')
-        self.fc_2 = tf.keras.layers.Dense(units=1, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_2')
-
-    def call(self, inputs):
-        embedding_vector = tf.nn.embedding_lookup(params=self.params, ids=inputs)
-        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embed_vec_size])
-        logit = self.fc_2(self.fc_1(embedding_vector))
-        return logit, embedding_vector
-
-    def summary(self):
-        inputs = tf.keras.Input(shape=(self.slot_num,), dtype=args["tf_key_type"])
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()    
-
-
-
-
-
-
-
def train(args):
-    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
-    
-    model = TrainModel(init_tensors, args["slot_num"], args["embed_vec_size"])
-    model.summary()
-    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
-    
-    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
-    
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit, embedding_vector = model(inputs)
-            loss = loss_fn(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return logit, embedding_vector, loss
-
-    keys, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"],  args["np_key_type"])
-    dataset = tf_dataset(keys, labels, args["global_batch_size"])
-    for i, (id_tensors, labels) in enumerate(dataset):
-        _, embedding_vector, loss = _train_step(id_tensors, labels)
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-weights_list = trained_model.get_weights()
-embedding_weights = weights_list[-1]
-dense_model = tf.keras.models.Model(trained_model.get_layer("fc_1").input, trained_model.get_layer("fc_2").output)
-dense_model.summary()
-dense_model.save(args["dense_model_path"])
-
-
-
-
-
2022-07-12 07:49:56.742983: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-
-
-
WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(30000, 16) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- input_1 (InputLayer)        [(None, 3)]               0         
-                                                                 
- tf.compat.v1.nn.embedding_l  (None, 3, 16)            0         
- ookup (TFOpLambda)                                              
-                                                                 
- tf.reshape (TFOpLambda)     (None, 48)                0         
-                                                                 
- fc_1 (Dense)                (None, 256)               12544     
-                                                                 
- fc_2 (Dense)                (None, 1)                 257       
-                                                                 
-=================================================================
-Total params: 12,801
-Trainable params: 12,801
-Non-trainable params: 0
-_________________________________________________________________
-
-
-
2022-07-12 07:49:57.326494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30989 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
-------------------- Step 0, loss: 6136.6875 --------------------
--------------------- Step 1, loss: 4463.05712890625 --------------------
--------------------- Step 2, loss: 3192.029296875 --------------------
--------------------- Step 3, loss: 2180.40283203125 --------------------
--------------------- Step 4, loss: 1419.980712890625 --------------------
--------------------- Step 5, loss: 879.0396728515625 --------------------
--------------------- Step 6, loss: 513.3021240234375 --------------------
--------------------- Step 7, loss: 272.9712219238281 --------------------
--------------------- Step 8, loss: 129.147705078125 --------------------
--------------------- Step 9, loss: 48.21624755859375 --------------------
-Model: "model_1"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- input_2 (InputLayer)        [(None, 48)]              0         
-                                                                 
- fc_1 (Dense)                (None, 256)               12544     
-                                                                 
- fc_2 (Dense)                (None, 1)                 257       
-                                                                 
-=================================================================
-Total params: 12,801
-Trainable params: 12,801
-Non-trainable params: 0
-_________________________________________________________________
-WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
-
-
-
2022-07-12 07:49:59.645703: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
-
-
-
INFO:tensorflow:Assets written to: naive_dnn_dense.model/assets
-
-
-
-
-
-
-

Create the inference graph with HPS LookupLayer

-

In order to use HPS in the inference stage, we need to create a inference model graph which is almost the same as the train graph except that tf.nn.embedding_lookup is replaced by hps.LookupLayer. The trained dense model graph can be loaded directly, while the embedding weights should be converted to the formats required by HPS.

-

We can then save the inference model graph, which will be ready to be loaded for inference deployment.

-
-
-
class InferenceModel(tf.keras.models.Model):
-    def __init__(self,
-                 slot_num,
-                 embed_vec_size,
-                 dense_model_path,
-                 **kwargs):
-        super(InferenceModel, self).__init__(**kwargs)
-        
-        self.slot_num = slot_num
-        self.embed_vec_size = embed_vec_size
-        self.lookup_layer = hps.LookupLayer(model_name = "naive_dnn", 
-                                            table_id = 0,
-                                            emb_vec_size = self.embed_vec_size,
-                                            emb_vec_dtype = args["tf_vector_type"],
-                                            name = "lookup")
-        self.dense_model = tf.keras.models.load_model(dense_model_path)
-
-    def call(self, inputs):
-        embedding_vector = self.lookup_layer(inputs)
-        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embed_vec_size])
-        logit = self.dense_model(embedding_vector)
-        return logit, embedding_vector
-
-    def summary(self):
-        inputs = tf.keras.Input(shape=(self.slot_num,), dtype=args["tf_key_type"])
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def create_and_save_inference_graph(args): 
-    model = InferenceModel(args["slot_num"], args["embed_vec_size"], args["dense_model_path"])
-    model.summary()
-    _, _ = model(tf.keras.Input(shape=(args["slot_num"],), dtype=args["tf_key_type"]))
-    model.save(args["saved_path"])
-
-
-
-
-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights, args["embedding_table_path"], args["embed_vec_size"])
-create_and_save_inference_graph(args)
-
-
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-Model: "model_2"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- input_3 (InputLayer)        [(None, 3)]               0         
-                                                                 
- lookup (LookupLayer)        (None, 3, 16)             0         
-                                                                 
- tf.reshape_1 (TFOpLambda)   (None, 48)                0         
-                                                                 
- model_1 (Functional)        (None, 1)                 12801     
-                                                                 
-=================================================================
-Total params: 12,801
-Trainable params: 12,801
-Non-trainable params: 0
-_________________________________________________________________
-INFO:tensorflow:Assets written to: naive_dnn_tf_saved_model/assets
-
-
-
-
-
-
-

Inference with saved model graph

-

In order to initialize the lookup service provided by HPS, we also need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We only show how to deploy a model that has one embedding table here, and it can support multiple models with multiple embedding tables actually.

-

We first call hps.Init to do the necessary initialization work, and then load the saved model graph to make inference. We peek at the keys and the embedding vectors (it has been reshaped from (None, 3, 16) to (None, 48)) for the last inference batch.

-
-
-
%%writefile naive_dnn.json
-{
-    "supportlonglong": true,
-    "models": [{
-        "model": "naive_dnn",
-        "sparse_files": ["naive_dnn_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding1"],
-        "embedding_vecsize_per_table": [16],
-        "maxnum_catfeature_query_per_table_per_sample": [3],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 65536,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing naive_dnn.json
-
-
-
-
-
-
-
def inference_with_saved_model(args):
-    hps.Init(global_batch_size = args["global_batch_size"],
-             ps_config_file = args["ps_config_file"])
-    model = tf.keras.models.load_model(args["saved_path"])
-    model.summary()
-    def _infer_step(inputs, labels):
-        logit, embedding_vector = model(inputs)
-        return logit, embedding_vector
-    embedding_vectors_peek = list()
-    id_tensors_peek = list()
-    keys, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"],  args["np_key_type"])
-    dataset = tf_dataset(keys, labels, args["global_batch_size"])
-    for i, (id_tensors, labels) in enumerate(dataset):
-        print("-"*20, "Step {}".format(i),  "-"*20)
-        _, embedding_vector = _infer_step(id_tensors, labels)
-        embedding_vectors_peek.append(embedding_vector)
-        id_tensors_peek.append(id_tensors)
-    return embedding_vectors_peek, id_tensors_peek
-
-
-
-
-
-
-
embedding_vectors_peek, id_tensors_peek = inference_with_saved_model(args)
-print(embedding_vectors_peek[-1])
-print(id_tensors_peek[-1])
-
-
-
-
-
=====================================================HPS Parse====================================================
-[HCTR][07:50:25.009][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:50:25.009][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:50:25.009][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:50:25.009][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:50:25.009][INFO][RK0][main]: refresh_interval is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][07:50:25.009][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:50:25.010][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:50:25.010][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:50:25.357][INFO][RK0][main]: Table: hps_et.naive_dnn.sparse_embedding1; cached 30000 / 30000 embeddings in volatile database (PreallocatedHashMapBackend); load: 30000 / 18446744073709551615 (0.00%).
-[HCTR][07:50:25.357][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:50:25.357][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:50:25.363][INFO][RK0][main]: Model name: naive_dnn
-[HCTR][07:50:25.363][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][07:50:25.363][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:50:25.363][INFO][RK0][main]: Use I64 input key: True
-[HCTR][07:50:25.363][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:50:25.363][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:50:25.363][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][07:50:25.363][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:50:25.405][INFO][RK0][main]: Creating lookup session for naive_dnn on device: 0
-WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-Model: "inference_model"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- lookup (LookupLayer)        multiple                  0         
-                                                                 
- model_1 (Functional)        (None, 1)                 12801     
-                                                                 
-=================================================================
-Total params: 12,801
-Trainable params: 12,801
-Non-trainable params: 0
-_________________________________________________________________
--------------------- Step 0 --------------------
--------------------- Step 1 --------------------
--------------------- Step 2 --------------------
--------------------- Step 3 --------------------
--------------------- Step 4 --------------------
--------------------- Step 5 --------------------
--------------------- Step 6 --------------------
--------------------- Step 7 --------------------
--------------------- Step 8 --------------------
--------------------- Step 9 --------------------
-tf.Tensor(
-[[0.23265739 0.23265739 0.23265739 ... 0.11092357 0.11092357 0.11092357]
- [0.09594781 0.09594781 0.09594781 ... 0.16974597 0.16974597 0.16974597]
- [0.22555737 0.22555737 0.22555737 ... 0.20454781 0.20454781 0.20454781]
- ...
- [0.22397298 0.22397298 0.22397298 ... 0.1229516  0.1229516  0.1229516 ]
- [0.12451896 0.12451896 0.12451896 ... 0.21348731 0.21348731 0.21348731]
- [0.11943579 0.11943579 0.11943579 ... 0.2502464  0.2502464  0.2502464 ]], shape=(65536, 48), dtype=float32)
-tf.Tensor(
-[[ 5283 17773 26371]
- [ 5043 17928 22941]
- [ 5154 18816 28670]
- ...
- [ 9014 16185 22256]
- [ 9893 14515 25771]
- [ 5377 18265 28063]], shape=(65536, 3), dtype=int64)
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/hps_multi_table_sparse_input_demo.html b/review/pr-458/hps_tf/notebooks/hps_multi_table_sparse_input_demo.html deleted file mode 100644 index 971b868ccb..0000000000 --- a/review/pr-458/hps_tf/notebooks/hps_multi_table_sparse_input_demo.html +++ /dev/null @@ -1,785 +0,0 @@ - - - - - - - HPS for Multiple Tables and Sparse Inputs — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-multi-table-sparse-input-demo/nvidia_logo.png -
-

HPS for Multiple Tables and Sparse Inputs

-
-

Overview

-

This notebook demonstrates how to use HPS when there are multiple embedding tables and sparse input. It is recommended to run hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the paths to save the model and the model parameters. We will use a deep neural network (DNN) model which has two embedding table and several dense layers in this notebook. Please note that there are two inputs here, one is the sparse key tensor (multi-hot) while the other is the dense key tensor (one-hot).

-
-
-
import hierarchical_parameter_server as hps
-import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                                         # the number of available GPUs
-args["iter_num"] = 10                                       # the number of training iteration
-args["global_batch_size"] = 1024                            # the globally batchsize for all GPUs
-
-args["slot_num_per_table"] = [3, 2]                         # the number of feature fields for two embedding tables
-args["embed_vec_size_per_table"] = [16, 32]                 # the dimension of embedding vectors for two embedding tables
-args["max_vocabulary_size_per_table"] = [30000, 2000]       # the vocabulary size for two embedding tables
-args["vocabulary_range_per_slot_per_table"] = [ [[0,10000],[10000,20000],[20000,30000]], [[0, 1000], [1000, 2000]] ]
-args["max_nnz_per_slot_per_table"] = [[4, 2, 3], [1, 1]]    # the max number of non-zeros for each slot for two embedding tables
-
-args["dense_model_path"] = "multi_table_sparse_input_dense.model"
-args["ps_config_file"] = "multi_table_sparse_input.json"
-args["embedding_table_path"] = ["multi_table_sparse_input_sparse_0.model", "multi_table_sparse_input_sparse_1.model"]
-args["saved_path"] = "multi_table_sparse_input_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot_per_table, max_nnz_per_slot_per_table):
-    def generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz_per_slot, key_dtype = args["np_key_type"]):
-        slot_num = len(max_nnz_per_slot)
-        max_nnz_of_all_slots = max(max_nnz_per_slot)
-        indices = []
-        values = []
-        for i in range(num_samples):
-            for j in range(slot_num):
-                vocab_range = vocabulary_range_per_slot[j]
-                max_nnz = max_nnz_per_slot[j]
-                nnz = np.random.randint(low=1, high=max_nnz+1)
-                entries = sorted(np.random.choice(max_nnz, nnz, replace=False))
-                for entry in entries:
-                    indices.append([i, j, entry])
-                values.extend(np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(nnz, )))
-        values = np.array(values, dtype=key_dtype)
-        return tf.sparse.SparseTensor(indices = indices,
-                                    values = values,
-                                    dense_shape = (num_samples, slot_num, max_nnz_of_all_slots))
-
-    def generate_dense_keys(num_samples, vocabulary_range_per_slot, key_dtype = args["np_key_type"]):
-        dense_keys = list()
-        for vocab_range in vocabulary_range_per_slot:
-            keys_per_slot = np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(num_samples, 1), dtype=key_dtype)
-            dense_keys.append(keys_per_slot)
-        dense_keys = np.concatenate(np.array(dense_keys), axis = 1)
-        return dense_keys
-    
-    assert len(vocabulary_range_per_slot_per_table)==2, "there should be two embedding tables"
-    assert max(max_nnz_per_slot_per_table[0])>1, "the first embedding table has sparse key input (multi-hot)"
-    assert min(max_nnz_per_slot_per_table[1])==1, "the second embedding table has dense key input (one-hot)"
-    
-    sparse_keys = generate_sparse_keys(num_samples, vocabulary_range_per_slot_per_table[0], max_nnz_per_slot_per_table[0])
-    dense_keys = generate_dense_keys(num_samples, vocabulary_range_per_slot_per_table[1])
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return sparse_keys, dense_keys, labels
-
-def tf_dataset(sparse_keys, dense_keys, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((sparse_keys, dense_keys, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with native TF layers

-

We define the model graph for training with native TF layers, i.e., tf.nn.embedding_lookup_sparse, tf.nn.embedding_lookup and tf.keras.layers.Dense. We can then train the model and extract the trained weights of the two embedding tables. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

-
-
-
class TrainModel(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors_per_table,
-                 slot_num_per_table,
-                 embed_vec_size_per_table,
-                 max_nnz_per_slot_per_table,
-                 **kwargs):
-        super(TrainModel, self).__init__(**kwargs)
-        
-        self.slot_num_per_table = slot_num_per_table
-        self.embed_vec_size_per_table = embed_vec_size_per_table
-        self.max_nnz_per_slot_per_table = max_nnz_per_slot_per_table
-        self.max_nnz_of_all_slots_per_table = [max(ele) for ele in self.max_nnz_per_slot_per_table]
-        
-        self.init_tensors_per_table = init_tensors_per_table
-        self.params0 = tf.Variable(initial_value=tf.concat(self.init_tensors_per_table[0], axis=0))
-        self.params1 = tf.Variable(initial_value=tf.concat(self.init_tensors_per_table[1], axis=0))
-        
-        self.reshape = tf.keras.layers.Reshape((self.max_nnz_of_all_slots_per_table[0],),
-                                                input_shape=(self.slot_num_per_table[0], self.max_nnz_of_all_slots_per_table[0]))
-        
-        self.fc_1 = tf.keras.layers.Dense(units=256, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_1')
-        self.fc_2 = tf.keras.layers.Dense(units=256, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_2')
-        self.fc_3 = tf.keras.layers.Dense(units=1, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_3')
-
-    def call(self, inputs):
-        # SparseTensor of keys, shape: (batch_size*slot_num, max_nnz)
-        embeddings0 = tf.reshape(tf.nn.embedding_lookup_sparse(params=self.params0, sp_ids=inputs[0], sp_weights = None, combiner="mean"),
-                                shape=[-1, self.slot_num_per_table[0] * self.embed_vec_size_per_table[0]])
-        # Tensor of keys, shape: (batch_size, slot_num)
-        embeddings1 = tf.reshape(tf.nn.embedding_lookup(params=self.params1, ids=inputs[1]), 
-                                 shape=[-1, self.slot_num_per_table[1] * self.embed_vec_size_per_table[1]])
-        
-        logit = self.fc_3(tf.math.add(self.fc_1(embeddings0), self.fc_2(embeddings1)))
-        return logit, embeddings0, embeddings1
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz_of_all_slots_per_table[0], ), sparse=True, dtype=args["tf_key_type"]),
-                  tf.keras.Input(shape=(self.slot_num_per_table[1], ), dtype=args["tf_key_type"])]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def train(args):
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit, _, _ = model(inputs)
-            loss = loss_fn(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return logit, loss
-
-    init_tensors_per_table = [np.ones(shape=[args["max_vocabulary_size_per_table"][0], args["embed_vec_size_per_table"][0]], dtype=args["np_vector_type"]),
-                              np.ones(shape=[args["max_vocabulary_size_per_table"][1], args["embed_vec_size_per_table"][1]], dtype=args["np_vector_type"])]
-
-    model = TrainModel(init_tensors_per_table, args["slot_num_per_table"], args["embed_vec_size_per_table"], args["max_nnz_per_slot_per_table"])
-    model.summary()
-    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
-    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
-
-    sparse_keys, dense_keys, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot_per_table"], args["max_nnz_per_slot_per_table"])
-    dataset = tf_dataset(sparse_keys, dense_keys, labels, args["global_batch_size"])
-    for i, (sparse_keys, dense_keys, labels) in enumerate(dataset):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        inputs = [sparse_keys, dense_keys]
-        _, loss = _train_step(inputs, labels)
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-weights_list = trained_model.get_weights()
-embedding_weights_per_table = weights_list[-2:]
-dense_model = tf.keras.Model([trained_model.get_layer("fc_1").input, 
-                              trained_model.get_layer("fc_2").input], 
-                             trained_model.get_layer("fc_3").output)
-dense_model.summary()
-dense_model.save(args["dense_model_path"])
-
-
-
-
-
2022-07-12 07:51:09.676041: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2022-07-12 07:51:10.271131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30989 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup_sparse), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(30000, 16) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(2000, 32) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_1 (InputLayer)           [(None, 4)]          0           []                               
-                                                                                                  
- input_2 (InputLayer)           [(None, 2)]          0           []                               
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 16)          0           ['input_1[0][0]']                
- up_sparse (TFOpLambda)                                                                           
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 2, 32)       0           ['input_2[0][0]']                
- up (TFOpLambda)                                                                                  
-                                                                                                  
- tf.reshape (TFOpLambda)        (None, 48)           0           ['tf.compat.v1.nn.embedding_looku
-                                                                 p_sparse[0][0]']                 
-                                                                                                  
- tf.reshape_1 (TFOpLambda)      (None, 64)           0           ['tf.compat.v1.nn.embedding_looku
-                                                                 p[0][0]']                        
-                                                                                                  
- fc_1 (Dense)                   (None, 256)          12544       ['tf.reshape[0][0]']             
-                                                                                                  
- fc_2 (Dense)                   (None, 256)          16640       ['tf.reshape_1[0][0]']           
-                                                                                                  
- tf.math.add (TFOpLambda)       (None, 256)          0           ['fc_1[0][0]',                   
-                                                                  'fc_2[0][0]']                   
-                                                                                                  
- fc_3 (Dense)                   (None, 1)            257         ['tf.math.add[0][0]']            
-                                                                                                  
-==================================================================================================
-Total params: 29,441
-Trainable params: 29,441
-Non-trainable params: 0
-__________________________________________________________________________________________________
--------------------- Step 0, loss: 14588.0 --------------------
--------------------- Step 1, loss: 11693.25 --------------------
--------------------- Step 2, loss: 8232.9658203125 --------------------
--------------------- Step 3, loss: 6276.9736328125 --------------------
--------------------- Step 4, loss: 4676.82861328125 --------------------
--------------------- Step 5, loss: 2921.1875 --------------------
--------------------- Step 6, loss: 1938.2447509765625 --------------------
--------------------- Step 7, loss: 1093.598388671875 --------------------
--------------------- Step 8, loss: 616.3092651367188 --------------------
--------------------- Step 9, loss: 257.61248779296875 --------------------
-Model: "model_1"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_3 (InputLayer)           [(None, 48)]         0           []                               
-                                                                                                  
- input_4 (InputLayer)           [(None, 64)]         0           []                               
-                                                                                                  
- fc_1 (Dense)                   (None, 256)          12544       ['input_3[0][0]']                
-                                                                                                  
- fc_2 (Dense)                   (None, 256)          16640       ['input_4[0][0]']                
-                                                                                                  
- tf.math.add (TFOpLambda)       (None, 256)          0           ['fc_1[1][0]',                   
-                                                                  'fc_2[1][0]']                   
-                                                                                                  
- fc_3 (Dense)                   (None, 1)            257         ['tf.math.add[1][0]']            
-                                                                                                  
-==================================================================================================
-Total params: 29,441
-Trainable params: 29,441
-Non-trainable params: 0
-__________________________________________________________________________________________________
-WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
-
-
-
2022-07-12 07:51:13.335404: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
-WARNING:absl:Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_1 in the SavedModel.
-
-
-
INFO:tensorflow:Assets written to: multi_table_sparse_input_dense.model/assets
-
-
-
INFO:tensorflow:Assets written to: multi_table_sparse_input_dense.model/assets
-
-
-
-
-
-
-

Create the inference graph with HPS SparseLookupLayer and LookupLayer

-

In order to use HPS in the inference stage, we need to create a inference model graph which is almost the same as the train graph except that tf.nn.embedding_lookup_sparse is replaced by hps.SparseLookupLayer and tf.nn.embedding_lookup is replaced by hps.LookupLayer. The trained dense model graph can be loaded directly, while the weights of two embedding tables should be converted to the formats required by HPS.

-

We can then save the inference model graph, which will be ready to be loaded for inference deployment.

-
-
-
class InferenceModel(tf.keras.models.Model):
-    def __init__(self,
-                 slot_num_per_table,
-                 embed_vec_size_per_table,
-                 max_nnz_per_slot_per_table,
-                 dense_model_path,
-                 **kwargs):
-        super(InferenceModel, self).__init__(**kwargs)
-        
-        self.slot_num_per_table = slot_num_per_table
-        self.embed_vec_size_per_table = embed_vec_size_per_table
-        self.max_nnz_per_slot_per_table = max_nnz_per_slot_per_table
-        self.max_nnz_of_all_slots_per_table = [max(ele) for ele in self.max_nnz_per_slot_per_table]
-        
-        self.sparse_lookup_layer = hps.SparseLookupLayer(model_name = "multi_table_sparse_input", 
-                                            table_id = 0,
-                                            emb_vec_size = self.embed_vec_size_per_table[0],
-                                            emb_vec_dtype = args["tf_vector_type"])
-        self.lookup_layer = hps.LookupLayer(model_name = "multi_table_sparse_input", 
-                                            table_id = 1,
-                                            emb_vec_size = self.embed_vec_size_per_table[1],
-                                            emb_vec_dtype = args["tf_vector_type"])
-        self.dense_model = tf.keras.models.load_model(dense_model_path)
-    
-    def call(self, inputs):
-        # SparseTensor of keys, shape: (batch_size*slot_num, max_nnz)
-        embeddings0 = tf.reshape(self.sparse_lookup_layer(sp_ids=inputs[0], sp_weights = None, combiner="mean"),
-                                shape=[-1, self.slot_num_per_table[0] * self.embed_vec_size_per_table[0]])
-        # Tensor of keys, shape: (batch_size, slot_num)
-        embeddings1 = tf.reshape(self.lookup_layer(inputs[1]), 
-                                 shape=[-1, self.slot_num_per_table[1] * self.embed_vec_size_per_table[1]])
-        
-        logit = self.dense_model([embeddings0, embeddings1])
-        return logit, embeddings0, embeddings1
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz_of_all_slots_per_table[0], ), sparse=True, dtype=args["tf_key_type"]),
-                  tf.keras.Input(shape=(self.slot_num_per_table[1], ), dtype=args["tf_key_type"])]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def create_and_save_inference_graph(args): 
-    model = InferenceModel(args["slot_num_per_table"], args["embed_vec_size_per_table"], args["max_nnz_per_slot_per_table"], args["dense_model_path"])
-    model.summary()
-    inputs = [tf.keras.Input(shape=(max(args["max_nnz_per_slot_per_table"][0]), ), sparse=True, dtype=args["tf_key_type"]),
-             tf.keras.Input(shape=(args["slot_num_per_table"][1], ), dtype=args["tf_key_type"])]
-    _, _, _= model(inputs)
-    model.save(args["saved_path"])
-
-
-
-
-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights_per_table[0], args["embedding_table_path"][0], args["embed_vec_size_per_table"][0])
-convert_to_sparse_model(embedding_weights_per_table[1], args["embedding_table_path"][1], args["embed_vec_size_per_table"][1])
-create_and_save_inference_graph(args)
-
-
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
Model: "model_2"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_5 (InputLayer)           [(None, 4)]          0           []                               
-                                                                                                  
- input_6 (InputLayer)           [(None, 2)]          0           []                               
-                                                                                                  
- sparse_lookup_layer (SparseLoo  (None, 16)          0           ['input_5[0][0]']                
- kupLayer)                                                                                        
-                                                                                                  
- lookup_layer (LookupLayer)     (None, 2, 32)        0           ['input_6[0][0]']                
-                                                                                                  
- tf.reshape_2 (TFOpLambda)      (None, 48)           0           ['sparse_lookup_layer[0][0]']    
-                                                                                                  
- tf.reshape_3 (TFOpLambda)      (None, 64)           0           ['lookup_layer[0][0]']           
-                                                                                                  
- model_1 (Functional)           (None, 1)            29441       ['tf.reshape_2[0][0]',           
-                                                                  'tf.reshape_3[0][0]']           
-                                                                                                  
-==================================================================================================
-Total params: 29,441
-Trainable params: 29,441
-Non-trainable params: 0
-__________________________________________________________________________________________________
-
-
-
WARNING:absl:Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_3 in the SavedModel.
-
-
-
INFO:tensorflow:Assets written to: multi_table_sparse_input_tf_saved_model/assets
-
-
-
INFO:tensorflow:Assets written to: multi_table_sparse_input_tf_saved_model/assets
-
-
-
-
-
-
-

Inference with saved model graph

-

In order to initialize the lookup service provided by HPS, we also need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We deploy a model that has two embedding tables here, and it can support multiple models with multiple embedding tables actually. Please note how maxnum_catfeature_query_per_table_per_sample is specified for the two embedding tables: the max_nnz_per_slot of the first table is [4, 2, 3], which sums to 9, and for the second table it is [1, 1], which sums to 2.

-

We first call hps.Init to do the necessary initialization work, and then load the saved model graph to make inference. We peek at the keys and the embedding vectors for each table for the last inference batch.

-
-
-
%%writefile multi_table_sparse_input.json
-{
-    "supportlonglong": true,
-    "models": [{
-        "model": "multi_table_sparse_input",
-        "sparse_files": ["multi_table_sparse_input_sparse_0.model", "multi_table_sparse_input_sparse_1.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0", "sparse_embedding1"],
-        "embedding_vecsize_per_table": [16, 32],
-        "maxnum_catfeature_query_per_table_per_sample": [9, 2],
-        "default_value_for_each_table": [1.0, 1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing multi_table_sparse_input.json
-
-
-
-
-
-
-
def inference_with_saved_model(args):
-    hps.Init(global_batch_size = args["global_batch_size"],
-             ps_config_file = args["ps_config_file"])
-    model = tf.keras.models.load_model(args["saved_path"])
-    model.summary()
-    def _infer_step(inputs, labels):
-        logit, embeddings0, embeddings1 = model(inputs)
-        return logit, embeddings0, embeddings1
-    embeddings0_peek = list()
-    embeddings1_peek = list()
-    inputs_peek = list()
-    sparse_keys, dense_keys, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot_per_table"], args["max_nnz_per_slot_per_table"])
-    dataset = tf_dataset(sparse_keys, dense_keys, labels, args["global_batch_size"])
-    for i, (sparse_keys, dense_keys, labels) in enumerate(dataset):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        inputs = [sparse_keys, dense_keys]
-        logit, embeddings0, embeddings1 = _infer_step(inputs, labels)
-        embeddings0_peek.append(embeddings0)
-        embeddings1_peek.append(embeddings1)
-        inputs_peek.append(inputs)
-        print("-"*20, "Step {}".format(i),  "-"*20)
-    return embeddings0_peek, embeddings1_peek, inputs_peek
-
-
-
-
-
-
-
embeddings0_peek, embeddings1_peek, inputs_peek = inference_with_saved_model(args)
-
-# 1st embedding table, input keys are SparseTensor 
-print(inputs_peek[-1][0].values)
-print(embeddings0_peek[-1])
-
-# 2nd embedding table, input keys are Tensor
-print(inputs_peek[-1][1])
-print(embeddings1_peek[-1])
-
-
-
-
-
=====================================================HPS Parse====================================================
-[HCTR][07:51:32.495][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:51:32.495][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-[HCTR][07:51:32.495][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:51:32.495][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:51:32.495][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:51:32.495][INFO][RK0][main]: refresh_interval is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][07:51:32.495][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:51:32.495][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:51:32.495][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:51:32.855][INFO][RK0][main]: Table: hps_et.multi_table_sparse_input.sparse_embedding0; cached 30000 / 30000 embeddings in volatile database (PreallocatedHashMapBackend); load: 30000 / 18446744073709551615 (0.00%).
-[HCTR][07:51:33.195][INFO][RK0][main]: Table: hps_et.multi_table_sparse_input.sparse_embedding1; cached 2000 / 2000 embeddings in volatile database (PreallocatedHashMapBackend); load: 2000 / 18446744073709551615 (0.00%).
-[HCTR][07:51:33.195][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:51:33.195][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:51:33.201][INFO][RK0][main]: Model name: multi_table_sparse_input
-[HCTR][07:51:33.201][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][07:51:33.201][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:51:33.201][INFO][RK0][main]: Use I64 input key: True
-[HCTR][07:51:33.201][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:51:33.201][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:51:33.201][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][07:51:33.201][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:51:33.212][INFO][RK0][main]: Creating lookup session for multi_table_sparse_input on device: 0
-WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
Model: "inference_model"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- sparse_lookup_layer (Sparse  multiple                 0         
- LookupLayer)                                                    
-                                                                 
- lookup_layer (LookupLayer)  multiple                  0         
-                                                                 
- model_1 (Functional)        (None, 1)                 29441     
-                                                                 
-=================================================================
-Total params: 29,441
-Trainable params: 29,441
-Non-trainable params: 0
-_________________________________________________________________
--------------------- Step 0 --------------------
--------------------- Step 1 --------------------
--------------------- Step 2 --------------------
--------------------- Step 3 --------------------
--------------------- Step 4 --------------------
--------------------- Step 5 --------------------
--------------------- Step 6 --------------------
--------------------- Step 7 --------------------
--------------------- Step 8 --------------------
--------------------- Step 9 --------------------
-tf.Tensor([ 9905  1750  4223 ... 20477 22119 23797], shape=(6111,), dtype=int64)
-tf.Tensor(
-[[0.9122444  0.9122444  0.9122444  ... 1.         1.         1.        ]
- [0.76979905 0.76979905 0.76979905 ... 1.         1.         1.        ]
- [0.7415885  0.7415885  0.7415885  ... 1.         1.         1.        ]
- ...
- [0.66938084 0.66938084 0.66938084 ... 1.         1.         1.        ]
- [0.90488005 0.90488005 0.90488005 ... 1.         1.         1.        ]
- [0.7773342  0.7773342  0.7773342  ... 0.6368773  0.6368773  0.6368773 ]], shape=(1024, 48), dtype=float32)
-tf.Tensor(
-[[ 276 1610]
- [ 408 1884]
- [ 678 1762]
- ...
- [ 369 1794]
- [ 403 1216]
- [ 909 1427]], shape=(1024, 2), dtype=int64)
-tf.Tensor(
-[[0.28882617 0.28882617 0.28882617 ... 0.41947648 0.41947648 0.41947648]
- [0.597903   0.597903   0.597903   ... 0.37505823 0.37505823 0.37505823]
- [0.70420146 0.70420146 0.70420146 ... 0.38864705 0.38864705 0.38864705]
- ...
- [0.32224336 0.32224336 0.32224336 ... 0.31987724 0.31987724 0.31987724]
- [0.43596342 0.43596342 0.43596342 ... 0.5383081  0.5383081  0.5383081 ]
- [0.37384593 0.37384593 0.37384593 ... 0.6026224  0.6026224  0.6026224 ]], shape=(1024, 64), dtype=float32)
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/hps_pretrained_model_training_demo.html b/review/pr-458/hps_tf/notebooks/hps_pretrained_model_training_demo.html deleted file mode 100644 index 782d7ec308..0000000000 --- a/review/pr-458/hps_tf/notebooks/hps_pretrained_model_training_demo.html +++ /dev/null @@ -1,801 +0,0 @@ - - - - - - - HPS Pretrained Model Training Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-pretrained-model-training-demo/nvidia_logo.png -
-

HPS Pretrained Model Training Demo

-
-

Overview

-

This notebook demonstrates how to use HPS to load pre-trained embedding tables. It is recommended to run hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use a deep neural network (DNN) model which has one embedding table and several dense layers. Please note that the input to the embedding layer will be a sparse key tensor.

-
-
-
import hierarchical_parameter_server as hps
-import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 4                               # the number of available GPUs
-args["iter_num"] = 10                             # the number of training iteration
-args["slot_num"] = 10                             # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 16                       # the dimension of embedding vectors
-args["dense_dim"] = 10                            # the dimension of dense features
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 100000
-args["vocabulary_range_per_slot"] = [[i*10000, (i+1)*10000] for i in range(10)] 
-args["max_nnz"] = 5                # the max number of non-zeros for all slots
-args["combiner"] = "mean"
-
-args["ps_config_file"] = "dnn.json"
-args["dense_model_path"] = "dnn_dense.model"
-args["embedding_table_path"] = "dnn_sparse.model"
-args["saved_path"] = "dnn_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, max_nnz, dense_dim):
-    def generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz, key_dtype = args["np_key_type"]):
-        slot_num = len(vocabulary_range_per_slot)
-        indices = []
-        values = []
-        for i in range(num_samples):
-            for j in range(slot_num):
-                vocab_range = vocabulary_range_per_slot[j]
-                nnz = np.random.randint(low=1, high=max_nnz+1)
-                entries = sorted(np.random.choice(max_nnz, nnz, replace=False))
-                for entry in entries:
-                    indices.append([i, j, entry])
-                values.extend(np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(nnz, )))
-        values = np.array(values, dtype=key_dtype)
-        return tf.sparse.SparseTensor(indices = indices,
-                                    values = values,
-                                    dense_shape = (num_samples, slot_num, max_nnz))
-
-    
-    sparse_keys = generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz)
-    dense_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return sparse_keys, dense_features, labels
-
-def tf_dataset(sparse_keys, dense_features, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((sparse_keys, dense_features, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with native TF layers

-

We define the model graph for training with native TF layers, i.e., tf.nn.embedding_lookup_sparse and tf.keras.layers.Dense. Besides, the embedding weights are stored in tf.Variable. We can then train the model and extract the trained weights of the embedding table.

-
-
-
class DNN(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors,
-                 combiner,
-                 embed_vec_size,
-                 slot_num,
-                 max_nnz,
-                 dense_dim,
-                 **kwargs):
-        super(DNN, self).__init__(**kwargs)
-        
-        self.combiner = combiner
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.max_nnz = max_nnz
-        self.dense_dim = dense_dim
-        self.params = tf.Variable(initial_value=tf.concat(init_tensors, axis=0))
-        self.fc1 = tf.keras.layers.Dense(units=1024, activation="relu", name="fc1")
-        self.fc2 = tf.keras.layers.Dense(units=256, activation="relu", name="fc2")
-        self.fc3 = tf.keras.layers.Dense(units=1, activation="sigmoid", name="fc3")
-
-    def call(self, inputs, training=True):
-        input_cat = inputs[0]
-        input_dense = inputs[1]
-        
-        # SparseTensor of keys, shape: (batch_size*slot_num, max_nnz)
-        embeddings = tf.reshape(tf.nn.embedding_lookup_sparse(params=self.params, sp_ids=input_cat, sp_weights = None, combiner=self.combiner),
-                                shape=[-1, self.slot_num * self.embed_vec_size])
-        concat_feas = tf.concat([embeddings, input_dense], axis=1)
-        logit = self.fc3(self.fc2(self.fc1(concat_feas)))
-        return logit, embeddings
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def train(args):
-    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
-    strategy = tf.distribute.MirroredStrategy()
-    with strategy.scope():
-        model = DNN(init_tensors, args["combiner"], args["embed_vec_size"], args["slot_num"], args["max_nnz"], args["dense_dim"])
-        model.summary()
-        optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)    
-
-    loss_fn = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
-    def _replica_loss(labels, logits):
-        loss = loss_fn(labels, logits)
-        return tf.nn.compute_average_loss(loss, global_batch_size=args["global_batch_size"])
-    
-    def _reshape_input(sparse_keys):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        return sparse_keys
-    
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit, _ = model(inputs)
-            loss = _replica_loss(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return logit, loss
-
-    def _dataset_fn(input_context):
-        replica_batch_size = input_context.get_per_replica_batch_size(args["global_batch_size"])
-        sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
-        dataset = tf_dataset(sparse_keys, dense_features, labels, replica_batch_size)
-        dataset = dataset.shard(input_context.num_input_pipelines, input_context.input_pipeline_id)
-        return dataset
-
-    dataset = strategy.distribute_datasets_from_function(_dataset_fn)
-    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
-        sparse_keys = strategy.run(_reshape_input, args=(sparse_keys,))
-        inputs = [sparse_keys, dense_features]  
-        _, loss = strategy.run(_train_step, args=(inputs, labels))
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-weights_list = trained_model.get_weights()
-embedding_weights = weights_list[-1]
-
-
-
-
-
2022-07-29 06:41:55.554588: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2022-07-29 06:41:57.606412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30989 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-2022-07-29 06:41:57.608128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30989 MB memory:  -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
-2022-07-29 06:41:57.609468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30989 MB memory:  -> device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0
-2022-07-29 06:41:57.610818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 30989 MB memory:  -> device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0
-
-
-
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
-WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup_sparse), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(100000, 16) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_1 (InputLayer)           [(None, 5)]          0           []                               
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 16)          0           ['input_1[0][0]']                
- up_sparse (TFOpLambda)                                                                           
-                                                                                                  
- tf.reshape (TFOpLambda)        (None, 160)          0           ['tf.compat.v1.nn.embedding_looku
-                                                                 p_sparse[0][0]']                 
-                                                                                                  
- input_2 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- tf.concat (TFOpLambda)         (None, 170)          0           ['tf.reshape[0][0]',             
-                                                                  'input_2[0][0]']                
-                                                                                                  
- fc1 (Dense)                    (None, 1024)         175104      ['tf.concat[0][0]']              
-                                                                                                  
- fc2 (Dense)                    (None, 256)          262400      ['fc1[0][0]']                    
-                                                                                                  
- fc3 (Dense)                    (None, 1)            257         ['fc2[0][0]']                    
-                                                                                                  
-==================================================================================================
-Total params: 437,761
-Trainable params: 437,761
-Non-trainable params: 0
-__________________________________________________________________________________________________
-WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
-WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
-
-
-
/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1082: UserWarning: "`binary_crossentropy` received `from_logits=True`, but the `output` argument was produced by a sigmoid or softmax activation and thus does not represent logits. Was this intended?"
-  return dispatch_target(*args, **kwargs)
-
-
-
INFO:tensorflow:batch_all_reduce: 6 all-reduces with algorithm = nccl, num_packs = 1
-WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
-INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
--------------------- Step 0, loss: PerReplica:{
-  0: tf.Tensor(0.1950232, shape=(), dtype=float32),
-  1: tf.Tensor(0.20766959, shape=(), dtype=float32),
-  2: tf.Tensor(0.2006835, shape=(), dtype=float32),
-  3: tf.Tensor(0.21188965, shape=(), dtype=float32)
-} --------------------
-WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
-WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
-INFO:tensorflow:batch_all_reduce: 6 all-reduces with algorithm = nccl, num_packs = 1
-WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
-INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
--------------------- Step 1, loss: PerReplica:{
-  0: tf.Tensor(681.73474, shape=(), dtype=float32),
-  1: tf.Tensor(691.33826, shape=(), dtype=float32),
-  2: tf.Tensor(588.15265, shape=(), dtype=float32),
-  3: tf.Tensor(622.72485, shape=(), dtype=float32)
-} --------------------
-WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
-INFO:tensorflow:batch_all_reduce: 6 all-reduces with algorithm = nccl, num_packs = 1
-WARNING:tensorflow:Efficient allreduce is not supported for 1 IndexedSlices
-INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
--------------------- Step 2, loss: PerReplica:{
-  0: tf.Tensor(6.9260483, shape=(), dtype=float32),
-  1: tf.Tensor(8.509967, shape=(), dtype=float32),
-  2: tf.Tensor(7.0374002, shape=(), dtype=float32),
-  3: tf.Tensor(7.1059036, shape=(), dtype=float32)
-} --------------------
-INFO:tensorflow:batch_all_reduce: 6 all-reduces with algorithm = nccl, num_packs = 1
--------------------- Step 3, loss: PerReplica:{
-  0: tf.Tensor(3.002458, shape=(), dtype=float32),
-  1: tf.Tensor(3.7079678, shape=(), dtype=float32),
-  2: tf.Tensor(3.333396, shape=(), dtype=float32),
-  3: tf.Tensor(3.6451607, shape=(), dtype=float32)
-} --------------------
-WARNING:tensorflow:5 out of the last 5 calls to <function _apply_all_reduce.<locals>._all_reduce at 0x7fba4c2dc1f0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
--------------------- Step 4, loss: PerReplica:{
-  0: tf.Tensor(0.8326673, shape=(), dtype=float32),
-  1: tf.Tensor(0.79405844, shape=(), dtype=float32),
-  2: tf.Tensor(0.85364443, shape=(), dtype=float32),
-  3: tf.Tensor(0.92679256, shape=(), dtype=float32)
-} --------------------
-WARNING:tensorflow:6 out of the last 6 calls to <function _apply_all_reduce.<locals>._all_reduce at 0x7fba4c2dcdc0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
--------------------- Step 5, loss: PerReplica:{
-  0: tf.Tensor(0.5796976, shape=(), dtype=float32),
-  1: tf.Tensor(0.54752666, shape=(), dtype=float32),
-  2: tf.Tensor(0.57471323, shape=(), dtype=float32),
-  3: tf.Tensor(0.54845804, shape=(), dtype=float32)
-} --------------------
--------------------- Step 6, loss: PerReplica:{
-  0: tf.Tensor(0.61678064, shape=(), dtype=float32),
-  1: tf.Tensor(0.647662, shape=(), dtype=float32),
-  2: tf.Tensor(0.6421599, shape=(), dtype=float32),
-  3: tf.Tensor(0.6278339, shape=(), dtype=float32)
-} --------------------
--------------------- Step 7, loss: PerReplica:{
-  0: tf.Tensor(0.28049487, shape=(), dtype=float32),
-  1: tf.Tensor(0.2768654, shape=(), dtype=float32),
-  2: tf.Tensor(0.2943622, shape=(), dtype=float32),
-  3: tf.Tensor(0.2805586, shape=(), dtype=float32)
-} --------------------
--------------------- Step 8, loss: PerReplica:{
-  0: tf.Tensor(1.2102679, shape=(), dtype=float32),
-  1: tf.Tensor(1.368755, shape=(), dtype=float32),
-  2: tf.Tensor(1.4997649, shape=(), dtype=float32),
-  3: tf.Tensor(1.5143406, shape=(), dtype=float32)
-} --------------------
--------------------- Step 9, loss: PerReplica:{
-  0: tf.Tensor(0.413176, shape=(), dtype=float32),
-  1: tf.Tensor(0.42411563, shape=(), dtype=float32),
-  2: tf.Tensor(0.38453132, shape=(), dtype=float32),
-  3: tf.Tensor(0.4314984, shape=(), dtype=float32)
-} --------------------
-
-
-
-
-
-
-

Load the pre-trained embeddings via HPS

-

In order to use HPS to load the pre-trained embeddings, they should be converted to the formats required by HPS. After that, we can train a new model which leverages the pre-trained embeddings and only updates the weights of dense layers. Please note that hps.SparseLookupLayer and hps.LookupLayer are not trainable.

-

In order to initialize the lookup service provided by HPS, we also need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We deploy a model that has one embedding table here, and it can support multiple models with multiple embedding tables actually. Please note how maxnum_catfeature_query_per_table_per_sample is specified for the embedding table: the max_nnz is 5 for all the slots and there are 10 slots, so this entry is configured as 50.

-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
%%writefile dnn.json
-{
-    "supportlonglong": true,
-    "models": [{
-        "model": "dnn",
-        "sparse_files": ["dnn_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [16],
-        "maxnum_catfeature_query_per_table_per_sample": [50],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0,1,2,3],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Overwriting dnn.json
-
-
-
-
-
-
-
class PreTrainedEmbedding(tf.keras.models.Model):
-    def __init__(self,
-                 combiner,
-                 embed_vec_size,
-                 slot_num,
-                 max_nnz,
-                 dense_dim,
-                 **kwargs):
-        super(PreTrainedEmbedding, self).__init__(**kwargs)
-        
-        self.combiner = combiner
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.max_nnz = max_nnz
-        self.dense_dim = dense_dim
-        
-        self.sparse_lookup_layer = hps.SparseLookupLayer(model_name = "dnn", 
-                                                         table_id = 0,
-                                                         emb_vec_size = self.embed_vec_size,
-                                                         emb_vec_dtype = args["tf_vector_type"])
-        # Only use one FC layer when leveraging pre-trained embeddings
-        self.new_fc = tf.keras.layers.Dense(units=1, activation="sigmoid", name="new_fc")
-
-    def call(self, inputs, training=True):
-        input_cat = inputs[0]
-        input_dense = inputs[1]
-        
-        # SparseTensor of keys, shape: (batch_size*slot_num, max_nnz)
-        embeddings = tf.reshape(self.sparse_lookup_layer(sp_ids=input_cat, sp_weights = None, combiner=self.combiner),
-                                shape=[-1, self.slot_num * self.embed_vec_size])
-        concat_feas = tf.concat([embeddings, input_dense], axis=1)
-        logit = self.new_fc(concat_feas)
-        return logit, embeddings
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def train_with_pretrained_embeddings(args):
-    strategy = tf.distribute.MirroredStrategy()
-    with strategy.scope():
-        hps.Init(global_batch_size = args["global_batch_size"], ps_config_file = args["ps_config_file"])
-        model = PreTrainedEmbedding(args["combiner"], args["embed_vec_size"], args["slot_num"], args["max_nnz"], args["dense_dim"])
-        model.summary()
-        optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
-        
-    loss_fn = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
-    def _replica_loss(labels, logits):
-        loss = loss_fn(labels, logits)
-        return tf.nn.compute_average_loss(loss, global_batch_size=args["global_batch_size"])
-    
-    def _reshape_input(sparse_keys):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        return sparse_keys
-    
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit, _ = model(inputs)
-            loss = _replica_loss(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return logit, loss
-    
-    def _dataset_fn(input_context):
-        replica_batch_size = input_context.get_per_replica_batch_size(args["global_batch_size"])
-        sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
-        dataset = tf_dataset(sparse_keys, dense_features, labels, replica_batch_size)
-        dataset = dataset.shard(input_context.num_input_pipelines, input_context.input_pipeline_id)
-        return dataset
-
-    dataset = strategy.distribute_datasets_from_function(_dataset_fn)
-    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
-        sparse_keys = strategy.run(_reshape_input, args=(sparse_keys,))
-        inputs = [sparse_keys, dense_features]
-        _, loss = strategy.run(_train_step, args=(inputs, labels))
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-    return model
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights, args["embedding_table_path"], args["embed_vec_size"])
-model = train_with_pretrained_embeddings(args)
-
-
-
-
-
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
-=====================================================HPS Parse====================================================
-You are using the plugin with MirroredStrategy.
-[HCTR][06:42:16.707][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][06:42:16.707][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][06:42:16.707][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][06:42:16.707][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][06:42:16.707][INFO][RK0][main]: refresh_interval is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][06:42:16.707][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][06:42:16.707][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:42:16.707][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:42:17.153][INFO][RK0][main]: Table: hps_et.dnn.sparse_embedding0; cached 100000 / 100000 embeddings in volatile database (PreallocatedHashMapBackend); load: 100000 / 18446744073709551615 (0.00%).
-[HCTR][06:42:17.153][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:42:17.153][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][06:42:17.160][INFO][RK0][main]: Model name: dnn
-[HCTR][06:42:17.160][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][06:42:17.160][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][06:42:17.160][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:42:17.160][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:42:17.160][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:42:17.160][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][06:42:17.160][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:42:17.170][INFO][RK0][main]: Creating embedding cache in device 1.
-[HCTR][06:42:17.177][INFO][RK0][main]: Model name: dnn
-[HCTR][06:42:17.177][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][06:42:17.177][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][06:42:17.177][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:42:17.177][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:42:17.177][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:42:17.177][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][06:42:17.177][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:42:17.180][INFO][RK0][main]: Creating embedding cache in device 2.
-[HCTR][06:42:17.188][INFO][RK0][main]: Model name: dnn
-[HCTR][06:42:17.188][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][06:42:17.188][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][06:42:17.188][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:42:17.188][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:42:17.188][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:42:17.188][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][06:42:17.188][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:42:17.191][INFO][RK0][main]: Creating embedding cache in device 3.
-[HCTR][06:42:17.197][INFO][RK0][main]: Model name: dnn
-[HCTR][06:42:17.197][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][06:42:17.197][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][06:42:17.197][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:42:17.197][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:42:17.197][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:42:17.197][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][06:42:17.197][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:42:17.300][INFO][RK0][main]: Creating lookup session for dnn on device: 0
-[HCTR][06:42:17.300][INFO][RK0][main]: Creating lookup session for dnn on device: 1
-[HCTR][06:42:17.300][INFO][RK0][main]: Creating lookup session for dnn on device: 2
-[HCTR][06:42:17.300][INFO][RK0][main]: Creating lookup session for dnn on device: 3
-Model: "model_1"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_3 (InputLayer)           [(None, 5)]          0           []                               
-                                                                                                  
- sparse_lookup_layer (SparseLoo  (None, 16)          0           ['input_3[0][0]']                
- kupLayer)                                                                                        
-                                                                                                  
- tf.reshape_1 (TFOpLambda)      (None, 160)          0           ['sparse_lookup_layer[0][0]']    
-                                                                                                  
- input_4 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- tf.concat_1 (TFOpLambda)       (None, 170)          0           ['tf.reshape_1[0][0]',           
-                                                                  'input_4[0][0]']                
-                                                                                                  
- new_fc (Dense)                 (None, 1)            171         ['tf.concat_1[0][0]']            
-                                                                                                  
-==================================================================================================
-Total params: 171
-Trainable params: 171
-Non-trainable params: 0
-__________________________________________________________________________________________________
--------------------- Step 0, loss: PerReplica:{
-  0: tf.Tensor(0.17934436, shape=(), dtype=float32),
-  1: tf.Tensor(0.17969523, shape=(), dtype=float32),
-  2: tf.Tensor(0.18917403, shape=(), dtype=float32),
-  3: tf.Tensor(0.18102707, shape=(), dtype=float32)
-} --------------------
--------------------- Step 1, loss: PerReplica:{
-  0: tf.Tensor(1.7858478, shape=(), dtype=float32),
-  1: tf.Tensor(1.68311, shape=(), dtype=float32),
-  2: tf.Tensor(1.66279, shape=(), dtype=float32),
-  3: tf.Tensor(1.5826445, shape=(), dtype=float32)
-} --------------------
--------------------- Step 2, loss: PerReplica:{
-  0: tf.Tensor(0.7325904, shape=(), dtype=float32),
-  1: tf.Tensor(0.7331751, shape=(), dtype=float32),
-  2: tf.Tensor(0.7210605, shape=(), dtype=float32),
-  3: tf.Tensor(0.7671325, shape=(), dtype=float32)
-} --------------------
--------------------- Step 3, loss: PerReplica:{
-  0: tf.Tensor(0.62144834, shape=(), dtype=float32),
-  1: tf.Tensor(0.5696643, shape=(), dtype=float32),
-  2: tf.Tensor(0.5946336, shape=(), dtype=float32),
-  3: tf.Tensor(0.64713424, shape=(), dtype=float32)
-} --------------------
--------------------- Step 4, loss: PerReplica:{
-  0: tf.Tensor(0.88115656, shape=(), dtype=float32),
-  1: tf.Tensor(0.9079187, shape=(), dtype=float32),
-  2: tf.Tensor(0.98161024, shape=(), dtype=float32),
-  3: tf.Tensor(0.97925556, shape=(), dtype=float32)
-} --------------------
--------------------- Step 5, loss: PerReplica:{
-  0: tf.Tensor(0.6572284, shape=(), dtype=float32),
-  1: tf.Tensor(0.6304919, shape=(), dtype=float32),
-  2: tf.Tensor(0.66552734, shape=(), dtype=float32),
-  3: tf.Tensor(0.6695935, shape=(), dtype=float32)
-} --------------------
--------------------- Step 6, loss: PerReplica:{
-  0: tf.Tensor(0.2002374, shape=(), dtype=float32),
-  1: tf.Tensor(0.19162768, shape=(), dtype=float32),
-  2: tf.Tensor(0.1874283, shape=(), dtype=float32),
-  3: tf.Tensor(0.19209734, shape=(), dtype=float32)
-} --------------------
-
-
-
-------------------- Step 7, loss: PerReplica:{
-  0: tf.Tensor(0.5284709, shape=(), dtype=float32),
-  1: tf.Tensor(0.6028371, shape=(), dtype=float32),
-  2: tf.Tensor(0.5635803, shape=(), dtype=float32),
-  3: tf.Tensor(0.5773235, shape=(), dtype=float32)
-} --------------------
--------------------- Step 8, loss: PerReplica:{
-  0: tf.Tensor(0.74001855, shape=(), dtype=float32),
-  1: tf.Tensor(0.71915305, shape=(), dtype=float32),
-  2: tf.Tensor(0.619328, shape=(), dtype=float32),
-  3: tf.Tensor(0.7890761, shape=(), dtype=float32)
-} --------------------
--------------------- Step 9, loss: PerReplica:{
-  0: tf.Tensor(0.55197906, shape=(), dtype=float32),
-  1: tf.Tensor(0.5565746, shape=(), dtype=float32),
-  2: tf.Tensor(0.52792, shape=(), dtype=float32),
-  3: tf.Tensor(0.6230979, shape=(), dtype=float32)
-} --------------------
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/hps_table_fusion_demo.html b/review/pr-458/hps_tf/notebooks/hps_table_fusion_demo.html deleted file mode 100644 index 9042b6bd47..0000000000 --- a/review/pr-458/hps_tf/notebooks/hps_table_fusion_demo.html +++ /dev/null @@ -1,729 +0,0 @@ - - - - - - - HPS Table Fusion Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-table-fusion-demo/nvidia_logo.png -
-

HPS Table Fusion Demo

-
-

Overview

-

This notebook demonstrates how to fuse embedding tables of the same embedding vector size with the HPS plugin for TensorFlow. It is recommended to run hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Create TF SavedModel

-

First of all we specify the required configurations, e.g., the arguments needed for generating the embedding tables, the template HPS JSON configuration file. We will use a naive deep neural network (DNN) model which has 8 embedding tables of the same emebedding vector size and one fully connected layer in this notebook.

-

We define the model with hps.LookupLayer and some native TF layers, and then save it in the SavedModel format. Please note that the table fusion is turned off here by setting fuse_embedding_table as False.

-
-
-
%%writefile create_model_for_table_fusion.py
-
-import hierarchical_parameter_server as hps
-import tensorflow as tf
-import os
-import numpy as np
-import struct
-import json
-import pytest
-import time
-
-NUM_GPUS = 1
-VOCAB_SIZE = 10000
-EMB_VEC_SIZE = 128
-NUM_QUERY_KEY = 26
-EMB_VEC_DTYPE = np.float32
-TF_KEY_TYPE = tf.int32
-MAX_BATCH_SIZE = 256
-NUM_ITERS = 100
-NUM_TABLES = 8
-USE_CONTEXT_STREAM = True
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(NUM_GPUS)))
-
-gpus = tf.config.experimental.list_physical_devices("GPU")
-for gpu in gpus:
-    tf.config.experimental.set_memory_growth(gpu, True)
-tf.config.threading.set_inter_op_parallelism_threads(1)
-
-hps_config = {
-    "supportlonglong": False,
-    "fuse_embedding_table": True,
-    "models": [
-        {
-            "model": str(NUM_TABLES) + "_table",
-            "sparse_files": [],
-            "num_of_worker_buffer_in_pool": NUM_TABLES,
-            "embedding_table_names": [],
-            "embedding_vecsize_per_table": [],
-            "maxnum_catfeature_query_per_table_per_sample": [],
-            "default_value_for_each_table": [0.0],
-            "deployed_device_list": [0],
-            "max_batch_size": MAX_BATCH_SIZE,
-            "cache_refresh_percentage_per_iteration": 1.0,
-            "hit_rate_threshold": 1.0,
-            "gpucacheper": 1.0,
-            "gpucache": True,
-            "embedding_cache_type": "dynamic",
-            "use_context_stream": True,
-        }
-    ],
-}
-
-def generate_embedding_tables(hugectr_sparse_model, vocab_range, embedding_vec_size):
-    os.system("mkdir -p {}".format(hugectr_sparse_model))
-    with open("{}/key".format(hugectr_sparse_model), "wb") as key_file, open(
-        "{}/emb_vector".format(hugectr_sparse_model), "wb"
-    ) as vec_file:
-        for key in range(vocab_range[0], vocab_range[1]):
-            vec = 0.00025 * np.ones((embedding_vec_size,)).astype(np.float32)
-            key_struct = struct.pack("q", key)
-            vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-            key_file.write(key_struct)
-            vec_file.write(vec_struct)
-
-
-def set_up_model_files():
-    for i in range(NUM_TABLES):
-        table_name = "table" + str(i)
-        model_file_name = "embeddings/" + table_name
-        generate_embedding_tables(
-            model_file_name, [i * VOCAB_SIZE, (i + 1) * VOCAB_SIZE], EMB_VEC_SIZE
-        )
-        hps_config["models"][0]["sparse_files"].append(model_file_name)
-        hps_config["models"][0]["embedding_table_names"].append(table_name)
-        hps_config["models"][0]["embedding_vecsize_per_table"].append(EMB_VEC_SIZE)
-        hps_config["models"][0]["maxnum_catfeature_query_per_table_per_sample"].append(
-            NUM_QUERY_KEY
-        )
-    return hps_config
-
-class InferenceModel(tf.keras.models.Model):
-    def __init__(self, num_tables, **kwargs):
-        super(InferenceModel, self).__init__(**kwargs)
-        self.lookup_layers = []
-        for i in range(num_tables):
-            self.lookup_layers.append(
-                hps.LookupLayer(
-                    model_name=str(NUM_TABLES) + "_table",
-                    table_id=i,
-                    emb_vec_size=EMB_VEC_SIZE,
-                    emb_vec_dtype=EMB_VEC_DTYPE,
-                    ps_config_file=str(NUM_TABLES) + "_table.json",
-                    global_batch_size=MAX_BATCH_SIZE,
-                    name="embedding_lookup" + str(i),
-                )
-            )
-        self.fc = tf.keras.layers.Dense(
-            units=1,
-            activation=None,
-            kernel_initializer="ones",
-            bias_initializer="zeros",
-            name="fc",
-        )
-
-    def call(self, inputs):
-        assert len(inputs) == len(self.lookup_layers)
-        embeddings = []
-        for i in range(len(inputs)):
-            embeddings.append(
-                tf.reshape(
-                    self.lookup_layers[i](inputs[i]), shape=[-1, NUM_QUERY_KEY * EMB_VEC_SIZE]
-                )
-            )
-        concat_embeddings = tf.concat(embeddings, axis=1)
-        logit = self.fc(concat_embeddings)
-        return logit
-
-    def summary(self):
-        inputs = []
-        for _ in range(len(self.lookup_layers)):
-            inputs.append(tf.keras.Input(shape=(NUM_QUERY_KEY,), dtype=TF_KEY_TYPE))
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-def create_savedmodel(hps_config):
-    # Overwrite JSON configuration file
-    hps_config["fuse_embedding_table"] = False
-    hps_config_json_object = json.dumps(hps_config, indent=4)
-    with open(str(NUM_TABLES) + "_table.json", "w") as outfile:
-        outfile.write(hps_config_json_object)
-
-    model = InferenceModel(NUM_TABLES)
-    model.summary()
-    inputs = []
-    for i in range(NUM_TABLES):
-        inputs.append(
-            np.random.randint(
-                i * VOCAB_SIZE, (i + 1) * VOCAB_SIZE, (MAX_BATCH_SIZE, NUM_QUERY_KEY)
-            ).astype(np.int32)
-        )
-    model(inputs)
-    model.save(str(NUM_TABLES) + "_table.savedmodel")
-
-    # Overwrite JSON configuration file
-    hps_config["fuse_embedding_table"] = True
-    hps_config_json_object = json.dumps(hps_config, indent=4)
-    with open(str(NUM_TABLES) + "_table.json", "w") as outfile:
-        outfile.write(hps_config_json_object)
-
-if __name__ == "__main__":
-    hps_config = set_up_model_files()
-    create_savedmodel(hps_config)
-
-
-
-
-
Writing create_model_for_table_fusion.py
-
-
-
-
-
-
-
import os
-os.system("python3 create_model_for_table_fusion.py")
-
-
-
-
-
2023-03-29 07:24:28.206281: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2023-03-29 07:24:36.420084: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2023-03-29 07:24:36.926162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30996 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
[INFO] hierarchical_parameter_server is imported
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_1 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_2 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_3 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_4 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_5 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_6 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_7 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- input_8 (InputLayer)           [(None, 26)]         0           []                               
-                                                                                                  
- embedding_lookup0 (LookupLayer  (None, 26, 128)     0           ['input_1[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup1 (LookupLayer  (None, 26, 128)     0           ['input_2[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup2 (LookupLayer  (None, 26, 128)     0           ['input_3[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup3 (LookupLayer  (None, 26, 128)     0           ['input_4[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup4 (LookupLayer  (None, 26, 128)     0           ['input_5[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup5 (LookupLayer  (None, 26, 128)     0           ['input_6[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup6 (LookupLayer  (None, 26, 128)     0           ['input_7[0][0]']                
- )                                                                                                
-                                                                                                  
- embedding_lookup7 (LookupLayer  (None, 26, 128)     0           ['input_8[0][0]']                
- )                                                                                                
-                                                                                                  
- tf.reshape (TFOpLambda)        (None, 3328)         0           ['embedding_lookup0[0][0]']      
-                                                                                                  
- tf.reshape_1 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup1[0][0]']      
-                                                                                                  
- tf.reshape_2 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup2[0][0]']      
-                                                                                                  
- tf.reshape_3 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup3[0][0]']      
-                                                                                                  
- tf.reshape_4 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup4[0][0]']      
-                                                                                                  
- tf.reshape_5 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup5[0][0]']      
-                                                                                                  
- tf.reshape_6 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup6[0][0]']      
-                                                                                                  
- tf.reshape_7 (TFOpLambda)      (None, 3328)         0           ['embedding_lookup7[0][0]']      
-                                                                                                  
- tf.concat (TFOpLambda)         (None, 26624)        0           ['tf.reshape[0][0]',             
-                                                                  'tf.reshape_1[0][0]',           
-                                                                  'tf.reshape_2[0][0]',           
-                                                                  'tf.reshape_3[0][0]',           
-                                                                  'tf.reshape_4[0][0]',           
-                                                                  'tf.reshape_5[0][0]',           
-                                                                  'tf.reshape_6[0][0]',           
-                                                                  'tf.reshape_7[0][0]']           
-                                                                                                  
- fc (Dense)                     (None, 1)            26625       ['tf.concat[0][0]']              
-                                                                                                  
-==================================================================================================
-Total params: 26,625
-Trainable params: 26,625
-Non-trainable params: 0
-__________________________________________________________________________________________________
-=====================================================HPS Parse====================================================
-[HCTR][07:24:38.079][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:24:38.079][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-[HCTR][07:24:38.079][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:24:38.079][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:24:38.079][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:24:38.079][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][07:24:38.079][INFO][RK0][main]: use_static_table is not specified using default: 0
-[HCTR][07:24:38.079][INFO][RK0][main]: HPS plugin uses context stream for model 8_table: True
-====================================================HPS Create====================================================
-[HCTR][07:24:38.080][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:24:38.080][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][07:24:38.080][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:24:38.080][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:24:38.080][DEBUG][RK0][main]: Created raw model loader in local memory!
-
-
-
[HCTR][07:24:38.547][INFO][RK0][main]: Table: hps_et.8_table.table0; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:39.379][INFO][RK0][main]: Table: hps_et.8_table.table1; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:39.830][INFO][RK0][main]: Table: hps_et.8_table.table2; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:40.448][INFO][RK0][main]: Table: hps_et.8_table.table3; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:40.899][INFO][RK0][main]: Table: hps_et.8_table.table4; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:41.934][INFO][RK0][main]: Table: hps_et.8_table.table5; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:43.097][INFO][RK0][main]: Table: hps_et.8_table.table6; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:45.296][INFO][RK0][main]: Table: hps_et.8_table.table7; cached 10000 / 10000 embeddings in volatile database (HashMapBackend); load: 10000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:45.296][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:24:45.297][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:24:45.306][INFO][RK0][main]: Model name: 8_table
-[HCTR][07:24:45.306][INFO][RK0][main]: Max batch size: 256
-[HCTR][07:24:45.306][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][07:24:45.306][INFO][RK0][main]: Number of embedding tables: 8
-[HCTR][07:24:45.306][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:24:45.306][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][07:24:45.306][INFO][RK0][main]: Use I64 input key: False
-[HCTR][07:24:45.306][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:24:45.306][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:24:45.306][INFO][RK0][main]: The size of worker memory pool: 8
-[HCTR][07:24:45.306][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:24:45.306][INFO][RK0][main]: The refresh percentage : 1.000000
-[HCTR][07:24:45.469][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table0
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table1
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table2
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table3
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table4
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table5
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table6
-[HCTR][07:24:45.470][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.table7
-[HCTR][07:24:45.475][INFO][RK0][main]: LookupSession i64_input_key: False
-[HCTR][07:24:45.475][INFO][RK0][main]: Creating lookup session for 8_table on device: 0
-
-
-
0
-
-
-
-
-
-
-

Make inference with HPS table fusion

-

We load the TF SavedModel and make inference for several batches. The table fusion is enabled since fuse_embedding_table is True within the HPS JSON configuration file.

-
-
-
import hierarchical_parameter_server as hps
-import tensorflow as tf
-import os
-import numpy as np
-import time
-
-NUM_GPUS = 1
-VOCAB_SIZE = 10000
-NUM_QUERY_KEY = 26
-MAX_BATCH_SIZE = 256
-NUM_ITERS = 100
-NUM_TABLES = 8
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(NUM_GPUS)))
-
-gpus = tf.config.experimental.list_physical_devices("GPU")
-for gpu in gpus:
-    tf.config.experimental.set_memory_growth(gpu, True)
-tf.config.threading.set_inter_op_parallelism_threads(1)
-
-
-model = tf.keras.models.load_model(str(NUM_TABLES) + "_table.savedmodel")
-inputs_seq = []
-for _ in range(NUM_ITERS + 1):
-    inputs = []
-    for i in range(NUM_TABLES):
-        inputs.append(
-            np.random.randint(
-                i * VOCAB_SIZE, (i + 1) * VOCAB_SIZE, (MAX_BATCH_SIZE, NUM_QUERY_KEY)
-            ).astype(np.int32)
-        )
-    inputs_seq.append(inputs)
-preds = model(inputs_seq[0])
-start = time.time()
-for i in range(NUM_ITERS):
-    print("-" * 20, "Step {}".format(i), "-" * 20)
-    preds = model(inputs_seq[i + 1])
-end = time.time()
-print(
-    "[INFO] Elapsed time for "
-    + str(NUM_ITERS)
-    + " iterations: "
-    + str(end - start)
-    + " seconds"
-)
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
2023-03-29 07:25:39.918038: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2023-03-29 07:25:42.325440: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2023-03-29 07:25:42.818316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30996 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-=====================================================HPS Parse====================================================
-[HCTR][07:25:43.756][INFO][RK0][main]: Table fusion is enabled for HPS. Please ensure that there is no key value overlap in different tables and the embedding lookup layer has no dependency in the model graph. For more information, see https://nvidia-merlin.github.io/HugeCTR/main/hierarchical_parameter_server/hps_database_backend.html#configuration
-[HCTR][07:25:43.756][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:25:43.756][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-[HCTR][07:25:43.756][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:25:43.756][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:25:43.756][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:25:43.756][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][07:25:43.756][INFO][RK0][main]: use_static_table is not specified using default: 0
-[HCTR][07:25:43.756][INFO][RK0][main]: HPS plugin uses context stream for model 8_table: True
-====================================================HPS Create====================================================
-[HCTR][07:25:43.756][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:25:43.756][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][07:25:43.756][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:25:43.756][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:25:43.756][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:25:44.292][INFO][RK0][main]: Table: hps_et.8_table.fused_embedding0; cached 80000 / 80000 embeddings in volatile database (HashMapBackend); load: 80000 / 18446744073709551615 (0.00%).
-[HCTR][07:25:44.292][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:25:44.292][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:25:44.299][INFO][RK0][main]: Model name: 8_table
-[HCTR][07:25:44.299][INFO][RK0][main]: Max batch size: 256
-[HCTR][07:25:44.299][INFO][RK0][main]: Fuse embedding tables: True
-[HCTR][07:25:44.299][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][07:25:44.299][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:25:44.299][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][07:25:44.299][INFO][RK0][main]: Use I64 input key: False
-[HCTR][07:25:44.299][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:25:44.299][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:25:44.299][INFO][RK0][main]: The size of worker memory pool: 8
-[HCTR][07:25:44.299][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:25:44.299][INFO][RK0][main]: The refresh percentage : 1.000000
-[HCTR][07:25:44.406][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:25:44.406][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.fused_embedding0
-[HCTR][07:25:44.407][INFO][RK0][main]: LookupSession i64_input_key: False
-[HCTR][07:25:44.407][INFO][RK0][main]: Creating lookup session for 8_table on device: 0
--------------------- Step 0 --------------------
--------------------- Step 1 --------------------
--------------------- Step 2 --------------------
--------------------- Step 3 --------------------
--------------------- Step 4 --------------------
--------------------- Step 5 --------------------
--------------------- Step 6 --------------------
--------------------- Step 7 --------------------
--------------------- Step 8 --------------------
--------------------- Step 9 --------------------
--------------------- Step 10 --------------------
--------------------- Step 11 --------------------
--------------------- Step 12 --------------------
--------------------- Step 13 --------------------
--------------------- Step 14 --------------------
--------------------- Step 15 --------------------
--------------------- Step 16 --------------------
--------------------- Step 17 --------------------
--------------------- Step 18 --------------------
--------------------- Step 19 --------------------
--------------------- Step 20 --------------------
--------------------- Step 21 --------------------
--------------------- Step 22 --------------------
--------------------- Step 23 --------------------
--------------------- Step 24 --------------------
--------------------- Step 25 --------------------
--------------------- Step 26 --------------------
--------------------- Step 27 --------------------
--------------------- Step 28 --------------------
--------------------- Step 29 --------------------
--------------------- Step 30 --------------------
--------------------- Step 31 --------------------
--------------------- Step 32 --------------------
--------------------- Step 33 --------------------
--------------------- Step 34 --------------------
--------------------- Step 35 --------------------
--------------------- Step 36 --------------------
--------------------- Step 37 --------------------
--------------------- Step 38 --------------------
--------------------- Step 39 --------------------
--------------------- Step 40 --------------------
--------------------- Step 41 --------------------
--------------------- Step 42 --------------------
--------------------- Step 43 --------------------
--------------------- Step 44 --------------------
--------------------- Step 45 --------------------
--------------------- Step 46 --------------------
--------------------- Step 47 --------------------
--------------------- Step 48 --------------------
--------------------- Step 49 --------------------
--------------------- Step 50 --------------------
--------------------- Step 51 --------------------
--------------------- Step 52 --------------------
--------------------- Step 53 --------------------
--------------------- Step 54 --------------------
--------------------- Step 55 --------------------
--------------------- Step 56 --------------------
--------------------- Step 57 --------------------
--------------------- Step 58 --------------------
--------------------- Step 59 --------------------
--------------------- Step 60 --------------------
--------------------- Step 61 --------------------
--------------------- Step 62 --------------------
--------------------- Step 63 --------------------
--------------------- Step 64 --------------------
--------------------- Step 65 --------------------
--------------------- Step 66 --------------------
--------------------- Step 67 --------------------
--------------------- Step 68 --------------------
--------------------- Step 69 --------------------
--------------------- Step 70 --------------------
--------------------- Step 71 --------------------
--------------------- Step 72 --------------------
--------------------- Step 73 --------------------
--------------------- Step 74 --------------------
--------------------- Step 75 --------------------
--------------------- Step 76 --------------------
--------------------- Step 77 --------------------
--------------------- Step 78 --------------------
--------------------- Step 79 --------------------
--------------------- Step 80 --------------------
--------------------- Step 81 --------------------
--------------------- Step 82 --------------------
--------------------- Step 83 --------------------
--------------------- Step 84 --------------------
--------------------- Step 85 --------------------
--------------------- Step 86 --------------------
--------------------- Step 87 --------------------
--------------------- Step 88 --------------------
--------------------- Step 89 --------------------
--------------------- Step 90 --------------------
--------------------- Step 91 --------------------
--------------------- Step 92 --------------------
--------------------- Step 93 --------------------
--------------------- Step 94 --------------------
--------------------- Step 95 --------------------
--------------------- Step 96 --------------------
--------------------- Step 97 --------------------
--------------------- Step 98 --------------------
--------------------- Step 99 --------------------
-[INFO] Elapsed time for 100 iterations: 0.9442901611328125 seconds
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/hps_tensorflow_triton_deployment_demo.html b/review/pr-458/hps_tf/notebooks/hps_tensorflow_triton_deployment_demo.html deleted file mode 100644 index 993ada1f4a..0000000000 --- a/review/pr-458/hps_tf/notebooks/hps_tensorflow_triton_deployment_demo.html +++ /dev/null @@ -1,1051 +0,0 @@ - - - - - - - Deploy SavedModel using HPS with Triton TensorFlow Backend — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-tensorflow-triton-deployment/nvidia_logo.png -
-

Deploy SavedModel using HPS with Triton TensorFlow Backend

-
-

Overview

-

This notebook demonstrates how to deploy the SavedModel that leverages HPS with Triton TensorFlow backend. It also shows how to apply TF-TRT optimization to SavedModel whose embedding lookup is based on HPS. It is recommended to run hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-

The Triton TensorFlow backend is also available in this container.

-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the paths to save the model and the model parameters. We will use a deep neural network (DNN) model which has one embedding table and several dense layers in this notebook. Please note that there are two inputs here, one is the key tensor (one-hot) while the other is the dense feature tensor.

-
-
-
import hierarchical_parameter_server as hps
-import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                               # the number of available GPUs
-args["iter_num"] = 10                             # the number of training iteration
-args["slot_num"] = 5                              # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 16                       # the dimension of embedding vectors
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 50000
-args["vocabulary_range_per_slot"] = [[0,10000],[10000,20000],[20000,30000],[30000,40000],[40000,50000]]
-args["dense_dim"] = 10
-
-args["dense_model_path"] = "hps_tf_triton_dense.model"
-args["ps_config_file"] = "hps_tf_triton.json"
-args["embedding_table_path"] = "hps_tf_triton_sparse_0.model"
-args["saved_path"] = "hps_tf_triton_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
-  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, dense_dim, key_dtype = args["np_key_type"]):
-    keys = list()
-    for vocab_range in vocabulary_range_per_slot:
-        keys_per_slot = np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(num_samples, 1), dtype=key_dtype)
-        keys.append(keys_per_slot)
-    keys = np.concatenate(np.array(keys), axis = 1)
-    dense_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return keys, dense_features, labels
-
-def tf_dataset(keys, dense_features, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((keys, dense_features, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with native TF layers

-

We define the model graph for training with native TF layers, i.e., tf.nn.embedding_lookup and tf.keras.layers.Dense. Besides, the embedding weights are stored in tf.Variable. We can then train the model and extract the trained weights of the embedding table. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

-
-
-
class TrainModel(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors,
-                 slot_num,
-                 embed_vec_size,
-                 dense_dim,
-                 **kwargs):
-        super(TrainModel, self).__init__(**kwargs)
-        
-        self.slot_num = slot_num
-        self.embed_vec_size = embed_vec_size
-        self.dense_dim = dense_dim
-        self.init_tensors = init_tensors
-        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))
-        self.concat = tf.keras.layers.Concatenate(axis=1, name="concatenate")
-        self.fc_1 = tf.keras.layers.Dense(units=256, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_1')
-        self.fc_2 = tf.keras.layers.Dense(units=1, activation=None,
-                                                 kernel_initializer="ones",
-                                                 bias_initializer="zeros",
-                                                 name='fc_2')
-
-    def call(self, inputs):
-        keys, dense_features = inputs[0], inputs[1]
-        embedding_vector = tf.nn.embedding_lookup(params=self.params, ids=keys)
-        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embed_vec_size])
-        concated_features = self.concat([embedding_vector, dense_features])
-        logit = self.fc_2(self.fc_1(concated_features))
-        return logit
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.slot_num, ), dtype=args["tf_key_type"]),
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def train(args):
-    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
-    
-    model = TrainModel(init_tensors, args["slot_num"], args["embed_vec_size"], args["dense_dim"])
-    model.summary()
-    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
-    
-    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
-    
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit = model(inputs)
-            loss = loss_fn(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return logit, loss
-
-    keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["dense_dim"])
-    dataset = tf_dataset(keys, dense_features, labels, args["global_batch_size"])
-    for i, (keys, dense_features, labels) in enumerate(dataset):
-        inputs = [keys, dense_features]
-        _, loss = _train_step(inputs, labels)
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-weights_list = trained_model.get_weights()
-embedding_weights = weights_list[-1]
-dense_model = tf.keras.Model(trained_model.get_layer("concatenate").input,
-                             trained_model.get_layer("fc_2").output)
-dense_model.summary()
-dense_model.save(args["dense_model_path"])
-
-
-
-
-
2022-11-23 01:36:13.919938: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2022-11-23 01:36:14.444040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30991 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(50000, 16) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_1 (InputLayer)           [(None, 5)]          0           []                               
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 5, 16)       0           ['input_1[0][0]']                
- up (TFOpLambda)                                                                                  
-                                                                                                  
- tf.reshape (TFOpLambda)        (None, 80)           0           ['tf.compat.v1.nn.embedding_looku
-                                                                 p[0][0]']                        
-                                                                                                  
- input_2 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- concatenate (Concatenate)      (None, 90)           0           ['tf.reshape[0][0]',             
-                                                                  'input_2[0][0]']                
-                                                                                                  
- fc_1 (Dense)                   (None, 256)          23296       ['concatenate[0][0]']            
-                                                                                                  
- fc_2 (Dense)                   (None, 1)            257         ['fc_1[0][0]']                   
-                                                                                                  
-==================================================================================================
-Total params: 23,553
-Trainable params: 23,553
-Non-trainable params: 0
-__________________________________________________________________________________________________
--------------------- Step 0, loss: 10934.333984375 --------------------
--------------------- Step 1, loss: 9218.0703125 --------------------
--------------------- Step 2, loss: 7060.255859375 --------------------
--------------------- Step 3, loss: 5094.876953125 --------------------
--------------------- Step 4, loss: 3605.475830078125 --------------------
--------------------- Step 5, loss: 2593.270751953125 --------------------
--------------------- Step 6, loss: 1741.0677490234375 --------------------
--------------------- Step 7, loss: 1045.5091552734375 --------------------
--------------------- Step 8, loss: 541.4227905273438 --------------------
--------------------- Step 9, loss: 242.8596649169922 --------------------
-Model: "model_1"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_3 (InputLayer)           [(None, 80)]         0           []                               
-                                                                                                  
- input_2 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- concatenate (Concatenate)      (None, 90)           0           ['input_3[0][0]',                
-                                                                  'input_2[0][0]']                
-                                                                                                  
- fc_1 (Dense)                   (None, 256)          23296       ['concatenate[1][0]']            
-                                                                                                  
- fc_2 (Dense)                   (None, 1)            257         ['fc_1[1][0]']                   
-                                                                                                  
-==================================================================================================
-Total params: 23,553
-Trainable params: 23,553
-Non-trainable params: 0
-__________________________________________________________________________________________________
-WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
-
-
-
WARNING:absl:Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_1 in the SavedModel.
-
-
-
INFO:tensorflow:Assets written to: hps_tf_triton_dense.model/assets
-
-
-
INFO:tensorflow:Assets written to: hps_tf_triton_dense.model/assets
-
-
-
-
-
-
-

Create the inference graph with HPS LookupLayer

-

In order to use HPS in the inference stage, we need to create a inference model graph which is almost the same as the train graph except that tf.nn.embedding_lookup is replaced by hps.LookupLayer. The trained dense model graph can be loaded directly, while the embedding weights should be converted to the formats required by HPS.

-

We can then save the inference model graph, which will be ready to be loaded for inference deployment. Please note that the inference SavedModel that leverages HPS will be deployed with the Triton TensorFlow backend, thus implicit initialization of HPS should be enabled by specifying ps_config_file and global_batch_size in the constructor of hps.LookupLayer. For more information, please refer to HPS Initialize.

-

To this end, we need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We only show how to deploy a model that has one embedding table here, and it can support multiple models with multiple embedding tables actually.

-
-
-
%%writefile hps_tf_triton.json
-{
-    "supportlonglong": true,
-    "models": [{
-        "model": "hps_tf_triton",
-        "sparse_files": ["/hugectr/hps_tf/notebooks/model_repo/hps_tf_triton/hps_tf_triton_sparse_0.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding1"],
-        "embedding_vecsize_per_table": [16],
-        "maxnum_catfeature_query_per_table_per_sample": [5],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing hps_tf_triton.json
-
-
-
-
-
-
-
triton_model_repo = "/hugectr/hps_tf/notebooks/model_repo/hps_tf_triton/"
-
-class InferenceModel(tf.keras.models.Model):
-    def __init__(self,
-                 slot_num,
-                 embed_vec_size,
-                 dense_dim,
-                 dense_model_path,
-                 **kwargs):
-        super(InferenceModel, self).__init__(**kwargs)
-        
-        self.slot_num = slot_num
-        self.embed_vec_size = embed_vec_size
-        self.dense_dim = dense_dim
-        self.lookup_layer = hps.LookupLayer(model_name = "hps_tf_triton", 
-                                            table_id = 0,
-                                            emb_vec_size = self.embed_vec_size,
-                                            emb_vec_dtype = args["tf_vector_type"],
-                                            ps_config_file = triton_model_repo + args["ps_config_file"],
-                                            global_batch_size = args["global_batch_size"],
-                                            name = "lookup")
-        self.dense_model = tf.keras.models.load_model(dense_model_path)
-
-    def call(self, inputs):
-        keys, dense_features = inputs[0], inputs[1]
-        embedding_vector = self.lookup_layer(keys)
-        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embed_vec_size])
-        logit = self.dense_model([embedding_vector, dense_features])
-        return logit
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.slot_num, ), dtype=args["tf_key_type"]),
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def create_and_save_inference_graph(args): 
-    model = InferenceModel(args["slot_num"], args["embed_vec_size"], args["dense_dim"], args["dense_model_path"])
-    model.summary()
-    _ = model([tf.keras.Input(shape=(args["slot_num"], ), dtype=args["tf_key_type"]),
-               tf.keras.Input(shape=(args["dense_dim"], ), dtype=tf.float32)])
-    model.save(args["saved_path"])
-
-
-
-
-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights, args["embedding_table_path"], args["embed_vec_size"])
-create_and_save_inference_graph(args)
-
-
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
Model: "model_2"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_4 (InputLayer)           [(None, 5)]          0           []                               
-                                                                                                  
- lookup (LookupLayer)           (None, 5, 16)        0           ['input_4[0][0]']                
-                                                                                                  
- tf.reshape_1 (TFOpLambda)      (None, 80)           0           ['lookup[0][0]']                 
-                                                                                                  
- input_5 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- model_1 (Functional)           (None, 1)            23553       ['tf.reshape_1[0][0]',           
-                                                                  'input_5[0][0]']                
-                                                                                                  
-==================================================================================================
-Total params: 23,553
-Trainable params: 23,553
-Non-trainable params: 0
-__________________________________________________________________________________________________
-INFO:tensorflow:Assets written to: hps_tf_triton_tf_saved_model/assets
-
-
-
INFO:tensorflow:Assets written to: hps_tf_triton_tf_saved_model/assets
-
-
-
-
-
-
-

Deploy SavedModel using HPS with Triton TensorFlow Backend

-

In order to deploy the inference SavedModel with the Triton TensorFlow backend, we need to create the model repository and define the config.pbtxt first. Please note that some required portions (i.e., the input and output tensors) of the model configuration are generated automatically by Triton (see Auto-Generated Model Configuration), so you do NOT need to specify them explicitly in config.pbtxt.

-
-
-
!mkdir -p model_repo/hps_tf_triton/1
-!mv hps_tf_triton_tf_saved_model model_repo/hps_tf_triton/1/model.savedmodel
-!mv hps_tf_triton_sparse_0.model model_repo/hps_tf_triton
-!mv hps_tf_triton.json model_repo/hps_tf_triton
-
-
-
-
-
-
-
%%writefile model_repo/hps_tf_triton/config.pbtxt
-name: "hps_tf_triton"
-platform: "tensorflow_savedmodel"
-max_batch_size:1024
-input [
-  {
-    name: "input_6"
-    data_type: TYPE_INT64
-    dims: [5]
-  },
-  {
-    name: "input_7"
-    data_type: TYPE_FP32
-    dims: [10]
-  }
-]
-output [
-  {
-    name: "output_1"
-    data_type: TYPE_FP32
-    dims: [1]
-  }
-]
-version_policy: {
-        specific:{versions: 1}
-},
-instance_group [
-  {
-    count: 1
-    kind : KIND_GPU
-    gpus: [0]
-  }
-]
-
-
-
-
-
Writing model_repo/hps_tf_triton/config.pbtxt
-
-
-
-
-
-
-
!tree model_repo/hps_tf_triton
-
-
-
-
-
model_repo/hps_tf_triton
-├── 1
-│   └── model.savedmodel
-│       ├── assets
-│       ├── keras_metadata.pb
-│       ├── saved_model.pb
-│       └── variables
-│           ├── variables.data-00000-of-00001
-│           └── variables.index
-├── config.pbtxt
-├── hps_tf_triton.json
-└── hps_tf_triton_sparse_0.model
-    ├── emb_vector
-    └── key
-
-5 directories, 8 files
-
-
-
-
-

We can then launch the Triton inference server using the TensorFlow backend. Please note that LD_PRELOAD is utilized to load the custom TensorFlow operations (i.e., HPS related operations) into Triton. For more information, please refer to TensorFlow Custom Operations in Triton.

-

Note: Since Background processes not supported by Jupyter, please launch the Triton Server according to the following command independently in the background.

-
-

LD_PRELOAD=/usr/local/lib/python3.8/dist-packages/merlin_hps-1.0.0-py3.8-linux-x86_64.egg/hierarchical_parameter_server/lib/libhierarchical_parameter_server.so tritonserver –model-repository=/hugectr/hps_tf/notebooks/model_repo –backend-config=tensorflow,version=2 –load-model=hps_tf_triton –model-control-mode=explicit

-
-

We can then send the requests to the Triton inference server using the HTTP client. Please note that HPS will be initialized implicitly when the first request is processed at the server side, and the latency can be higher than that of later requests.

-
-
-
!curl localhost:8000/v2/models/hps_tf_triton/config
-
-
-
-
-
{"name":"hps_tf_triton","platform":"tensorflow_savedmodel","backend":"tensorflow","version_policy":{"specific":{"versions":[1]}},"max_batch_size":1024,"input":[{"name":"input_6","data_type":"TYPE_INT64","format":"FORMAT_NONE","dims":[5],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"input_7","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[10],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false}],"output":[{"name":"output_1","data_type":"TYPE_FP32","dims":[1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[1024],"max_queue_delay_microseconds":0,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"hps_tf_triton_0","kind":"KIND_GPU","count":1,"gpus":[0],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.savedmodel","cc_model_filenames":{},"metric_tags":{},"parameters":{},"model_warmup":[]}
-
-
-
-
-
-
-
import os
-num_gpu = 1
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(num_gpu)))
-
-import numpy as np
-import tritonclient.http as httpclient
-from tritonclient.utils import *
-
-
-def send_inference_requests(num_requests, num_samples):
-    triton_client = httpclient.InferenceServerClient(url="localhost:8000", verbose=True)
-    triton_client.is_server_live()
-    triton_client.get_model_repository_index()
-
-    for i in range(num_requests):
-        print("--------------------------Request {}--------------------------".format(i))
-        key_tensor, dense_tensor, _ = generate_random_samples(num_samples, args["vocabulary_range_per_slot"], args["dense_dim"])
-
-        inputs = [
-            httpclient.InferInput("input_6", 
-                                  key_tensor.shape,
-                                  np_to_triton_dtype(np.int64)),
-            httpclient.InferInput("input_7", 
-                                  dense_tensor.shape,
-                                  np_to_triton_dtype(np.float32)),
-        ]
-
-        inputs[0].set_data_from_numpy(key_tensor)
-        inputs[1].set_data_from_numpy(dense_tensor)
-        outputs = [
-            httpclient.InferRequestedOutput("output_1")
-        ]
-
-        # print("Input key tensor is \n{}".format(key_tensor))
-        # print("Input dense tensor is \n{}".format(dense_tensor))
-        model_name = "hps_tf_triton"
-        with httpclient.InferenceServerClient("localhost:8000") as client:
-            response = client.infer(model_name,
-                                    inputs,
-                                    outputs=outputs)
-            result = response.get_response()
-
-            print("Response details:\n{}".format(result))
-
-
-
-
-
-
-
send_inference_requests(num_requests = 5, num_samples = 128)
-
-
-
-
-
GET /v2/health/live, headers None
-<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>
-POST /v2/repository/index, headers None
-
-<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '56'}>
-bytearray(b'[{"name":"hps_tf_triton","version":"1","state":"READY"}]')
---------------------------Request 0--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 1--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 2--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 3--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 4--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
-
-
-
-
-
-
-

Deploy TF-TRT SavedModel using HPS with Triton TensorFlow Backend

-

We can leverage TF-TRT to optimize the above inference TF SavedModel. The hps.LookupLayer will fall back to the TF ops while the TensorRT engine will be built to execute the dense network. The optimized TF-TRT SavedModel can still be deployed with Triton TensorFlow backend.

-

The TF-TRT SavedModel is be placed in the folder "model_repo/hps_tf_triton/2/" and the config.pbtxt file is updated correspondingly to load the version 2 of the inference model, i.e., the TF-TRT optimized one.

-
-
-
# Build TF-TRT SavedModel
-from tensorflow.python.compiler.tensorrt import trt_convert as trt
-
-ORIGINAL_MODEL_PATH = "model_repo/hps_tf_triton/1/model.savedmodel"
-NEW_MODEL_PATH = "model_repo/hps_tf_triton/2/model.savedmodel"
-
-# Instantiate the TF-TRT converter
-converter = trt.TrtGraphConverterV2(
-   input_saved_model_dir=ORIGINAL_MODEL_PATH,
-   precision_mode=trt.TrtPrecisionMode.FP32
-)
-
-# Convert the model into TRT compatible segments
-trt_func = converter.convert()
-converter.summary()
-
-keys, dense_features, _ = generate_random_samples(args["global_batch_size"], args["vocabulary_range_per_slot"], args["dense_dim"])
-keys  = tf.convert_to_tensor(keys)
-dense_features = tf.convert_to_tensor(dense_features)
-def input_fn():
-   yield [keys, dense_features]
-
-converter.build(input_fn=input_fn)
-converter.save(output_saved_model_dir=NEW_MODEL_PATH)
-
-
-
-
-
INFO:tensorflow:Linked TensorRT version: (8, 4, 2)
-
-
-
INFO:tensorflow:Linked TensorRT version: (8, 4, 2)
-
-
-
INFO:tensorflow:Loaded TensorRT version: (8, 4, 2)
-
-
-
INFO:tensorflow:Loaded TensorRT version: (8, 4, 2)
-
-
-
INFO:tensorflow:Clearing prior device assignments in loaded saved model
-
-
-
2022-11-23 01:37:22.924379: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
-2022-11-23 01:37:22.924537: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
-2022-11-23 01:37:22.928272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30991 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-INFO:tensorflow:Clearing prior device assignments in loaded saved model
-
-
-
INFO:tensorflow:Automatic mixed precision has been deactivated.
-
-
-
INFO:tensorflow:Automatic mixed precision has been deactivated.
-2022-11-23 01:37:23.028482: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
-2022-11-23 01:37:23.028568: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
-2022-11-23 01:37:24.061909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30991 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-2022-11-23 01:37:24.068593: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:198] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
-2022-11-23 01:37:24.069761: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:952] 
-
-################################################################################
-TensorRT unsupported/non-converted OP Report:
-	- NoOp -> 2x
-	- Placeholder -> 2x
-	- Identity -> 1x
-	- Init -> 1x
-	- Lookup -> 1x
-	- Reshape -> 1x
---------------------------------------------------------------------------------
-	- Total nonconverted OPs: 8
-	- Total nonconverted OP Types: 6
-For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
-################################################################################
-
-2022-11-23 01:37:24.069860: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1280] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 1 TRT Engines with  at least minimum_segment_size=3 nodes.
-2022-11-23 01:37:24.069893: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 1
-2022-11-23 01:37:24.060667: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] Replaced segment 0 consisting of 9 nodes by TRTEngineOp_000_000.
-
-
-
TRTEngineOP Name                 Device        # Nodes # Inputs      # Outputs     Input DTypes       Output Dtypes      Input Shapes       Output Shapes     
-================================================================================================================================================================
-TRTEngineOp_000_000              device:GPU:0  10      2             1             ['float32', 'f ... ['float32']        [[-1, 80], [-1 ... [[-1, 1]]         
-
-	- BiasAdd: 2x
-	- ConcatV2: 1x
-	- Const: 5x
-	- MatMul: 2x
-
-================================================================================================================================================================
-[*] Total number of TensorRT engines: 1
-[*] % of OPs Converted: 50.00% [10/20]
-
-=====================================================HPS Parse====================================================
-[HCTR][01:37:23.329][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][01:37:23.329][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][01:37:23.329][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][01:37:23.329][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][01:37:23.329][INFO][RK0][main]: refresh_interval is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][01:37:23.329][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][01:37:23.329][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][01:37:23.329][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][01:37:23.329][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][01:37:23.329][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][01:37:23.745][INFO][RK0][main]: Table: hps_et.hps_tf_triton.sparse_embedding1; cached 50000 / 50000 embeddings in volatile database (HashMapBackend); load: 50000 / 18446744073709551615 (0.00%).
-[HCTR][01:37:23.745][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][01:37:23.745][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][01:37:23.753][INFO][RK0][main]: Model name: hps_tf_triton
-[HCTR][01:37:23.753][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][01:37:23.753][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][01:37:23.753][INFO][RK0][main]: Use I64 input key: True
-[HCTR][01:37:23.753][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][01:37:23.753][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][01:37:23.753][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][01:37:23.753][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][01:37:23.753][INFO][RK0][main]: The refresh percentage : 0.200000
-[HCTR][01:37:23.778][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][01:37:23.814][INFO][RK0][main]: EC initialization for model: "hps_tf_triton", num_tables: 1
-[HCTR][01:37:23.814][INFO][RK0][main]: EC initialization on device: 0
-[HCTR][01:37:23.815][INFO][RK0][main]: Creating lookup session for hps_tf_triton on device: 0
-
-
-
2022-11-23 01:37:23.818078: I tensorflow/compiler/tf2tensorrt/common/utils.cc:104] Linked TensorRT version: 8.4.2
-2022-11-23 01:37:23.818150: I tensorflow/compiler/tf2tensorrt/common/utils.cc:106] Loaded TensorRT version: 8.4.2
-2022-11-23 01:37:28.749149: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1275] [TF-TRT] Sparse compute capability is enabled.
-2022-11-23 01:37:28.814132: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:86] DefaultLogger 1: [wrapper.cpp::CublasWrapper::85] Error Code 1: Cublas (Could not initialize cublas. Please check CUDA installation.)
-2022-11-23 01:37:28.817575: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1061] TF-TRT Warning: Engine creation for TRTEngineOp_000_000 failed. The native segment will be used instead. Reason: INTERNAL: Failed to build TensorRT engine
-2022-11-23 01:37:28.817694: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:894] TF-TRT Warning: Engine retrieval for input shapes: [[1024,80], [1024,10]] failed. Running native segment for TRTEngineOp_000_000
-2022-11-23 01:37:28.823806: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:894] TF-TRT Warning: Engine retrieval for input shapes: [[1024,80], [1024,10]] failed. Running native segment for TRTEngineOp_000_000
-
-
-
INFO:tensorflow:Assets written to: model_repo/hps_tf_triton/2/model.savedmodel/assets
-
-
-
INFO:tensorflow:Assets written to: model_repo/hps_tf_triton/2/model.savedmodel/assets
-
-
-
-
-
-
-
%%writefile model_repo/hps_tf_triton/config.pbtxt
-name: "hps_tf_triton"
-platform: "tensorflow_savedmodel"
-max_batch_size:1024
-input [
-  {
-    name: "input_6"
-    data_type: TYPE_INT64
-    dims: [5]
-  },
-  {
-    name: "input_7"
-    data_type: TYPE_FP32
-    dims: [10]
-  }
-]
-output [
-  {
-    name: "output_1"
-    data_type: TYPE_FP32
-    dims: [1]
-  }
-]
-version_policy: {
-        specific:{versions: 2}
-},
-instance_group [
-  {
-    count: 1
-    kind : KIND_GPU
-    gpus: [0]
-  }
-]
-
-
-
-
-
Overwriting model_repo/hps_tf_triton/config.pbtxt
-
-
-
-
-
-
-
!tree model_repo/hps_tf_triton
-
-
-
-
-
model_repo/hps_tf_triton
-├── 1
-│   └── model.savedmodel
-│       ├── assets
-│       ├── keras_metadata.pb
-│       ├── saved_model.pb
-│       └── variables
-│           ├── variables.data-00000-of-00001
-│           └── variables.index
-├── 2
-│   └── model.savedmodel
-│       ├── assets
-│       │   └── trt-serialized-engine.TRTEngineOp_000_000
-│       ├── saved_model.pb
-│       └── variables
-│           ├── variables.data-00000-of-00001
-│           └── variables.index
-├── config.pbtxt
-├── hps_tf_triton.json
-└── hps_tf_triton_sparse_0.model
-    ├── emb_vector
-    └── key
-
-9 directories, 12 files
-
-
-
-
-
-
-
# Release the occupied GPU memory by TensorFlow and Keras
-from numba import cuda
-cuda.select_device(0)
-cuda.close()
-
-
-
-
-

We can then launch the Triton inference server using the TensorFlow backend using the same command in the background. Please remember to kill the previous tritonserver process completely before launching it again. Otherwise, there could be out of memory errors.

-

When the triton server is successfully launched, we can then send the requests to it using the HTTP client again.

-
-
-
!curl localhost:8000/v2/models/hps_tf_triton/config
-
-
-
-
-
{"name":"hps_tf_triton","platform":"tensorflow_savedmodel","backend":"tensorflow","version_policy":{"specific":{"versions":[2]}},"max_batch_size":1024,"input":[{"name":"input_6","data_type":"TYPE_INT64","format":"FORMAT_NONE","dims":[5],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"input_7","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[10],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false}],"output":[{"name":"output_1","data_type":"TYPE_FP32","dims":[1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[1024],"max_queue_delay_microseconds":0,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"hps_tf_triton_0","kind":"KIND_GPU","count":1,"gpus":[0],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.savedmodel","cc_model_filenames":{},"metric_tags":{},"parameters":{},"model_warmup":[]}
-
-
-
-
-
-
-
send_inference_requests(num_requests = 5, num_samples = 128)
-
-
-
-
-
GET /v2/health/live, headers None
-<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>
-POST /v2/repository/index, headers None
-
-<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '56'}>
-bytearray(b'[{"name":"hps_tf_triton","version":"2","state":"READY"}]')
---------------------------Request 0--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '2', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 1--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '2', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 2--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '2', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 3--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '2', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
---------------------------Request 4--------------------------
-Response details:
-{'model_name': 'hps_tf_triton', 'model_version': '2', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [128, 1], 'parameters': {'binary_data_size': 512}}]}
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/index.html b/review/pr-458/hps_tf/notebooks/index.html deleted file mode 100644 index a6dae6b30f..0000000000 --- a/review/pr-458/hps_tf/notebooks/index.html +++ /dev/null @@ -1,271 +0,0 @@ - - - - - - - Hierarchical Parameter Server Notebooks — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Notebooks

-

This directory contains a set of Jupyter notebooks that demonstrate how to use HPS in TensorFlow.

-
-

Quickstart

-

The simplest way to run a one of our notebooks is with a Docker container. -A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. -Docker images are available from the NVIDIA GPU Cloud (NGC). -If you prefer to build the HugeCTR Docker image on your own, refer to Set Up the Development Environment With Merlin Containers.

-
-

Pull the NGC Docker

-

Pull the container using the following command:

-
docker pull nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
-
-
-
-
-

Clone the HugeCTR Repository

-

Use the following command to clone the HugeCTR repository:

-
git clone https://github.com/NVIDIA/HugeCTR
-
-
-
-
-

Start the Jupyter Notebook

-
    -
  1. Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:

    -
    docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
    -
    -
    -
  2. -
  3. Start Jupyter using these commands:

    -
    cd /hugectr/hps_tf/notebooks
    -jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'
    -
    -
    -
  4. -
  5. Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser: http://[host machine]:8888

    -

    Use the token available from the output by running the command above to log in. For example:

    -

    http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b

    -
  6. -
-
-
-
-

Notebook List

-

Here’s a list of notebooks that you can run:

- -
-
-

System Specifications

-

The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notebook

CPU

GPU

#GPUs

Author

hierarchical_parameter_server_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

hps_multi_table_sparse_input_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

hps_pretrained_model_training_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

4

Kingsley Liu

sok_to_hps_dlrm_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

hps_tensorflow_triton_deployment_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

hps_table_fusion_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/sok_to_hps_dlrm_demo.html b/review/pr-458/hps_tf/notebooks/sok_to_hps_dlrm_demo.html deleted file mode 100644 index ddc95b4eeb..0000000000 --- a/review/pr-458/hps_tf/notebooks/sok_to_hps_dlrm_demo.html +++ /dev/null @@ -1,905 +0,0 @@ - - - - - - - SOK to HPS DLRM Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-sok-to-dlrm-demo/nvidia_logo.png -
-

SOK to HPS DLRM Demo

-
-

Overview

-

This notebook demonstrates how to train a DLRM model with SparseOperationKit (SOK) and then make inference with HierarchicalParameterServer(HPS). It is recommended to run sparse_operation_kit_demo.ipynb and hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about SOK, please refer to SOK Documentation. For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get SOK from NGC

-

Both SOK and HPS Python modules are preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import sparse_operation_kit as sok"
-$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use DLRM model which has one embedding table, bottom MLP layers, interaction layer and top MLP layers. Please note that the input to the embedding layer will be a sparse key tensor.

-
-
-
import sparse_operation_kit as sok
-import sys
-sys.path.append("/hugectr/sparse_operation_kit/unit_test/test_scripts/tf2/")
-import utils
-
-import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                               # the number of available GPUs
-args["iter_num"] = 10                             # the number of training iteration
-args["slot_num"] = 26                             # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 16                       # the dimension of embedding vectors
-args["dense_dim"] = 13                            # the dimension of dense features
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 260000
-args["vocabulary_range_per_slot"] = [[i*10000, (i+1)*10000] for i in range(26)] 
-args["max_nnz"] = 10                # the max number of non-zeros for all slots
-args["combiner"] = "mean"
-
-args["ps_config_file"] = "dlrm.json"
-args["dense_model_path"] = "dlrm_dense.model"
-args["embedding_table_path"] = "dlrm_sparse.model"
-args["saved_path"] = "dlrm_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-args["optimizer"] = "plugin_adam"
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
[INFO]: sparse_operation_kit is imported
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, max_nnz, dense_dim):
-    def generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz, key_dtype = args["np_key_type"]):
-        slot_num = len(vocabulary_range_per_slot)
-        indices = []
-        values = []
-        for i in range(num_samples):
-            for j in range(slot_num):
-                vocab_range = vocabulary_range_per_slot[j]
-                nnz = np.random.randint(low=1, high=max_nnz+1)
-                entries = sorted(np.random.choice(max_nnz, nnz, replace=False))
-                for entry in entries:
-                    indices.append([i, j, entry])
-                values.extend(np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(nnz, )))
-        values = np.array(values, dtype=key_dtype)
-        return tf.sparse.SparseTensor(indices = indices,
-                                    values = values,
-                                    dense_shape = (num_samples, slot_num, max_nnz))
-
-    
-    sparse_keys = generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz)
-    dense_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return sparse_keys, dense_features, labels
-
-def tf_dataset(sparse_keys, dense_features, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((sparse_keys, dense_features, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with SOK embedding layers

-

We define the model graph for training with SOK embedding layers, i.e., sok.DistributedEmbedding. We can then train the model and save the trained weights of the embedding table into the formats required by HPS. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

-
-
-
class MLP(tf.keras.layers.Layer):
-    def __init__(self,
-                arch,
-                activation='relu',
-                out_activation=None,
-                **kwargs):
-        super(MLP, self).__init__(**kwargs)
-        self.layers = []
-        index = 0
-        for units in arch[:-1]:
-            self.layers.append(tf.keras.layers.Dense(units, activation=activation, name="{}_{}".format(kwargs['name'], index)))
-            index+=1
-        self.layers.append(tf.keras.layers.Dense(arch[-1], activation=out_activation, name="{}_{}".format(kwargs['name'], index)))
-
-            
-    def call(self, inputs, training=True):
-        x = self.layers[0](inputs)
-        for layer in self.layers[1:]:
-            x = layer(x)
-        return x
-
-class SecondOrderFeatureInteraction(tf.keras.layers.Layer):
-    def __init__(self, self_interaction=False):
-        super(SecondOrderFeatureInteraction, self).__init__()
-        self.self_interaction = self_interaction
-
-    def call(self, inputs):
-        batch_size = tf.shape(inputs)[0]
-        num_feas = tf.shape(inputs)[1]
-
-        dot_products = tf.matmul(inputs, inputs, transpose_b=True)
-
-        ones = tf.ones_like(dot_products)
-        mask = tf.linalg.band_part(ones, 0, -1)
-        out_dim = num_feas * (num_feas + 1) // 2
-
-        if not self.self_interaction:
-            mask = mask - tf.linalg.band_part(ones, 0, 0)
-            out_dim = num_feas * (num_feas - 1) // 2
-        flat_interactions = tf.reshape(tf.boolean_mask(dot_products, mask), (batch_size, out_dim))
-        return flat_interactions
-
-class DLRM(tf.keras.models.Model):
-    def __init__(self,
-                 combiner,
-                 max_vocabulary_size_per_gpu,
-                 embed_vec_size,
-                 slot_num,
-                 max_nnz,
-                 dense_dim,
-                 arch_bot,
-                 arch_top,
-                 self_interaction,
-                 **kwargs):
-        super(DLRM, self).__init__(**kwargs)
-        
-        self.combiner = combiner
-        self.max_vocabulary_size_per_gpu = max_vocabulary_size_per_gpu
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.max_nnz = max_nnz
-        self.dense_dim = dense_dim
-        
-        self.embedding_layer = sok.DistributedEmbedding(combiner=self.combiner,
-                                                        max_vocabulary_size_per_gpu=self.max_vocabulary_size_per_gpu,
-                                                        embedding_vec_size=self.embed_vec_size,
-                                                        slot_num=self.slot_num,
-                                                        max_nnz=self.max_nnz)
-        self.bot_nn = MLP(arch_bot, name = "bottom", out_activation='relu')
-        self.top_nn = MLP(arch_top, name = "top", out_activation='sigmoid')
-        self.interaction_op = SecondOrderFeatureInteraction(self_interaction)
-        if self_interaction:
-            self.interaction_out_dim = (self.slot_num+1) * (self.slot_num+2) // 2
-        else:
-            self.interaction_out_dim = self.slot_num * (self.slot_num+1) // 2
-        self.reshape_layer1 = tf.keras.layers.Reshape((1, arch_bot[-1]), name = "reshape1")
-        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "concat1")
-        self.concat2 = tf.keras.layers.Concatenate(axis=1, name = "concat2")
-            
-    def call(self, inputs, training=True):
-        input_cat = inputs[0]
-        input_dense = inputs[1]
-        
-        embedding_vector = self.embedding_layer(input_cat, training=training)
-        dense_x = self.bot_nn(input_dense)
-        concat_features = self.concat1([embedding_vector, self.reshape_layer1(dense_x)])
-        
-        Z = self.interaction_op(concat_features)
-        z = self.concat2([dense_x, Z])
-        logit = self.top_nn(z)
-        return logit, embedding_vector
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
def train(args):
-    dlrm = DLRM(combiner = "mean", 
-                max_vocabulary_size_per_gpu = args["max_vocabulary_size"] // args["gpu_num"],
-                embed_vec_size = args["embed_vec_size"],
-                slot_num = args["slot_num"],
-                max_nnz = args["max_nnz"],
-                dense_dim = args["dense_dim"],
-                arch_bot = [256, 128, args["embed_vec_size"]],
-                arch_top = [256, 128, 1],
-                self_interaction = False)
-
-    emb_opt = utils.get_embedding_optimizer(args["optimizer"])(learning_rate=0.1)
-    dense_opt = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)
-
-    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
-    embedding_saver = sok.Saver()
-    embedding_saver.load_embedding_values(dlrm.embedding_layer.embedding_variable, init_tensors)
-
-    loss_fn = tf.keras.losses.BinaryCrossentropy()
-
-    @tf.function
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit, embedding_vector = dlrm(inputs, training=True)
-            loss = loss_fn(labels, logit)
-        embedding_variables, other_variable = sok.split_embedding_variable_from_others(dlrm.trainable_variables)
-        grads, emb_grads = tape.gradient(loss, [other_variable, embedding_variables])
-        if 'plugin' not in args["optimizer"]:
-            with sok.OptimizerScope(embedding_variables):
-                emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
-                                        experimental_aggregate_gradients=False)
-        else:
-            emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
-                                    experimental_aggregate_gradients=False)
-        dense_opt.apply_gradients(zip(grads, other_variable))
-        return logit, embedding_vector, loss
-
-    sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
-    dataset = tf_dataset(sparse_keys, dense_features, labels, args["global_batch_size"])
-    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        inputs = [sparse_keys, dense_features]
-        logit, embedding_vector, loss = _train_step(inputs, labels)
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-    return dlrm, embedding_saver
-
-
-
-
-
-
-
sok.Init(global_batch_size=args["global_batch_size"])
-trained_model, embedding_saver = train(args)
-trained_model.summary()
-
-
-
-
-
2022-07-29 07:16:16.793169: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2022-07-29 07:16:17.323141: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
-2022-07-29 07:16:17.323214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30997 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:107] Mapping from local_replica_id to device_id:
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:109] 0 -> 0
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:84] Global seed is 4287744788
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:85] Local GPU Count: 1
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:86] Global GPU Count: 1
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:127] Global Replica Id: 0; Local Replica Id: 0
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:132] Created embedding variable whose name is EmbeddingVariable
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_param.cc:120] Variable: EmbeddingVariable on global_replica_id: 0 start initialization
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_param.cc:137] Variable: EmbeddingVariable on global_replica_id: 0 initialization done.
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/facade.cc:257] SparseOperationKit allocated internal memory.
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:225] Loading embedding values to Variable: EmbeddingVariable...
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_param.cc:378] Allocated temporary buffer for loading embedding values.
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:299] num_total_keys = 260000, while total_max_vocabulary_size = 260000
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:350] Worker 0: Start uploading parameters. Total loop_num = 260
-2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:235] Loaded embedding values to Variable: EmbeddingVariable.
-
-
-
/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1082: UserWarning: "`binary_crossentropy` received `from_logits=True`, but the `output` argument was produced by a sigmoid or softmax activation and thus does not represent logits. Was this intended?"
-  return dispatch_target(*args, **kwargs)
-
-
-
-------------------- Step 0, loss: 0.9379717111587524 --------------------
--------------------- Step 1, loss: 12726.013671875 --------------------
--------------------- Step 2, loss: 73.78772735595703 --------------------
--------------------- Step 3, loss: 71.33247375488281 --------------------
--------------------- Step 4, loss: 33.48320770263672 --------------------
--------------------- Step 5, loss: 234.79978942871094 --------------------
--------------------- Step 6, loss: 1.6663873195648193 --------------------
--------------------- Step 7, loss: 30.426162719726562 --------------------
--------------------- Step 8, loss: 2.430748462677002 --------------------
--------------------- Step 9, loss: 4.768443584442139 --------------------
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_2 (InputLayer)           [(None, 13)]         0           []                               
-                                                                                                  
- bottom (MLP)                   (None, 16)           38544       ['input_2[0][0]']                
-                                                                                                  
- input_1 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- distributed_embedding (Distrib  (None, 26, 16)      4160000     ['input_1[0][0]']                
- utedEmbedding)                                                                                   
-                                                                                                  
- reshape1 (Reshape)             (None, 1, 16)        0           ['bottom[0][0]']                 
-                                                                                                  
- concat1 (Concatenate)          (None, 27, 16)       0           ['distributed_embedding[0][0]',  
-                                                                  'reshape1[0][0]']               
-                                                                                                  
- second_order_feature_interacti  (None, None)        0           ['concat1[0][0]']                
- on (SecondOrderFeatureInteract                                                                   
- ion)                                                                                             
-                                                                                                  
- concat2 (Concatenate)          (None, None)         0           ['bottom[0][0]',                 
-                                                                  'second_order_feature_interactio
-                                                                 n[0][0]']                        
-                                                                                                  
- top (MLP)                      (None, 1)            127233      ['concat2[0][0]']                
-                                                                                                  
-==================================================================================================
-Total params: 4,325,777
-Trainable params: 4,325,777
-Non-trainable params: 0
-__________________________________________________________________________________________________
-
-
-
-
-
-
-
dense_model = tf.keras.Model([trained_model.get_layer("distributed_embedding").output,
-                             trained_model.get_layer("bottom").input],
-                             trained_model.get_layer("top").output)
-dense_model.summary()
-dense_model.save(args["dense_model_path"])
-
-
-
-
-
Model: "model_1"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_2 (InputLayer)           [(None, 13)]         0           []                               
-                                                                                                  
- bottom (MLP)                   (None, 16)           38544       ['input_2[0][0]']                
-                                                                                                  
- input_3 (InputLayer)           [(None, 26, 16)]     0           []                               
-                                                                                                  
- reshape1 (Reshape)             (None, 1, 16)        0           ['bottom[1][0]']                 
-                                                                                                  
- concat1 (Concatenate)          (None, 27, 16)       0           ['input_3[0][0]',                
-                                                                  'reshape1[1][0]']               
-                                                                                                  
- second_order_feature_interacti  (None, None)        0           ['concat1[1][0]']                
- on (SecondOrderFeatureInteract                                                                   
- ion)                                                                                             
-                                                                                                  
- concat2 (Concatenate)          (None, None)         0           ['bottom[1][0]',                 
-                                                                  'second_order_feature_interactio
-                                                                 n[1][0]']                        
-                                                                                                  
- top (MLP)                      (None, 1)            127233      ['concat2[1][0]']                
-                                                                                                  
-==================================================================================================
-Total params: 165,777
-Trainable params: 165,777
-Non-trainable params: 0
-__________________________________________________________________________________________________
-WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
-
-
-
2022-07-29 07:16:56.089529: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
-WARNING:absl:Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_1 in the SavedModel.
-WARNING:absl:Found untraced functions such as bottom_0_layer_call_fn, bottom_0_layer_call_and_return_conditional_losses, bottom_1_layer_call_fn, bottom_1_layer_call_and_return_conditional_losses, bottom_2_layer_call_fn while saving (showing 5 of 12). These functions will not be directly callable after loading.
-
-
-
INFO:tensorflow:Assets written to: dlrm_dense.model/assets
-
-
-
INFO:tensorflow:Assets written to: dlrm_dense.model/assets
-
-
-
-
-
-
-
!mkdir -p dlrm_sparse.model
-embedding_saver.dump_to_file(trained_model.embedding_layer.embedding_variable, args["embedding_table_path"])
-!mv dlrm_sparse.model/EmbeddingVariable_keys.file dlrm_sparse.model/key
-!mv dlrm_sparse.model/EmbeddingVariable_values.file dlrm_sparse.model/emb_vector
-!ls -l dlrm_sparse.model
-
-
-
-
-
2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:192] Saving EmbeddingVariable to dlrm_sparse.model..
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:60] Worker: 0, GPU: 0 key-index count = 260000
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:147] Worker: 0, GPU: 0: dumping parameters from hashtable..
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:200] Saved EmbeddingVariable to dlrm_sparse.model.
-total 18360
--rw-r--r-- 1 nobody nogroup 16640000 Jul 29 07:17 emb_vector
--rw-r--r-- 1 nobody nogroup  2080000 Jul 29 07:17 key
-
-
-
-
-
-
-

Create the inference graph with HPS SparseLookupLayer

-

In order to use HPS in the inference stage, we need to create a inference model graph which is almost the same as the train graph except that sok.DistributedEmbedding is replaced by hps.SparseLookupLayer. The trained dense model graph can be loaded directly, while the weights of the embedding table can be retrieved by HPS from the folder dlrm_sparse.model.

-

We can then save the inference model graph, which will be ready to be loaded for inference deployment.

-
-
-
import hierarchical_parameter_server as hps
-
-class InferenceModel(tf.keras.models.Model):
-    def __init__(self,
-                 slot_num,
-                 embed_vec_size,
-                 max_nnz,
-                 dense_dim,
-                 dense_model_path,
-                 **kwargs):
-        super(InferenceModel, self).__init__(**kwargs)
-        
-        self.slot_num = slot_num
-        self.embed_vec_size = embed_vec_size
-        self.max_nnz = max_nnz
-        self.dense_dim = dense_dim
-        
-        self.sparse_lookup_layer = hps.SparseLookupLayer(model_name = "dlrm", 
-                                            table_id = 0,
-                                            emb_vec_size = self.embed_vec_size,
-                                            emb_vec_dtype = args["tf_vector_type"])
-        self.dense_model = tf.keras.models.load_model(dense_model_path)
-    
-    def call(self, inputs):
-        input_cat = inputs[0]
-        input_dense = inputs[1]
-
-        embeddings = tf.reshape(self.sparse_lookup_layer(sp_ids=input_cat, sp_weights = None, combiner="mean"),
-                                shape=[-1, self.slot_num, self.embed_vec_size])
-        logit = self.dense_model([embeddings, input_dense])
-        return logit, embeddings
-
-    def summary(self):
-        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
-                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
[INFO] hierarchical_parameter_server is imported
-
-
-
-
-
-
-
def create_and_save_inference_graph(args): 
-    model = InferenceModel(args["slot_num"], args["embed_vec_size"], args["max_nnz"], args["dense_dim"], args["dense_model_path"])
-    model.summary()
-    inputs = [tf.keras.Input(shape=(args["max_nnz"], ), sparse=True, dtype=args["tf_key_type"]), 
-              tf.keras.Input(shape=(args["dense_dim"], ), dtype=tf.float32)]
-    _, _ = model(inputs)
-    model.save(args["saved_path"])
-
-
-
-
-
-
-
create_and_save_inference_graph(args)
-
-
-
-
-
2022-07-29 07:24:43.911439: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-2022-07-29 07:24:44.490542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30989 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- input_1 (InputLayer)           [(None, 10)]         0           []                               
-                                                                                                  
- sparse_lookup_layer (SparseLoo  (None, 16)          0           ['input_1[0][0]']                
- kupLayer)                                                                                        
-                                                                                                  
- tf.reshape (TFOpLambda)        (None, 26, 16)       0           ['sparse_lookup_layer[0][0]']    
-                                                                                                  
- input_2 (InputLayer)           [(None, 13)]         0           []                               
-                                                                                                  
- model_1 (Functional)           (None, 1)            165777      ['tf.reshape[0][0]',             
-                                                                  'input_2[0][0]']                
-                                                                                                  
-==================================================================================================
-Total params: 165,777
-Trainable params: 165,777
-Non-trainable params: 0
-__________________________________________________________________________________________________
-
-
-
2022-07-29 07:24:48.043599: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
-WARNING:absl:Function `_wrapped_model` contains input name(s) args_0 with unsupported characters which will be renamed to args_0_3 in the SavedModel.
-WARNING:absl:Found untraced functions such as bottom_0_layer_call_fn, bottom_0_layer_call_and_return_conditional_losses, bottom_1_layer_call_fn, bottom_1_layer_call_and_return_conditional_losses, bottom_2_layer_call_fn while saving (showing 5 of 12). These functions will not be directly callable after loading.
-
-
-
INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets
-
-
-
INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets
-
-
-
-
-
-
-

Inference with saved model graph

-

In order to initialize the lookup service provided by HPS, we also need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We deploy the DLRM model that has one embedding table here, and it can support multiple models with multiple embedding tables actually. Please note how maxnum_catfeature_query_per_table_per_sample is specified for the embedding table: the max_nnz is 10 for all the slots and there are 26 slots, so this entry is configured as 260.

-

We first call hps.Init to do the necessary initialization work, and then load the saved model graph to make inference. We peek at the keys and the embedding vectors for each table for the last inference batch.

-
-
-
%%writefile dlrm.json
-{
-    "supportlonglong": true,
-    "models": [{
-        "model": "dlrm",
-        "sparse_files": ["dlrm_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [16],
-        "maxnum_catfeature_query_per_table_per_sample": [260],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Overwriting dlrm.json
-
-
-
-
-
-
-
def inference_with_saved_model(args):
-    hps.Init(global_batch_size = args["global_batch_size"],
-             ps_config_file = args["ps_config_file"])
-    model = tf.keras.models.load_model(args["saved_path"])
-    model.summary()
-    def _infer_step(inputs, labels):
-        logit, embeddings = model(inputs)
-        return logit, embeddings
-    
-    embeddings_peek = list()
-    inputs_peek = list()
-    
-    sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
-    dataset = tf_dataset(sparse_keys, dense_features, labels, args["global_batch_size"])
-    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
-        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
-        inputs = [sparse_keys, dense_features]
-        logit, embeddings = _infer_step(inputs, labels)
-        embeddings_peek.append(embeddings)
-        inputs_peek.append(inputs)
-        print("-"*20, "Step {}".format(i),  "-"*20)
-    return embeddings_peek, inputs_peek
-
-
-
-
-
-
-
embeddings_peek, inputs_peek = inference_with_saved_model(args)
-
-# embedding table, input keys are SparseTensor 
-print(inputs_peek[-1][0].values)
-print(embeddings_peek[-1])
-
-
-
-
-
=====================================================HPS Parse====================================================
-[HCTR][07:24:53.183][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:24:53.183][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:24:53.183][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:24:53.183][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:24:53.183][INFO][RK0][main]: refresh_interval is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][07:24:53.184][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:24:53.184][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:24:53.184][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:24:53.682][INFO][RK0][main]: Table: hps_et.dlrm.sparse_embedding0; cached 260000 / 260000 embeddings in volatile database (PreallocatedHashMapBackend); load: 260000 / 18446744073709551615 (0.00%).
-[HCTR][07:24:53.682][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:24:53.682][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:24:53.689][INFO][RK0][main]: Model name: dlrm
-[HCTR][07:24:53.689][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][07:24:53.689][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:24:53.689][INFO][RK0][main]: Use I64 input key: True
-[HCTR][07:24:53.689][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:24:53.689][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:24:53.689][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][07:24:53.689][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:24:53.736][INFO][RK0][main]: Creating lookup session for dlrm on device: 0
-WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
-
-
-
Model: "inference_model"
-_________________________________________________________________
- Layer (type)                Output Shape              Param #   
-=================================================================
- sparse_lookup_layer (Sparse  multiple                 0         
- LookupLayer)                                                    
-                                                                 
- model_1 (Functional)        (None, 1)                 165777    
-                                                                 
-=================================================================
-Total params: 165,777
-Trainable params: 165,777
-Non-trainable params: 0
-_________________________________________________________________
--------------------- Step 0 --------------------
--------------------- Step 1 --------------------
--------------------- Step 2 --------------------
--------------------- Step 3 --------------------
--------------------- Step 4 --------------------
--------------------- Step 5 --------------------
--------------------- Step 6 --------------------
--------------------- Step 7 --------------------
--------------------- Step 8 --------------------
--------------------- Step 9 --------------------
-tf.Tensor([   888   4486   5745 ... 255671 252879 252045], shape=(145888,), dtype=int64)
-tf.Tensor(
-[[[0.6825647  0.6801282  0.68074    ... 0.68074226 0.6818684  0.6809397 ]
-  [1.3980061  1.3981627  1.3980061  ... 1.3980992  1.3980061  1.3980061 ]
-  [0.78289294 0.7833897  0.78293324 ... 0.78336245 0.78305507 0.78301686]
-  ...
-  [0.880705   0.88164043 0.88109225 ... 0.87982655 0.88028604 0.88119066]
-  [0.8650326  0.86442304 0.86414057 ... 0.8642554  0.8640611  0.8645548 ]
-  [0.783202   0.78315204 0.78240466 ... 0.7826805  0.78258413 0.7824805 ]]
-
- [[0.8573375  0.85796195 0.85979205 ... 0.8595341  0.85846806 0.85798156]
-  [0.7563881  0.7563928  0.7564304  ... 0.7563316  0.7563634  0.7564283 ]
-  [0.62020814 0.6213356  0.62018126 ... 0.62036    0.6201106  0.6201722 ]
-  ...
-  [0.85459447 0.85330284 0.854774   ... 0.854769   0.8547034  0.85447353]
-  [0.64481944 0.6447684  0.6449137  ... 0.64472693 0.64465916 0.64503783]
-  [0.7852191  0.78577    0.78521436 ... 0.7852911  0.78544927 0.7853453 ]]
-
- [[0.6184057  0.61849916 0.61735946 ... 0.61852926 0.61921203 0.6175788 ]
-  [0.7092892  0.7092928  0.7092843  ... 0.70928746 0.70928514 0.70928574]
-  [0.6360293  0.6360285  0.636029   ... 0.63602984 0.63602865 0.63602734]
-  ...
-  [0.69062346 0.69038725 0.690281   ... 0.6907744  0.6904431  0.6903974 ]
-  [0.6840397  0.684031   0.68404853 ... 0.6840508  0.68404937 0.68404216]
-  [0.7159784  0.71973306 0.7159706  ... 0.7161063  0.71603465 0.71592766]]
-
- ...
-
- [[0.67292804 0.67351913 0.67328465 ... 0.67328894 0.6733438  0.67301095]
-  [0.68593156 0.6859398  0.68593466 ... 0.6859294  0.6859311  0.68593705]
-  [0.72352993 0.7230278  0.72331727 ... 0.72321206 0.72359455 0.7233958 ]
-  ...
-  [0.60178    0.6017275  0.60140777 ... 0.60140765 0.60151523 0.6015818 ]
-  [0.73245263 0.73322636 0.7328412  ... 0.73278296 0.7325789  0.7329973 ]
-  [0.68950844 0.69225705 0.6898281  ... 0.6889306  0.68944615 0.69020116]]
-
- [[0.848309   0.84465414 0.84872234 ... 0.8486877  0.84938526 0.8492384 ]
-  [0.701107   0.6997489  0.70110285 ... 0.700902   0.7011098  0.70111394]
-  [0.5723409  0.5738345  0.5723305  ... 0.57233423 0.57233775 0.572342  ]
-  ...
-  [0.82768726 0.82793933 0.8282728  ... 0.8282294  0.82802093 0.8280283 ]
-  [0.6491487  0.64926434 0.64963746 ... 0.64926565 0.64935625 0.64957225]
-  [0.5615084  0.56340796 0.5635457  ... 0.5635438  0.5613529  0.56135494]]
-
- [[0.9477315  0.94783926 0.94776624 ... 0.9477597  0.9477446  0.9477345 ]
-  [0.74906373 0.7491199  0.74906075 ... 0.7490612  0.7490609  0.7490617 ]
-  [0.6141995  0.6144503  0.6139838  ... 0.6140719  0.6141932  0.61409426]
-  ...
-  [0.6773844  0.67902935 0.67736465 ... 0.6773715  0.6773739  0.67744035]
-  [0.700472   0.70258003 0.69977176 ... 0.70001334 0.69977176 0.69977176]
-  [0.75941193 0.7594471  0.75891864 ... 0.7593392  0.75900066 0.75923026]]], shape=(1024, 26, 16), dtype=float32)
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_tf/notebooks/sok_train_demo.html b/review/pr-458/hps_tf/notebooks/sok_train_demo.html deleted file mode 100644 index 77ebfe28dd..0000000000 --- a/review/pr-458/hps_tf/notebooks/sok_train_demo.html +++ /dev/null @@ -1,526 +0,0 @@ - - - - - - - SOK Train DLRM Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-sok-to-dlrm-demo/nvidia_logo.png -
-

SOK Train DLRM Demo

-
-

Overview

-

This notebook demonstrates how to train a DLRM model with SparseOperationKit (SOK) and then make inference with HierarchicalParameterServer(HPS). It is recommended to run sparse_operation_kit_demo.ipynb and hierarchical_parameter_server_demo.ipynb before diving into this notebook.

-

For more details about SOK, please refer to SOK Documentation. For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get SOK from NGC

-

Both SOK and HPS Python modules are preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import sparse_operation_kit as sok"
-$ python3 -c "import hierarchical_parameter_server as hps"
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use DLRM model which has one embedding table, bottom MLP layers, interaction layer and top MLP layers. Please note that the input to the embedding layer will be a sparse key tensor.

-
-
-
import sys
-
-import os
-os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
-import numpy as np
-import tensorflow as tf
-import horovod.tensorflow as hvd
-import sparse_operation_kit as sok
-import struct
-
-args = dict()
-
-args["gpu_num"] = 2                               # the number of available GPUs
-args["iter_num"] = 10                             # the number of training iteration
-args["slot_num"] = 26                             # the number of feature fields in this embedding layer
-args["embed_vec_sizes"] = [16]*args["slot_num"]                       # the dimension of embedding vectors
-args["dense_dim"] = 13                            # the dimension of dense features
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["local_batch_size"] = int(args["global_batch_size"]/args["gpu_num"])                  # the locally batchsize for all GPUs
-args["table_names"] = ["table"+str(i) for i in range(args["slot_num"])]                            # embedding table names
-args["max_vocabulary_sizes"] = np.random.randint(1000, 1200, size=args["slot_num"]).tolist()
-args["max_nnz"] = np.random.randint(1, 100, size=args["slot_num"])
-args["combiner"] = ["mean"]*args["slot_num"]
-args["sok_backend_type"] = "hybrid"               # selcet sok backend type , hybrid means use HKV, hbm means use DET 
-
-args["ps_config_file"] = "dlrm.json"
-args["dense_model_path"] = "dlrm_dense.model"
-args["sparse_model_path"] = "dlrm_sparse.model"
-args["sok_embedding_table_path"] = "sok_dlrm_sparse.model"
-args["saved_path"] = "dlrm_tf_saved_model"
-args["np_key_type"] = np.int64
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int64
-args["tf_vector_type"] = tf.float32
-
-hvd.init()
-gpus = tf.config.experimental.list_physical_devices("GPU")
-for gpu in gpus:
-    tf.config.experimental.set_memory_growth(gpu, True)
-if gpus:
-    tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], "GPU")
-sok.init()
-
-
-
-
-
-
-
def generate_random_samples(batch_size,iters, vocabulary_range_per_slot, max_nnz, dense_dim):
-    num_samples = batch_size*iters
-
-    def generate_ragged_tensor_samples(embedding_table_sizes,batch_size, lookup_num, hotness, iters):
-
-        if len(hotness) != lookup_num:
-            raise ValueError("Length of hotness list must be equal to lookup_num")
-        total_indices = []
-        for i in range(lookup_num):
-            offsets = np.random.randint(1, hotness[i] + 1, iters * batch_size)
-            offsets = tf.convert_to_tensor(offsets, dtype=tf.int64)
-            values = np.random.randint(0, embedding_table_sizes[i], tf.reduce_sum(offsets))
-            values = tf.convert_to_tensor(values, dtype=tf.int64)
-            total_indices.append(tf.RaggedTensor.from_row_lengths(values, offsets))
-        return total_indices
-
-    sparse_keys = generate_ragged_tensor_samples(vocabulary_range_per_slot,batch_size,len(vocabulary_range_per_slot),max_nnz,iters)
-    dense_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return sparse_keys, dense_features, labels
-
-def tf_dataset(sparse_keys, dense_features, labels, batchsize):
-    total_data = []
-    total_data.extend(sparse_keys)
-    total_data.append(dense_features)
-    total_data.append(labels)
-    dataset = tf.data.Dataset.from_tensor_slices(tuple(total_data))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Build model with SOK embedding layers

-

We define the model graph for training with SOK embedding variables, i.e., sok.DynamicVariable and lookup sparse values use sok.lookup_sparse,e can then train the model and save the trained weights of the embedding table into file system. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

-
-
-
class MLP(tf.keras.layers.Layer):
-    def __init__(self,
-                arch,
-                activation='relu',
-                out_activation=None,
-                **kwargs):
-        super(MLP, self).__init__(**kwargs)
-        self.layers = []
-        index = 0
-        for units in arch[:-1]:
-            self.layers.append(tf.keras.layers.Dense(units, activation=activation, name="{}_{}".format(kwargs['name'], index)))
-            index+=1
-        self.layers.append(tf.keras.layers.Dense(arch[-1], activation=out_activation, name="{}_{}".format(kwargs['name'], index)))
-
-
-    def call(self, inputs, training=True):
-        x = self.layers[0](inputs)
-        for layer in self.layers[1:]:
-            x = layer(x)
-        return x
-
-class SecondOrderFeatureInteraction(tf.keras.layers.Layer):
-    def __init__(self, self_interaction=False):
-        super(SecondOrderFeatureInteraction, self).__init__()
-        self.self_interaction = self_interaction
-
-    def call(self, inputs):
-        batch_size = tf.shape(inputs)[0]
-        num_feas =  tf.shape(inputs)[1] 
-
-        dot_products = tf.matmul(inputs, inputs, transpose_b=True)
-
-        ones = tf.ones_like(dot_products)
-        mask = tf.linalg.band_part(ones, 0, -1)
-        out_dim = num_feas * (num_feas + 1) // 2
-
-        if not self.self_interaction:
-            mask = mask - tf.linalg.band_part(ones, 0, 0)
-            out_dim = num_feas * (num_feas - 1) // 2
-        flat_interactions = tf.reshape(tf.boolean_mask(dot_products, mask), (batch_size, out_dim))
-        return flat_interactions
-
-class SokEmbLayer(tf.keras.layers.Layer):
-    def __init__(self,embedding_dims,embedding_table_sizes,var_type,combiners,table_names,name):
-        super(SokEmbLayer, self).__init__(name=name)
-        self.table_num = len(embedding_dims)
-        self.combiners = combiners
-        self.initializers = ["uniform"]*self.table_num
-
-        self.sok_vars = [sok.DynamicVariable(
-            dimension=embedding_dims[i],
-            var_type=var_type,
-            initializer=self.initializers[i],
-            init_capacity=embedding_table_sizes[i],
-            max_capacity=embedding_table_sizes[i],
-            name = table_names[i]
-        )
-        for i in range(self.table_num)
-        ]
-        self.reshape_layer_list = []
-        for i in range(self.table_num):
-            self.reshape_layer_list.append(tf.keras.layers.Reshape((1, args["embed_vec_sizes"][i]), name = "sok_reshape"+str(i)))
-        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "sok_concat1")
-
-    def call(self, inputs):
-        embeddings = sok.lookup_sparse(self.sok_vars, inputs, combiners=self.combiners)
-        ret_embeddings = []
-        for i in range(args["slot_num"]):
-            ret_embeddings.append(self.reshape_layer_list[i](embeddings[i]))
-        ret_embeddings = self.concat1(ret_embeddings)
-        return ret_embeddings
-
-class DLRM(tf.keras.models.Model):
-    def __init__(self,
-                 combiners,
-                 embedding_table_sizes,
-                 embed_vec_dims,
-                 sok_backend_type,
-                 slot_num,
-                 dense_dim,
-                 arch_bot,
-                 arch_top,
-                 self_interaction,
-                 table_names,
-                 **kwargs):
-        super(DLRM, self).__init__(**kwargs)
-
-        self.combiners = combiners
-        self.embed_vec_dims = embed_vec_dims
-        self.sok_backend_type = sok_backend_type
-        self.embedding_table_sizes = embedding_table_sizes
-        self.slot_num = len(combiners)
-        self.dense_dim = dense_dim
-
-        self.embedding_model = SokEmbLayer(embedding_dims=self.embed_vec_dims,
-                                         embedding_table_sizes = self.embedding_table_sizes,
-                                         var_type = self.sok_backend_type,combiners=combiners,table_names = table_names,name="sok_embedding")
-
-        self.bot_nn = MLP(arch_bot, name = "bottom", out_activation='relu')
-        self.top_nn = MLP(arch_top, name = "top", out_activation='sigmoid')
-        self.interaction_op = SecondOrderFeatureInteraction(self_interaction)
-        if self_interaction:
-            self.interaction_out_dim = (self.slot_num+1) * (self.slot_num+2) // 2
-        else:
-           self.interaction_out_dim = self.slot_num * (self.slot_num+1) // 2
-
-        self.reshape_layer1 = tf.keras.layers.Reshape((1, arch_bot[-1]), name = "dense_reshape1")
-        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "dense_concat1")
-        self.concat2 = tf.keras.layers.Concatenate(axis=1, name = "dense_concat2")
-
-    def call(self, inputs, training=True):
-        input_sparse = inputs[0]
-        input_dense = inputs[1]
-
-        embedding_vectors = self.embedding_model(input_sparse)
-        dense_x = self.bot_nn(input_dense)
-        concat_features = self.concat1([embedding_vectors, self.reshape_layer1(dense_x)])
-        Z = self.interaction_op(embedding_vectors)
-        z = self.concat2([dense_x, Z])
-        logit = self.top_nn(z)
-
-        return logit, embedding_vectors
-
-    def summary(self):
-        sparse_inputs = []
-        for i in range(self.slot_num):
-            sparse_inputs.append(tf.keras.Input(shape=(args["max_nnz"][i], ), sparse=True, dtype=args["tf_key_type"])) 
-        dense_input = tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)
-        inputs = [sparse_inputs,dense_input]
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-    def get_embedding_model(self):
-        return self.embedding_model
-
-    def get_embedding_variables(self):
-        return self.embedding_model.trainable_variables
-
-    def get_dense_variables(self):
-        tmp_var = self.trainable_variables
-        sparse_vars , dense_vars = sok.filter_variables(tmp_var)
-        return dense_vars
-
-
-    def embedding_load(self,path,opt):
-        embedding_vars = self.get_embedding_variables()
-        sok.load(path, embedding_vars, opt)
-
-    def embedding_dump(self,path,opt):
-        embedding_vars = self.get_embedding_variables()
-        sok.dump(path, embedding_vars, opt)
-
-
-
-
-
-
-

Train with SOK models

-

Define a Trainer class to wrap the training of SOK. When training SOK, the following points need to be noted: -1.Two gradient tapes need to be defined because the dense variables may need to be wrapped with Horovod’s hvd.DistributedGradientTape. -2. -SOK variables need to be updated using SOK’s optimizer, while the dense variables need to be updated using TensorFlow’s optimizer.

-
-
-
class Trainer:
-   def __init__(self,args):
-       self.args = args
-       self.dlrm = DLRM(combiners = args["combiner"],
-                   embedding_table_sizes = args["max_vocabulary_sizes"],
-                   embed_vec_dims = args["embed_vec_sizes"],
-                   sok_backend_type = args["sok_backend_type"],
-                   slot_num = args["slot_num"],
-                   dense_dim = args["dense_dim"],
-                   arch_bot = [256, 128, args["embed_vec_sizes"][0]],
-                   arch_top = [256, 128, 1],
-                   self_interaction = False,
-                   table_names = args["table_names"])
-
-       # initialize optimizer
-       optimizer = tf.optimizers.Adam(learning_rate=1.0)
-       self.embedding_opt = sok.OptimizerWrapper(optimizer)
-       self.dense_opt = tf.optimizers.Adam(learning_rate=1.0)
-
-       self.loss_fn = tf.keras.losses.BinaryCrossentropy()
-
-   
-   def train(self):
-       embedding_vars = self.dlrm.get_embedding_variables()
-       dense_vars = self.dlrm.get_dense_variables()
-
-       @tf.function
-       def _train_step(inputs, labels):
-           with tf.GradientTape() as tape, tf.GradientTape() as emb_tape:
-               logit, embedding_vector = self.dlrm(inputs, training=True)
-               loss = self.loss_fn(labels, logit)
-
-           tape = hvd.DistributedGradientTape(tape)
-           dense_grads = tape.gradient(loss, dense_vars)
-           embedding_grads = emb_tape.gradient(loss, embedding_vars)
-
-           self.embedding_opt.apply_gradients(zip(embedding_grads, embedding_vars))
-           self.dense_opt.apply_gradients(zip(dense_grads, dense_vars))
-
-           return logit, embedding_vector, loss
-
-       sparse_keys, dense_features, labels = generate_random_samples(self.args["local_batch_size"], self.args["iter_num"], self.args["max_vocabulary_sizes"], self.args["max_nnz"], self.args["dense_dim"])
-       dataset = tf_dataset(sparse_keys, dense_features, labels, self.args["local_batch_size"])
-       for i, input_tuple in enumerate(dataset):
-           sparse_keys = input_tuple[:-2]
-           dense_features = input_tuple[-2]
-           labels = input_tuple[-1]
-           inputs = [sparse_keys, dense_features]
-           logit, embedding_vector, loss = _train_step(inputs, labels)
-           print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-       self.dlrm.summary()
-
-   def dump_model(self):
-       self.dlrm.embedding_dump(args["sok_embedding_table_path"],self.embedding_opt)
-    
-
-       
-       dense_model = tf.keras.Model([self.dlrm.get_layer("sok_embedding").output,
-                             self.dlrm.get_layer("bottom").input],
-                             self.dlrm.get_layer("top").output)
-       dense_model.summary()
-       dense_model.save(args["dense_model_path"])
-
-trainer = Trainer(args)
-trainer.train()
-trainer.dump_model()
-
-
-
-
-
-
-
!mkdir -p dlrm_sparse.model
-embedding_saver.dump_to_file(trained_model.embedding_layer.embedding_variable, args["embedding_table_path"])
-!mv dlrm_sparse.model/EmbeddingVariable_keys.file dlrm_sparse.model/key
-!mv dlrm_sparse.model/EmbeddingVariable_values.file dlrm_sparse.model/emb_vector
-!ls -l dlrm_sparse.model
-
-
-
-
-
2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:192] Saving EmbeddingVariable to dlrm_sparse.model..
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:60] Worker: 0, GPU: 0 key-index count = 260000
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:147] Worker: 0, GPU: 0: dumping parameters from hashtable..
-2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:200] Saved EmbeddingVariable to dlrm_sparse.model.
-total 18360
--rw-r--r-- 1 nobody nogroup 16640000 Jul 29 07:17 emb_vector
--rw-r--r-- 1 nobody nogroup  2080000 Jul 29 07:17 key
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_torch/notebooks/hps_torch_demo.html b/review/pr-458/hps_torch/notebooks/hps_torch_demo.html deleted file mode 100644 index 8e099b37f1..0000000000 --- a/review/pr-458/hps_torch/notebooks/hps_torch_demo.html +++ /dev/null @@ -1,530 +0,0 @@ - - - - - - - HPS Torch Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
- -
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hierarchical-parameter-server-demo/nvidia_logo.png -
-

HPS Torch Demo

-
-

Overview

-

Hierarchical Parameter Server (HPS) is a distributed recommendation inference framework, which combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for inference tasks. It is provided as a PyTorch plugin and can be easily used in the Torch model.

-

This notebook demonstrates how to apply HPS to the Torch model and use it for inference. For more details about HPS APIs, please refer to HPS APIs. For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Get HPS from NGC

-

The HPS Python module is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
$ python3 -c "import hps_torch"
-
-
-
-
-
-

Data Generation

-

First of all we specify the required configurations for data generation. We generate 8 embedding tables, all with the same embedding vector size 128. The maximum batch size is 256 and each sample has 10 keys to lookup up for each table.

-
-
-
import torch
-import hps_torch
-from typing import List
-import os
-import numpy as np
-import struct
-import json
-import pytest
-import time
-
-NUM_GPUS = 1
-VOCAB_SIZE = 10000
-EMB_VEC_SIZE = 128
-NUM_QUERY_KEY = 10
-MAX_BATCH_SIZE = 256
-NUM_ITERS = 100
-NUM_TABLES = 8
-USE_CONTEXT_STREAM = True
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(NUM_GPUS)))
-
-
-
-
-
[INFO] hps_torch is imported
-
-
-
-
-
-
-
hps_config = {
-    "supportlonglong": False,
-    "fuse_embedding_table": True,
-    "models": [
-        {
-            "model": str(NUM_TABLES) + "_table",
-            "sparse_files": [],
-            "num_of_worker_buffer_in_pool": NUM_TABLES,
-            "embedding_table_names": [],
-            "embedding_vecsize_per_table": [],
-            "maxnum_catfeature_query_per_table_per_sample": [],
-            "default_value_for_each_table": [0.0],
-            "deployed_device_list": [0],
-            "max_batch_size": MAX_BATCH_SIZE,
-            "cache_refresh_percentage_per_iteration": 1.0,
-            "hit_rate_threshold": 1.0,
-            "gpucacheper": 1.0,
-            "gpucache": True,
-            "embedding_cache_type": "static",
-            "use_context_stream": True,
-        }
-    ],
-}
-
-def generate_embedding_tables(
-    hugectr_sparse_model, vocab_range, embedding_vec_size, embedding_table
-):
-    os.system("mkdir -p {}".format(hugectr_sparse_model))
-    with open("{}/key".format(hugectr_sparse_model), "wb") as key_file, open(
-        "{}/emb_vector".format(hugectr_sparse_model), "wb"
-    ) as vec_file:
-        for key in range(vocab_range[0], vocab_range[1]):
-            vec = np.random.random((embedding_vec_size,)).astype(np.float32)
-            key_struct = struct.pack("q", key)
-            vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-            key_file.write(key_struct)
-            vec_file.write(vec_struct)
-            embedding_table[key] = vec
-
-
-def set_up_model_files():
-    embedding_table = np.zeros((NUM_TABLES * VOCAB_SIZE, EMB_VEC_SIZE)).astype(np.float32)
-    for i in range(NUM_TABLES):
-        table_name = "table" + str(i)
-        model_file_name = "embeddings/" + table_name
-        generate_embedding_tables(
-            model_file_name, [i * VOCAB_SIZE, (i + 1) * VOCAB_SIZE], EMB_VEC_SIZE, embedding_table
-        )
-        hps_config["models"][0]["sparse_files"].append(model_file_name)
-        hps_config["models"][0]["embedding_table_names"].append(table_name)
-        hps_config["models"][0]["embedding_vecsize_per_table"].append(EMB_VEC_SIZE)
-        hps_config["models"][0]["maxnum_catfeature_query_per_table_per_sample"].append(
-            NUM_QUERY_KEY
-        )
-    hps_config_json_object = json.dumps(hps_config, indent=4)
-    with open(str(NUM_TABLES) + "_table.json", "w") as outfile:
-        outfile.write(hps_config_json_object)
-    return embedding_table
-
-
-
-
-
-
-
embedding_table = set_up_model_files()
-
-
-
-
-
-
-
!du -lh embeddings
-
-
-
-
-
5.0M	embeddings/table0
-5.0M	embeddings/table1
-5.0M	embeddings/table2
-5.0M	embeddings/table3
-5.0M	embeddings/table4
-5.0M	embeddings/table5
-5.0M	embeddings/table6
-5.0M	embeddings/table7
-40M	embeddings
-
-
-
-
-
-
-

Lookup with Table Fusion

-

HPS supports fusing tables of the same embedding vector size via CPU multithreading. This can be achieved with torch.jit.fork and torch.jit.wait when the HPS plugin for Torch is employed. For more details, please refer to HPS Configuration.

-

We conduct embedding lookup with table fusion and compare the results with the ground truth.

-
-
-
class Model(torch.nn.Module):
-    def __init__(self, ps_config_file: str, model_name: str, emb_vec_size: List[int]):
-        super().__init__()
-        self.layers = torch.nn.ModuleList(
-            [
-                hps_torch.LookupLayer(ps_config_file, model_name, table_id, emb_vec_size[table_id])
-                for table_id in range(len(emb_vec_size))
-            ]
-        )
-
-    def forward(self, keys_list: torch.Tensor):
-        vectors = []
-        futures = torch.jit.annotate(List[torch.jit.Future[torch.Tensor]], [])
-        for i, layer in enumerate(self.layers):
-            fut = torch.jit.fork(layer, keys_list[i])
-            futures.append(fut)
-        for i, _ in enumerate(self.layers):
-            vectors.append(torch.jit.wait(futures[i]))
-        return torch.cat(vectors)
-
-
-
-
-
-
-
model = torch.jit.script(
-    Model(
-        f"{NUM_TABLES}_table.json",
-        f"{NUM_TABLES}_table",
-        [EMB_VEC_SIZE for _ in range(NUM_TABLES)],
-    )
-)
-inputs_seq = []
-for _ in range(NUM_ITERS + 1):
-    inputs = []
-    for i in range(NUM_TABLES):
-        inputs.append(
-            torch.randint(
-                i * VOCAB_SIZE,
-                (i + 1) * VOCAB_SIZE,
-                (MAX_BATCH_SIZE, NUM_QUERY_KEY),
-                dtype=torch.int32,
-            ).cuda()
-        )
-    inputs_seq.append(torch.stack(inputs))
-
-preds = model(inputs_seq[0])
-preds_seq = []
-start = time.time()
-for i in range(NUM_ITERS):
-    preds_seq.append(model(inputs_seq[i + 1]))
-end = time.time()
-print(
-    "[INFO] Elapsed time for "
-    + str(NUM_ITERS)
-    + " iterations: "
-    + str(end - start)
-    + " seconds"
-)
-preds_seq = torch.stack(preds_seq).cpu().numpy()
-
-preds_seq_gt = []
-for i in range(NUM_ITERS):
-    preds_seq_gt.append(np.concatenate(embedding_table[inputs_seq[i + 1].cpu().numpy()]))
-preds_seq_gt = np.array(preds_seq_gt)
-
-diff = preds_seq - preds_seq_gt
-mse = np.mean(diff * diff)
-assert mse <= 1e-6
-print(f"HPS Torch Plugin embedding lookup with table fusion, MSE: {mse} ")
-
-
-
-
-
=====================================================HPS Parse====================================================
-[HCTR][05:25:11.836][INFO][RK0][main]: Table fusion is enabled for HPS. Please ensure that there is no key value overlap in different tables and the embedding lookup layer has no dependency in the model graph. For more information, see https://nvidia-merlin.github.io/HugeCTR/main/hierarchical_parameter_server/hps_database_backend.html#configuration
-[HCTR][05:25:11.836][INFO][RK0][main]: fuse_embedding_table is not specified using default: 1
-[HCTR][05:25:11.839][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][05:25:11.839][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-[HCTR][05:25:11.839][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][05:25:11.839][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][05:25:11.839][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][05:25:11.839][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][05:25:11.839][INFO][RK0][main]: fuse_embedding_table is not specified using default: 1
-[HCTR][05:25:11.839][INFO][RK0][main]: use_static_table is not specified using default: 0
-[HCTR][05:25:11.839][INFO][RK0][main]: use_hctr_cache_implementation is not specified using default: 1
-[HCTR][05:25:11.839][INFO][RK0][main]: thread_pool_size is not specified using default: 16
-[HCTR][05:25:11.839][INFO][RK0][main]: init_ec is not specified using default: 1
-[HCTR][05:25:11.839][INFO][RK0][main]: HPS plugin uses context stream for model 8_table: True
-====================================================HPS Create====================================================
-[HCTR][05:25:11.840][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][05:25:11.840][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][05:25:11.840][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][05:25:11.840][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][05:25:11.840][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][05:25:11.880][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][05:25:11.880][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][05:25:11.880][INFO][RK0][main]: Model name: 8_table
-[HCTR][05:25:11.880][INFO][RK0][main]: Max batch size: 256
-[HCTR][05:25:11.880][INFO][RK0][main]: Fuse embedding tables: True
-[HCTR][05:25:11.880][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][05:25:11.880][INFO][RK0][main]: Embedding cache type: static
-[HCTR][05:25:11.880][INFO][RK0][main]: Use I64 input key: False
-[HCTR][05:25:11.880][INFO][RK0][main]: The size of worker memory pool: 8
-[HCTR][05:25:11.880][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][05:25:11.880][INFO][RK0][main]: The refresh percentage : 1.000000
-[HCTR][05:25:11.936][INFO][RK0][main]: Initialize the embedding cache by by inserting the same size model file with embedding cache from beginning
-[HCTR][05:25:11.936][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][05:25:11.936][INFO][RK0][main]: EC initialization on device 0 for hps_et.8_table.fused_embedding0
-[HCTR][05:25:11.936][INFO][RK0][main]: To achieve the best performance, when using static table, the pointers of keys and vectors in HPS lookup should preferably be aligned to at least 16 Bytes.
-[HCTR][05:25:11.975][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.018][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.041][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.059][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.070][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.088][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.104][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.113][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.123][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.137][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.167][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.196][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.210][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.223][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.239][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.252][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.284][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.296][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.307][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.319][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.336][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.360][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.368][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.380][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.390][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.409][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.437][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.446][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.453][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.475][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.515][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.535][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.551][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.560][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.580][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.597][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.606][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.615][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.624][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.632][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.668][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.678][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.695][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.712][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.725][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.740][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.756][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.768][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.783][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.794][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.821][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:12.844][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:12.861][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:12.880][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:12.890][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:12.900][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:12.920][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:12.929][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:12.938][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:12.957][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:12.979][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:13.006][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:13.016][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:13.027][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:13.037][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:13.046][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:13.056][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:13.064][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:13.085][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:13.095][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:13.110][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 1000 keys.
-[HCTR][05:25:13.125][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 1000 keys.
-[HCTR][05:25:13.136][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 1000 keys.
-[HCTR][05:25:13.163][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 1000 keys.
-[HCTR][05:25:13.173][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 1000 keys.
-[HCTR][05:25:13.183][INFO][RK0][main]: Initialize the embedding table 0 for iteration 5 with number of 1000 keys.
-[HCTR][05:25:13.194][INFO][RK0][main]: Initialize the embedding table 0 for iteration 6 with number of 1000 keys.
-[HCTR][05:25:13.212][INFO][RK0][main]: Initialize the embedding table 0 for iteration 7 with number of 1000 keys.
-[HCTR][05:25:13.231][INFO][RK0][main]: Initialize the embedding table 0 for iteration 8 with number of 1000 keys.
-[HCTR][05:25:13.249][INFO][RK0][main]: Initialize the embedding table 0 for iteration 9 with number of 1000 keys.
-[HCTR][05:25:13.250][INFO][RK0][main]: LookupSession i64_input_key: False
-[HCTR][05:25:13.250][INFO][RK0][main]: Creating lookup session for 8_table on device: 0
-[INFO] Elapsed time for 100 iterations: 0.10996460914611816 seconds
-HPS Torch Plugin embedding lookup with table fusion, MSE: 0.0 
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_torch/notebooks/index.html b/review/pr-458/hps_torch/notebooks/index.html deleted file mode 100644 index a26163f154..0000000000 --- a/review/pr-458/hps_torch/notebooks/index.html +++ /dev/null @@ -1,231 +0,0 @@ - - - - - - - Hierarchical Parameter Server Notebooks — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

Hierarchical Parameter Server Notebooks

-

This directory contains a set of Jupyter notebooks that demonstrate how to use HPS in PyTorch.

-
-

Quickstart

-

The simplest way to run a one of our notebooks is with a Docker container. -A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. -Docker images are available from the NVIDIA GPU Cloud (NGC). -If you prefer to build the HugeCTR Docker image on your own, refer to Set Up the Development Environment With Merlin Containers.

-
-

Pull the NGC Docker

-

Pull the container using the following command:

-
docker pull nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-
-
-

Clone the HugeCTR Repository

-

Use the following command to clone the HugeCTR repository:

-
git clone https://github.com/NVIDIA/HugeCTR
-
-
-
-
-

Start the Jupyter Notebook

-
    -
  1. Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:

    -
    docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/merlin-hugectr:24.06
    -
    -
    -
  2. -
  3. Start Jupyter using these commands:

    -
    cd /hugectr/hps_torch/notebooks
    -jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'
    -
    -
    -
  4. -
  5. Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser: http://[host machine]:8888

    -

    Use the token available from the output by running the command above to log in. For example:

    -

    http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b

    -
  6. -
-
-
-
-

Notebook List

-

Here’s a list of notebooks that you can run:

-
    -
  • hps_torch_demo.ipynb: Demonstrates how to use the HPS plugin for Torch to conduct embedding lookup for inference.

  • -
-
-
-

System Specifications

-

The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.

- - - - - - - - - - - - - - - - - -

Notebook

CPU

GPU

#GPUs

Author

hps_torch_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_trt/notebooks/benchmark_tf_trained_large_model.html b/review/pr-458/hps_trt/notebooks/benchmark_tf_trained_large_model.html deleted file mode 100644 index 3c133b4865..0000000000 --- a/review/pr-458/hps_trt/notebooks/benchmark_tf_trained_large_model.html +++ /dev/null @@ -1,1195 +0,0 @@ - - - - - - - HPS TensorRT Plugin Benchmark for TensorFlow Large Model — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-tensorflow-triton-deployment/nvidia_logo.png -
-

HPS TensorRT Plugin Benchmark for TensorFlow Large Model

-
-

Overview

-

This notebook demonstrates how to benchmark the HPS-integrated TensorRT engine for the TensorFlow large model.

-

For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
    -
  1. Create TF Create the TF model.

  2. -
  3. Build the HPS-integrated TensorRT engine

  4. -
  5. Benchmark HPS-integrated TensorRT engine on Triton

  6. -
  7. Benchmark HPS-integrated TensorRT engine on Grace and Hooper

  8. -
-
-
-

Installation

-
-

Use NGC

-

The HPS TensorRT plugin is preinstalled in the 23.05 and later Merlin TensorFlow Container: nvcr.io/nvidia/merlin/merlin-tensorflow:23.05.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
-
-
import ctypes
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-plugin_handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-
-
-
-

-
-
-
-

1. Create the TF model

-

We define the model graph with native TF layers, i.e., tf.nn.embedding_lookup, tf.keras.layers.Dense and so on. The embedding lookup layer is a placeholder here, which will be replaced by HPS plugin later to support looking up 147GB embedding table efficiently.

-
-
-
import numpy as np
-import tensorflow as tf
-
-
-
-
-
2023-06-08 02:22:18.552734: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-
-
-
-
-
-
-
class LiteModel(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors,
-                 embed_vec_size,
-                 slot_num,
-                 dense_dim,
-                 **kwargs):
-        super(LiteModel, self).__init__(**kwargs)
-        
-        self.init_tensors = init_tensors
-        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))
-        
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.dense_dim = dense_dim
-
-        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "concat1")
-        self.fc1 = tf.keras.layers.Dense(1024, activation=None, name="fc1")
-        self.fc2 = tf.keras.layers.Dense(256, activation=None, name="fc2")
-        self.fc3 = tf.keras.layers.Dense(1, activation=None, name="fc3")            
-    
-    def call(self, inputs, training=True):
-        categorical_features = inputs["categorical_features"]
-        numerical_features = inputs["numerical_features"]
-        
-        embedding_vector = tf.nn.embedding_lookup(params=self.params, ids=categorical_features)
-        reduced_embedding = tf.math.reduce_mean(embedding_vector, axis=1, keepdims=False)
-        concat_features = self.concat1([reduced_embedding, numerical_features])
-        
-        logit = self.fc3(self.fc2(self.fc1(concat_features)))
-        return logit
-
-    def summary(self):
-        inputs = {"categorical_features": tf.keras.Input(shape=(self.slot_num, ), dtype=tf.int32, name="categorical_features"), 
-                  "numerical_features": tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32, name="numrical_features")}
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
# This is the placeholder embedding table. The real embedding table is of 147GB
-init_tensors = np.ones(shape=[10000, 128], dtype=np.float32)
-model = LiteModel(init_tensors, 128, 26, 13, name = "dlrm")
-model.summary()
-categorical_features = np.random.randint(0,100, (4096,26))
-numerical_features = np.random.random((4096, 13))
-inputs = {"categorical_features": categorical_features, "numerical_features": numerical_features}
-model(inputs)
-model.save("3fc_light.savedmodel")
-
-# Release the occupied GPU memory by TensorFlow and Keras
-from numba import cuda
-cuda.select_device(0)
-cuda.close()
-
-
-
-
-
WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup_1), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(10000, 128) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model_1"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- categorical_features (InputLay  [(None, 26)]        0           []                               
- er)                                                                                              
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 26, 128)     0           ['categorical_features[0][0]']   
- up_1 (TFOpLambda)                                                                                
-                                                                                                  
- tf.math.reduce_mean_1 (TFOpLam  (None, 128)         0           ['tf.compat.v1.nn.embedding_looku
- bda)                                                            p_1[0][0]']                      
-                                                                                                  
- numrical_features (InputLayer)  [(None, 13)]        0           []                               
-                                                                                                  
- concat1 (Concatenate)          (None, 141)          0           ['tf.math.reduce_mean_1[0][0]',  
-                                                                  'numrical_features[0][0]']      
-                                                                                                  
- fc1 (Dense)                    (None, 1024)         145408      ['concat1[0][0]']                
-                                                                                                  
- fc2 (Dense)                    (None, 256)          262400      ['fc1[0][0]']                    
-                                                                                                  
- fc3 (Dense)                    (None, 1)            257         ['fc2[0][0]']                    
-                                                                                                  
-==================================================================================================
-Total params: 408,065
-Trainable params: 408,065
-Non-trainable params: 0
-__________________________________________________________________________________________________
-INFO:tensorflow:Assets written to: 3fc_light.savedmodel/assets
-
-
-
-
-

-
-
-

2. Build the HPS-integrated TensorRT engine

-

In order to use HPS in the inference stage, we create JSON configuration file and leverage the 147GB embedding table files in the HPS format.

-

Then we convert the TF saved model to ONNX, and employ the ONNX GraphSurgoen tool to replace the native TF embedding lookup layer with the placeholder of HPS TensorRT plugin layer.

-

After that, we can build the TensorRT engine, which is comprised of the HPS TensorRT plugin layer and the dense network.

-

-
-

Step1: Prepare the 147GB embedding table

-

The current instructions below assume you need to train a 147GB DLRM model from scratch based DeepLearningExamples using the 1TB criteo dataset.

-
-

1.1 Train a 147GB model from scratch

-

Please refer to the Quick Start Guide for DLRM, which contains the following important steps:

-
    -
  1. Built the training docker container.

  2. -
  3. Preprocessing the training datasets.
    -Here you have two options, you can use the 1TB Criteo dataset to train a 147GB model (need to use Spark for data preprocessing), or you can use a synthetic dataset (avoid the preprocessing process) for fast verification.
    -2.1. Preprocess the Criteo 1TB dataset
    -A. Please change the frequency limit to 2 in the step 6 .Preprocess the data to generate a training dataset for the 147GB embedding model file.
    -B. Please ensure that the final_output_dir path is consistent with the input data path in the Part 3 for step 3 Prepare the benchmark input data.
    -Note: The frequency limit is used to filter out the categorical values which appear less than n times in the whole dataset, and make them be 0. Change this variable to 1 to enable it. The default frequency limit is 15 in the script. You also can change the number as you want by changing the line of OPTS=”–frequency_limit 8”.

    -

    2.2 Generate synthetic data in the same format as Criteo
    -Downloading and preprocessing the Criteo 1TB dataset requires a lot of time and disk space. Because of this we provide a synthetic dataset generator that roughly matches Criteo 1TB characteristics. This will enable you to benchmark quickly.

    -
  4. -
  5. Run the training and saved a model checkpoint. If you haven’t completed those steps.
    -Note:If the model is successfully saved, you will find that each category feature of the Criteo data will export a feauture_*.npy file, which is the embedding table for each feature, and you will merge these npy files into a complete binary embedding table for HPS in the next step.

  6. -
-
-
-

1.2 Get the embedding model file in hps format

-

After completing the model training and getting the 147GB embedding model file, you need to convert the embedding table file to the HPS-format embedding file. -In addition: you only need to complete the first three steps in the Quick Start Guide to obtain the HPS-format embedding file.

-
    -
  1. Build the Merlin HPS docker container.

  2. -
  3. Run the training docker container built during the training stage.

  4. -
  5. Convert the model checkpoint into a Triton model repository.

  6. -
-

Then you will find a folder named sparse under your deploy_path(The paths provided in steps 2 and 3) for format conversion, from which you can find the two embedding table files(emb_vector and key) in HPS format.

-
    -
  1. Copy the above two files(emb_vector and key) to the deployment path (model_repo/hps_model/1/dlrm_sparse.model)

  2. -
-
-
-
!tree -n model_repo/hps_model/1/dlrm_sparse.model
-
-
-
-
-
model_repo/hps_model/1/dlrm_sparse.model
-├── memb_vector
-└── mkey
-
-0 directories, 2 files
-
-
-
-
-

-
-
-
-

Step2: Prepare JSON configuration file for HPS

-

Please note that the storage format in the dlrm_sparse.model/key file is int64, while the HPS TensorRT plugin currently only support int32 when loading the keys into memory.
-Note:In order to facilitate the benchmark test from the minimum batch to the maximum batch, we set the test range to a maximum of 65536. If you want to get better performance, please set each batch independently. For instance, if you set batch=32, it is only used to the case with 32 samples for one batch.

-
-
-
%%writefile light.json
-{
-    "supportlonglong": false,
-    "models": [{
-        "model": "light",
-        "sparse_files": ["/hugectr/hps_trt/notebooks/model_repo/hps_model/1/dlrm_sparse.model"],
-        "num_of_worker_buffer_in_pool": 1,
-        "num_of_refresher_buffer_in_pool": 0,
-        "embedding_table_names":["sparse_embedding1"],
-        "embedding_vecsize_per_table": [128],
-        "maxnum_catfeature_query_per_table_per_sample": [26],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 65536,
-        "cache_refresh_percentage_per_iteration": 0.0,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 0.00,
-        "gpucache": true,
-        "init_ec": false,
-        "embedding_cache_type": "static",
-        "enable_pagelock" = true
-        "use_context_stream": true
-        }
-    ]
-}
-
-
-
-
-
Writing light.json
-
-
-
-
-
-
-

Step3: Convert to ONNX and do ONNX graph surgery

-
-
-
# convert TF SavedModel to ONNX
-!python -m tf2onnx.convert --saved-model 3fc_light.savedmodel --output 3fc_light.onnx
-
-
-
-
-
2023-06-08 02:28:29.577492: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
-To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
-/usr/lib/python3.8/runpy.py:127: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
-  warn(RuntimeWarning(msg))
-2023-06-08 02:28:32,462 - WARNING - ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
-2023-06-08 02:28:36,132 - WARNING - '--tag' not specified for saved_model. Using --tag serve
-2023-06-08 02:28:36,928 - INFO - Signatures found in model: [serving_default].
-2023-06-08 02:28:36,928 - WARNING - '--signature_def' not specified, using first signature: serving_default
-2023-06-08 02:28:36,928 - INFO - Output names: ['output_1']
-2023-06-08 02:28:37,440 - INFO - Using tensorflow=2.11.0, onnx=1.14.0, tf2onnx=1.14.0/8f8d49
-2023-06-08 02:28:37,440 - INFO - Using opset <onnx, 15>
-2023-06-08 02:28:37,459 - INFO - Computed 0 values for constant folding
-2023-06-08 02:28:37,482 - INFO - Optimizing ONNX model
-2023-06-08 02:28:37,541 - INFO - After optimization: Const -3 (7->4), Identity -2 (2->0)
-2023-06-08 02:28:37,781 - INFO - 
-2023-06-08 02:28:37,781 - INFO - Successfully converted TensorFlow model 3fc_light.savedmodel to ONNX
-2023-06-08 02:28:37,781 - INFO - Model inputs: ['categorical_features', 'numerical_features']
-2023-06-08 02:28:37,781 - INFO - Model outputs: ['output_1']
-2023-06-08 02:28:37,781 - INFO - ONNX model is saved at 3fc_light.onnx
-
-
-
-
-
-
-
 # ONNX graph surgery to insert HPS the TensorRT plugin placeholder
-import onnx_graphsurgeon as gs
-from onnx import  shape_inference
-import numpy as np
-import onnx
-
-graph = gs.import_onnx(onnx.load("3fc_light.onnx"))
-saved = []
-
-for node in graph.nodes:
-    if node.name == "StatefulPartitionedCall/dlrm/embedding_lookup":
-        categorical_features = gs.Variable(name="categorical_features", dtype=np.int32, shape=("unknown", 26))
-        hps_node = gs.Node(op="HPS_TRT", attrs={"ps_config_file": "light.json\0", "model_name": "light\0", "table_id": 0, "emb_vec_size": 128}, 
-                           inputs=[categorical_features], outputs=[node.outputs[0]])
-        graph.nodes.append(hps_node)
-        saved.append(categorical_features)
-        node.outputs.clear()
-for i in graph.inputs:
-    if i.name == "numerical_features":
-        saved.append(i)
-graph.inputs = saved
-
-graph.cleanup().toposort()
-onnx.save(gs.export_onnx(graph), "3fc_light_with_hps.onnx")
-
-
-
-
-
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
-[W] Found distinct tensors that share the same name:
-[id: 139822124016208] Variable (categorical_features): (shape=('unknown', 26), dtype=<class 'numpy.int32'>)
-[id: 139821990953120] Variable (categorical_features): (shape=['unk__6', 26], dtype=int64)
-Note: Producer node(s) of first tensor:
-[]
-Producer node(s) of second tensor:
-[]
-[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
-[W] Found distinct tensors that share the same name:
-[id: 139821990953120] Variable (categorical_features): (shape=['unk__6', 26], dtype=int64)
-[id: 139822124016208] Variable (categorical_features): (shape=('unknown', 26), dtype=<class 'numpy.int32'>)
-Note: Producer node(s) of first tensor:
-[]
-Producer node(s) of second tensor:
-[]
-
-
-
-
-
-
-

Step4: Build the TensorRT engine

-
-
-
import tensorrt as trt
-import ctypes
-
-TRT_LOGGER = trt.Logger(trt.Logger.INFO)
-EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
-
-def create_hps_plugin_creator():
-    trt_version = [int(n) for n in trt.__version__.split('.')]
-
-    plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-    handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-    trt.init_libnvinfer_plugins(TRT_LOGGER, "")
-    plg_registry = trt.get_plugin_registry()
-
-    for plugin_creator in plg_registry.plugin_creator_list:
-        if plugin_creator.name[0] == "H":
-            print(plugin_creator.name)
-
-    hps_plugin_creator = plg_registry.get_plugin_creator("HPS_TRT", "1", "")
-    return hps_plugin_creator
-
-def build_engine_from_onnx(onnx_model_path):
-    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser, builder.create_builder_config() as builder_config:        
-        model = open(onnx_model_path, 'rb')
-        parser.parse(model.read())
-        print(network.num_layers)
-        
-        builder_config.set_flag(trt.BuilderFlag.FP16)
-        profile = builder.create_optimization_profile()        
-        profile.set_shape("categorical_features", (1, 26), (1024, 26), (65536, 26))    
-        profile.set_shape("numerical_features", (1, 13), (1024, 13), (65536, 13))
-        builder_config.add_optimization_profile(profile)
-
-        engine = builder.build_serialized_network(network, builder_config)
- 
-        return engine
-
-create_hps_plugin_creator()
-serialized_engine = build_engine_from_onnx("3fc_light_with_hps.onnx")
-with open("dynamic_3fc_light.trt", "wb") as fout:
-    fout.write(serialized_engine)
-
-
-
-
-
HPS_TRT
-[06/08/2023-02:37:03] [TRT] [I] [MemUsageChange] Init CUDA: CPU +974, GPU +0, now: CPU 2531, GPU 661 (MiB)
-[06/08/2023-02:37:09] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +336, GPU +74, now: CPU 2943, GPU 735 (MiB)
-[06/08/2023-02:37:09] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
-[06/08/2023-02:37:09] [TRT] [I] No importer registered for op: HPS_TRT. Attempting to import as plugin.
-[06/08/2023-02:37:09] [TRT] [I] Searching for plugin: HPS_TRT, plugin_version: 1, plugin_namespace: 
-=====================================================HPS Parse====================================================
-[HCTR][02:37:09.116][INFO][RK0][main]: fuse_embedding_table is not specified using default: 0
-[HCTR][02:37:09.116][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][02:37:09.117][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][02:37:09.117][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][02:37:09.117][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][02:37:09.117][INFO][RK0][main]: use_static_table is not specified using default: 0
-[HCTR][02:37:09.117][INFO][RK0][main]: use_hctr_cache_implementation is not specified using default: 1
-[HCTR][02:37:09.117][INFO][RK0][main]: HPS plugin uses context stream for model light: True
-====================================================HPS Create====================================================
-[HCTR][02:37:09.117][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][02:37:09.117][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][02:37:09.117][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][02:37:09.117][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][02:37:09.117][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][02:37:09.177][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][02:37:09.177][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][02:37:09.177][INFO][RK0][main]: Model name: light
-[HCTR][02:37:09.177][INFO][RK0][main]: Max batch size: 65536
-[HCTR][02:37:09.177][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][02:37:09.177][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][02:37:09.177][INFO][RK0][main]: Use static table: False
-[HCTR][02:37:09.177][INFO][RK0][main]: Use I64 input key: False
-[HCTR][02:37:09.177][INFO][RK0][main]: The size of worker memory pool: 1
-[HCTR][02:37:09.177][INFO][RK0][main]: The size of refresh memory pool: 0
-[HCTR][02:37:09.177][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][02:38:09.140][INFO][RK0][main]: Initialize the embedding cache by by inserting the same size model file with embedding cache from beginning
-[HCTR][02:38:09.140][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][02:38:09.141][INFO][RK0][main]: EC initialization on device 0 for hps_et.light.sparse_embedding1
-[HCTR][03:41:57.227][INFO][RK0][main]: LookupSession i64_input_key: False
-[HCTR][03:41:57.227][INFO][RK0][main]: Creating lookup session for light on device: 0
-[06/08/2023-03:41:57] [TRT] [I] Successfully created plugin: HPS_TRT
-9
-[06/08/2023-03:41:57] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
-[06/08/2023-03:41:57] [TRT] [I] Graph optimization time: 0.0088975 seconds.
-[06/08/2023-03:41:57] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2952, GPU 12129 (MiB)
-[06/08/2023-03:41:57] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +349, GPU +190, now: CPU 3301, GPU 12319 (MiB)
-[06/08/2023-03:41:57] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
-[06/08/2023-03:41:57] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
-[06/08/2023-03:41:57] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
-[06/08/2023-03:42:03] [TRT] [I] Detected 2 inputs and 1 output network tensors.
-[06/08/2023-03:42:03] [TRT] [I] Total Host Persistent Memory: 16672
-[06/08/2023-03:42:03] [TRT] [I] Total Device Persistent Memory: 0
-[06/08/2023-03:42:03] [TRT] [I] Total Scratch Memory: 0
-[06/08/2023-03:42:03] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 3 MiB, GPU 1248 MiB
-[06/08/2023-03:42:03] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 10 steps to complete.
-[06/08/2023-03:42:03] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.040758ms to assign 3 blocks to 10 nodes requiring 905970176 bytes.
-[06/08/2023-03:42:03] [TRT] [I] Total Activation Memory: 905969664
-[06/08/2023-03:42:03] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3302, GPU 12397 (MiB)
-[06/08/2023-03:42:03] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3302, GPU 12407 (MiB)
-[06/08/2023-03:42:03] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
-[06/08/2023-03:42:03] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
-[06/08/2023-03:42:03] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
-[06/08/2023-03:42:03] [TRT] [W] Check verbose logs for the list of affected weights.
-[06/08/2023-03:42:03] [TRT] [W] - 2 weights are affected by this issue: Detected subnormal FP16 values.
-[06/08/2023-03:42:03] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
-
-
-
-
-

-
-
-
-

3. Benchmark HPS-integrated TensorRT engine on Triton

-
-

Step1: Create the model repository

-

In order to benchmark the TensorRT engine with the Triton TensorRT backend, we need to create the model repository and define the config.pbtxt first.

-
-
-
!mkdir -p model_repo/dynamic_3fc_lite_hps_trt/1
-!mv dynamic_3fc_light.trt model_repo/dynamic_3fc_lite_hps_trt/1
-
-
-
-
-
-
-
%%writefile model_repo/dynamic_3fc_lite_hps_trt/config.pbtxt
-
-platform: "tensorrt_plan"
-default_model_filename: "dynamic_3fc_light.trt"
-backend: "tensorrt"
-max_batch_size: 0
-input [
-  {
-    name: "categorical_features"
-    data_type: TYPE_INT32
-    dims: [-1,26]
-  },
-  {
-    name: "numerical_features"
-    data_type: TYPE_FP32
-    dims: [-1,13]
-  }
-]
-output [
-  {
-      name: "output_1"
-      data_type: TYPE_FP32
-      dims: [-1,1]
-  }
-]
-instance_group [
-  {
-    count: 1
-    kind: KIND_GPU
-    gpus:[0]
-
-  }
-]
-
-
-
-
-
Overwriting model_repo/dynamic_3fc_lite_hps_trt/config.pbtxt
-
-
-
-
-
-
-
!tree -n model_repo/dynamic_3fc_lite_hps_trt
-
-
-
-
-
mmodel_repo/dynamic_3fc_lite_hps_trt
-├── 1
-│   └── dynamic_3fc_light.trt
-└── config.pbtxt
-
-1 directory, 2 files
-
-
-
-
-
-
-

Step2: Prepare the benchmark input data

-

To benchmark with Triton Performance Analyzer, we need to prepare the input data of the required format based on Criteo dataset.

-

In part 2 section 1.1, we have created the binary dataset in final_output_dir. We provide you with a script to convert this binary data into JSON format that can be fed into the Triton Performance Analyzer. To use the script, you will need the DeepLearningExamples again. Make sure you have add it to your $PYTHONPATH.

-

If not, please run export PYTHONPATH=/DeepLearningExamples/TensorFlow2/Recommendation/DLRM_and_DCNv2. Remember to replace the path with the correct path in your workspace.

-
-
-
#Create a dir to store the JSON format data
-!mkdir -p ./perf_data
-
-
-
-
-

Please note that the following script will takes several minutes to finish.

-
-
-
#Run the Python script to convert binary data to JSON format
-!python spark2json.py --result-path ./perf_data --dataset_path /path/to/your/binary_split_converted_data --num-benchmark-samples 2000000
-
-
-
-
-

Remember to replace the --dataset_path with the correct path in your workspace, and specify the --num-benchmark-samples you want to use.

-
-
-
!tree -n perf_data
-
-
-
-
-
perf_data
-├── 1024.json
-├── 16384.json
-├── 2048.json
-├── 256.json
-├── 32768.json
-├── 4096.json
-├── 512.json
-├── 65536.json
-└── 8192.json
-
-0 directories, 9 files
-
-
-
-
-
-
-

Step3: Launch the Triton inference server

-

We can then launch the Triton inference server using the TensorRT backend. Please note that LD_PRELOAD is utilized to load the custom TensorRT plugin (i.e., HPS TensorRT plugin) into Triton.

-

Note: Since Background processes not supported by Jupyter, please launch the Triton Server according to the following command independently in the background.

-
-

LD_PRELOAD=/usr/local/hps_trt/lib/libhps_plugin.so tritonserver –model-repository=/hugectr/hps_trt/notebooks/model_repo/ –load-model=dynamic_3fc_lite_hps_trt –model-control-mode=explicit

-
-

If you successfully started tritonserver, you should see a log similar to following:

-
+--------------------------+---------+--------+
-| Model                    | Version | Status |
-+--------------------------+---------+--------+
-| dynamic_3fc_lite_hps_trt | 1       | READY  |
-+--------------------------+---------+--------+
-
-Started GRPCInferenceService at 0.0.0.0:8001
-Started HTTPService at 0.0.0.0:8000
-Started Metrics Service at 0.0.0.0:8002
-
-
-

We can then benchmark the online inference performance of this HPS-integrated engine with 147GB embedding table.

-
-
-

Step4: Run the benchmark

-

The reported compute infer number at the server side is the end-to-end inference latency of the engine, which covers HPS embedding lookup and TensorRT forward of dense layers.

-
-
-
%%writefile benchmark.sh
-
-batch_size=(256 512 1024 2048 4096 8192 16384 32768 65536)
-
-model_name=("dynamic_3fc_lite_hps_trt")
-
-for b in ${batch_size[*]};
-do
-  for m in ${model_name[*]};
-  do
-    echo $b $m
-    perf_analyzer -m ${m} -u localhost:8000 --input-data perf_data/${b}.json --shape categorical_features:${b},26 --shape numerical_features:${b},13
-  done
-done
-
-
-
-
-
Overwriting benchmark.sh
-
-
-
-
-
-
-
!bash benchmark.sh > result.log
-
-
-
-
-
256 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 25600 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 20941
-    Throughput: 1163.14 infer/sec
-    Avg latency: 851 usec (standard deviation 1184 usec)
-    p50 latency: 799 usec
-    p90 latency: 922 usec
-    p95 latency: 977 usec
-    p99 latency: 1190 usec
-    Avg HTTP time: 846 usec (send/recv 85 usec + response wait 761 usec)
-  Server: 
-    Inference count: 20941
-    Execution count: 20941
-    Successful request count: 20941
-    Avg request latency: 551 usec (overhead 21 usec + queue 40 usec + compute input 38 usec + compute infer 343 usec + compute output 108 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 1163.14 infer/sec, latency 851 usec
-512 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 12800 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 14135
-    Throughput: 785.143 infer/sec
-    Avg latency: 1264 usec (standard deviation 286 usec)
-    p50 latency: 1236 usec
-    p90 latency: 1340 usec
-    p95 latency: 1374 usec
-    p99 latency: 1476 usec
-    Avg HTTP time: 1258 usec (send/recv 92 usec + response wait 1166 usec)
-  Server: 
-    Inference count: 14135
-    Execution count: 14135
-    Successful request count: 14135
-    Avg request latency: 889 usec (overhead 27 usec + queue 44 usec + compute input 42 usec + compute infer 619 usec + compute output 156 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 785.143 infer/sec, latency 1264 usec
-1024 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 6400 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 8116
-    Throughput: 450.826 infer/sec
-    Avg latency: 2206 usec (standard deviation 391 usec)
-    p50 latency: 2183 usec
-    p90 latency: 2321 usec
-    p95 latency: 2368 usec
-    p99 latency: 2486 usec
-    Avg HTTP time: 2199 usec (send/recv 118 usec + response wait 2081 usec)
-  Server: 
-    Inference count: 8116
-    Execution count: 8116
-    Successful request count: 8116
-    Avg request latency: 1632 usec (overhead 45 usec + queue 73 usec + compute input 106 usec + compute infer 1173 usec + compute output 234 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 450.826 infer/sec, latency 2206 usec
-2048 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 3200 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 5311
-    Throughput: 295.01 infer/sec
-    Avg latency: 3377 usec (standard deviation 459 usec)
-    p50 latency: 3349 usec
-    p90 latency: 3486 usec
-    p95 latency: 3530 usec
-    p99 latency: 3820 usec
-    Avg HTTP time: 3370 usec (send/recv 155 usec + response wait 3215 usec)
-  Server: 
-    Inference count: 5311
-    Execution count: 5311
-    Successful request count: 5311
-    Avg request latency: 2591 usec (overhead 50 usec + queue 76 usec + compute input 162 usec + compute infer 2068 usec + compute output 234 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 295.01 infer/sec, latency 3377 usec
-4096 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 1600 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 3518
-    Throughput: 195.42 infer/sec
-    Avg latency: 5109 usec (standard deviation 380 usec)
-    p50 latency: 5068 usec
-    p90 latency: 5242 usec
-    p95 latency: 5316 usec
-    p99 latency: 5741 usec
-    Avg HTTP time: 5104 usec (send/recv 171 usec + response wait 4933 usec)
-  Server: 
-    Inference count: 3518
-    Execution count: 3518
-    Successful request count: 3518
-    Avg request latency: 4134 usec (overhead 38 usec + queue 48 usec + compute input 138 usec + compute infer 3742 usec + compute output 167 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 195.42 infer/sec, latency 5109 usec
-8192 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 800 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 1910
-    Throughput: 106.097 infer/sec
-    Avg latency: 9412 usec (standard deviation 553 usec)
-    p50 latency: 9384 usec
-    p90 latency: 9529 usec
-    p95 latency: 9581 usec
-    p99 latency: 10106 usec
-    Avg HTTP time: 9406 usec (send/recv 294 usec + response wait 9112 usec)
-  Server: 
-    Inference count: 1910
-    Execution count: 1910
-    Successful request count: 1910
-    Avg request latency: 7674 usec (overhead 42 usec + queue 54 usec + compute input 267 usec + compute infer 7179 usec + compute output 130 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 106.097 infer/sec, latency 9412 usec
-16384 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 400 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 992
-    Throughput: 55.1033 infer/sec
-    Avg latency: 18132 usec (standard deviation 726 usec)
-    p50 latency: 18051 usec
-    p90 latency: 18257 usec
-    p95 latency: 18330 usec
-    p99 latency: 23069 usec
-    Avg HTTP time: 18125 usec (send/recv 1278 usec + response wait 16847 usec)
-  Server: 
-    Inference count: 992
-    Execution count: 992
-    Successful request count: 992
-    Avg request latency: 14999 usec (overhead 29 usec + queue 56 usec + compute input 476 usec + compute infer 14234 usec + compute output 203 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 55.1033 infer/sec, latency 18132 usec
-32768 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 200 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-  Client: 
-    Request count: 515
-    Throughput: 28.6081 infer/sec
-    Avg latency: 34878 usec (standard deviation 927 usec)
-    p50 latency: 34734 usec
-    p90 latency: 35143 usec
-    p95 latency: 35288 usec
-    p99 latency: 40804 usec
-    Avg HTTP time: 34872 usec (send/recv 2584 usec + response wait 32288 usec)
-  Server: 
-    Inference count: 516
-    Execution count: 516
-    Successful request count: 516
-    Avg request latency: 29340 usec (overhead 33 usec + queue 55 usec + compute input 870 usec + compute infer 28111 usec + compute output 270 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 28.6081 infer/sec, latency 34878 usec
-65536 dynamic_3fc_lite_hps_trt
- Successfully read data for 1 stream/streams with 100 step/steps.
-*** Measurement Settings ***
-  Batch size: 1
-  Service Kind: Triton
-  Using "time_windows" mode for stabilization
-  Measurement window: 5000 msec
-  Using synchronous calls for inference
-  Stabilizing using average latency
-
-Request concurrency: 1
-
-
-
  Client: 
-    Request count: 253
-    Throughput: 14.053 infer/sec
-    Avg latency: 71063 usec (standard deviation 1570 usec)
-    p50 latency: 70749 usec
-    p90 latency: 71666 usec
-    p95 latency: 73226 usec
-    p99 latency: 77979 usec
-    Avg HTTP time: 71058 usec (send/recv 5092 usec + response wait 65966 usec)
-  Server: 
-    Inference count: 253
-    Execution count: 253
-    Successful request count: 253
-    Avg request latency: 60716 usec (overhead 38 usec + queue 58 usec + compute input 1804 usec + compute infer 58482 usec + compute output 333 usec)
-
-Inferences/Second vs. Client Average Batch Latency
-Concurrency: 1, throughput: 14.053 infer/sec, latency 71063 usec
-
-
-
-
-
-
-
%%writefile ./summary.py
-import os
-from argparse import ArgumentParser
-import json
-import re
-import glob
-from collections import defaultdict
-import math
-
-log_pattern = {
-    "inference_benchmark": {
-        "cmd_log": r"compute infer",
-        "result_log": r"compute infer (\d+\.?\d*) usec",
-    },
-
-}
-
-
-
-def extract_result_from_log(log_path):
-    job_log_pattern = log_pattern["inference_benchmark"]
-    results = []
-    with open(log_path, "r", errors="ignore") as f:
-        lines = "".join(f.readlines())
-        job_logs = lines.split("+ ")
-        for each_job_log in job_logs:
-            if re.search(job_log_pattern["cmd_log"], each_job_log):
-                for line in each_job_log.split("\n"):
-                    match = re.search(job_log_pattern["result_log"], line)
-                    if match is None:
-                        continue
-                    result = float(match.group(1))
-                    results.append(result)
-    return results
-
-
-
-if __name__ == "__main__":
-    parser = ArgumentParser()
-    parser.add_argument("--log_path", required=True)
-    args = parser.parse_args()
-    batch_sizes = ["256", "512", "1024", "2048", "4096", "8192", "16384", "32768", "65536"]
-
-
-    perf_result = extract_result_from_log(args.log_path)
-    idx = 0
-    batch_sizes = ["256", "512", "1024", "2048", "4096", "8192", "16384", "32768", "65536"]
-    print("Inference Latency (usec)")
-    print("-----------------------------------------------------------------------------------------")
-    print("batch_size\tresult \t")
-    print("-----------------------------------------------------------------------------------------")
-    for i in range(len(perf_result)):
-        print("{}\t\t{}\t\t".format(
-                        batch_sizes[i],
-                        perf_result[i]
-                    )
-                )
-        print(
-                "-----------------------------------------------------------------------------------------"
-            )
-
-
-
-
-
Overwriting ./summary.py
-
-
-
-
-

Summarize the result

-
-
-
!python ./summary.py --log_path result.log
-
-
-
-
-
Inference Latency (usec)
------------------------------------------------------------------------------------------
-batch_size	result 	
------------------------------------------------------------------------------------------
-256		343.0		
------------------------------------------------------------------------------------------
-512		619.0		
------------------------------------------------------------------------------------------
-1024		1173.0		
------------------------------------------------------------------------------------------
-2048		2068.0		
------------------------------------------------------------------------------------------
-4096		3742.0		
------------------------------------------------------------------------------------------
-8192		7179.0		
------------------------------------------------------------------------------------------
-16384		14234.0		
------------------------------------------------------------------------------------------
-32768		28111.0		
------------------------------------------------------------------------------------------
-65536		58482.0		
------------------------------------------------------------------------------------------
-
-
-
-
-

-
-
-
-

4. Benchmark for ARM64 or Grace + Hooper systems

-

Our prebuilt Grace-optimized ARM64 images are currently undergoing testing, and are therefore not yet available via NGC. This will change soon. If you want to benchmark on ARM and in particular a system equipped with a NVIDIA Grace CPU, you can build a compatible docker image yourself by following these steps.

-

In some steps we provide 2 mutually exclusive alternatives.

-
    -
  • Option A (portable ARM64 HugeCTR): If you follow option A instructions, you will build the standard version of HugeCTR for ARM64 platforms. This approach produces binaries that are more portable, but may not allow you get the most out your Grace+Hopper hardware setup.

  • -
  • Option B (G+H optimized HugeCTR): In contrast, if you follow option B instructions, you will build and run a HugeCTR variant that maximizes DLRM throughput on Grace+Hopper systems. However, please be advised that slight alterations to the system setup are necessary to achieve this. To apply these alterations you must have root access.es configuration

  • -
-
-

Step 1: Build the NVIDIA Merlin docker images

-

Use the following instructions on your ARM system to download and build merlin-base and merlin-tensorflow docker images required for the benchmark.

-
    -
  • Option A (portable ARM64 HugeCTR):

    -
    git clone https://github.com/NVIDIA-Merlin/Merlin.git
    -cd Merlin/docker
    -docker build -t nvcr.io/nvstaging/merlin/merlin-base:24.06 -f dockerfile.merlin.ctr .
    -docker build -t nvcr.io/nvstaging/merlin/merlin-hugectr:24.06 -f dockerfile.ctr .
    -cd ../..
    -
    -
    -
  • -
  • Option B (G+H optimized HugeCTR):

    -
    git clone https://github.com/NVIDIA-Merlin/Merlin.git
    -cd Merlin/docker
    -sed -i -e 's/" -DENABLE_INFERENCE=ON/" -DUSE_HUGE_PAGES=ON -DENABLE_INFERENCE=ON/g' dockerfile.merlin
    -docker build -t nvcr.io/nvstaging/merlin/merlin-base:24.06 -f dockerfile.merlin.ctr .
    -docker build -t nvcr.io/nvstaging/merlin/merlin-hugectr:24.06 -f dockerfile.ctr .
    -cd ../..
    -
    -
    -
  • -
-
-
-

Step 2: Prepare host system for running the docker container

-
    -
  • Option A (portable ARM64 HugeCTR): -No action required.

  • -
  • Option B (G+H optimized HugeCTR): -Adjust your Grace+Hopper system configuration to increase the number of large memory pages for the benchmark.

    -
    sudo echo '180000' > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    -
    -
    -

    This make take a while.

    -

    In addition, you can reuse the light.json configuration file in Prepare JSON configuration file for HPS.

    -
  • -
-
-
-

Step 3: Create the model

-

Follow to Create TF Create the TF model to create the model. There are many ways to accomplish this. We suggest simply running this Jupyter notebook using the docker image that you just created in your ARM64 / Grace+Hopper node, and forward the web-server port to the host system.

-

Your filesystem or system environment might impose constraints. The following command just serves as an example. It assumes HugeCTR was downloaded from GitHub into the current working directory (git clone https://github.com/NVIDIA-Merlin/HugeCTR.git). To allow writing files, we first give root user (inside the docker image you are root) to access to the notebook folder (this folder), and then startup a suitable Jupyter server.

-
export HCTR_SRC="${PWD}/HugeCTR" && chmod -R 777 "${HCTR_SRC}/hps_trt/notebooks" && docker run -it --rm --gpus all --network=host -v ${HCTR_SRC}:/hugectr nvcr.io/nvstaging/merlin/merlin-hugectr:24.06 jupyter-lab --allow-root --ip 0.0.0.0 --port 8888 --no-browser --notebook-dir=/hugectr/hps_trt/notebooks
-
-
-
-
-

Step 4: Prepare data

-

Next, follow the instructions in Build the HPS-integrated TensorRT engine to create the dataset and the predconditions for benchmarking.

-
-
-

Step 5: Run benchmark

-

Follow the steps outlined in Benchmark HPS-integrated TensorRT engine on Triton to execute the benchmark itself.

-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_trt/notebooks/demo_for_hugectr_trained_model.html b/review/pr-458/hps_trt/notebooks/demo_for_hugectr_trained_model.html deleted file mode 100644 index 1841694e20..0000000000 --- a/review/pr-458/hps_trt/notebooks/demo_for_hugectr_trained_model.html +++ /dev/null @@ -1,799 +0,0 @@ - - - - - - - HPS TensorRT Plugin Demo for HugeCTR Trained Model — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-tensorflow-triton-deployment/nvidia_logo.png -
-

HPS TensorRT Plugin Demo for HugeCTR Trained Model

-
-

Overview

-

This notebook demonstrates how to build and deploy the HPS-integrated TensorRT engine for the model trained with HugeCTR.

-

For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Use NGC

-

The HPS TensorRT plugin is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
-
-
import ctypes
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-plugin_handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-
-
-
-
-
-
-

Data Generation

-

HugeCTR provides a tool to generate synthetic datasets. The Data Generator is capable of generating datasets of different file formats and different distributions. We will generate one-hot Parquet datasets with power-law distribution for this notebook:

-
-
-
import hugectr
-from hugectr.tools import DataGeneratorParams, DataGenerator
-
-data_generator_params = DataGeneratorParams(
-  format = hugectr.DataReaderType_t.Parquet,
-  label_dim = 1,
-  dense_dim = 13,
-  num_slot = 26,
-  i64_input_key = True,
-  nnz_array = [1 for _ in range(26)],
-  source = "./data_parquet/file_list.txt",
-  eval_source = "./data_parquet/file_list_test.txt",
-  slot_size_array = [10000 for _ in range(26)],
-  check_type = hugectr.Check_t.Non,
-  dist_type = hugectr.Distribution_t.PowerLaw,
-  power_law_type = hugectr.PowerLaw_t.Short,
-  num_files = 16,
-  eval_num_files = 4,
-  num_samples_per_file = 40960)
-data_generator = DataGenerator(data_generator_params)
-data_generator.generate()
-
-
-
-
-
[HCTR][05:12:08.561][INFO][RK0][main]: Generate Parquet dataset
-[HCTR][05:12:08.561][INFO][RK0][main]: train data folder: ./data_parquet, eval data folder: ./data_parquet, slot_size_array: 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, nnz array: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, #files for train: 16, #files for eval: 4, #samples per file: 40960, Use power law distribution: 1, alpha of power law: 1.3
-[HCTR][05:12:08.564][INFO][RK0][main]: ./data_parquet exist
-[HCTR][05:12:08.568][INFO][RK0][main]: ./data_parquet/train/gen_0.parquet
-[HCTR][05:12:10.204][INFO][RK0][main]: ./data_parquet/train/gen_1.parquet
-[HCTR][05:12:10.455][INFO][RK0][main]: ./data_parquet/train/gen_2.parquet
-[HCTR][05:12:10.709][INFO][RK0][main]: ./data_parquet/train/gen_3.parquet
-[HCTR][05:12:10.957][INFO][RK0][main]: ./data_parquet/train/gen_4.parquet
-[HCTR][05:12:11.196][INFO][RK0][main]: ./data_parquet/train/gen_5.parquet
-[HCTR][05:12:11.437][INFO][RK0][main]: ./data_parquet/train/gen_6.parquet
-[HCTR][05:12:11.681][INFO][RK0][main]: ./data_parquet/train/gen_7.parquet
-[HCTR][05:12:11.920][INFO][RK0][main]: ./data_parquet/train/gen_8.parquet
-[HCTR][05:12:12.171][INFO][RK0][main]: ./data_parquet/train/gen_9.parquet
-[HCTR][05:12:12.411][INFO][RK0][main]: ./data_parquet/train/gen_10.parquet
-[HCTR][05:12:12.650][INFO][RK0][main]: ./data_parquet/train/gen_11.parquet
-[HCTR][05:12:12.885][INFO][RK0][main]: ./data_parquet/train/gen_12.parquet
-[HCTR][05:12:13.120][INFO][RK0][main]: ./data_parquet/train/gen_13.parquet
-[HCTR][05:12:13.341][INFO][RK0][main]: ./data_parquet/train/gen_14.parquet
-[HCTR][05:12:13.577][INFO][RK0][main]: ./data_parquet/train/gen_15.parquet
-[HCTR][05:12:13.818][INFO][RK0][main]: ./data_parquet/file_list.txt done!
-[HCTR][05:12:13.827][INFO][RK0][main]: ./data_parquet/val/gen_0.parquet
-[HCTR][05:12:14.066][INFO][RK0][main]: ./data_parquet/val/gen_1.parquet
-[HCTR][05:12:14.299][INFO][RK0][main]: ./data_parquet/val/gen_2.parquet
-[HCTR][05:12:14.537][INFO][RK0][main]: ./data_parquet/val/gen_3.parquet
-[HCTR][05:12:14.751][INFO][RK0][main]: ./data_parquet/file_list_test.txt done!
-
-
-
-
-
-
-

Train with HugeCTR

-

We can train a DLRM model with HugeCTR Python APIs. The trained sparse and dense model files will be saved separately. The model graph will be dumped into a JSON file.

-
-
-
%%writefile train.py
-import hugectr
-from mpi4py import MPI
-
-
-solver = hugectr.CreateSolver(
-    model_name="dlrm",
-    max_eval_batches=160,
-    batchsize_eval=1024,
-    batchsize=1024,
-    lr=0.001,
-    vvgpu=[[0]],
-    repeat_dataset=True,
-    use_mixed_precision=True,
-    use_cuda_graph=True,
-    scaler=1024,
-    i64_input_key=True,
-)
-reader = hugectr.DataReaderParams(
-    data_reader_type=hugectr.DataReaderType_t.Parquet,
-    source=["./data_parquet/file_list.txt"],
-    eval_source="./data_parquet/file_list_test.txt",
-    slot_size_array=[10000 for _ in range(26)],
-    check_type=hugectr.Check_t.Non,
-)
-optimizer = hugectr.CreateOptimizer(
-    optimizer_type=hugectr.Optimizer_t.Adam,
-    update_type=hugectr.Update_t.Global,
-    beta1=0.9,
-    beta2=0.999,
-    epsilon=0.0001,
-)
-
-model = hugectr.Model(solver, reader, optimizer)
-model.add(
-    hugectr.Input(
-        label_dim=1,
-        label_name="label",
-        dense_dim=13,
-        dense_name="numerical_features",
-        data_reader_sparse_param_array=[hugectr.DataReaderSparseParam("keys", 1, True, 26)],
-    )
-)
-model.add(
-    hugectr.SparseEmbedding(
-        embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-        workspace_size_per_gpu_in_mb=5000,
-        embedding_vec_size=128,
-        combiner="mean",
-        sparse_embedding_name="sparse_embedding1",
-        bottom_name="keys",
-        optimizer=optimizer,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MLP,
-        bottom_names=["numerical_features"],
-        top_names=["mlp1"],
-        num_outputs=[512, 256, 128],
-        act_type=hugectr.Activation_t.Relu,
-        use_bias=True,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Interaction,
-        bottom_names=["mlp1", "sparse_embedding1"],
-        top_names=["interaction1"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MLP,
-        bottom_names=["interaction1"],
-        top_names=["mlp2"],
-        num_outputs=[1024, 1024, 512, 256, 1],
-        use_bias=True,
-        activations=[
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Relu,
-            hugectr.Activation_t.Non,
-        ],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["mlp2", "label"],
-        top_names=["loss"],
-    )
-)
-model.graph_to_json("dlrm_hugectr_graph.json")
-model.compile()
-model.summary()
-model.fit(max_iter=1200, display=200, eval_interval=1000, snapshot=1000, snapshot_prefix="dlrm_hugectr")
-
-
-
-
-
Writing train.py
-
-
-
-
-
-
-
!python3 train.py
-
-
-
-
-
--------------------------------------------------------------------------
-An error occurred while trying to map in the address of a function.
-  Function Name: cuIpcOpenMemHandle_v2
-  Error string:  /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
-CUDA-aware support is disabled.
---------------------------------------------------------------------------
-HugeCTR Version: 4.1
-====================================================Model Init=====================================================
-[HCTR][05:12:24.539][INFO][RK0][main]: Initialize model: dlrm
-[HCTR][05:12:24.539][INFO][RK0][main]: Global seed is 2950905596
-[HCTR][05:12:24.542][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-[HCTR][05:12:26.698][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][05:12:26.698][INFO][RK0][main]: Start all2all warmup
-[HCTR][05:12:26.698][INFO][RK0][main]: End all2all warmup
-[HCTR][05:12:26.699][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][05:12:26.700][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][05:12:26.705][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][05:12:26.705][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][05:12:26.782][INFO][RK0][main]: Vocabulary size: 260000
-[HCTR][05:12:26.782][INFO][RK0][main]: max_vocabulary_size_per_gpu_=3413333
-[HCTR][05:12:26.791][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-[HCTR][05:12:26.795][INFO][RK0][main]: Save the model graph to dlrm_hugectr_graph.json successfully
-===================================================Model Compile===================================================
-[HCTR][05:12:27.772][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][05:12:27.781][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][05:12:27.783][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][05:12:27.785][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][05:12:27.785][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   numerical_features             keys                          
-(1024,1)                                (1024,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      keys                          sparse_embedding1             (1024,26,128)                 
-------------------------------------------------------------------------------------------------------------------
-MLP                                     numerical_features            mlp1                          (1024,128)                    
-------------------------------------------------------------------------------------------------------------------
-Interaction                             mlp1                          interaction1                  (1024,480)                    
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-MLP                                     interaction1                  mlp2                          (1024,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  mlp2                          loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][05:12:27.785][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1200
-[HCTR][05:12:27.785][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
-[HCTR][05:12:27.785][INFO][RK0][main]: Evaluation interval: 1000, snapshot interval: 1000
-[HCTR][05:12:27.785][INFO][RK0][main]: Dense network trainable: True
-[HCTR][05:12:27.785][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][05:12:27.785][INFO][RK0][main]: Use mixed precision: True, scaler: 1024.000000, use cuda graph: True
-[HCTR][05:12:27.785][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][05:12:27.785][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][05:12:27.785][INFO][RK0][main]: Training source file: ./data_parquet/file_list.txt
-[HCTR][05:12:27.785][INFO][RK0][main]: Evaluation source file: ./data_parquet/file_list_test.txt
-[HCTR][05:12:31.522][INFO][RK0][main]: Iter: 200 Time(200 iters): 3.72017s Loss: 0.693168 lr:0.001
-[HCTR][05:12:35.188][INFO][RK0][main]: Iter: 400 Time(200 iters): 3.64947s Loss: 0.694016 lr:0.001
-[HCTR][05:12:38.814][INFO][RK0][main]: Iter: 600 Time(200 iters): 3.60927s Loss: 0.69323 lr:0.001
-[HCTR][05:12:42.432][INFO][RK0][main]: Iter: 800 Time(200 iters): 3.60078s Loss: 0.693079 lr:0.001
-[HCTR][05:12:46.050][INFO][RK0][main]: Iter: 1000 Time(200 iters): 3.60162s Loss: 0.693134 lr:0.001
-[HCTR][05:12:46.206][INFO][RK0][main]: Evaluation, AUC: 0.498656
-[HCTR][05:12:46.206][INFO][RK0][main]: Eval Time for 160 iters: 0.156138s
-[HCTR][05:12:46.206][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:12:46.272][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][05:12:47.456][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][05:12:47.958][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][05:12:47.958][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:12:56.286][INFO][RK0][main]: Done
-[HCTR][05:12:56.840][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][05:12:56.840][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:13:06.514][INFO][RK0][main]: Done
-[HCTR][05:13:06.555][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HCTR][05:13:06.561][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:13:06.693][INFO][RK0][main]: Dumping dense weights to file, successful
-[HCTR][05:13:06.694][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:13:06.823][INFO][RK0][main]: Dumping dense optimizer states to file, successful
-[HCTR][05:13:10.414][INFO][RK0][main]: Finish 1200 iterations with batchsize: 1024 in 42.63s.
-
-
-
-
-
-
-

Build the HPS-integrated TensorRT engine

-

The sparse saved model dlrm_hugectr0_sparse_1000.model is already in the format that HPS requires. In order to use HPS in the inference stage, we need to create JSON configuration file for HPS.

-

Then we convert the dense saved model dlrm_hugectr_dense_1000.model to ONNX using hugectr2onnx, and employ the ONNX GraphSurgoen tool to replace the input embedding vectors with with the placeholder of HPS TensorRT plugin layer.

-

After that, we can build the TensorRT engine, which is comprised of the HPS TensorRT plugin layer and the dense network.

-
-

Step1: Prepare JSON configuration file for HPS

-

Please note that the storage format in the dlrm_hugectr0_sparse_1000.model/key file is int64, while the HPS TensorRT plugin only supports int32 when loading the keys into memory. There is no overflow since the key value range is 0~260000.

-
-
-
%%writefile dlrm_hugectr.json
-{
-    "supportlonglong": false,
-    "models": [{
-        "model": "dlrm",
-        "sparse_files": ["dlrm_hugectr0_sparse_1000.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [128],
-        "maxnum_catfeature_query_per_table_per_sample": [26],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing dlrm_hugectr.json
-
-
-
-
-
-
-

Step2: Convert to ONNX and do ONNX graph surgery

-
-
-
# hugectr2onnx
-import hugectr2onnx
-hugectr2onnx.converter.convert(onnx_model_path = "dlrm_hugectr_dense.onnx",
-                            graph_config = "dlrm_hugectr_graph.json",
-                            dense_model = "dlrm_hugectr_dense_1000.model",
-                            convert_embedding = False)
-
-
-
-
-
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
-Skip sparse embedding layers in converted ONNX model
-[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting MLP layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Interaction layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting MLP layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
-[HUGECTR2ONNX][INFO]: The model is checked!
-[HUGECTR2ONNX][INFO]: The model is saved at dlrm_hugectr_dense.onnx
-
-
-
-
-
-
-
# ONNX graph surgery to insert HPS the TensorRT plugin placeholder
-import onnx_graphsurgeon as gs
-from onnx import  shape_inference
-import numpy as np
-import onnx
-
-graph = gs.import_onnx(onnx.load("dlrm_hugectr_dense.onnx"))
-saved = []
-
-for i in graph.inputs:
-    if i.name == "sparse_embedding1":
-        categorical_features = gs.Variable(name="categorical_features", dtype=np.int32, shape=("unknown_1", 26))
-        node = gs.Node(op="HPS_TRT", attrs={"ps_config_file": "dlrm_hugectr.json\0", "model_name": "dlrm\0", "table_id": 0, "emb_vec_size": 128}, inputs=[categorical_features], outputs=[i])
-        graph.nodes.append(node)
-        saved.append(categorical_features)
-    elif i.name == "numerical_features":
-        i.shape = ("unknown_2", 13)
-        saved.append(i)
-
-graph.inputs = saved
-
-graph.cleanup().toposort()
-onnx.save(gs.export_onnx(graph), "dlrm_hugectr_with_hps.onnx")
-
-
-
-
-
-
-

Step3: Build the TensorRT engine

-
-
-
# build the TensorRT engine based on dlrm_with_hps.onnx
-import tensorrt as trt
-import ctypes
-
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-TRT_LOGGER = trt.Logger(trt.Logger.INFO)
-EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
-
-def build_engine_from_onnx(onnx_model_path):
-    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser, builder.create_builder_config() as builder_config:        
-        model = open(onnx_model_path, 'rb')
-        parser.parse(model.read())
-
-        profile = builder.create_optimization_profile()        
-        profile.set_shape("categorical_features", (1, 26), (1024, 26), (1024, 26))    
-        profile.set_shape("numerical_features", (1, 13), (1024, 13), (1024, 13))
-        builder_config.add_optimization_profile(profile)
-        engine = builder.build_serialized_network(network, builder_config)
-        return engine
-
-serialized_engine = build_engine_from_onnx("dlrm_hugectr_with_hps.onnx")
-with open("dlrm_hugectr_with_hps.trt", "wb") as fout:
-    fout.write(serialized_engine)
-print("Successfully build the TensorRT engine")
-
-
-
-
-
[12/14/2022-05:13:31] [TRT] [I] [MemUsageChange] Init CUDA: CPU +262, GPU +0, now: CPU 1014, GPU 886 (MiB)
-[12/14/2022-05:13:33] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +170, GPU +46, now: CPU 1239, GPU 932 (MiB)
-[12/14/2022-05:13:33] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
-[12/14/2022-05:13:33] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
-[12/14/2022-05:13:33] [TRT] [I] No importer registered for op: HPS_TRT. Attempting to import as plugin.
-[12/14/2022-05:13:33] [TRT] [I] Searching for plugin: HPS_TRT, plugin_version: 1, plugin_namespace: 
-=====================================================HPS Parse====================================================
-[HCTR][05:13:33.812][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][05:13:33.812][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][05:13:33.812][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][05:13:33.812][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][05:13:33.812][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][05:13:33.812][INFO][RK0][main]: use_static_table is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][05:13:33.813][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][05:13:33.813][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][05:13:33.813][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][05:13:33.813][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][05:13:33.813][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][05:13:33.813][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:13:36.189][INFO][RK0][main]: Table: hps_et.dlrm.sparse_embedding0; cached 239950 / 239950 embeddings in volatile database (HashMapBackend); load: 239950 / 18446744073709551615 (0.00%).
-[HCTR][05:13:36.196][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][05:13:36.196][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][05:13:36.205][INFO][RK0][main]: Model name: dlrm
-[HCTR][05:13:36.205][INFO][RK0][main]: Max batch size: 1024
-[HCTR][05:13:36.205][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][05:13:36.205][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][05:13:36.205][INFO][RK0][main]: Use static table: False
-[HCTR][05:13:36.205][INFO][RK0][main]: Use I64 input key: False
-[HCTR][05:13:36.205][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][05:13:36.205][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][05:13:36.205][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][05:13:36.205][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][05:13:36.205][INFO][RK0][main]: The refresh percentage : 0.200000
-[HCTR][05:13:36.270][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][05:13:36.270][INFO][RK0][main]: Using Local file system backend.
-[HCTR][05:13:36.419][INFO][RK0][main]: EC initialization for model: "dlrm", num_tables: 1
-[HCTR][05:13:36.419][INFO][RK0][main]: EC initialization on device: 0
-[HCTR][05:13:36.440][INFO][RK0][main]: Creating lookup session for dlrm on device: 0
-[12/14/2022-05:13:36] [TRT] [I] Successfully created plugin: HPS_TRT
-[12/14/2022-05:13:37] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +335, GPU +146, now: CPU 5763, GPU 1314 (MiB)
-[12/14/2022-05:13:37] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +116, GPU +54, now: CPU 5879, GPU 1368 (MiB)
-[12/14/2022-05:13:37] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
-[12/14/2022-05:13:37] [TRT] [W] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
-[12/14/2022-05:13:52] [TRT] [I] Total Activation Memory: 34118830080
-[12/14/2022-05:13:52] [TRT] [I] Detected 2 inputs and 1 output network tensors.
-[12/14/2022-05:13:52] [TRT] [I] Total Host Persistent Memory: 20304
-[12/14/2022-05:13:52] [TRT] [I] Total Device Persistent Memory: 10752
-[12/14/2022-05:13:52] [TRT] [I] Total Scratch Memory: 32505856
-[12/14/2022-05:13:52] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 16 MiB, GPU 4628 MiB
-[12/14/2022-05:13:52] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 16 steps to complete.
-[12/14/2022-05:13:52] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.09284ms to assign 4 blocks to 16 nodes requiring 48099840 bytes.
-[12/14/2022-05:13:52] [TRT] [I] Total Activation Memory: 48099840
-[12/14/2022-05:13:52] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 6321, GPU 1580 (MiB)
-[12/14/2022-05:13:52] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 6322, GPU 1590 (MiB)
-[12/14/2022-05:13:52] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +8, GPU +16, now: CPU 8, GPU 16 (MiB)
-Successfully build the TensorRT engine
-
-
-
-
-
-
-
-

Deploy HPS-integrated TensorRT engine on Triton

-

In order to deploy the TensorRT engine with the Triton TensorRT backend, we need to create the model repository and define the config.pbtxt first.

-
-
-
!mkdir -p model_repo/dlrm_hugectr_with_hps/1
-!mv dlrm_hugectr_with_hps.trt model_repo/dlrm_hugectr_with_hps/1
-
-
-
-
-
-
-
%%writefile model_repo/dlrm_hugectr_with_hps/config.pbtxt
-
-platform: "tensorrt_plan"
-default_model_filename: "dlrm_hugectr_with_hps.trt"
-backend: "tensorrt"
-max_batch_size: 0
-input [
-  {
-    name: "categorical_features"
-    data_type: TYPE_INT32
-    dims: [-1,26]
-  },
-  {
-    name: "numerical_features"
-    data_type: TYPE_FP32
-    dims: [-1,13]
-  }
-]
-output [
-  {
-      name: "label"
-      data_type: TYPE_FP32
-      dims: [-1,1]
-  }
-]
-instance_group [
-  {
-    count: 1
-    kind: KIND_GPU
-    gpus:[0]
-
-  }
-]
-
-
-
-
-
Writing model_repo/dlrm_hugectr_with_hps/config.pbtxt
-
-
-
-
-
-
-
!tree model_repo/dlrm_hugectr_with_hps
-
-
-
-
-
model_repo/dlrm_hugectr_with_hps
-├── 1
-│   └── dlrm_hugectr_with_hps.trt
-└── config.pbtxt
-
-1 directory, 2 files
-
-
-
-
-

We can then launch the Triton inference server using the TensorRT backend. Please note that LD_PRELOAD is utilized to load the custom TensorRT plugin (i.e., HPS TensorRT plugin) into Triton.

-

Note: Since Background processes not supported by Jupyter, please launch the Triton Server according to the following command independently in the background.

-
-

LD_PRELOAD=/usr/local/hps_trt/lib/libhps_plugin.so tritonserver –model-repository=/hugectr/hps_trt/notebooks/model_repo/ –load-model=dlrm_hugectr_with_hps –model-control-mode=explicit

-
-

If you successfully started tritonserver, you should see a log similar to following:

-
+----------+--------------------------------+--------------------------------+
-| Backend  | Path                           | Config                         |
-+----------+--------------------------------+--------------------------------+
-| tensorrt | /opt/tritonserver/backends/ten | {"cmdline":{"auto-complete-con |
-|          | sorrt/libtriton_tensorrt.so    | fig":"true","min-compute-capab |
-|          |                                | ility":"6.000000","backend-dir |
-|          |                                | ectory":"/opt/tritonserver/bac |
-|          |                                | kends","default-max-batch-size |
-|          |                                | ":"4"}}                        |
-|          |                                |                                |
-+----------+--------------------------------+--------------------------------+
-
-+-----------------------+---------+--------+
-| Model                 | Version | Status |
-+-----------------------+---------+--------+
-| dlrm_hugectr_with_hps | 1       | READY  |
-+-----------------------+---------+--------+
-
-
-

We can then send the requests to the Triton inference server using the HTTP client.

-
-
-
import os
-import shutil
-import numpy as np
-import tritonclient.http as httpclient
-from tritonclient.utils import *
-
-BATCH_SIZE = 1024
-
-categorical_feature = np.random.randint(0,260000,size=(BATCH_SIZE,26)).astype(np.int32)
-numerical_feature = np.random.random((BATCH_SIZE, 13)).astype(np.float32)
-
-inputs = [
-    httpclient.InferInput("categorical_features", 
-                          categorical_feature.shape,
-                          np_to_triton_dtype(np.int32)),
-    httpclient.InferInput("numerical_features", 
-                          numerical_feature.shape,
-                          np_to_triton_dtype(np.float32)),                          
-]
-inputs[0].set_data_from_numpy(categorical_feature)
-inputs[1].set_data_from_numpy(numerical_feature)
-
-
-outputs = [
-    httpclient.InferRequestedOutput("label")
-]
-
-model_name = "dlrm_hugectr_with_hps"
-
-with httpclient.InferenceServerClient("localhost:8000") as client:
-    response = client.infer(model_name,
-                            inputs,
-                            outputs=outputs)
-    result = response.get_response()
-    
-    print("Prediction result is \n{}".format(response.as_numpy("label")))
-    print("Response details:\n{}".format(result))
-
-
-
-
-
Prediction result is 
-[[1.        ]
- [0.49642828]
- [0.52846366]
- ...
- [0.99999994]
- [0.9999992 ]
- [0.9999905 ]]
-Response details:
-{'model_name': 'dlrm_hugectr_with_hps', 'model_version': '1', 'outputs': [{'name': 'label', 'datatype': 'FP32', 'shape': [1024, 1], 'parameters': {'binary_data_size': 4096}}]}
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_trt/notebooks/demo_for_pytorch_trained_model.html b/review/pr-458/hps_trt/notebooks/demo_for_pytorch_trained_model.html deleted file mode 100644 index e86f0d7ca1..0000000000 --- a/review/pr-458/hps_trt/notebooks/demo_for_pytorch_trained_model.html +++ /dev/null @@ -1,879 +0,0 @@ - - - - - - - HPS TensorRT Plugin Demo for PyTorch Trained Model — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-tensorflow-triton-deployment/nvidia_logo.png -
-

HPS TensorRT Plugin Demo for PyTorch Trained Model

-
-

Overview

-

This notebook demonstrates how to build and deploy the HPS-integrated TensorRT engine for the model trained with PyTorch.

-

For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Use NGC

-

The HPS TensorRT plugin is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
-
-
import ctypes
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-plugin_handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use DLRM model which has one embedding table, bottom MLP layers, interaction layer and top MLP layers. Please note that the input to the embedding layer will be a dense key tensor of int32.

-
-
-
import os
-import numpy as np
-import torch
-from torch.utils.data import DataLoader
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                               # the number of available GPUs
-args["iter_num"] = 50                             # the number of training iteration
-args["slot_num"] = 26                             # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 128                      # the dimension of embedding vectors
-args["dense_dim"] = 13                            # the dimension of dense features
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 260000
-args["vocabulary_range_per_slot"] = [[i*10000, (i+1)*10000] for i in range(26)]
-args["combiner"] = "mean"
-
-args["ps_config_file"] = "dlrm_pytorch.json"
-args["embedding_table_path"] = "dlrm_pytorch_sparse.model"
-args["onnx_path"] = "dlrm_pytorch.onnx"
-args["modified_onnx_path"] = "dlrm_pytorch_with_hps.onnx"
-args["np_key_type"] = np.int32
-args["np_vector_type"] = np.float32
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
-  from .autonotebook import tqdm as notebook_tqdm
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, dense_dim, key_dtype = args["np_key_type"]):
-    keys = list()
-    for vocab_range in vocabulary_range_per_slot:
-        keys_per_slot = np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(num_samples, 1), dtype=key_dtype)
-        keys.append(keys_per_slot)
-    keys = np.concatenate(np.array(keys), axis = 1)
-    numerical_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return keys, numerical_features, labels
-
-
-
-
-
-
-

Train with PyTorch

-

We define the model graph for training with native PyTorch layers, i.e., torch.nn.Embedding, torch.nn.Linear and so on. We can then train the model and extract the trained weights of the embedding table.

-
-
-
import torch
-
-class MLP(torch.nn.Module):
-    def __init__(self,
-                arch,
-                name,
-                out_activation,
-                **kwargs):
-        super(MLP, self).__init__(**kwargs)
-        self.mlp = torch.nn.Sequential()
-        
-        for idx in range(1, len(arch)-1):
-            self.mlp.add_module(name + "_linear_layer_%d" % idx, torch.nn.Linear(arch[idx-1], arch[idx]))
-            self.mlp.add_module(name + "_relu_layer_%d" % idx, torch.nn.ReLU(inplace=True))
-            
-        idx = len(arch) - 1
-        if out_activation == "relu":
-            self.mlp.add_module(name + "_linear_layer_%d" % idx, torch.nn.Linear(arch[idx-1], arch[idx]))
-            self.mlp.add_module(name + "_relu_layer_%d" % idx, torch.nn.ReLU(inplace=True))
-        elif out_activation == "sigmoid":
-            self.mlp.add_module(name + "_linear_layer_%d" % idx, torch.nn.Linear(arch[idx-1], arch[idx]))
-            self.mlp.add_module(name + "_relu_layer_%d" % idx, torch.nn.Sigmoid())
-
-    def forward(self, x):
-        y = self.mlp(x)
-        return y
-
-    
-class SecondOrderFeatureInteraction(torch.nn.Module):
-    def __init__(self):
-        super(SecondOrderFeatureInteraction, self).__init__()
-
-    def forward(self, inputs, num_feas):
-        dot_products = torch.reshape(torch.matmul(inputs, torch.transpose(inputs, 1, 2)), (-1, num_feas * num_feas))
-        indices = torch.tensor([i * num_feas + j for j in range(1, num_feas) for i in range(j)])
-        flat_interactions = torch.index_select(dot_products, 1, indices)
-        return flat_interactions    
-
-class DLRM(torch.nn.Module):
-    def __init__(self,
-                 init_tensors,
-                 embed_vec_size,
-                 slot_num,
-                 dense_dim,
-                 arch_bot,
-                 arch_top,
-                 **kwargs):
-        
-        super(DLRM, self).__init__()
-        self.embedding = torch.nn.Embedding.from_pretrained(init_tensors, freeze=False)
-        
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.dense_dim = dense_dim
-        self.arch_bot = arch_bot
-        self.arch_top = arch_top
-
-        self.bot_mlp = MLP([self.dense_dim] + arch_bot, name = "bottom", out_activation='relu')
-        self.interaction_layer = SecondOrderFeatureInteraction()
-        self.interaction_out_dim = self.slot_num * (self.slot_num+1) // 2
-        self.top_mlp = MLP([self.interaction_out_dim + self.arch_bot[-1]] + arch_top, name = "top", out_activation='sigmoid')
-    
-    def forward(self, inputs):
-        categorical_features = inputs[0]
-        numerical_features = inputs[1]
-        
-        embedding_vector = self.embedding(categorical_features)
-        dense_x = self.bot_mlp(numerical_features)
-        
-        concat_features = torch.concat([embedding_vector, torch.reshape(dense_x, (-1, 1, self.arch_bot[-1]))], 1)
-        
-        Z = self.interaction_layer(concat_features, self.slot_num+1)
-        z = torch.concat([dense_x, Z], 1)
-        logit = self.top_mlp(z)
-        return logit
-
-
-
-
-
-
-
def train(args):
-    init_tensors = torch.Tensor(np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"]))
-    
-    model = DLRM(init_tensors, args["embed_vec_size"], args["slot_num"], args["dense_dim"],
-                arch_bot = [512, 256, args["embed_vec_size"]],
-                arch_top = [1024, 1024, 512, 256, 1])
-
-    print(model)
-
-    criterion = torch.nn.BCELoss()
-    optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
-    
-    keys, numerical_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["dense_dim"], args["np_key_type"])
-    x0_iterator = iter(DataLoader(torch.from_numpy(keys), batch_size=args["global_batch_size"], shuffle=True, num_workers=0, pin_memory=False, drop_last=False))
-    x1_iterator = iter(DataLoader(torch.from_numpy(numerical_features), batch_size=args["global_batch_size"], shuffle=True, num_workers=0, pin_memory=False, drop_last=False))
-    y_iterator = iter(DataLoader(torch.from_numpy(labels).float(), batch_size=args["global_batch_size"], shuffle=True, num_workers=0, pin_memory=False, drop_last=False))
-    
-    
-    for i in range(args["iter_num"]):
-        inputs = [next(x0_iterator), next(x1_iterator)]
-        labels = next(y_iterator)
-        preds = model(inputs)
-        loss = criterion(preds.squeeze(), labels.squeeze())
-        
-        optimizer.zero_grad()
-        loss.backward()
-        optimizer.step()
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-embedding_weights = trained_model.state_dict()["embedding.weight"]
-print(embedding_weights)
-
-
-
-
-
DLRM(
-  (embedding): Embedding(260000, 128)
-  (bot_mlp): MLP(
-    (mlp): Sequential(
-      (bottom_linear_layer_1): Linear(in_features=13, out_features=512, bias=True)
-      (bottom_relu_layer_1): ReLU(inplace=True)
-      (bottom_linear_layer_2): Linear(in_features=512, out_features=256, bias=True)
-      (bottom_relu_layer_2): ReLU(inplace=True)
-      (bottom_linear_layer_3): Linear(in_features=256, out_features=128, bias=True)
-      (bottom_relu_layer_3): ReLU(inplace=True)
-    )
-  )
-  (interaction_layer): SecondOrderFeatureInteraction()
-  (top_mlp): MLP(
-    (mlp): Sequential(
-      (top_linear_layer_1): Linear(in_features=479, out_features=1024, bias=True)
-      (top_relu_layer_1): ReLU(inplace=True)
-      (top_linear_layer_2): Linear(in_features=1024, out_features=1024, bias=True)
-      (top_relu_layer_2): ReLU(inplace=True)
-      (top_linear_layer_3): Linear(in_features=1024, out_features=512, bias=True)
-      (top_relu_layer_3): ReLU(inplace=True)
-      (top_linear_layer_4): Linear(in_features=512, out_features=256, bias=True)
-      (top_relu_layer_4): ReLU(inplace=True)
-      (top_linear_layer_5): Linear(in_features=256, out_features=1, bias=True)
-      (top_relu_layer_5): Sigmoid()
-    )
-  )
-)
--------------------- Step 0, loss: 1.1652954816818237 --------------------
--------------------- Step 1, loss: 1.7626148462295532 --------------------
--------------------- Step 2, loss: 1.1845550537109375 --------------------
--------------------- Step 3, loss: 0.7347715497016907 --------------------
--------------------- Step 4, loss: 1.0786197185516357 --------------------
--------------------- Step 5, loss: 0.9271171689033508 --------------------
--------------------- Step 6, loss: 0.7060756683349609 --------------------
--------------------- Step 7, loss: 0.7490934133529663 --------------------
--------------------- Step 8, loss: 0.8274499773979187 --------------------
--------------------- Step 9, loss: 0.7962949275970459 --------------------
--------------------- Step 10, loss: 0.6947690844535828 --------------------
--------------------- Step 11, loss: 0.7241608500480652 --------------------
--------------------- Step 12, loss: 0.7649394869804382 --------------------
--------------------- Step 13, loss: 0.7043794393539429 --------------------
--------------------- Step 14, loss: 0.6948238611221313 --------------------
--------------------- Step 15, loss: 0.7003152370452881 --------------------
--------------------- Step 16, loss: 0.7330600619316101 --------------------
--------------------- Step 17, loss: 0.711887001991272 --------------------
--------------------- Step 18, loss: 0.6917610168457031 --------------------
--------------------- Step 19, loss: 0.7227296233177185 --------------------
--------------------- Step 20, loss: 0.7232402563095093 --------------------
--------------------- Step 21, loss: 0.7025701999664307 --------------------
--------------------- Step 22, loss: 0.6962350010871887 --------------------
--------------------- Step 23, loss: 0.7100769281387329 --------------------
--------------------- Step 24, loss: 0.7159318923950195 --------------------
--------------------- Step 25, loss: 0.6963521242141724 --------------------
--------------------- Step 26, loss: 0.7058508396148682 --------------------
--------------------- Step 27, loss: 0.7144895792007446 --------------------
--------------------- Step 28, loss: 0.7082542181015015 --------------------
--------------------- Step 29, loss: 0.6955724954605103 --------------------
--------------------- Step 30, loss: 0.6997341513633728 --------------------
--------------------- Step 31, loss: 0.7167338132858276 --------------------
--------------------- Step 32, loss: 0.6962475776672363 --------------------
--------------------- Step 33, loss: 0.6955674290657043 --------------------
--------------------- Step 34, loss: 0.7098587155342102 --------------------
--------------------- Step 35, loss: 0.6992183327674866 --------------------
--------------------- Step 36, loss: 0.6928209066390991 --------------------
--------------------- Step 37, loss: 0.6933107972145081 --------------------
--------------------- Step 38, loss: 0.697549045085907 --------------------
--------------------- Step 39, loss: 0.6969214677810669 --------------------
--------------------- Step 40, loss: 0.6935250163078308 --------------------
--------------------- Step 41, loss: 0.6948344111442566 --------------------
--------------------- Step 42, loss: 0.7015650868415833 --------------------
--------------------- Step 43, loss: 0.6928752660751343 --------------------
--------------------- Step 44, loss: 0.6936203837394714 --------------------
--------------------- Step 45, loss: 0.6962599158287048 --------------------
--------------------- Step 46, loss: 0.6941655278205872 --------------------
--------------------- Step 47, loss: 0.6939643025398254 --------------------
--------------------- Step 48, loss: 0.6933950185775757 --------------------
--------------------- Step 49, loss: 0.6970551013946533 --------------------
-tensor([[1.0014, 1.0014, 1.0014,  ..., 1.0014, 1.0014, 1.0014],
-        [0.9997, 0.9997, 0.9997,  ..., 0.9997, 0.9997, 0.9997],
-        [0.9991, 0.9991, 0.9991,  ..., 0.9991, 0.9991, 0.9991],
-        ...,
-        [1.0004, 1.0004, 1.0005,  ..., 1.0004, 1.0004, 1.0004],
-        [1.0001, 1.0001, 1.0001,  ..., 1.0001, 1.0001, 1.0001],
-        [1.0002, 1.0002, 1.0002,  ..., 1.0002, 1.0002, 1.0002]])
-
-
-
-
-
-
-

Build the HPS-integrated TensorRT engine

-

In order to use HPS in the inference stage, we need to convert the embedding weights to the formats required by HPS first and create JSON configuration file for HPS.

-

Then we convert the PyTorch model to ONNX, and employ the ONNX GraphSurgoen tool to replace the native PyTorch embedding lookup layer with the placeholder of HPS TensorRT plugin layer.

-

After that, we can build the TensorRT engine, which is comprised of the HPS TensorRT plugin layer and the dense network.

-
-

Step1: Prepare sparse model and JSON configuration file for HPS

-

Please note that the storage format in the dlrm_pytorch_sparse.model/key file is int64, while the HPS TensorRT plugin currently only support int32 when loading the keys into memory. There is no overflow since the key value range is 0~260000.

-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights.numpy(), args["embedding_table_path"], args["embed_vec_size"])
-
-
-
-
-
-
-
%%writefile dlrm_pytorch.json
-{
-    "supportlonglong": false,
-    "models": [{
-        "model": "dlrm",
-        "sparse_files": ["dlrm_pytorch_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [128],
-        "maxnum_catfeature_query_per_table_per_sample": [26],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing dlrm_pytorch.json
-
-
-
-
-
-
-

Step2: Convert to ONNX and do ONNX graph surgery

-
-
-
dummy_keys = torch.randint(0, args["max_vocabulary_size"], (args["global_batch_size"], args["slot_num"]), dtype=torch.int32)
-dummy_numerical_features = torch.randn(args["global_batch_size"], args["dense_dim"])
-torch.onnx.export(trained_model, 
-                  [dummy_keys, dummy_numerical_features],
-                  args["onnx_path"], 
-                  verbose = True, 
-                  input_names = ["keys", "numerical_features"], 
-                  output_names = ["output"], 
-                  dynamic_axes = {'keys' : {0 : 'batch_size'}, 'numerical_features' : {0 : 'batch_size'}}
-                 )
-
-
-
-
-
/tmp/ipykernel_52545/1281679600.py:35: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
-  indices = torch.tensor([i * num_feas + j for j in range(1, num_feas) for i in range(j)])
-
-
-
Exported graph: graph(%keys : Int(*, 26, strides=[26, 1], requires_grad=0, device=cpu),
-      %numerical_features : Float(*, 13, strides=[13, 1], requires_grad=0, device=cpu),
-      %embedding.weight : Float(260000, 128, strides=[128, 1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_1.weight : Float(512, 13, strides=[13, 1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_1.bias : Float(512, strides=[1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_2.weight : Float(256, 512, strides=[512, 1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_3.weight : Float(128, 256, strides=[256, 1], requires_grad=1, device=cpu),
-      %bot_mlp.mlp.bottom_linear_layer_3.bias : Float(128, strides=[1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_1.weight : Float(1024, 479, strides=[479, 1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_2.weight : Float(1024, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_2.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_3.weight : Float(512, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_3.bias : Float(512, strides=[1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_4.weight : Float(256, 512, strides=[512, 1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_4.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_5.weight : Float(1, 256, strides=[256, 1], requires_grad=1, device=cpu),
-      %top_mlp.mlp.top_linear_layer_5.bias : Float(1, strides=[1], requires_grad=1, device=cpu)):
-  %/embedding/Gather_output_0 : Float(*, 26, 128, strides=[3328, 128, 1], requires_grad=1, device=cpu) = onnx::Gather[onnx_name="/embedding/Gather"](%embedding.weight, %keys), scope: __main__.DLRM::/torch.nn.modules.sparse.Embedding::embedding # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2206:0
-  %/bot_mlp/mlp/bottom_linear_layer_1/Gemm_output_0 : Float(*, 512, strides=[512, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/bot_mlp/mlp/bottom_linear_layer_1/Gemm"](%numerical_features, %bot_mlp.mlp.bottom_linear_layer_1.weight, %bot_mlp.mlp.bottom_linear_layer_1.bias), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::bottom_linear_layer_1 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/bot_mlp/mlp/bottom_relu_layer_1/Relu_output_0 : Float(*, 512, strides=[512, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/bot_mlp/mlp/bottom_relu_layer_1/Relu"](%/bot_mlp/mlp/bottom_linear_layer_1/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::bottom_relu_layer_1 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/bot_mlp/mlp/bottom_linear_layer_2/Gemm_output_0 : Float(*, 256, strides=[256, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/bot_mlp/mlp/bottom_linear_layer_2/Gemm"](%/bot_mlp/mlp/bottom_relu_layer_1/Relu_output_0, %bot_mlp.mlp.bottom_linear_layer_2.weight, %bot_mlp.mlp.bottom_linear_layer_2.bias), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::bottom_linear_layer_2 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/bot_mlp/mlp/bottom_relu_layer_2/Relu_output_0 : Float(*, 256, strides=[256, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/bot_mlp/mlp/bottom_relu_layer_2/Relu"](%/bot_mlp/mlp/bottom_linear_layer_2/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::bottom_relu_layer_2 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/bot_mlp/mlp/bottom_linear_layer_3/Gemm_output_0 : Float(*, 128, strides=[128, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/bot_mlp/mlp/bottom_linear_layer_3/Gemm"](%/bot_mlp/mlp/bottom_relu_layer_2/Relu_output_0, %bot_mlp.mlp.bottom_linear_layer_3.weight, %bot_mlp.mlp.bottom_linear_layer_3.bias), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::bottom_linear_layer_3 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/bot_mlp/mlp/bottom_relu_layer_3/Relu_output_0 : Float(*, 128, strides=[128, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/bot_mlp/mlp/bottom_relu_layer_3/Relu"](%/bot_mlp/mlp/bottom_linear_layer_3/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::bot_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::bottom_relu_layer_3 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/Constant_output_0 : Long(3, strides=[1], device=cpu) = onnx::Constant[value=  -1    1  128 [ CPULongType{3} ], onnx_name="/Constant"](), scope: __main__.DLRM:: # /tmp/ipykernel_52545/1281679600.py:70:0
-  %/Reshape_output_0 : Float(*, *, *, strides=[128, 128, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0, onnx_name="/Reshape"](%/bot_mlp/mlp/bottom_relu_layer_3/Relu_output_0, %/Constant_output_0), scope: __main__.DLRM:: # /tmp/ipykernel_52545/1281679600.py:70:0
-  %/Concat_output_0 : Float(*, *, 128, strides=[3456, 128, 1], requires_grad=1, device=cpu) = onnx::Concat[axis=1, onnx_name="/Concat"](%/embedding/Gather_output_0, %/Reshape_output_0), scope: __main__.DLRM:: # /tmp/ipykernel_52545/1281679600.py:70:0
-  %/interaction_layer/Transpose_output_0 : Float(*, 128, *, strides=[3456, 1, 128], requires_grad=1, device=cpu) = onnx::Transpose[perm=[0, 2, 1], onnx_name="/interaction_layer/Transpose"](%/Concat_output_0), scope: __main__.DLRM::/__main__.SecondOrderFeatureInteraction::interaction_layer # /tmp/ipykernel_52545/1281679600.py:34:0
-  %/interaction_layer/MatMul_output_0 : Float(*, *, *, strides=[729, 27, 1], requires_grad=1, device=cpu) = onnx::MatMul[onnx_name="/interaction_layer/MatMul"](%/Concat_output_0, %/interaction_layer/Transpose_output_0), scope: __main__.DLRM::/__main__.SecondOrderFeatureInteraction::interaction_layer # /tmp/ipykernel_52545/1281679600.py:34:0
-  %/interaction_layer/Constant_output_0 : Long(2, strides=[1], device=cpu) = onnx::Constant[value=  -1  729 [ CPULongType{2} ], onnx_name="/interaction_layer/Constant"](), scope: __main__.DLRM::/__main__.SecondOrderFeatureInteraction::interaction_layer # /tmp/ipykernel_52545/1281679600.py:34:0
-  %/interaction_layer/Reshape_output_0 : Float(*, *, strides=[729, 1], requires_grad=1, device=cpu) = onnx::Reshape[allowzero=0, onnx_name="/interaction_layer/Reshape"](%/interaction_layer/MatMul_output_0, %/interaction_layer/Constant_output_0), scope: __main__.DLRM::/__main__.SecondOrderFeatureInteraction::interaction_layer # /tmp/ipykernel_52545/1281679600.py:34:0
-  %onnx::Gather_33 : Long(351, strides=[1], requires_grad=0, device=cpu) = onnx::Constant[value=<Tensor>]()
-  %/interaction_layer/Gather_output_0 : Float(*, 351, strides=[351, 1], requires_grad=1, device=cpu) = onnx::Gather[axis=1, onnx_name="/interaction_layer/Gather"](%/interaction_layer/Reshape_output_0, %onnx::Gather_33), scope: __main__.DLRM::/__main__.SecondOrderFeatureInteraction::interaction_layer # /tmp/ipykernel_52545/1281679600.py:36:0
-  %/Concat_1_output_0 : Float(*, 479, strides=[479, 1], requires_grad=1, device=cpu) = onnx::Concat[axis=1, onnx_name="/Concat_1"](%/bot_mlp/mlp/bottom_relu_layer_3/Relu_output_0, %/interaction_layer/Gather_output_0), scope: __main__.DLRM:: # /tmp/ipykernel_52545/1281679600.py:73:0
-  %/top_mlp/mlp/top_linear_layer_1/Gemm_output_0 : Float(*, 1024, strides=[1024, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/top_mlp/mlp/top_linear_layer_1/Gemm"](%/Concat_1_output_0, %top_mlp.mlp.top_linear_layer_1.weight, %top_mlp.mlp.top_linear_layer_1.bias), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::top_linear_layer_1 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/top_mlp/mlp/top_relu_layer_1/Relu_output_0 : Float(*, 1024, strides=[1024, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/top_mlp/mlp/top_relu_layer_1/Relu"](%/top_mlp/mlp/top_linear_layer_1/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::top_relu_layer_1 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/top_mlp/mlp/top_linear_layer_2/Gemm_output_0 : Float(*, 1024, strides=[1024, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/top_mlp/mlp/top_linear_layer_2/Gemm"](%/top_mlp/mlp/top_relu_layer_1/Relu_output_0, %top_mlp.mlp.top_linear_layer_2.weight, %top_mlp.mlp.top_linear_layer_2.bias), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::top_linear_layer_2 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/top_mlp/mlp/top_relu_layer_2/Relu_output_0 : Float(*, 1024, strides=[1024, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/top_mlp/mlp/top_relu_layer_2/Relu"](%/top_mlp/mlp/top_linear_layer_2/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::top_relu_layer_2 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/top_mlp/mlp/top_linear_layer_3/Gemm_output_0 : Float(*, 512, strides=[512, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/top_mlp/mlp/top_linear_layer_3/Gemm"](%/top_mlp/mlp/top_relu_layer_2/Relu_output_0, %top_mlp.mlp.top_linear_layer_3.weight, %top_mlp.mlp.top_linear_layer_3.bias), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::top_linear_layer_3 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/top_mlp/mlp/top_relu_layer_3/Relu_output_0 : Float(*, 512, strides=[512, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/top_mlp/mlp/top_relu_layer_3/Relu"](%/top_mlp/mlp/top_linear_layer_3/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::top_relu_layer_3 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/top_mlp/mlp/top_linear_layer_4/Gemm_output_0 : Float(*, 256, strides=[256, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/top_mlp/mlp/top_linear_layer_4/Gemm"](%/top_mlp/mlp/top_relu_layer_3/Relu_output_0, %top_mlp.mlp.top_linear_layer_4.weight, %top_mlp.mlp.top_linear_layer_4.bias), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::top_linear_layer_4 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %/top_mlp/mlp/top_relu_layer_4/Relu_output_0 : Float(*, 256, strides=[256, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="/top_mlp/mlp/top_relu_layer_4/Relu"](%/top_mlp/mlp/top_linear_layer_4/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.ReLU::top_relu_layer_4 # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1455:0
-  %/top_mlp/mlp/top_linear_layer_5/Gemm_output_0 : Float(*, 1, strides=[1, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/top_mlp/mlp/top_linear_layer_5/Gemm"](%/top_mlp/mlp/top_relu_layer_4/Relu_output_0, %top_mlp.mlp.top_linear_layer_5.weight, %top_mlp.mlp.top_linear_layer_5.bias), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.linear.Linear::top_linear_layer_5 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py:114:0
-  %output : Float(*, 1, strides=[1, 1], requires_grad=1, device=cpu) = onnx::Sigmoid[onnx_name="/top_mlp/mlp/top_relu_layer_5/Sigmoid"](%/top_mlp/mlp/top_linear_layer_5/Gemm_output_0), scope: __main__.DLRM::/__main__.MLP::top_mlp/torch.nn.modules.container.Sequential::mlp/torch.nn.modules.activation.Sigmoid::top_relu_layer_5 # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/activation.py:294:0
-  return (%output)
-
-
-
-
-
-
-
# ONNX graph surgery to insert HPS the TensorRT plugin placeholder
-import onnx_graphsurgeon as gs
-from onnx import  shape_inference
-import numpy as np
-import onnx
-
-graph = gs.import_onnx(onnx.load("dlrm_pytorch.onnx"))
-saved = []
-
-for node in graph.nodes:
-    if node.name == "/embedding/Gather":
-        categorical_features = gs.Variable(name="categorical_features", dtype=np.int32, shape=("unknown", 26))
-        hps_node = gs.Node(op="HPS_TRT", attrs={"ps_config_file": "dlrm_pytorch.json\0", "model_name": "dlrm\0", "table_id": 0, "emb_vec_size": 128}, 
-                           inputs=[categorical_features], outputs=[node.outputs[0]])
-        graph.nodes.append(hps_node)
-        saved.append(categorical_features)
-        node.outputs.clear()
-for i in graph.inputs:
-    if i.name == "numerical_features":
-        saved.append(i)
-graph.inputs = saved
-
-graph.cleanup().toposort()
-onnx.save(gs.export_onnx(graph), "dlrm_pytorch_with_hps.onnx")
-
-
-
-
-
-
-

Step3: Build the TensorRT engine

-
-
-
# build the TensorRT engine based on dlrm_pytorch_with_hps.onnx
-import tensorrt as trt
-import ctypes
-
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-TRT_LOGGER = trt.Logger(trt.Logger.INFO)
-EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
-
-def build_engine_from_onnx(onnx_model_path):
-    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser, builder.create_builder_config() as builder_config:        
-        model = open(onnx_model_path, 'rb')
-        parser.parse(model.read())
-
-        profile = builder.create_optimization_profile()        
-        profile.set_shape("categorical_features", (1, 26), (1024, 26), (1024, 26))    
-        profile.set_shape("numerical_features", (1, 13), (1024, 13), (1024, 13))
-        builder_config.add_optimization_profile(profile)
-        engine = builder.build_serialized_network(network, builder_config)
-        return engine
-
-serialized_engine = build_engine_from_onnx("dlrm_pytorch_with_hps.onnx")
-with open("dlrm_pytorch_with_hps.trt", "wb") as fout:
-    fout.write(serialized_engine)
-print("Successfully build the TensorRT engine")
-
-
-
-
-
[01/03/2023-07:25:18] [TRT] [I] [MemUsageChange] Init CUDA: CPU +268, GPU +0, now: CPU 1035, GPU 497 (MiB)
-[01/03/2023-07:25:20] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +170, GPU +46, now: CPU 1259, GPU 543 (MiB)
-[01/03/2023-07:25:20] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
-[01/03/2023-07:25:20] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
-[01/03/2023-07:25:20] [TRT] [I] No importer registered for op: HPS_TRT. Attempting to import as plugin.
-[01/03/2023-07:25:20] [TRT] [I] Searching for plugin: HPS_TRT, plugin_version: 1, plugin_namespace: 
-=====================================================HPS Parse====================================================
-[HCTR][07:25:20.652][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][07:25:20.652][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][07:25:20.652][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][07:25:20.652][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][07:25:20.652][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][07:25:20.652][INFO][RK0][main]: use_static_table is not specified using default: 0
-====================================================HPS Create====================================================
-[HCTR][07:25:20.653][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][07:25:20.653][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][07:25:20.653][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:25:20.653][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:25:20.653][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:25:20.653][INFO][RK0][main]: Using Local file system backend.
-[HCTR][07:25:22.209][INFO][RK0][main]: Table: hps_et.dlrm.sparse_embedding0; cached 260000 / 260000 embeddings in volatile database (HashMapBackend); load: 260000 / 18446744073709551615 (0.00%).
-[HCTR][07:25:22.220][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:25:22.220][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:25:22.227][INFO][RK0][main]: Model name: dlrm
-[HCTR][07:25:22.227][INFO][RK0][main]: Max batch size: 1024
-[HCTR][07:25:22.227][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][07:25:22.227][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][07:25:22.227][INFO][RK0][main]: Use static table: False
-[HCTR][07:25:22.227][INFO][RK0][main]: Use I64 input key: False
-[HCTR][07:25:22.227][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:25:22.227][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:25:22.227][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][07:25:22.227][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:25:22.227][INFO][RK0][main]: The refresh percentage : 0.200000
-[HCTR][07:25:22.280][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:25:22.280][INFO][RK0][main]: Using Local file system backend.
-[HCTR][07:25:22.408][INFO][RK0][main]: EC initialization for model: "dlrm", num_tables: 1
-[HCTR][07:25:22.408][INFO][RK0][main]: EC initialization on device: 0
-[HCTR][07:25:22.433][INFO][RK0][main]: Creating lookup session for dlrm on device: 0
-[01/03/2023-07:25:22] [TRT] [I] Successfully created plugin: HPS_TRT
-[01/03/2023-07:25:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +331, GPU +144, now: CPU 5771, GPU 933 (MiB)
-[01/03/2023-07:25:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +115, GPU +54, now: CPU 5886, GPU 987 (MiB)
-[01/03/2023-07:25:23] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
-[01/03/2023-07:26:27] [TRT] [I] Total Activation Memory: 34103362048
-[01/03/2023-07:26:27] [TRT] [I] Detected 2 inputs and 1 output network tensors.
-[01/03/2023-07:26:27] [TRT] [I] Total Host Persistent Memory: 416
-[01/03/2023-07:26:27] [TRT] [I] Total Device Persistent Memory: 0
-[01/03/2023-07:26:27] [TRT] [I] Total Scratch Memory: 45142016
-[01/03/2023-07:26:27] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 75 MiB
-[01/03/2023-07:26:27] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
-[01/03/2023-07:26:27] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.011619ms to assign 3 blocks to 3 nodes requiring 58774016 bytes.
-[01/03/2023-07:26:27] [TRT] [I] Total Activation Memory: 58774016
-[01/03/2023-07:26:27] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5933, GPU 1035 (MiB)
-[01/03/2023-07:26:27] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 5933, GPU 1043 (MiB)
-[01/03/2023-07:26:27] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +32, now: CPU 0, GPU 32 (MiB)
-Successfully build the TensorRT engine
-
-
-
-
-
-
-
-

Deploy HPS-integrated TensorRT engine on Triton

-

In order to deploy the TensorRT engine with the Triton TensorRT backend, we need to create the model repository and define the config.pbtxt first.

-
-
-
!mkdir -p model_repo/dlrm_pytorch_with_hps/1
-!mv dlrm_pytorch_with_hps.trt model_repo/dlrm_pytorch_with_hps/1
-
-
-
-
-
-
-
%%writefile model_repo/dlrm_pytorch_with_hps/config.pbtxt
-
-platform: "tensorrt_plan"
-default_model_filename: "dlrm_pytorch_with_hps.trt"
-backend: "tensorrt"
-max_batch_size: 0
-input [
-  {
-    name: "categorical_features"
-    data_type: TYPE_INT32
-    dims: [-1,26]
-  },
-  {
-    name: "numerical_features"
-    data_type: TYPE_FP32
-    dims: [-1,13]
-  }
-]
-output [
-  {
-      name: "output"
-      data_type: TYPE_FP32
-      dims: [-1,1]
-  }
-]
-instance_group [
-  {
-    count: 1
-    kind: KIND_GPU
-    gpus:[0]
-
-  }
-]
-
-
-
-
-
Writing model_repo/dlrm_pytorch_with_hps/config.pbtxt
-
-
-
-
-
-
-
!tree model_repo/dlrm_pytorch_with_hps
-
-
-
-
-
model_repo/dlrm_pytorch_with_hps
-├── 1
-│   └── dlrm_pytorch_with_hps.trt
-└── config.pbtxt
-
-1 directory, 2 files
-
-
-
-
-

We can then launch the Triton inference server using the TensorRT backend. Please note that LD_PRELOAD is utilized to load the custom TensorRT plugin (i.e., HPS TensorRT plugin) into Triton.

-

Note: Since Background processes not supported by Jupyter, please launch the Triton Server according to the following command independently in the background.

-
-

LD_PRELOAD=/usr/local/hps_trt/lib/libhps_plugin.so tritonserver –model-repository=/hugectr/hps_trt/notebooks/model_repo/ –load-model=dlrm_pytorch_with_hps –model-control-mode=explicit

-
-

If you successfully started tritonserver, you should see a log similar to following:

-
+----------+--------------------------------+--------------------------------+
-| Backend  | Path                           | Config                         |
-+----------+--------------------------------+--------------------------------+
-| tensorrt | /opt/tritonserver/backends/ten | {"cmdline":{"auto-complete-con |
-|          | sorrt/libtriton_tensorrt.so    | fig":"true","min-compute-capab |
-|          |                                | ility":"6.000000","backend-dir |
-|          |                                | ectory":"/opt/tritonserver/bac |
-|          |                                | kends","default-max-batch-size |
-|          |                                | ":"4"}}                        |
-|          |                                |                                |
-+----------+--------------------------------+--------------------------------+
-
-
-+-----------------------+---------+--------+
-| Model                 | Version | Status |
-+-----------------------+---------+--------+
-| dlrm_pytorch_with_hps | 1       | READY  |
-+-----------------------+---------+--------+
-
-
-

We can then send the requests to the Triton inference server using the HTTP client.

-
-
-
import os
-import shutil
-import numpy as np
-import tritonclient.http as httpclient
-from tritonclient.utils import *
-
-BATCH_SIZE = 1024
-
-categorical_feature = np.random.randint(0,260000,size=(BATCH_SIZE,26)).astype(np.int32)
-numerical_feature = np.random.random((BATCH_SIZE, 13)).astype(np.float32)
-
-inputs = [
-    httpclient.InferInput("categorical_features", 
-                          categorical_feature.shape,
-                          np_to_triton_dtype(np.int32)),
-    httpclient.InferInput("numerical_features", 
-                          numerical_feature.shape,
-                          np_to_triton_dtype(np.float32)),                          
-]
-inputs[0].set_data_from_numpy(categorical_feature)
-inputs[1].set_data_from_numpy(numerical_feature)
-
-
-outputs = [
-    httpclient.InferRequestedOutput("output")
-]
-
-model_name = "dlrm_pytorch_with_hps"
-
-with httpclient.InferenceServerClient("localhost:8000") as client:
-    response = client.infer(model_name,
-                            inputs,
-                            outputs=outputs)
-    result = response.get_response()
-    
-    print("Prediction result is \n{}".format(response.as_numpy("output")))
-    print("Response details:\n{}".format(result))
-
-
-
-
-
Prediction result is 
-[[0.5128022 ]
- [0.51312006]
- [0.51246136]
- ...
- [0.5129204 ]
- [0.51302147]
- [0.513144  ]]
-Response details:
-{'model_name': 'dlrm_pytorch_with_hps', 'model_version': '1', 'outputs': [{'name': 'output', 'datatype': 'FP32', 'shape': [1024, 1], 'parameters': {'binary_data_size': 4096}}]}
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_trt/notebooks/demo_for_tf_trained_model.html b/review/pr-458/hps_trt/notebooks/demo_for_tf_trained_model.html deleted file mode 100644 index 49ae1cdc66..0000000000 --- a/review/pr-458/hps_trt/notebooks/demo_for_tf_trained_model.html +++ /dev/null @@ -1,946 +0,0 @@ - - - - - - - HPS TensorRT Plugin Demo for TensorFlow Trained Model — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-hps-tensorflow-triton-deployment/nvidia_logo.png -
-

HPS TensorRT Plugin Demo for TensorFlow Trained Model

-
-

Overview

-

This notebook demonstrates how to build and deploy the HPS-integrated TensorRT engine for the model trained with TensorFlow.

-

For more details about HPS, please refer to HugeCTR Hierarchical Parameter Server (HPS).

-
-
-

Installation

-
-

Use NGC

-

The HPS TensorRT plugin is preinstalled in the 24.06 and later Merlin HugeCTR Container: nvcr.io/nvidia/merlin/merlin-hugectr:24.06.

-

You can check the existence of the required libraries by running the following Python code after launching this container.

-
-
-
import ctypes
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-plugin_handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-
-
-
-
-
-
-

Configurations

-

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use DLRM model which has one embedding table, bottom MLP layers, interaction layer and top MLP layers. Please note that the input to the embedding layer will be a dense key tensor of int32.

-
-
-
import os
-import numpy as np
-import tensorflow as tf
-import struct
-
-args = dict()
-
-args["gpu_num"] = 1                               # the number of available GPUs
-args["iter_num"] = 50                             # the number of training iteration
-args["slot_num"] = 26                             # the number of feature fields in this embedding layer
-args["embed_vec_size"] = 128                      # the dimension of embedding vectors
-args["dense_dim"] = 13                            # the dimension of dense features
-args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
-args["max_vocabulary_size"] = 260000
-args["vocabulary_range_per_slot"] = [[i*10000, (i+1)*10000] for i in range(26)]
-args["combiner"] = "mean"
-
-args["ps_config_file"] = "dlrm_tf.json"
-args["embedding_table_path"] = "dlrm_tf_sparse.model"
-args["saved_path"] = "dlrm_tf_saved_model"
-args["np_key_type"] = np.int32
-args["np_vector_type"] = np.float32
-args["tf_key_type"] = tf.int32
-args["tf_vector_type"] = tf.float32
-
-os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
-
-
-
-
-
2023-08-21 03:16:46.032517: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
-To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.
-
-
-
-
-
-
-
def generate_random_samples(num_samples, vocabulary_range_per_slot, dense_dim, key_dtype = args["np_key_type"]):
-    keys = list()
-    for vocab_range in vocabulary_range_per_slot:
-        keys_per_slot = np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(num_samples, 1), dtype=key_dtype)
-        keys.append(keys_per_slot)
-    keys = np.concatenate(np.array(keys), axis = 1)
-    numerical_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
-    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
-    return keys, numerical_features, labels
-
-def tf_dataset(keys, numerical_features, labels, batchsize):
-    dataset = tf.data.Dataset.from_tensor_slices((keys, numerical_features, labels))
-    dataset = dataset.batch(batchsize, drop_remainder=True)
-    return dataset
-
-
-
-
-
-
-

Train with native TF layers

-

We define the model graph for training with native TF layers, i.e., tf.nn.embedding_lookup, tf.keras.layers.Dense and so on. We can then train the model and extract the trained weights of the embedding table.

-
-
-
class MLP(tf.keras.layers.Layer):
-    def __init__(self,
-                arch,
-                activation='relu',
-                out_activation=None,
-                **kwargs):
-        super(MLP, self).__init__(**kwargs)
-        self.layers = []
-        index = 0
-        for units in arch[:-1]:
-            self.layers.append(tf.keras.layers.Dense(units, activation=activation, name="{}_{}".format(kwargs['name'], index)))
-            index+=1
-        self.layers.append(tf.keras.layers.Dense(arch[-1], activation=out_activation, name="{}_{}".format(kwargs['name'], index)))
-
-            
-    def call(self, inputs, training=True):
-        x = self.layers[0](inputs)
-        for layer in self.layers[1:]:
-            x = layer(x)
-        return x
-
-class SecondOrderFeatureInteraction(tf.keras.layers.Layer):
-    def __init__(self):
-        super(SecondOrderFeatureInteraction, self).__init__()
-
-    def call(self, inputs, num_feas):
-        dot_products = tf.reshape(tf.matmul(inputs, inputs, transpose_b=True), (-1, num_feas * num_feas))
-        indices = tf.constant([i * num_feas + j for j in range(1, num_feas) for i in range(j)])
-        flat_interactions = tf.gather(dot_products, indices, axis=1)
-        return flat_interactions
-
-class DLRM(tf.keras.models.Model):
-    def __init__(self,
-                 init_tensors,
-                 embed_vec_size,
-                 slot_num,
-                 dense_dim,
-                 arch_bot,
-                 arch_top,
-                 **kwargs):
-        super(DLRM, self).__init__(**kwargs)
-        
-
-        self.init_tensors = init_tensors
-        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))
-        
-        self.embed_vec_size = embed_vec_size
-        self.slot_num = slot_num
-        self.dense_dim = dense_dim
-    
-        self.bot_nn = MLP(arch_bot, name = "bottom", out_activation='relu')
-        self.top_nn = MLP(arch_top, name = "top", out_activation='sigmoid')
-        self.interaction_op = SecondOrderFeatureInteraction()
-
-        self.interaction_out_dim = self.slot_num * (self.slot_num+1) // 2
-        self.reshape_layer1 = tf.keras.layers.Reshape((1, arch_bot[-1]), name = "reshape1")
-        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "concat1")
-        self.concat2 = tf.keras.layers.Concatenate(axis=1, name = "concat2")
-            
-    def call(self, inputs, training=True):
-        categorical_features = inputs["keys"]
-        numerical_features = inputs["numerical_features"]
-        
-        embedding_vector = tf.nn.embedding_lookup(params=self.params, ids=categorical_features)
-        dense_x = self.bot_nn(numerical_features)
-        concat_features = self.concat1([embedding_vector, self.reshape_layer1(dense_x)])
-        
-        Z = self.interaction_op(concat_features, self.slot_num+1)
-        z = self.concat2([dense_x, Z])
-        logit = self.top_nn(z)
-        return logit
-
-    def summary(self):
-        inputs = {"keys": tf.keras.Input(shape=(self.slot_num, ), dtype=args["tf_key_type"], name="keys"), 
-                  "numerical_features": tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32, name="numrical_features")}
-        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
-        return model.summary()
-
-
-
-
-
-
-
 def train(args):
-    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
-    
-    model = DLRM(init_tensors, args["embed_vec_size"], args["slot_num"], args["dense_dim"],
-                arch_bot = [512, 256, args["embed_vec_size"]],
-                arch_top = [1024, 1024, 512, 256, 1],
-                name = "dlrm")
-    model.summary()
-    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
-    loss_fn = tf.keras.losses.BinaryCrossentropy()
-    
-    def _train_step(inputs, labels):
-        with tf.GradientTape() as tape:
-            logit = model(inputs)
-            loss = loss_fn(labels, logit)
-        grads = tape.gradient(loss, model.trainable_variables)
-        optimizer.apply_gradients(zip(grads, model.trainable_variables))
-        return loss, logit
-
-    keys, numerical_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["dense_dim"], args["np_key_type"])
-    dataset = tf_dataset(keys, numerical_features, labels, args["global_batch_size"])
-    for i, (keys, numerical_features, labels) in enumerate(dataset):
-        inputs = {"keys": keys, "numerical_features": numerical_features}
-        loss, logit = _train_step(inputs, labels)
-        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
-
-    return model
-
-
-
-
-
-
-
trained_model = train(args)
-weights_list = trained_model.get_weights()
-embedding_weights = weights_list[-1]
-trained_model.save(args["saved_path"])
-
-
-
-
-
2023-08-21 03:16:55.963734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1638] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30974 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-
-
-
WARNING:tensorflow:The following Variables were used in a Lambda layer's call (tf.compat.v1.nn.embedding_lookup), but are not present in its tracked objects:   <tf.Variable 'Variable:0' shape=(260000, 128) dtype=float32>. This is a strong indication that the Lambda layer should be rewritten as a subclassed Layer.
-Model: "model"
-__________________________________________________________________________________________________
- Layer (type)                   Output Shape         Param #     Connected to                     
-==================================================================================================
- numrical_features (InputLayer)  [(None, 13)]        0           []                               
-                                                                                                  
- bottom (MLP)                   (None, 128)          171392      ['numrical_features[0][0]']      
-                                                                                                  
- keys (InputLayer)              [(None, 26)]         0           []                               
-                                                                                                  
- tf.compat.v1.nn.embedding_look  (None, 26, 128)     0           ['keys[0][0]']                   
- up (TFOpLambda)                                                                                  
-                                                                                                  
- reshape1 (Reshape)             (None, 1, 128)       0           ['bottom[0][0]']                 
-                                                                                                  
- concat1 (Concatenate)          (None, 27, 128)      0           ['tf.compat.v1.nn.embedding_looku
-                                                                 p[0][0]',                        
-                                                                  'reshape1[0][0]']               
-                                                                                                  
- second_order_feature_interacti  (None, 351)         0           ['concat1[0][0]']                
- on (SecondOrderFeatureInteract                                                                   
- ion)                                                                                             
-                                                                                                  
- concat2 (Concatenate)          (None, 479)          0           ['bottom[0][0]',                 
-                                                                  'second_order_feature_interactio
-                                                                 n[0][0]']                        
-                                                                                                  
- top (MLP)                      (None, 1)            2197505     ['concat2[0][0]']                
-                                                                                                  
-==================================================================================================
-Total params: 2,368,897
-Trainable params: 2,368,897
-Non-trainable params: 0
-__________________________________________________________________________________________________
-
-
-
2023-08-21 03:16:57.578464: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype int64 and shape [51200,1]
-	 [[{{node Placeholder/_2}}]]
-2023-08-21 03:16:58.892396: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x55e0fdfeb330 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
-2023-08-21 03:16:58.892450: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
-2023-08-21 03:16:58.897903: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
-2023-08-21 03:16:59.379151: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8902
-2023-08-21 03:16:59.502058: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
-
-
-
WARNING:tensorflow:5 out of the last 5 calls to <function _BaseOptimizer._update_step_xla at 0x7fa9660adab0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
-WARNING:tensorflow:6 out of the last 6 calls to <function _BaseOptimizer._update_step_xla at 0x7fa9660adab0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
--------------------- Step 0, loss: 39.68028259277344 --------------------
--------------------- Step 1, loss: 2571352064.0 --------------------
--------------------- Step 2, loss: 639234.5 --------------------
--------------------- Step 3, loss: 4132346.75 --------------------
--------------------- Step 4, loss: 20792958.0 --------------------
--------------------- Step 5, loss: 5957.8994140625 --------------------
--------------------- Step 6, loss: 231005.96875 --------------------
--------------------- Step 7, loss: 185315.3125 --------------------
--------------------- Step 8, loss: 151740.75 --------------------
--------------------- Step 9, loss: 43695.6640625 --------------------
--------------------- Step 10, loss: 45556.24609375 --------------------
--------------------- Step 11, loss: 131654.78125 --------------------
--------------------- Step 12, loss: 1.8805829286575317 --------------------
--------------------- Step 13, loss: 49121.47265625 --------------------
--------------------- Step 14, loss: 60609.62109375 --------------------
--------------------- Step 15, loss: 676294.375 --------------------
--------------------- Step 16, loss: 31208.66015625 --------------------
--------------------- Step 17, loss: 156789.65625 --------------------
--------------------- Step 18, loss: 103213.1015625 --------------------
--------------------- Step 19, loss: 22.394046783447266 --------------------
--------------------- Step 20, loss: 10789.5703125 --------------------
--------------------- Step 21, loss: 2716.05859375 --------------------
--------------------- Step 22, loss: 139559.96875 --------------------
--------------------- Step 23, loss: 130419.9453125 --------------------
--------------------- Step 24, loss: 13583.6923828125 --------------------
--------------------- Step 25, loss: 7378.22802734375 --------------------
--------------------- Step 26, loss: 81185.40625 --------------------
--------------------- Step 27, loss: 18370.255859375 --------------------
--------------------- Step 28, loss: 3314.90478515625 --------------------
--------------------- Step 29, loss: 15871.3154296875 --------------------
--------------------- Step 30, loss: 545.2841796875 --------------------
--------------------- Step 31, loss: 1281.3038330078125 --------------------
--------------------- Step 32, loss: 52890.65625 --------------------
--------------------- Step 33, loss: 2550.232177734375 --------------------
--------------------- Step 34, loss: 4526.03759765625 --------------------
--------------------- Step 35, loss: 25.5832462310791 --------------------
--------------------- Step 36, loss: 22.22301483154297 --------------------
--------------------- Step 37, loss: 17.7525691986084 --------------------
--------------------- Step 38, loss: 9.034607887268066 --------------------
--------------------- Step 39, loss: 1.6510401964187622 --------------------
--------------------- Step 40, loss: 6.275766372680664 --------------------
--------------------- Step 41, loss: 3.707094430923462 --------------------
--------------------- Step 42, loss: 0.7623991966247559 --------------------
--------------------- Step 43, loss: 1.5783321857452393 --------------------
--------------------- Step 44, loss: 0.8166252374649048 --------------------
--------------------- Step 45, loss: 0.885994553565979 --------------------
--------------------- Step 46, loss: 0.912842869758606 --------------------
--------------------- Step 47, loss: 0.7323049902915955 --------------------
--------------------- Step 48, loss: 0.7469371557235718 --------------------
--------------------- Step 49, loss: 0.8475004434585571 --------------------
-WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-
-
-
2023-08-21 03:17:12.248789: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,128]
-	 [[{{node inputs}}]]
-2023-08-21 03:17:12.721088: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype float and shape [?,128]
-	 [[{{node inputs}}]]
-WARNING:absl:Found untraced functions such as bottom_0_layer_call_fn, bottom_0_layer_call_and_return_conditional_losses, bottom_1_layer_call_fn, bottom_1_layer_call_and_return_conditional_losses, bottom_2_layer_call_fn while saving (showing 5 of 16). These functions will not be directly callable after loading.
-
-
-
INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets
-
-
-
INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets
-
-
-
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-
-
-
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-
-
-
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-
-
-
WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.
-
-
-
-
-
-
-
# Release the occupied GPU memory by TensorFlow and Keras
-from numba import cuda
-cuda.select_device(0)
-cuda.close()
-
-
-
-
-
-
-

Build the HPS-integrated TensorRT engine

-

In order to use HPS in the inference stage, we need to convert the embedding weights to the formats required by HPS first and create JSON configuration file for HPS.

-

Then we convert the TF saved model to ONNX, and employ the ONNX GraphSurgoen tool to replace the native TF embedding lookup layer with the placeholder of HPS TensorRT plugin layer.

-

After that, we can build the TensorRT engine, which is comprised of the HPS TensorRT plugin layer and the dense network.

-
-

Step1: Prepare sparse model and JSON configuration file for HPS

-

Please note that the storage format in the dlrm_tf_sparse.model/key file is int64, while the HPS TensorRT plugin currently only support int32 when loading the keys into memory. There is no overflow since the key value range is 0~260000.

-
-
-
def convert_to_sparse_model(embeddings_weights, embedding_table_path, embedding_vec_size):
-    os.system("mkdir -p {}".format(embedding_table_path))
-    with open("{}/key".format(embedding_table_path), 'wb') as key_file, \
-        open("{}/emb_vector".format(embedding_table_path), 'wb') as vec_file:
-      for key in range(embeddings_weights.shape[0]):
-        vec = embeddings_weights[key]
-        key_struct = struct.pack('q', key)
-        vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-        key_file.write(key_struct)
-        vec_file.write(vec_struct)
-
-
-
-
-
-
-
convert_to_sparse_model(embedding_weights, args["embedding_table_path"], args["embed_vec_size"])
-
-
-
-
-
-
-
%%writefile dlrm_tf.json
-{
-    "supportlonglong": false,
-    "models": [{
-        "model": "dlrm",
-        "sparse_files": ["dlrm_tf_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [128],
-        "maxnum_catfeature_query_per_table_per_sample": [26],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Writing dlrm_tf.json
-
-
-
-
-
-
-

Step2: Convert to ONNX and do ONNX graph surgery

-
-
-
# convert TF SavedModel to ONNX
-!python -m tf2onnx.convert --saved-model dlrm_tf_saved_model --output dlrm_tf.onnx
-
-
-
-
-
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'tf2onnx.convert' found in sys.modules after import of package 'tf2onnx', but prior to execution of 'tf2onnx.convert'; this may result in unpredictable behaviour
-  warn(RuntimeWarning(msg))
-2023-08-21 03:17:49,926 - WARNING - ***IMPORTANT*** Installed protobuf is not cpp accelerated. Conversion will be extremely slow. See https://github.com/onnx/tensorflow-onnx/issues/1557
-2023-08-21 03:17:50,868 - WARNING - '--tag' not specified for saved_model. Using --tag serve
-2023-08-21 03:17:56,302 - INFO - Signatures found in model: [serving_default].
-2023-08-21 03:17:56,302 - WARNING - '--signature_def' not specified, using first signature: serving_default
-2023-08-21 03:17:56,302 - INFO - Output names: ['output_1']
-2023-08-21 03:18:02,064 - INFO - Using tensorflow=2.12.0, onnx=1.14.0, tf2onnx=1.14.0/8f8d49
-2023-08-21 03:18:02,064 - INFO - Using opset <onnx, 15>
-2023-08-21 03:18:03,255 - INFO - Computed 0 values for constant folding
-2023-08-21 03:18:04,203 - INFO - Optimizing ONNX model
-2023-08-21 03:18:04,624 - INFO - After optimization: Cast -3 (3->0), Concat -1 (3->2), Const -15 (35->20), Identity -2 (2->0), Shape -1 (1->0), Slice -1 (1->0), Squeeze -1 (1->0), Unsqueeze -3 (3->0)
-2023-08-21 03:18:07,745 - INFO - 
-2023-08-21 03:18:07,745 - INFO - Successfully converted TensorFlow model dlrm_tf_saved_model to ONNX
-2023-08-21 03:18:07,745 - INFO - Model inputs: ['keys', 'numerical_features']
-2023-08-21 03:18:07,745 - INFO - Model outputs: ['output_1']
-2023-08-21 03:18:07,745 - INFO - ONNX model is saved at dlrm_tf.onnx
-
-
-
-
-
-
-
# ONNX graph surgery to insert HPS the TensorRT plugin placeholder
-import onnx_graphsurgeon as gs
-from onnx import  shape_inference
-import numpy as np
-import onnx
-
-graph = gs.import_onnx(onnx.load("dlrm_tf.onnx"))
-saved = []
-
-for node in graph.nodes:
-    if node.name == "StatefulPartitionedCall/dlrm/embedding_lookup":
-        categorical_features = gs.Variable(name="categorical_features", dtype=np.int32, shape=("unknown", 26))
-        hps_node = gs.Node(op="HPS_TRT", attrs={"ps_config_file": "dlrm_tf.json\0", "model_name": "dlrm\0", "table_id": 0, "emb_vec_size": 128}, 
-                           inputs=[categorical_features], outputs=[node.outputs[0]])
-        graph.nodes.append(hps_node)
-        saved.append(categorical_features)
-        node.outputs.clear()
-for i in graph.inputs:
-    if i.name == "numerical_features":
-        saved.append(i)
-graph.inputs = saved
-
-graph.cleanup().toposort()
-onnx.save(gs.export_onnx(graph), "dlrm_tf_with_hps.onnx")
-
-
-
-
-
-
-

Step3: Build the TensorRT engine

-
-
-
# build the TensorRT engine based on dlrm_tf_with_hps.onnx
-import tensorrt as trt
-import ctypes
-
-plugin_lib_name = "/usr/local/hps_trt/lib/libhps_plugin.so"
-handle = ctypes.CDLL(plugin_lib_name, mode=ctypes.RTLD_GLOBAL)
-
-TRT_LOGGER = trt.Logger(trt.Logger.INFO)
-EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
-
-def build_engine_from_onnx(onnx_model_path):
-    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser, builder.create_builder_config() as builder_config:        
-        model = open(onnx_model_path, 'rb')
-        parser.parse(model.read())
-
-        profile = builder.create_optimization_profile()        
-        profile.set_shape("categorical_features", (1, 26), (1024, 26), (1024, 26))    
-        profile.set_shape("numerical_features", (1, 13), (1024, 13), (1024, 13))
-        builder_config.add_optimization_profile(profile)
-        engine = builder.build_serialized_network(network, builder_config)
-        return engine
-
-serialized_engine = build_engine_from_onnx("dlrm_tf_with_hps.onnx")
-with open("dlrm_tf_with_hps.trt", "wb") as fout:
-    fout.write(serialized_engine)
-print("Successfully build the TensorRT engine")
-
-
-
-
-
[08/21/2023-03:18:16] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2013, GPU +0, now: CPU 4018, GPU 721 (MiB)
-[08/21/2023-03:18:22] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +421, GPU +72, now: CPU 4516, GPU 793 (MiB)
-[08/21/2023-03:18:22] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
-[08/21/2023-03:18:22] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
-[08/21/2023-03:18:22] [TRT] [I] No importer registered for op: HPS_TRT. Attempting to import as plugin.
-[08/21/2023-03:18:22] [TRT] [I] Searching for plugin: HPS_TRT, plugin_version: 1, plugin_namespace: 
-=====================================================HPS Parse====================================================
-[HCTR][03:18:22.774][INFO][RK0][main]: fuse_embedding_table is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: dense_file is not specified using default: 
-[HCTR][03:18:22.774][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
-[HCTR][03:18:22.774][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
-[HCTR][03:18:22.774][INFO][RK0][main]: refresh_delay is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: refresh_interval is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: use_static_table is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: use_context_stream is not specified using default: 1
-[HCTR][03:18:22.774][INFO][RK0][main]: use_hctr_cache_implementation is not specified using default: 1
-[HCTR][03:18:22.774][INFO][RK0][main]: thread_pool_size is not specified using default: 16
-[HCTR][03:18:22.774][INFO][RK0][main]: init_ec is not specified using default: 1
-[HCTR][03:18:22.774][INFO][RK0][main]: enable_pagelock is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: fp8_quant is not specified using default: 0
-[HCTR][03:18:22.774][INFO][RK0][main]: HPS plugin uses context stream for model dlrm: True
-====================================================HPS Create====================================================
-[HCTR][03:18:22.775][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][03:18:22.775][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][03:18:22.775][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][03:18:22.775][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][03:18:22.775][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][03:18:24.860][INFO][RK0][main]: Table: hps_et.dlrm.sparse_embedding0; cached 260000 / 260000 embeddings in volatile database (HashMapBackend); load: 260000 / 18446744073709551615 (0.00%).
-[HCTR][03:18:24.863][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][03:18:24.864][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][03:18:24.869][INFO][RK0][main]: Model name: dlrm
-[HCTR][03:18:24.869][INFO][RK0][main]: Max batch size: 1024
-[HCTR][03:18:24.869][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][03:18:24.869][INFO][RK0][main]: Number of embedding tables: 1
-[HCTR][03:18:24.869][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 1.000000
-[HCTR][03:18:24.869][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][03:18:24.869][INFO][RK0][main]: Use I64 input key: False
-[HCTR][03:18:24.869][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][03:18:24.869][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][03:18:24.869][INFO][RK0][main]: The size of worker memory pool: 3
-[HCTR][03:18:24.869][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][03:18:24.869][INFO][RK0][main]: The refresh percentage : 0.200000
-[HCTR][03:18:24.902][INFO][RK0][main]: Initialize the embedding cache by by inserting the same size model file with embedding cache from beginning
-[HCTR][03:18:24.902][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][03:18:24.902][INFO][RK0][main]: EC initialization on device 0 for hps_et.dlrm.sparse_embedding0
-[HCTR][03:18:24.947][INFO][RK0][main]: Initialize the embedding table 0 for iteration 0 with number of 51968 keys.
-[HCTR][03:18:24.992][INFO][RK0][main]: Initialize the embedding table 0 for iteration 1 with number of 51968 keys.
-[HCTR][03:18:25.018][INFO][RK0][main]: Initialize the embedding table 0 for iteration 2 with number of 51968 keys.
-[HCTR][03:18:25.047][INFO][RK0][main]: Initialize the embedding table 0 for iteration 3 with number of 51968 keys.
-[HCTR][03:18:25.069][INFO][RK0][main]: Initialize the embedding table 0 for iteration 4 with number of 51968 keys.
-[HCTR][03:18:25.077][INFO][RK0][main]: LookupSession i64_input_key: False
-[HCTR][03:18:25.077][INFO][RK0][main]: Creating lookup session for dlrm on device: 0
-[08/21/2023-03:18:25] [TRT] [I] Successfully created plugin: HPS_TRT
-[08/21/2023-03:18:25] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
-[08/21/2023-03:18:25] [TRT] [I] Graph optimization time: 0.034216 seconds.
-[08/21/2023-03:18:25] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 8710, GPU 1051 (MiB)
-[08/21/2023-03:18:25] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 8711, GPU 1061 (MiB)
-[08/21/2023-03:18:25] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
-[08/21/2023-03:18:25] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
-[08/21/2023-03:18:27] [TRT] [I] Detected 2 inputs and 1 output network tensors.
-[08/21/2023-03:18:27] [TRT] [I] Total Host Persistent Memory: 144
-[08/21/2023-03:18:27] [TRT] [I] Total Device Persistent Memory: 0
-[08/21/2023-03:18:27] [TRT] [I] Total Scratch Memory: 18350080
-[08/21/2023-03:18:27] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 41 MiB
-[08/21/2023-03:18:27] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 2 steps to complete.
-[08/21/2023-03:18:27] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.007954ms to assign 2 blocks to 2 nodes requiring 31981568 bytes.
-[08/21/2023-03:18:27] [TRT] [I] Total Activation Memory: 31981568
-[08/21/2023-03:18:27] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8764, GPU 1091 (MiB)
-[08/21/2023-03:18:27] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 8764, GPU 1101 (MiB)
-[08/21/2023-03:18:27] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +16, now: CPU 0, GPU 16 (MiB)
-Successfully build the TensorRT engine
-
-
-
-
-
-
-
-

Deploy HPS-integrated TensorRT engine with Triton on multiple GPUs

-

In order to deploy the TensorRT engine with the Triton TensorRT backend, we need to create the model repository and define the config.pbtxt first. Since we are deploy model instances on multiple GPUs, we need to modify the "deployed_device_list" entry in dlrm_tf.json accordingly.

-
-
-
!mkdir -p model_repo/dlrm_tf_with_hps/1
-!mv dlrm_tf_with_hps.trt model_repo/dlrm_tf_with_hps/1
-
-
-
-
-
-
-
%%writefile model_repo/dlrm_tf_with_hps/config.pbtxt
-
-platform: "tensorrt_plan"
-default_model_filename: "dlrm_tf_with_hps.trt"
-backend: "tensorrt"
-max_batch_size: 0
-input [
-  {
-    name: "categorical_features"
-    data_type: TYPE_INT32
-    dims: [-1,26]
-  },
-  {
-    name: "numerical_features"
-    data_type: TYPE_FP32
-    dims: [-1,13]
-  }
-]
-output [
-  {
-      name: "output_1"
-      data_type: TYPE_FP32
-      dims: [-1,1]
-  }
-]
-instance_group [
-  {
-    count: 1
-    kind: KIND_GPU
-    gpus:[0,1,2,3]
-  }
-]
-
-
-
-
-
Overwriting model_repo/dlrm_tf_with_hps/config.pbtxt
-
-
-
-
-
-
-
%%writefile dlrm_tf.json
-{
-    "supportlonglong": false,
-    "models": [{
-        "model": "dlrm",
-        "sparse_files": ["dlrm_tf_sparse.model"],
-        "num_of_worker_buffer_in_pool": 3,
-        "embedding_table_names":["sparse_embedding0"],
-        "embedding_vecsize_per_table": [128],
-        "maxnum_catfeature_query_per_table_per_sample": [26],
-        "default_value_for_each_table": [1.0],
-        "deployed_device_list": [0,1,2,3],
-        "max_batch_size": 1024,
-        "cache_refresh_percentage_per_iteration": 0.2,
-        "hit_rate_threshold": 1.0,
-        "gpucacheper": 1.0,
-        "gpucache": true
-        }
-    ]
-}
-
-
-
-
-
Overwriting dlrm_tf.json
-
-
-
-
-
-
-
!tree model_repo/dlrm_tf_with_hps
-
-
-
-
-
model_repo/dlrm_tf_with_hps
-├── 1
-│   └── dlrm_tf_with_hps.trt
-└── config.pbtxt
-
-1 directory, 2 files
-
-
-
-
-

We can then launch the Triton inference server using the TensorRT backend. Please note that LD_PRELOAD is utilized to load the custom TensorRT plugin (i.e., HPS TensorRT plugin) into Triton.

-

Note: Since Background processes not supported by Jupyter, please launch the Triton Server according to the following command independently in the background.

-
-

LD_PRELOAD=/usr/local/hps_trt/lib/libhps_plugin.so tritonserver –model-repository=/hugectr/hps_trt/notebooks/model_repo/ –load-model=dlrm_tf_with_hps –model-control-mode=explicit

-
-

If you successfully started tritonserver, you should see a log similar to following:

-
TRITONBACKEND_ModelInstanceInitialize: dlrm_tf_with_hps_0 (GPU device 0)
-TRITONBACKEND_ModelInstanceInitialize: dlrm_tf_with_hps_0 (GPU device 1)
-TRITONBACKEND_ModelInstanceInitialize: dlrm_tf_with_hps_0 (GPU device 2)
-TRITONBACKEND_ModelInstanceInitialize: dlrm_tf_with_hps_0 (GPU device 3)
-
-
-+----------+--------------------------------+--------------------------------+
-| Backend  | Path                           | Config                         |
-+----------+--------------------------------+--------------------------------+
-| tensorrt | /opt/tritonserver/backends/ten | {"cmdline":{"auto-complete-con |
-|          | sorrt/libtriton_tensorrt.so    | fig":"true","min-compute-capab |
-|          |                                | ility":"6.000000","backend-dir |
-|          |                                | ectory":"/opt/tritonserver/bac |
-|          |                                | kends","default-max-batch-size |
-|          |                                | ":"4"}}                        |
-|          |                                |                                |
-+----------+--------------------------------+--------------------------------+
-
-+------------------+---------+--------+
-| Model            | Version | Status |
-+------------------+---------+--------+
-| dlrm_tf_with_hps | 1       | READY  |
-+------------------+---------+--------+
-
-
-

We can then send the requests to the Triton inference server using the HTTP client.

-
-
-
import os
-import shutil
-import numpy as np
-import tritonclient.http as httpclient
-from tritonclient.utils import *
-
-BATCH_SIZE = 1024
-
-categorical_feature = np.random.randint(0,260000,size=(BATCH_SIZE,26)).astype(np.int32)
-numerical_feature = np.random.random((BATCH_SIZE, 13)).astype(np.float32)
-
-inputs = [
-    httpclient.InferInput("categorical_features", 
-                          categorical_feature.shape,
-                          np_to_triton_dtype(np.int32)),
-    httpclient.InferInput("numerical_features", 
-                          numerical_feature.shape,
-                          np_to_triton_dtype(np.float32)),                          
-]
-inputs[0].set_data_from_numpy(categorical_feature)
-inputs[1].set_data_from_numpy(numerical_feature)
-
-
-outputs = [
-    httpclient.InferRequestedOutput("output_1")
-]
-
-model_name = "dlrm_tf_with_hps"
-
-with httpclient.InferenceServerClient("localhost:8000") as client:
-    response = client.infer(model_name,
-                            inputs,
-                            outputs=outputs)
-    result = response.get_response()
-    
-    print("Prediction result is \n{}".format(response.as_numpy("output_1")))
-    print("Response details:\n{}".format(result))
-
-
-
-
-
Prediction result is 
-[[0.34091672]
- [0.34091672]
- [0.34091672]
- ...
- [0.34091672]
- [0.34091672]
- [0.34091672]]
-Response details:
-{'model_name': 'dlrm_tf_with_hps', 'model_version': '1', 'outputs': [{'name': 'output_1', 'datatype': 'FP32', 'shape': [1024, 1], 'parameters': {'binary_data_size': 4096}}]}
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hps_trt/notebooks/index.html b/review/pr-458/hps_trt/notebooks/index.html deleted file mode 100644 index 3157e39f31..0000000000 --- a/review/pr-458/hps_trt/notebooks/index.html +++ /dev/null @@ -1,282 +0,0 @@ - - - - - - - HPS Plugin for TensorRT Notebooks — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

HPS Plugin for TensorRT Notebooks

-

This directory contains a set of Jupyter notebooks that demonstrate how to use the HPS plugin for TensorRT.

-
-

Quickstart

-

The simplest way to run a one of our notebooks is with a Docker container. -A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. -Docker images are available from the NVIDIA GPU Cloud (NGC). -If you prefer to build the HugeCTR Docker image on your own, refer to Set Up the Development Environment With Merlin Containers.

-
-

Pull the Container from NGC

-

Pull a container based on the modeling framework and notebook that you want to run.

-

To run the demo_for_tf_trained_model.ipynb notebook, pull the Merlin TensorFlow container:

-
docker pull nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
-
-
-

To run the demo_for_pytorch_trained_model.ipynb notebook, pull the Merlin PyTorch container:

-
docker pull nvcr.io/nvidia/merlin/merlin-pytorch:23.02
-
-
-

To run the demo_for_hugectr_trained_model.ipynb notebook, pull the Merlin HugeCTR container:

-
docker pull nvcr.io/nvidia/merlin/merlin-hugectr:23.02
-
-
-

The HPS TensorRT plugin is installed in all the containers.

-
-
-

Clone the HugeCTR Repository

-

Use the following command to clone the HugeCTR repository:

-
git clone https://github.com/NVIDIA-Merlin/HugeCTR
-
-
-
-
-

Start the Jupyter Notebook

-
    -
  1. Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running the following command.

    -
    docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr -p 8888:8888 nvcr.io/nvidia/merlin/<container-name>:23.02
    -
    -
    -
  2. -
  3. Start Jupyter using these commands:

    -
    cd /hugectr/hps_trt/notebooks
    -jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'
    -
    -
    -
  4. -
  5. Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser: http://[host machine]:8888

    -

    Use the token available from the output by running the command above to log in. For example:

    -

    http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b

    -
  6. -
-
-
-
-

Notebook List

-

Here’s a list of notebooks that you can run:

-
    -
  • benchmark_tf_trained_large_model.ipynb: Provides the steps to benchmark the inference performance of the large model. The model is comprised of one 147GB embedding table and three fully connected layers, for which a TensorRT engine is built with the HPS plugin. Here are the performance metrics at batch size 4096 on different platforms:

  • -
- - - - - - - - - - - - - - - - - - - - - -

Platform

Interconnect between GPU and CPU

Average per-batch latency in usec @BZ=4096

A100-SXM4-80GB + 2 x AMD EPYC 7742 64-Core Processor (2TB CPU Memory)

PCIe Gen4

1396

H100-SXM5-80GB + 2 x Intel Xeon Platinum 8480C 56-Core Processor (2TB CPU Memory)

PCIe Gen5

773

H100-NVL-94GB + NVIDIA Grace 72-Core Processor (480GB CPU Memory)

NVLink-C2C

210

- -
-
-

System Specifications

-

The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notebook

CPU

GPU

#GPUs

Author

demo_for_tf_trained_model.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

demo_for_pytorch_trained_model.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

demo_for_hugectr_trained_model.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hugectr_contributor_guide.html b/review/pr-458/hugectr_contributor_guide.html deleted file mode 100644 index 077d25d15e..0000000000 --- a/review/pr-458/hugectr_contributor_guide.html +++ /dev/null @@ -1,315 +0,0 @@ - - - - - - - Contributing to HugeCTR — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Contributing to HugeCTR

- -
-

Overview of Contributing to HugeCTR

-

We’re grateful for your interest in HugeCTR and value your contributions. You can contribute to HugeCTR by:

-
    -
  • submitting a feature, documentation, or bug request.

    -

    NOTE: After we review your request, we’ll assign it to a future release. If you think the issue should be prioritized over others, comment on the issue.

    -
  • -
  • proposing and implementing a new feature.

    -

    NOTE: Once we agree to the proposed design, you can go ahead and implement the new feature using the steps outlined in the Contribute New Code section.

    -
  • -
  • implementing a pending feature or fixing a bug.

    -

    NOTE: Use the steps outlined in the Contribute New Code section. If you need more information about a particular issue, -add your comments on the issue.

    -
  • -
-
-
-

Contribute New Code

-
    -
  1. Build HugeCTR or Sparse Operation Kit (SOK) from source using the steps outlined in the Set Up the Development Environment with Merlin Containers.

  2. -
  3. File an issue and add a comment stating that you’ll work on it.

  4. -
  5. Start coding.

    -

    NOTE: Don’t forget to add or update the unit tests properly.

    -
  6. -
  7. Create a pull request for you work.

  8. -
  9. Wait for a maintainer to review your code.

    -

    You may be asked to make additional edits to your code if necessary. Once approved, a maintainer will merge your pull request.

    -
  10. -
-

If you have any questions or need clarification, don’t hesitate to add comments to your issue and we’ll respond promptly.

-
-
-

How to Start your Development

-
-

Set Up the Development Environment With Merlin Containers

-

We provide options to disable the installation of HugeCTR and HugeCTR Triton Backend in Merlin Dockerfiles so that our contributors can build the development environment (container) from them. By simply clone the code of HugeCTR into this environment and build you can start the journey of development.

-

Note: the message on terminal below is not errors if you are working in such containers.

-
groups: cannot find name for group ID 1007
-I have no name!@56a762eae3f8:/hugectr
-
-
-

In Merlin CTR Dockerfile, Merlin Tensorflow Dockerfile, we provide a set of arguments to setup your HugeCTR development container:

-

The arguments and configurations in this example can be used in all the three containers building:

-
docker build --pull -t ${DST_IMAGE} -f ${DOCKER_FILE} --build-arg RELEASE=false --build-arg RMM_VER=vnightly --build-arg CUDF_VER=vnightly --build-arg NVTAB_VER=vnightly --build-arg HUGECTR_DEV_MODE=true --no-cache .
-
-
-

For RMM_VER, CUDF_VER, NVTAB_VER, you can simply specify the release tag e.g. v1.0 or vnightly if you want to build with the head of the main branch. With specifying HUGECTR_DEV_MODE=true, you can disable HugeCTR installation.

-

Docker CLI Quick Reference

-
$ docker build [<opts>] <path> | <URL>
-               Build a new image from the source code at PATH
-  -f, --file path/to/Dockerfile
-               Path to the Dockerfile to use. Default: Dockerfile.
-  --build-arg <varname>=<value>
-               Name and value of a build argument defined with ARG
-               Dockerfile instruction
-  -t "<name>[:<tag>]"
-               Repository names (and optionally with tags) to be applied
-               to the resulting image
-  --label =<label>
-               Set metadata for an image
-  -q, --quiet  Suppress the output generated by containers
-  --rm         Remove intermediate containers after a successful build
-
-
-
-
-

Build HugeCTR Training Container from Source

-

To build HugeCTR Training Container from source, do the following:

-
    -
  1. Build the hugectr:devel image using the steps outlined here. Remember that this instruction is only for the Merlin CTR Dockerfile.

  2. -
  3. Download the HugeCTR repository and the third-party modules that it relies on by running the following commands:

    -
    $ git clone https://github.com/NVIDIA/HugeCTR.git
    -$ cd HugeCTR
    -$ git submodule update --init --recursive
    -
    -
    -
  4. -
  5. Build HugeCTR from scratch using one or any combination of the following options:

    -
      -
    • SM: You can use this option to build HugeCTR with a specific compute capability (DSM=90) or multiple compute capabilities (DSM=”80;70”). The default compute capability -is 70, which uses the NVIDIA V100 GPU. For more information, refer to the Compute Capability table. 60 is not supported for inference deployments. For more information, refer to the Quick Start for the HugeCTR backend of Triton Inference Server.

    • -
    • CMAKE_BUILD_TYPE: You can use this option to build HugeCTR with Debug or Release. When using Debug to build, HugeCTR will print more verbose logs and execute GPU tasks -in a synchronous manner. -average of eval_batches results. Only one thread and chunk will be used in the data reader. Performance will be lower when in validation mode. This option is set to OFF by -default.

    • -
    • ENABLE_MULTINODES: You can use this option to build HugeCTR with multiple nodes. This option is set to OFF by default. For more information, refer to the deep and cross network samples directory on GitHub.

    • -
    • ENABLE_INFERENCE: You can use this option to build HugeCTR in inference mode, which was designed for the inference framework. In this mode, an inference shared library -will be built for the HugeCTR Backend. Only interfaces that support the HugeCTR Backend can be used. Therefore, you can’t train models in this mode. This option is set to -OFF by default. For building inference container, please refer to Build HugeCTR Inference Container from Source

    • -
    • ENABLE_HDFS: You can use this option to build HugeCTR together with HDFS to enable HDFS related functions. Permissible values are ON and OFF (default). Setting this option to ON leads to building all necessary Hadoop modules that are required for building so that it can connect to HDFS deployments.

    • -
    • ENABLE_S3: You can use this option to build HugeCTR together with Amazon AWS S3 SDK to enable S3 related functions. Permissible values are ON and OFF (default). Setting this option to ON leads to building all necessary AWS SKKs and dependencies that are required for building AND running both HugeCTR and S3.

    • -
    -

    Please note that setting DENABLE_HDFS=ON or DENABLE_S3=ON requires root permission. So before using these two options to do the customized building, make sure you use -u root when you run the docker container.

    -

    Here are some examples of how you can build HugeCTR using these build options:

    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Release -DSM=80 .. # Target is NVIDIA A100
    -$ make -j && make install
    -
    -
    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Release -DSM="80;90" -DENABLE_MULTINODES=ON .. # Target is NVIDIA A100 / H100 with the multi-node mode on.
    -$ make -j && make install
    -
    -
    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Release -DSM="70;80" -DENABLE_HDFS=ON .. # Target is NVIDIA V100 / A100 with HDFS components mode on.
    -$ make -j && make install
    -
    -
    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Debug -DSM="70;80" .. # Target is NVIDIA V100 / A100 with Debug mode.
    -$ make -j && make install
    -
    -
    -

    By default, HugeCTR is installed at /usr/local. However, you can use CMAKE_INSTALL_PREFIX to install HugeCTR to non-default location:

    -
    $ cmake -DCMAKE_INSTALL_PREFIX=/opt/HugeCTR -DSM=70 ..
    -
    -
    -
  6. -
-
-
-

Build HugeCTR Inference Container from Source

-

To build HugeCTR inference container from source, do the following:

-
    -
  1. Build the hugectr:devel_inference image using the steps outlined here. Remember that this instruction is only for the Merlin CTR Dockerfile.

  2. -
  3. Download the HugeCTR repository and the third-party modules that it relies on by running the following commands:

    -
    $ git clone https://github.com/NVIDIA/HugeCTR.git
    -$ cd HugeCTR
    -$ git submodule update --init --recursive
    -
    -
    -
  4. -
  5. Here is an example of how you can build HugeCTR inference container using the build options:

    -
    $ mkdir -p build && cd build
    -$ cmake -DCMAKE_BUILD_TYPE=Release -DSM="70;80" -DENABLE_INFERENCE=ON .. # Target is NVIDIA V100 / A100 with Inference mode ON.
    -$ make -j && make install
    -
    -
    -
  6. -
-
-
-

Build Sparse Operation Kit (SOK) from Source

-

To build the Sparse Operation Kit component in HugeCTR, do the following:

-
    -
  1. Build the hugectr:tf-plugin docker image using the steps noted here. Remember that this instruction is only for the Merlin Tensorflow Dockerfile.

  2. -
  3. Download the HugeCTR repository by running the following command:

    -
    $ git clone https://github.com/NVIDIA/HugeCTR.git hugectr
    -
    -
    -
  4. -
  5. Build and install libraries to the system paths by running the following commands:

    -
    $ cd hugectr/sparse_operation_kit
    -$ python setup.py install
    -
    -
    -

    You can config different environment variables for compiling SOK, please refer to this section for more details.

    -
  6. -
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hugectr_core_features.html b/review/pr-458/hugectr_core_features.html deleted file mode 100644 index 2e68523f0e..0000000000 --- a/review/pr-458/hugectr_core_features.html +++ /dev/null @@ -1,237 +0,0 @@ - - - - - - - HugeCTR Core Features — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

HugeCTR Core Features

- - -
-

Summary of Core Features

-

In addition to single-node and full-precision training, HugeCTR supports a variety of features that -are described in the following topics.

-

NOTE: Multi-node training and mixed precision training can be used simultaneously.

-
-
-

Model Parallel Training

-

HugeCTR natively supports both model parallel and data parallel training, making it possible to train very large models on GPUs. Features and categories of embeddings can be distributed across multiple GPUs and nodes. For example, if you have two nodes with 8xA100 80GB GPUs, you can train models that are as large as 1TB fully on GPU. By using the embedding training cache, you can train even larger models on the same nodes.

-

To achieve the best performance on different embeddings, use various embedding layer implementations. Each of these implementations target different practical training cases such as:

-
    -
  • LocalizedSlotEmbeddingHash: The features in the same slot (feature field) will be stored in one GPU, which is why it’s referred to as a “localized slot”, and different slots may be stored in different GPUs according to the index number of the slot. LocalizedSlotEmbedding is optimized for instances where each embedding is smaller than the memory size of the GPU. As local reduction for each slot is used in the LocalizedSlotEmbedding with no global reduction between GPUs, the overall data transaction in the LocalizedSlotEmbedding is much less than the DistributedSlotEmbedding.

    -

    Note: Make sure that there aren’t any duplicated keys in the input dataset.

    -
  • -
  • DistributedSlotEmbeddingHash: All the features, which are located in different feature fields / slots, are distributed to different GPUs according to the index number of the feature regardless of the slot index number. That means the features in the same slot may be stored in different GPUs, which is why it’s referred to as a “distributed slot”. Since global reduction is required, the DistributedSlotEmbedding was developed for cases where the embeddings are larger than the memory size of the GPU. DistributedSlotEmbedding has much more memory transactions between GPUs.

    -

    Note: Make sure that there aren’t any duplicated keys in the input dataset.

    -
  • -
  • LocalizedSlotEmbeddingOneHot: A specialized LocalizedSlotEmbedding that requires a one-hot data input. Each feature field must also be indexed from zero. For example, gender: 0,1; 1,2 wouldn’t be considered correctly indexed.

  • -
-
-
-

Multi-Node Training

-

Multi-node training makes it easy to train an embedding table of arbitrary size. In a multi-node solution, the sparse model, which is referred to as the embedding layer, is distributed across the nodes. Meanwhile, the dense model, such as DNN, is data parallel and contains a copy of the dense model in each GPU (see Fig. 2). With our implementation, HugeCTR leverages NCCL for high speed and scalable inter-node and intra-node communication.

-

To run with multiple nodes, HugeCTR should be built with OpenMPI. GPUDirect RDMA support is recommended for high performance. For more information, refer to our dcn_2node_8gpu.py file in the samples/dcn directory on GitHub.

-
-
-

Mixed Precision Training

-

Mixed precision training is supported to help improve and reduce the memory throughput footprint. In this mode, TensorCores are used to boost performance for matrix multiplication-based layers, such as FullyConnectedLayer and InteractionLayer, on Volta, Turing, and Ampere architectures. For the other layers, including embeddings, the data type is changed to FP16 so that both memory bandwidth and capacity are saved. To enable mixed precision mode, specify the mixed_precision option in the configuration file. When mixed_precision is set, the full FP16 pipeline will be triggered. Loss scaling will be applied to avoid the arithmetic underflow (see Fig. 5). Mixed precision training can be enabled using the configuration file.

-_images/fig4_arithmetic_underflow.png -
Fig. 1: Arithmetic Underflow
-



-
-
-

SGD Optimizer and Learning Rate Scheduling

-

Learning rate scheduling allows users to configure its hyperparameters, which include the following:

-
    -
  • learning_rate: Base learning rate.

  • -
  • warmup_steps: Number of initial steps used for warm-up.

  • -
  • decay_start: Specifies when the learning rate decay starts.

  • -
  • decay_steps: Decay period (in steps).

  • -
-

Fig. 6 illustrates how these hyperparameters interact with the actual learning rate.

-

For more information, refer to Python Interface.

-_images/learning_rate_scheduling.png -
Fig. 2: Learning Rate Scheduling
-



-
-
-

HugeCTR to ONNX Converter

-

The HugeCTR to Open Neural Network Exchange (ONNX) converter (hugectr2onnx) is a Python package that can convert HugeCTR models to ONNX. It can improve the compatibility of HugeCTR with other deep learning frameworks since ONNX serves as an open-source format for AI models.

-

After training with our HugeCTR Python APIs, you can get the files for dense models, sparse models, and graph configurations, which are required as inputs when using the hugectr2onnx.converter.convert API. Each HugeCTR layer will correspond to one or several ONNX operators, and the trained model weights will be loaded as initializers in the ONNX graph. You can convert both dense and sparse models or only dense models. -For more information, refer to the onnx_converter directory of the HugeCTR repository on GitHub and the hugectr2onnx_demo.ipynb sample notebook.

-
-
-

HDFS Support

-

HugeCTR supports interactions with HDFS during training, e.g. loading and dumping models and optimizer states from HDFS.

-

If you use the Merlin NGC container, you can build Hadoop by running the build-hadoop.sh script. -If you want to build HugeCTR from scratch, you should make sure that Hadoop is correctly built in your system and specify -DENABLE_HDFS=ON when you build HugeCTR with cmake.

-

After HDFS is successfully enabled, you are able to use our Python API to train with HDFS. An end-to-end demo notebook can be found at here.

-
-
-

Hierarchical Parameter Server

-

HugeCTR Hierarchical Parameter Server (HPS), an industry-leading distributed recommendation inference framework,that combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval ofembeddings for online model inference tasks. Among other things, our HPS features (1) redundant hierarchical storage, (2) a novelGPU-enabled high-bandwidth cache to accelerate parallel embedding lookup, (3) online training support and (4) light-weight APIs forintegration into existing large-scale recommendation workflow.

-

Try out our hugectr_wdl_prediction.ipynb Notebook. For more information, refer to Distributed Deployment.

-

For more information about Hierrachical Parameter Server, see the details for HPS Backend and HPS Database Backend.

-
-
-

Sparse Operation Kit

-

The Sparse Operation Kit (SOK) is a Python package that wraps -GPU-accelerated operations that are dedicated for sparse training -or inference cases. -The package is designed to be compatible with common deep learning -frameworks such as TensorFlow.

-

SOK provides a model-parallelism GPU embedding layer. -In sparse training or inference scenarios, such as click-through-rates, -there are large number of parameters that do not fit into memory on -a single GPU. -Common deep learning frameworks do not support model-parallelism (MP). -As a result, it is difficult to fully utilize all available GPUs in -a cluster to accelerate the whole training process.

-

For more information, see the Sparse Operation Kit documentation.

-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hugectr_talks_blogs.html b/review/pr-458/hugectr_talks_blogs.html deleted file mode 100644 index 357fa0bbdd..0000000000 --- a/review/pr-458/hugectr_talks_blogs.html +++ /dev/null @@ -1,362 +0,0 @@ - - - - - - - HugeCTR Talks and Blogs — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-

HugeCTR Talks and Blogs

- - - - - - - - - - - -

Web pages

NVIDIA Merlin on developer.nvidia.com

NVIDIA HugeCTR on developer.nvidia.com

-
-

Talks

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Conference / Website

Title

Date

Speaker

Language

Short Videos Episode 1

Merlin HugeCTR:GPU 加速的推荐系统框架

May 2022

Joey Wang

中文

Short Videos Episode 2

HugeCTR 分级参数服务器如何加速推理

May 2022

Joey Wang

中文

Short Videos Episode 3

使用 HugeCTR SOK 加速 TensorFlow 训练

May 2022

Gems Guo

中文

GTC Sping 2022

Merlin HugeCTR: Distributed Hierarchical Inference Parameter Server Using GPU Embedding Cache

March 2022

Matthias Langer, Yingcan Wei, Yu Fan

English

APSARA 2021

GPU 推荐系统 Merlin

Oct 2021

Joey Wang

中文

GTC Spring 2021

Learn how Tencent Deployed an Advertising System on the Merlin GPU Recommender Framework

April 2021

Xiangting Kong, Joey Wang

English

GTC Spring 2021

Merlin HugeCTR: Deep Dive Into Performance Optimization

April 2021

Minseok Lee

English

GTC Spring 2021

Integrate HugeCTR Embedding with TensorFlow

April 2021

Jianbing Dong

English

GTC China 2020

MERLIN HUGECTR :深入研究性能优化

Oct 2020

Minseok Lee

English

GTC China 2020

性能提升 7 倍 + 的高性能 GPU 广告推荐加速系统的落地实现

Oct 2020

Xiangting Kong

中文

GTC China 2020

使用 GPU EMBEDDING CACHE 加速 CTR 推理过程

Oct 2020

Fan Yu

中文

GTC China 2020

将 HUGECTR EMBEDDING 集成于 TENSORFLOW

Oct 2020

Jianbing Dong

中文

GTC Spring 2020

HugeCTR: High-Performance Click-Through Rate Estimation Training

March 2020

Minseok Lee, Joey Wang

English

GTC China 2019

HUGECTR: GPU 加速的推荐系统训练

Oct 2019

Joey Wang

中文

-
-
-

Blogs

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Conference / Website

Title

Date

Authors

Language

Wechat Blog

Merlin HugeCTR 分级参数服务器系列之三:集成到TensorFlow

Nov. 2022

Kingsley Liu

中文

NVIDIA Devblog

Scaling Recommendation System Inference with Merlin Hierarchical Parameter Server/使用 Merlin 分层参数服务器扩展推荐系统推理

August 2022

Shashank Verma, Wenwen Gao, Yingcan Wei, Matthias Langer, Jerry Shi, Fan Yu, Kingsley Liu, Minseok Lee

English/中文

NVIDIA Devblog

Merlin HugeCTR Sparse Operation Kit 系列之二

June 2022

Kunlun Li

中文

NVIDIA Devblog

Merlin HugeCTR Sparse Operation Kit 系列之一

March 2022

Gems Guo, Jianbing Dong

中文

Wechat Blog

Merlin HugeCTR 分级参数服务器系列之二

March 2022

Yingcan Wei, Matthias Langer, Jerry Shi

中文

Wechat Blog

Merlin HugeCTR 分级参数服务器系列之一

Jan. 2022

Yingcan Wei, Jerry Shi

中文

NVIDIA Devblog

Accelerating Embedding with the HugeCTR TensorFlow Embedding Plugin

Sept 2021

Vinh Nguyen, Ann Spencer, Joey Wang and Jianbing Dong

English

medium.com

Optimizing Meituan’s Machine Learning Platform: An Interview with Jun Huang

Sept 2021

Sheng Luo and Benedikt Schifferer

English

medium.com

Leading Design and Development of the Advertising Recommender System at Tencent: An Interview with Xiangting Kong

Sept 2021

Xiangting Kong, Ann Spencer

English

NVIDIA Devblog

扩展和加速大型深度学习推荐系统 – HugeCTR 系列第 1 部分

June 2021

Minseok Lee

中文

NVIDIA Devblog

使用 Merlin HugeCTR 的 Python API 训练大型深度学习推荐模型 – HugeCTR 系列第 2 部分

June 2021

Vinh Nguyen

中文

medium.com

Training large Deep Learning Recommender Models with Merlin HugeCTR’s Python APIs — HugeCTR Series Part 2

May 2021

Minseok Lee, Joey Wang, Vinh Nguyen and Ashish Sardana

English

medium.com

Scaling and Accelerating large Deep Learning Recommender Systems — HugeCTR Series Part 1

May 2021

Minseok Lee

English

IRS 2020

Merlin: A GPU Accelerated Recommendation Framework

Aug 2020

Even Oldridge etc.

English

NVIDIA Devblog

Introducing NVIDIA Merlin HugeCTR: A Training Framework Dedicated to Recommender Systems

July 2020

Minseok Lee and Joey Wang

English

-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/hugectr_user_guide.html b/review/pr-458/hugectr_user_guide.html deleted file mode 100644 index 5798e6f4b5..0000000000 --- a/review/pr-458/hugectr_user_guide.html +++ /dev/null @@ -1,303 +0,0 @@ - - - - - - - Introduction to HugeCTR — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Introduction to HugeCTR

- -
-

About HugeCTR

-

HugeCTR is a GPU-accelerated framework designed to distribute training across multiple GPUs and nodes and estimate click-through rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin. Merlin is used for building large-scale recommender systems, which require massive datasets to train, particularly for deep learning based solutions.

-


_images/merlin_arch.png

-
Fig. 1: Merlin Architecture
-



-

To prevent data loading from becoming a major bottleneck during training, HugeCTR contains a dedicated data reader that is inherently asynchronous and multi-threaded. It will read a batched set of data records in which each record consists of high-dimensional, extremely sparse, or categorical features. Each record can also include dense numerical features, which can be fed directly to the fully connected layers. An embedding layer is used to compress the sparse input features to lower-dimensional, dense embedding vectors. There are three GPU-accelerated embedding stages:

-
    -
  • Table lookup

  • -
  • Weight reduction within each slot

  • -
  • Weight concatenation across the slots

  • -
-

To enable large embedding training, the embedding table in HugeCTR is model parallel and distributed across all GPUs in a homogeneous cluster, which consists of multiple nodes. Each GPU has its own:

-
    -
  • feed-forward neural network (data parallelism) to estimate CTRs.

  • -
  • hash table to make the data preprocessing easier and enable dynamic insertion.

  • -
-

Embedding initialization is not required before training takes place since the input training data are hash values (64-bit signed integer type) instead of original indices. A pair of <key,value> (random small weight) will be inserted during runtime only when a new key appears in the training data and the hash table cannot find it.

-_images/fig1_hugectr_arch.png -
Fig. 2: HugeCTR Architecture
-



-Embedding architecture -
Fig. 3: Embedding Architecture
-



-_images/fig3_embedding_mech.png -
Fig. 4: Embedding Mechanism
-



-
-
-

Installing and Building HugeCTR

-

You can either install HugeCTR easily using the Merlin Docker image in NGC, or build HugeCTR from scratch using various build options if you’re an advanced user.

-
-

Compute Capability

-

We support the following compute capabilities:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Compute Capability

GPU

SM

6.0

NVIDIA P100 (Pascal)

60

7.0

NVIDIA V100 (Volta)

70

7.5

NVIDIA T4 (Turing)

75

8.0

NVIDIA A100 (Ampere)

80

9.0

NVIDIA H100 (Hopper)

90

-
-
-

Installing HugeCTR Using NGC Containers

-

All NVIDIA Merlin components are available as open source projects. However, a more convenient way to utilize these components is by using our Merlin NGC containers. These containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. When installing HugeCTR using NGC containers, the application environment remains portable, consistent, reproducible, and agnostic to the underlying host system’s software configuration.

-

HugeCTR is included in the Merlin Docker containers that are available from the NVIDIA container repository. -You can query the collection for containers that match the HugeCTR label. -The following table also identifies the containers:

- - - - - - - - - - - - - - - - - -

Container Name

Container Location

Functionality

merlin-hugectr

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr

NVTabular, HugeCTR, and Triton Inference

merlin-tensorflow

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow

NVTabular, TensorFlow, and HugeCTR Tensorflow Embedding plugin

-

To use these Docker containers, you’ll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers.

-

The following sample command pulls and starts the Merlin Training container:

-
# Run the container in interactive mode
-$ docker run --gpus=all --rm -it --cap-add SYS_NICE nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-
-
-

Building HugeCTR from Scratch

-

To build HugeCTR from scratch, refer to the contributor guide information.

-
-
-
-

Tools

-

We currently support the following tools:

-
    -
  • Data Generator: A configurable data generator, which is available from the Python interface, can be used to generate a synthetic dataset for benchmarking and research purposes.

  • -
  • Preprocessing Script: We provide a set of scripts that form a template implementation to demonstrate how complex datasets, such as the original Criteo dataset, can be converted into HugeCTR using supported dataset formats such as Norm and RAW. It’s used in all of our samples to prepare the data and train various recommender models.

  • -
-
-

Generating Synthetic Data and Benchmarks

-

The Norm (with Header) and Raw (without Header) datasets can be generated with hugectr.tools.DataGenerator. For categorical features, you can configure the probability distribution to be uniform or power-law within hugectr.tools.DataGeneratorParam. The default distribution is power law with alpha = 1.2.

-
    -
  • Generate the Norm dataset for DCN and start training the HugeCTR model:

    -
    python3 ../tools/data_generator/dcn_norm_generate_train.py
    -
    -
    -
  • -
  • Generate the Norm dataset for WDL and start training the HugeCTR model:

    -
    python3 ../tools/data_generator/wdl_norm_generate_train.py
    -
    -
    -
  • -
  • Generate the Raw dataset for DLRM and start training the HugeCTR model:

    -
    python3 ../tools/data_generator/dlrm_raw_generate_train.py
    -
    -
    -
  • -
  • Generate the Parquet dataset for DCN and start training the HugeCTR model:

    -
    python3 ../tools/data_generator/dcn_parquet_generate_train.py
    -
    -
    -
  • -
-
-
-

Downloading and Preprocessing Datasets

-

Download the Criteo 1TB Click Logs dataset using HugeCTR/tools/preprocess.sh and preprocess it to train the DCN. The file_list.txt, file_list_test.txt, and preprocessed data files are available within the criteo_data directory. For more information, refer to the samples directory on GitHub.

-

For example:

-
$ cd tools # assume that the downloaded dataset is here
-$ bash preprocess.sh 1 criteo_data pandas 1 0
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/index.html b/review/pr-458/index.html deleted file mode 100644 index f569b720da..0000000000 --- a/review/pr-458/index.html +++ /dev/null @@ -1,158 +0,0 @@ - - - - - - - Merlin HugeCTR — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Merlin HugeCTR

-

Merlin HugeCTR is an open source library and provides a GPU-accelerated recommender framework -that is designed to distribute training across multiple GPUs and nodes and estimate click-through rates (CTRs).

-

For more information, see the Introduction.

- -
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/embedding_collection.html b/review/pr-458/notebooks/embedding_collection.html deleted file mode 100644 index ef1351a0b4..0000000000 --- a/review/pr-458/notebooks/embedding_collection.html +++ /dev/null @@ -1,1634 +0,0 @@ - - - - - - - HugeCTR Embedding Collection — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_embedding-collection/nvidia_logo.png -
-

HugeCTR Embedding Collection

-
-

About this Notebook

-

This notebook shows how to use an embedding collection in a DLRM model with the Criteo dataset for training and evaluation.

-

It shows two key feature usage in embedding collection:

-
    -
  1. How to configure table place strategy.

  2. -
  3. How to use dynamic hash table.

  4. -
-
-
-

Concepts and API Reference

-

The following key classes are used in this notebook:

-
    -
  • hugectr.EmbeddingTableConfig

  • -
  • hugectr.EmbeddingCollectionConfig

  • -
-

For the concepts and API reference information about the classes and file, see the Overview of Using the HugeCTR Embedding Collection in the HugeCTR Layer Classes and Methods information.

-
-
-

Setup

-

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

-
-
-

Use an Embedding Collection with a DLRM Model

-
-

Data Preparation

-

To download and prepare the dataset we will be doing the following steps. At the end of this cell, we provide the shell commands you can run on the terminal to get the data ready for this notebook.

-

Note: If you already have the data downloaded, then skip to the preprocessing step (2). If preprocessing is also done, skip to creating the softlink between the processed data to the notebooks/ directory (3).

-
    -
  1. Download the Criteo dataset

  2. -
-

To preprocess the downloaded Kaggle Criteo dataset, we’ll make the following operations:

-
    -
  • Reduce the amounts of data to speed up the preprocessing

  • -
  • Fill missing values

  • -
  • Remove the feature values whose occurrences are very rare, etc.

  • -
-
    -
  1. Preprocessing by Pandas:

    -

    Meanings of the command line arguments:

    -
      -
    • The 1st argument represents the dataset postfix. It is 1 here since day_1 is used.

    • -
    • The 2nd argument wdl_data is where the preprocessed data is stored.

    • -
    • The 3rd argument pandas is the processing script going to use, here we choose pandas.

    • -
    • The 4th argument 1 embodies that the normalization is applied to dense features.

    • -
    • The 5th argument 1 means that the feature crossing is applied.

    • -
    • The 6th argument 100 means the number of data files in each file list.

    • -
    -

    For more details about the data preprocessing, please refer to the “Preprocess the Criteo Dataset” section of the README in the samples/criteo directory of the repository on GitHub.

    -
  2. -
  3. Create a soft link of the dataset folder to the path of this notebook

  4. -
-
-

Run the following commands on the terminal to prepare the data for this notebook

-
export project_root=/home/hugectr # set this to the directory where hugectr is downloaded
-cd ${project_root}/tools
-# Step 1
-wget https://storage.googleapis.com/criteo-cail-datasets/day_0.gz
-#Step 2
-bash preprocess.sh 0 deepfm_data_nvt nvt 1 0 0
-#Step 3
-ln -s ${project_root}/tools/deepfm_data_nvt ${project_root}/notebooks/deepfm_data_nvt
-
-
-
-
-
-

Prepare the Training Script

-

This notebook was developed with on single DGX-1 to run the DLRM model in this notebook. The GPU info in DGX-1 is as follows. It consists of 8 V100-SXM2 GPUs.

-
-
-
! nvidia-smi
-
-
-
-
-
Thu Jun 23 00:14:56 2022       
-+-----------------------------------------------------------------------------+
-| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.6     |
-|-------------------------------+----------------------+----------------------+
-| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
-| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
-|                               |                      |               MIG M. |
-|===============================+======================+======================|
-|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
-| N/A   33C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
-| N/A   35C    P0    45W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
-| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
-| N/A   33C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
-| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
-| N/A   35C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
-| N/A   36C    P0    44W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
-| N/A   34C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-                                                                               
-+-----------------------------------------------------------------------------+
-| Processes:                                                                  |
-|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
-|        ID   ID                                                   Usage      |
-|=============================================================================|
-|  No running processes found                                                 |
-+-----------------------------------------------------------------------------+
-
-
-
-
-

The training script, dlrm_train.py, uses the the embedding collection API. -The script accepts argument that specifies the table placement strategy and use_dynamic_hash_table so we can run the script several times and evaluate different table placement strategy & use_dynamic_hash_table:

-
-
-
%%writefile dlrm_train.py
-"""
- Copyright (c) 2023, NVIDIA CORPORATION.
- 
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
-     http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-"""
-
-import hugectr
-import argparse
-from mpi4py import MPI
-
-parser = argparse.ArgumentParser(description="HugeCTR Embedding Collection DLRM model training script.")
-parser.add_argument(
-    "--shard_plan",
-    help="shard strategy",
-    type=str,
-    choices=["round_robin", "uniform", "hybrid"],
-)
-parser.add_argument(
-    "--use_dynamic_hash_table",
-    action="store_true",
-)
-args = parser.parse_args()
-
-
-def generate_shard_plan(slot_size_array, num_gpus):
-    if args.shard_plan == "round_robin":
-        shard_strategy = [("mp", [str(i) for i in range(len(slot_size_array))])]
-        shard_matrix = [[] for _ in range(num_gpus)]
-        for i, table_id in enumerate(range(len(slot_size_array))):
-            target_gpu = i % num_gpus
-            shard_matrix[target_gpu].append(str(table_id))
-    elif args.shard_plan == "uniform":
-        shard_strategy = [("mp", [str(i) for i in range(len(slot_size_array))])]
-        shard_matrix = [[] for _ in range(num_gpus)]
-        for table_id in range(len(slot_size_array)):
-            for gpu_id in range(num_gpus):
-                shard_matrix[gpu_id].append(str(table_id))
-    elif args.shard_plan == "hybrid":
-        mp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] > 6000]
-        dp_table = [i for i in range(len(slot_size_array)) if slot_size_array[i] <= 6000]
-        shard_matrix = [[] for _ in range(num_gpus)]
-        shard_strategy = [("mp", [str(i) for i in mp_table]), ("dp", [str(i) for i in dp_table])]
-
-        for table_id in dp_table:
-            for gpu_id in range(num_gpus):
-                shard_matrix[gpu_id].append(str(table_id))
-
-        for i, table_id in enumerate(mp_table):
-            target_gpu = i % num_gpus
-            shard_matrix[target_gpu].append(str(table_id))
-    else:
-        raise Exception(args.shard_plan + " is not supported")
-    return shard_matrix, shard_strategy
-
-
-solver = hugectr.CreateSolver(
-    max_eval_batches=70,
-    batchsize_eval=65536,
-    batchsize=65536,
-    lr=0.5,
-    warmup_steps=300,
-    vvgpu=[[0, 1, 2, 3, 4, 5, 6, 7]],
-    repeat_dataset=True,
-    i64_input_key=True,
-    metrics_spec={hugectr.MetricsType.AverageLoss: 0.0},
-    use_embedding_collection=True,
-)
-slot_size_array = [
-    203931,
-    18598,
-    14092,
-    7012,
-    18977,
-    4,
-    6385,
-    1245,
-    49,
-    186213,
-    71328,
-    67288,
-    11,
-    2168,
-    7338,
-    61,
-    4,
-    932,
-    15,
-    204515,
-    141526,
-    199433,
-    60919,
-    9137,
-    71,
-    34,
-]
-reader = hugectr.DataReaderParams(
-    data_reader_type=hugectr.DataReaderType_t.Parquet,
-    source=["./criteo_data/train/_file_list.txt"],
-    eval_source="./criteo_data/val/_file_list.txt",
-    check_type=hugectr.Check_t.Non,
-)
-optimizer = hugectr.CreateOptimizer(
-    optimizer_type=hugectr.Optimizer_t.SGD, update_type=hugectr.Update_t.Local, atomic_update=True
-)
-model = hugectr.Model(solver, reader, optimizer)
-
-num_embedding = 26
-
-model.add(
-    hugectr.Input(
-        label_dim=1,
-        label_name="label",
-        dense_dim=13,
-        dense_name="dense",
-        data_reader_sparse_param_array=[
-            hugectr.DataReaderSparseParam("data{}".format(i), 1, False, 1)
-            for i in range(num_embedding)
-        ],
-    )
-)
-
-# create embedding table
-embedding_table_list = []
-for i in range(num_embedding):
-    embedding_table_list.append(
-        hugectr.EmbeddingTableConfig(
-            name=str(i), max_vocabulary_size=-1 if args.use_dynamic_hash_table else slot_size_array[i], ev_size=128
-        )
-    )
-# create ebc config
-ebc_config = hugectr.EmbeddingCollectionConfig(use_exclusive_keys=True)
-emb_vec_list = []
-for i in range(num_embedding):
-    ebc_config.embedding_lookup(
-        table_config=embedding_table_list[i],
-        bottom_name="data{}".format(i),
-        top_name="emb_vec{}".format(i),
-        combiner="sum",
-    )
-shard_matrix, shard_strategy = generate_shard_plan(slot_size_array, 8)
-ebc_config.shard(shard_matrix=shard_matrix, shard_strategy=shard_strategy)
-
-model.add(ebc_config)
-# need concat
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Concat,
-        bottom_names=["emb_vec{}".format(i) for i in range(num_embedding)],
-        top_names=["sparse_embedding1"],
-    )
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["dense"],
-        top_names=["fc1"],
-        num_output=512,
-    )
-)
-
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"])
-)
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu1"],
-        top_names=["fc2"],
-        num_output=256,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc2"], top_names=["relu2"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu2"],
-        top_names=["fc3"],
-        num_output=128,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc3"], top_names=["relu3"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Interaction,  # interaction only support 3-D input
-        bottom_names=["relu3", "sparse_embedding1"],
-        top_names=["interaction1"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["interaction1"],
-        top_names=["fc4"],
-        num_output=1024,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc4"], top_names=["relu4"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu4"],
-        top_names=["fc5"],
-        num_output=1024,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc5"], top_names=["relu5"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu5"],
-        top_names=["fc6"],
-        num_output=512,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc6"], top_names=["relu6"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu6"],
-        top_names=["fc7"],
-        num_output=256,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc7"], top_names=["relu7"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu7"],
-        top_names=["fc8"],
-        num_output=1,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["fc8", "label"],
-        top_names=["loss"],
-    )
-)
-model.compile()
-model.summary()
-model.fit(max_iter=1000, display=100, eval_interval=100, snapshot=10000000, snapshot_prefix="dlrm")
-
-
-
-
-
Overwriting dlrm_train.py
-
-
-
-
-
-
-

Embedding Table Placement Strategy: Round Robin

-

In this Embedding Table Placement Strategy, we place each table on single GPU in a round robin way.

-
-
-
!python3 dlrm_train.py --shard_plan round_robin
-
-
-
-
-
HugeCTR Version: 23.2
-====================================================Model Init=====================================================
-[HCTR][10:25:19.539][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][10:25:19.539][INFO][RK0][main]: Global seed is 3508545476
-[HCTR][10:25:19.637][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-  GPU 1 ->  node 0
-  GPU 2 ->  node 0
-  GPU 3 ->  node 0
-  GPU 4 ->  node 1
-  GPU 5 ->  node 1
-  GPU 6 ->  node 1
-  GPU 7 ->  node 1
-[HCTR][10:25:30.608][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][10:25:30.608][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.4714 
-[HCTR][10:25:30.608][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 30.4441 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 30.5378 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 30.5339 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 30.4636 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 30.4480 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 30.4949 
-[HCTR][10:25:30.609][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 30.5183 
-[HCTR][10:25:30.609][INFO][RK0][main]: Start all2all warmup
-[HCTR][10:25:30.772][INFO][RK0][main]: End all2all warmup
-[HCTR][10:25:30.783][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][10:25:30.789][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.790][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.790][INFO][RK0][main]: Device 2: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.791][INFO][RK0][main]: Device 3: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.792][INFO][RK0][main]: Device 4: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.792][INFO][RK0][main]: Device 5: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.793][INFO][RK0][main]: Device 6: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.793][INFO][RK0][main]: Device 7: Tesla V100-SXM2-32GB
-[HCTR][10:25:30.919][INFO][RK0][main]: eval source ./deepfm_data_nvt/val/_file_list.txt max_row_group_size 133678
-[HCTR][10:25:31.022][INFO][RK0][main]: train source ./deepfm_data_nvt/train/_file_list.txt max_row_group_size 134102
-[HCTR][10:25:31.027][INFO][RK0][main]: num of DataReader workers for train: 8
-[HCTR][10:25:31.027][INFO][RK0][main]: num of DataReader workers for eval: 8
-[HCTR][10:25:31.029][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 30.0457 
-[HCTR][10:25:31.030][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 30.0183 
-[HCTR][10:25:31.032][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.1121 
-[HCTR][10:25:31.033][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.1082 
-[HCTR][10:25:31.035][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 30.0378 
-[HCTR][10:25:31.037][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 30.0222 
-[HCTR][10:25:31.038][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 30.0691 
-[HCTR][10:25:31.039][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0925 
-[HCTR][10:25:31.041][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 29.9636 
-[HCTR][10:25:31.043][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 29.9363 
-[HCTR][10:25:31.044][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.0300 
-[HCTR][10:25:31.046][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.0261 
-[HCTR][10:25:31.047][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 29.9558 
-[HCTR][10:25:31.049][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 29.9402 
-[HCTR][10:25:31.050][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 29.9871 
-[HCTR][10:25:31.052][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0105 
-[HCTR][10:25:31.114][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.6863 
-[HCTR][10:25:31.224][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.6589 
-[HCTR][10:25:31.330][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.7527 
-[HCTR][10:25:31.474][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.7488 
-[HCTR][10:25:31.564][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.6785 
-[HCTR][10:25:31.646][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.6628 
-[HCTR][10:25:31.755][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.7097 
-[HCTR][10:25:31.836][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.7332 
-[HCTR][10:25:32.040][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.4089 
-[HCTR][10:25:32.175][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.3816 
-[HCTR][10:25:32.319][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.4753 
-[HCTR][10:25:32.467][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.4714 
-[HCTR][10:25:32.617][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.4011 
-[HCTR][10:25:32.768][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.3855 
-[HCTR][10:25:32.921][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.4324 
-[HCTR][10:25:33.063][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.4558 
-[HCTR][10:25:33.221][INFO][RK0][main]: Vocabulary size: 0
-[HCTR][10:25:33.406][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.7546 
-[HCTR][10:25:33.408][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 28.7253 
-[HCTR][10:25:33.409][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 28.9402 
-[HCTR][10:25:33.410][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 28.8425 
-[HCTR][10:25:33.412][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.8308 
-[HCTR][10:25:33.413][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 28.7957 
-[HCTR][10:25:33.414][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.9031 
-[HCTR][10:25:33.415][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 28.9578 
-[HCTR][10:25:33.417][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.6531 
-[HCTR][10:25:33.418][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 28.6238 
-[HCTR][10:25:33.420][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 28.8386 
-[HCTR][10:25:33.421][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 28.7410 
-[HCTR][10:25:33.422][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.7292 
-[HCTR][10:25:33.424][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 28.6941 
-[HCTR][10:25:33.425][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.8015 
-[HCTR][10:25:33.426][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 28.8562 
-[HCTR][10:25:33.558][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][10:25:33.564][DEBUG][RK0][main]: [device 0] allocating 1.4051 GB, available 27.1921 
-[HCTR][10:25:33.567][DEBUG][RK0][main]: [device 1] allocating 1.4051 GB, available 27.1628 
-[HCTR][10:25:33.570][DEBUG][RK0][main]: [device 2] allocating 1.4051 GB, available 27.3777 
-[HCTR][10:25:33.573][DEBUG][RK0][main]: [device 3] allocating 1.4051 GB, available 27.2800 
-[HCTR][10:25:33.576][DEBUG][RK0][main]: [device 4] allocating 1.4051 GB, available 27.2683 
-[HCTR][10:25:33.579][DEBUG][RK0][main]: [device 5] allocating 1.4051 GB, available 27.2332 
-[HCTR][10:25:33.582][DEBUG][RK0][main]: [device 6] allocating 1.4051 GB, available 27.3406 
-[HCTR][10:25:33.585][DEBUG][RK0][main]: [device 7] allocating 1.4051 GB, available 27.3953 
-[HCTR][10:25:33.587][DEBUG][RK0][main]: [device 0] allocating 0.0088 GB, available 27.1824 
-[HCTR][10:25:33.588][DEBUG][RK0][main]: [device 1] allocating 0.0088 GB, available 27.1531 
-[HCTR][10:25:33.589][DEBUG][RK0][main]: [device 2] allocating 0.0088 GB, available 27.3679 
-[HCTR][10:25:33.590][DEBUG][RK0][main]: [device 3] allocating 0.0088 GB, available 27.2703 
-[HCTR][10:25:33.591][DEBUG][RK0][main]: [device 4] allocating 0.0088 GB, available 27.2585 
-[HCTR][10:25:33.592][DEBUG][RK0][main]: [device 5] allocating 0.0088 GB, available 27.2234 
-[HCTR][10:25:33.593][DEBUG][RK0][main]: [device 6] allocating 0.0088 GB, available 27.3308 
-[HCTR][10:25:33.595][DEBUG][RK0][main]: [device 7] allocating 0.0088 GB, available 27.3855 
-
-
-
===================================================Model Summary===================================================
-[HCTR][10:26:11.457][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
-(8192,1)                                (8192,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-EmbeddingCollection0                    data0                         emb_vec0                      (8192,1,128)                  
-                                        data1                         emb_vec1                      (8192,1,128)                  
-                                        data2                         emb_vec2                      (8192,1,128)                  
-                                        data3                         emb_vec3                      (8192,1,128)                  
-                                        data4                         emb_vec4                      (8192,1,128)                  
-                                        data5                         emb_vec5                      (8192,1,128)                  
-                                        data6                         emb_vec6                      (8192,1,128)                  
-                                        data7                         emb_vec7                      (8192,1,128)                  
-                                        data8                         emb_vec8                      (8192,1,128)                  
-                                        data9                         emb_vec9                      (8192,1,128)                  
-                                        data10                        emb_vec10                     (8192,1,128)                  
-                                        data11                        emb_vec11                     (8192,1,128)                  
-                                        data12                        emb_vec12                     (8192,1,128)                  
-                                        data13                        emb_vec13                     (8192,1,128)                  
-                                        data14                        emb_vec14                     (8192,1,128)                  
-                                        data15                        emb_vec15                     (8192,1,128)                  
-                                        data16                        emb_vec16                     (8192,1,128)                  
-                                        data17                        emb_vec17                     (8192,1,128)                  
-                                        data18                        emb_vec18                     (8192,1,128)                  
-                                        data19                        emb_vec19                     (8192,1,128)                  
-                                        data20                        emb_vec20                     (8192,1,128)                  
-                                        data21                        emb_vec21                     (8192,1,128)                  
-                                        data22                        emb_vec22                     (8192,1,128)                  
-                                        data23                        emb_vec23                     (8192,1,128)                  
-                                        data24                        emb_vec24                     (8192,1,128)                  
-                                        data25                        emb_vec25                     (8192,1,128)                  
-------------------------------------------------------------------------------------------------------------------
-Concat                                  emb_vec0                      sparse_embedding1             (8192,26,128)                 
-                                        emb_vec1                                                                                  
-                                        emb_vec2                                                                                  
-                                        emb_vec3                                                                                  
-                                        emb_vec4                                                                                  
-                                        emb_vec5                                                                                  
-                                        emb_vec6                                                                                  
-                                        emb_vec7                                                                                  
-                                        emb_vec8                                                                                  
-                                        emb_vec9                                                                                  
-                                        emb_vec10                                                                                 
-                                        emb_vec11                                                                                 
-                                        emb_vec12                                                                                 
-                                        emb_vec13                                                                                 
-                                        emb_vec14                                                                                 
-                                        emb_vec15                                                                                 
-                                        emb_vec16                                                                                 
-                                        emb_vec17                                                                                 
-                                        emb_vec18                                                                                 
-                                        emb_vec19                                                                                 
-                                        emb_vec20                                                                                 
-                                        emb_vec21                                                                                 
-                                        emb_vec22                                                                                 
-                                        emb_vec23                                                                                 
-                                        emb_vec24                                                                                 
-                                        emb_vec25                                                                                 
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dense                         fc1                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc3                           relu3                         (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-Interaction                             relu3                         interaction1                  (8192,480)                    
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            interaction1                  fc4                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc4                           relu4                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu4                         fc5                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc5                           relu5                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu5                         fc6                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc6                           relu6                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu6                         fc7                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc7                           relu7                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu7                         fc8                           (8192,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc8                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][10:26:11.457][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
-[HCTR][10:26:11.457][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
-[HCTR][10:26:11.457][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
-[HCTR][10:26:11.457][INFO][RK0][main]: Dense network trainable: True
-[HCTR][10:26:11.457][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][10:26:11.457][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
-[HCTR][10:26:11.457][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][10:26:11.457][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
-[HCTR][10:26:11.458][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
-
-
-
[HCTR][10:26:15.652][INFO][RK0][main]: Evaluation, AverageLoss: 0.14373
-[HCTR][10:26:15.652][INFO][RK0][main]: Eval Time for 70 iters: 1.24478s
-[HCTR][10:26:15.697][INFO][RK0][main]: Iter: 100 Time(100 iters): 4.23782s Loss: 0.142604 lr:0.168333
-[HCTR][10:26:19.865][INFO][RK0][main]: Evaluation, AverageLoss: 0.142137
-[HCTR][10:26:19.865][INFO][RK0][main]: Eval Time for 70 iters: 1.25698s
-[HCTR][10:26:19.899][INFO][RK0][main]: Iter: 200 Time(100 iters): 4.19912s Loss: 0.142685 lr:0.335
-[HCTR][10:26:24.035][INFO][RK0][main]: Evaluation, AverageLoss: 0.1404
-[HCTR][10:26:24.035][INFO][RK0][main]: Eval Time for 70 iters: 1.24589s
-[HCTR][10:26:24.080][INFO][RK0][main]: Iter: 300 Time(100 iters): 4.18021s Loss: 0.143021 lr:0.5
-[HCTR][10:26:28.211][INFO][RK0][main]: Evaluation, AverageLoss: 0.139695
-[HCTR][10:26:28.211][INFO][RK0][main]: Eval Time for 70 iters: 1.25073s
-[HCTR][10:26:28.245][INFO][RK0][main]: Iter: 400 Time(100 iters): 4.16407s Loss: 0.141111 lr:0.5
-[HCTR][10:26:32.375][INFO][RK0][main]: Evaluation, AverageLoss: 0.13893
-[HCTR][10:26:32.375][INFO][RK0][main]: Eval Time for 70 iters: 1.24958s
-[HCTR][10:26:32.419][INFO][RK0][main]: Iter: 500 Time(100 iters): 4.17112s Loss: 0.141069 lr:0.5
-[HCTR][10:26:36.558][INFO][RK0][main]: Evaluation, AverageLoss: 0.138218
-[HCTR][10:26:36.558][INFO][RK0][main]: Eval Time for 70 iters: 1.25123s
-[HCTR][10:26:36.606][INFO][RK0][main]: Iter: 600 Time(100 iters): 4.18422s Loss: 0.135439 lr:0.5
-[HCTR][10:26:40.759][INFO][RK0][main]: Evaluation, AverageLoss: 0.137244
-[HCTR][10:26:40.759][INFO][RK0][main]: Eval Time for 70 iters: 1.25471s
-[HCTR][10:26:40.803][INFO][RK0][main]: Iter: 700 Time(100 iters): 4.19334s Loss: 0.139792 lr:0.5
-[HCTR][10:26:44.933][INFO][RK0][main]: Evaluation, AverageLoss: 0.136812
-[HCTR][10:26:44.933][INFO][RK0][main]: Eval Time for 70 iters: 1.2416s
-[HCTR][10:26:44.979][INFO][RK0][main]: Iter: 800 Time(100 iters): 4.17574s Loss: 0.140519 lr:0.5
-[HCTR][10:26:49.115][INFO][RK0][main]: Evaluation, AverageLoss: 0.135968
-[HCTR][10:26:49.116][INFO][RK0][main]: Eval Time for 70 iters: 1.25386s
-[HCTR][10:26:49.163][INFO][RK0][main]: Iter: 900 Time(100 iters): 4.18238s Loss: 0.134846 lr:0.5
-[HCTR][10:26:53.291][INFO][RK0][main]: Evaluation, AverageLoss: 0.134873
-[HCTR][10:26:53.292][INFO][RK0][main]: Eval Time for 70 iters: 1.23619s
-[HCTR][10:26:53.292][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 41.83s.
-
-
-
-
-
-
-

Embedding Table Placement Strategy: Uniform

-

In this Embedding Table Placement Strategy, we place each table on all 8 GPUs.

-
-
-
!python3 dlrm_train.py --shard_plan uniform
-
-
-
-
-
HugeCTR Version: 23.3
-====================================================Model Init=====================================================
-[HCTR][06:33:37.284][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][06:33:37.284][INFO][RK0][main]: Global seed is 3445591887
-[HCTR][06:33:37.408][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-  GPU 1 ->  node 0
-  GPU 2 ->  node 0
-  GPU 3 ->  node 0
-  GPU 4 ->  node 1
-  GPU 5 ->  node 1
-  GPU 6 ->  node 1
-  GPU 7 ->  node 1
-[HCTR][06:33:56.383][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][06:33:56.384][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.4714 
-[HCTR][06:33:56.385][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 30.4441 
-[HCTR][06:33:56.385][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 30.5378 
-[HCTR][06:33:56.385][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 30.5339 
-[HCTR][06:33:56.385][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 30.4636 
-[HCTR][06:33:56.386][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 30.4480 
-[HCTR][06:33:56.386][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 30.4949 
-[HCTR][06:33:56.386][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 30.5183 
-[HCTR][06:33:56.386][INFO][RK0][main]: Start all2all warmup
-[HCTR][06:33:56.628][INFO][RK0][main]: End all2all warmup
-[HCTR][06:33:56.643][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][06:33:56.650][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.650][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.651][INFO][RK0][main]: Device 2: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.652][INFO][RK0][main]: Device 3: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.652][INFO][RK0][main]: Device 4: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.653][INFO][RK0][main]: Device 5: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.654][INFO][RK0][main]: Device 6: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.654][INFO][RK0][main]: Device 7: Tesla V100-SXM2-32GB
-[HCTR][06:33:56.785][INFO][RK0][main]: eval source ./deepfm_data_nvt/val/_file_list.txt max_row_group_size 133678
-[HCTR][06:33:56.939][INFO][RK0][main]: train source ./deepfm_data_nvt/train/_file_list.txt max_row_group_size 134102
-[HCTR][06:33:56.946][INFO][RK0][main]: num of DataReader workers for train: 8
-[HCTR][06:33:56.946][INFO][RK0][main]: num of DataReader workers for eval: 8
-[HCTR][06:33:56.997][DEBUG][RK0][main]: [device 0] allocating 0.0258 GB, available 30.0417 
-[HCTR][06:33:57.001][DEBUG][RK0][main]: [device 1] allocating 0.0258 GB, available 30.0144 
-[HCTR][06:33:57.006][DEBUG][RK0][main]: [device 2] allocating 0.0258 GB, available 30.1082 
-[HCTR][06:33:57.011][DEBUG][RK0][main]: [device 3] allocating 0.0258 GB, available 30.1042 
-[HCTR][06:33:57.015][DEBUG][RK0][main]: [device 4] allocating 0.0258 GB, available 30.0339 
-[HCTR][06:33:57.020][DEBUG][RK0][main]: [device 5] allocating 0.0258 GB, available 30.0183 
-[HCTR][06:33:57.024][DEBUG][RK0][main]: [device 6] allocating 0.0258 GB, available 30.0652 
-[HCTR][06:33:57.029][DEBUG][RK0][main]: [device 7] allocating 0.0258 GB, available 30.0886 
-[HCTR][06:33:57.071][DEBUG][RK0][main]: [device 0] allocating 0.0258 GB, available 29.9558 
-[HCTR][06:33:57.075][DEBUG][RK0][main]: [device 1] allocating 0.0258 GB, available 29.9285 
-[HCTR][06:33:57.080][DEBUG][RK0][main]: [device 2] allocating 0.0258 GB, available 30.0222 
-[HCTR][06:33:57.084][DEBUG][RK0][main]: [device 3] allocating 0.0258 GB, available 30.0183 
-[HCTR][06:33:57.088][DEBUG][RK0][main]: [device 4] allocating 0.0258 GB, available 29.9480 
-[HCTR][06:33:57.092][DEBUG][RK0][main]: [device 5] allocating 0.0258 GB, available 29.9324 
-[HCTR][06:33:57.097][DEBUG][RK0][main]: [device 6] allocating 0.0258 GB, available 29.9792 
-[HCTR][06:33:57.101][DEBUG][RK0][main]: [device 7] allocating 0.0258 GB, available 30.0027 
-[HCTR][06:33:59.332][INFO][RK0][main]: Vocabulary size: 0
-[HCTR][06:33:59.745][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 25.9753 
-[HCTR][06:33:59.746][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 25.9480 
-[HCTR][06:33:59.748][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 26.0417 
-[HCTR][06:33:59.749][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 26.0378 
-[HCTR][06:33:59.751][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 25.9675 
-[HCTR][06:33:59.752][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 25.9519 
-[HCTR][06:33:59.754][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 25.9988 
-[HCTR][06:33:59.756][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 26.0222 
-[HCTR][06:33:59.757][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 25.8738 
-[HCTR][06:33:59.759][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 25.8464 
-[HCTR][06:33:59.760][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 25.9402 
-[HCTR][06:33:59.762][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 25.9363 
-[HCTR][06:33:59.763][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 25.8660 
-[HCTR][06:33:59.765][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 25.8503 
-[HCTR][06:33:59.767][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 25.8972 
-[HCTR][06:33:59.768][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 25.9207 
-[HCTR][06:33:59.911][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][06:33:59.917][DEBUG][RK0][main]: [device 0] allocating 1.4051 GB, available 24.4128 
-[HCTR][06:33:59.921][DEBUG][RK0][main]: [device 1] allocating 1.4051 GB, available 24.3855 
-[HCTR][06:33:59.924][DEBUG][RK0][main]: [device 2] allocating 1.4051 GB, available 24.4792 
-[HCTR][06:33:59.927][DEBUG][RK0][main]: [device 3] allocating 1.4051 GB, available 24.4753 
-[HCTR][06:33:59.930][DEBUG][RK0][main]: [device 4] allocating 1.4051 GB, available 24.4050 
-[HCTR][06:33:59.934][DEBUG][RK0][main]: [device 5] allocating 1.4051 GB, available 24.3894 
-[HCTR][06:33:59.937][DEBUG][RK0][main]: [device 6] allocating 1.4051 GB, available 24.4363 
-[HCTR][06:33:59.940][DEBUG][RK0][main]: [device 7] allocating 1.4051 GB, available 24.4597 
-[HCTR][06:33:59.941][DEBUG][RK0][main]: [device 0] allocating 0.0088 GB, available 24.4031 
-[HCTR][06:33:59.942][DEBUG][RK0][main]: [device 1] allocating 0.0088 GB, available 24.3757 
-[HCTR][06:33:59.944][DEBUG][RK0][main]: [device 2] allocating 0.0088 GB, available 24.4695 
-[HCTR][06:33:59.945][DEBUG][RK0][main]: [device 3] allocating 0.0088 GB, available 24.4656 
-[HCTR][06:33:59.946][DEBUG][RK0][main]: [device 4] allocating 0.0088 GB, available 24.3953 
-[HCTR][06:33:59.947][DEBUG][RK0][main]: [device 5] allocating 0.0088 GB, available 24.3796 
-[HCTR][06:33:59.948][DEBUG][RK0][main]: [device 6] allocating 0.0088 GB, available 24.4265 
-[HCTR][06:33:59.950][DEBUG][RK0][main]: [device 7] allocating 0.0088 GB, available 24.4500 
-===================================================Model Summary===================================================
-[HCTR][06:34:37.841][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
-(8192,1)                                (8192,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-EmbeddingCollection0                    data0                         emb_vec0                      (8192,1,128)                  
-                                        data1                         emb_vec1                      (8192,1,128)                  
-                                        data2                         emb_vec2                      (8192,1,128)                  
-                                        data3                         emb_vec3                      (8192,1,128)                  
-                                        data4                         emb_vec4                      (8192,1,128)                  
-                                        data5                         emb_vec5                      (8192,1,128)                  
-                                        data6                         emb_vec6                      (8192,1,128)                  
-                                        data7                         emb_vec7                      (8192,1,128)                  
-                                        data8                         emb_vec8                      (8192,1,128)                  
-                                        data9                         emb_vec9                      (8192,1,128)                  
-                                        data10                        emb_vec10                     (8192,1,128)                  
-                                        data11                        emb_vec11                     (8192,1,128)                  
-                                        data12                        emb_vec12                     (8192,1,128)                  
-                                        data13                        emb_vec13                     (8192,1,128)                  
-                                        data14                        emb_vec14                     (8192,1,128)                  
-                                        data15                        emb_vec15                     (8192,1,128)                  
-                                        data16                        emb_vec16                     (8192,1,128)                  
-                                        data17                        emb_vec17                     (8192,1,128)                  
-                                        data18                        emb_vec18                     (8192,1,128)                  
-                                        data19                        emb_vec19                     (8192,1,128)                  
-                                        data20                        emb_vec20                     (8192,1,128)                  
-                                        data21                        emb_vec21                     (8192,1,128)                  
-                                        data22                        emb_vec22                     (8192,1,128)                  
-                                        data23                        emb_vec23                     (8192,1,128)                  
-                                        data24                        emb_vec24                     (8192,1,128)                  
-                                        data25                        emb_vec25                     (8192,1,128)                  
-------------------------------------------------------------------------------------------------------------------
-Concat                                  emb_vec0                      sparse_embedding1             (8192,26,128)                 
-                                        emb_vec1                                                                                  
-                                        emb_vec2                                                                                  
-                                        emb_vec3                                                                                  
-                                        emb_vec4                                                                                  
-                                        emb_vec5                                                                                  
-                                        emb_vec6                                                                                  
-                                        emb_vec7                                                                                  
-                                        emb_vec8                                                                                  
-                                        emb_vec9                                                                                  
-                                        emb_vec10                                                                                 
-                                        emb_vec11                                                                                 
-                                        emb_vec12                                                                                 
-                                        emb_vec13                                                                                 
-                                        emb_vec14                                                                                 
-                                        emb_vec15                                                                                 
-                                        emb_vec16                                                                                 
-                                        emb_vec17                                                                                 
-                                        emb_vec18                                                                                 
-                                        emb_vec19                                                                                 
-                                        emb_vec20                                                                                 
-                                        emb_vec21                                                                                 
-                                        emb_vec22                                                                                 
-                                        emb_vec23                                                                                 
-                                        emb_vec24                                                                                 
-                                        emb_vec25                                                                                 
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dense                         fc1                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc3                           relu3                         (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-Interaction                             relu3                         interaction1                  (8192,480)                    
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            interaction1                  fc4                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc4                           relu4                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu4                         fc5                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc5                           relu5                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu5                         fc6                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc6                           relu6                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu6                         fc7                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc7                           relu7                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu7                         fc8                           (8192,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc8                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][06:34:37.842][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
-[HCTR][06:34:37.842][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
-[HCTR][06:34:37.842][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
-[HCTR][06:34:37.842][INFO][RK0][main]: Dense network trainable: True
-[HCTR][06:34:37.842][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][06:34:37.842][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
-[HCTR][06:34:37.842][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][06:34:37.842][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
-[HCTR][06:34:37.842][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
-
-
-
[HCTR][06:34:46.251][INFO][RK0][main]: Evaluation, AverageLoss: 0.143524
-[HCTR][06:34:46.251][INFO][RK0][main]: Eval Time for 70 iters: 2.34586s
-[HCTR][06:34:46.345][INFO][RK0][main]: Iter: 100 Time(100 iters): 8.48449s Loss: 0.142247 lr:0.168333
-[HCTR][06:34:54.657][INFO][RK0][main]: Evaluation, AverageLoss: 0.141641
-[HCTR][06:34:54.657][INFO][RK0][main]: Eval Time for 70 iters: 2.33134s
-[HCTR][06:34:54.751][INFO][RK0][main]: Iter: 200 Time(100 iters): 8.40384s Loss: 0.142243 lr:0.335
-[HCTR][06:35:03.069][INFO][RK0][main]: Evaluation, AverageLoss: 0.139913
-[HCTR][06:35:03.069][INFO][RK0][main]: Eval Time for 70 iters: 2.33118s
-[HCTR][06:35:03.161][INFO][RK0][main]: Iter: 300 Time(100 iters): 8.40793s Loss: 0.142713 lr:0.5
-[HCTR][06:35:11.479][INFO][RK0][main]: Evaluation, AverageLoss: 0.138901
-[HCTR][06:35:11.479][INFO][RK0][main]: Eval Time for 70 iters: 2.34956s
-[HCTR][06:35:11.568][INFO][RK0][main]: Iter: 400 Time(100 iters): 8.40618s Loss: 0.140238 lr:0.5
-[HCTR][06:35:19.883][INFO][RK0][main]: Evaluation, AverageLoss: 0.138208
-[HCTR][06:35:19.883][INFO][RK0][main]: Eval Time for 70 iters: 2.34071s
-[HCTR][06:35:19.974][INFO][RK0][main]: Iter: 500 Time(100 iters): 8.38745s Loss: 0.140117 lr:0.5
-[HCTR][06:35:28.326][INFO][RK0][main]: Evaluation, AverageLoss: 0.137638
-[HCTR][06:35:28.326][INFO][RK0][main]: Eval Time for 70 iters: 2.34076s
-[HCTR][06:35:28.415][INFO][RK0][main]: Iter: 600 Time(100 iters): 8.42352s Loss: 0.135055 lr:0.5
-[HCTR][06:35:36.727][INFO][RK0][main]: Evaluation, AverageLoss: 0.137268
-[HCTR][06:35:36.728][INFO][RK0][main]: Eval Time for 70 iters: 2.33588s
-[HCTR][06:35:36.819][INFO][RK0][main]: Iter: 700 Time(100 iters): 8.38619s Loss: 0.139783 lr:0.5
-[HCTR][06:35:45.193][INFO][RK0][main]: Evaluation, AverageLoss: 0.136816
-[HCTR][06:35:45.193][INFO][RK0][main]: Eval Time for 70 iters: 2.3762s
-[HCTR][06:35:45.253][INFO][RK0][main]: Iter: 800 Time(100 iters): 8.43341s Loss: 0.140772 lr:0.5
-[HCTR][06:35:53.581][INFO][RK0][main]: Evaluation, AverageLoss: 0.136368
-[HCTR][06:35:53.581][INFO][RK0][main]: Eval Time for 70 iters: 2.3521s
-[HCTR][06:35:53.673][INFO][RK0][main]: Iter: 900 Time(100 iters): 8.41807s Loss: 0.135264 lr:0.5
-[HCTR][06:36:01.985][INFO][RK0][main]: Evaluation, AverageLoss: 0.135726
-[HCTR][06:36:01.985][INFO][RK0][main]: Eval Time for 70 iters: 2.34242s
-[HCTR][06:36:01.985][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 84.14s.
-
-
-
-
-
-
-

Embedding Table Placement Strategy: Hybrid

-

In this Embedding Table Placement Strategy, we place small table (size < 6000) in a data parallel way and large table(size >= 6000) in a round robin way

-
-
-
!python3 dlrm_train.py --shard_plan hybrid
-
-
-
-
-
HugeCTR Version: 23.2
-====================================================Model Init=====================================================
-[HCTR][10:35:14.415][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][10:35:14.415][INFO][RK0][main]: Global seed is 198655838
-[HCTR][10:35:14.517][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-  GPU 1 ->  node 0
-  GPU 2 ->  node 0
-  GPU 3 ->  node 0
-  GPU 4 ->  node 1
-  GPU 5 ->  node 1
-  GPU 6 ->  node 1
-  GPU 7 ->  node 1
-[HCTR][10:35:25.730][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.4714 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 30.4441 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 30.5378 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 30.5339 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 30.4636 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 30.4480 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 30.4949 
-[HCTR][10:35:25.731][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 30.5183 
-[HCTR][10:35:25.732][INFO][RK0][main]: Start all2all warmup
-[HCTR][10:35:25.896][INFO][RK0][main]: End all2all warmup
-[HCTR][10:35:25.907][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][10:35:25.913][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.914][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.914][INFO][RK0][main]: Device 2: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.915][INFO][RK0][main]: Device 3: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.916][INFO][RK0][main]: Device 4: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.916][INFO][RK0][main]: Device 5: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.917][INFO][RK0][main]: Device 6: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.917][INFO][RK0][main]: Device 7: Tesla V100-SXM2-32GB
-[HCTR][10:35:25.969][INFO][RK0][main]: eval source ./deepfm_data_nvt/val/_file_list.txt max_row_group_size 133678
-[HCTR][10:35:26.002][INFO][RK0][main]: train source ./deepfm_data_nvt/train/_file_list.txt max_row_group_size 134102
-[HCTR][10:35:26.004][INFO][RK0][main]: num of DataReader workers for train: 8
-[HCTR][10:35:26.004][INFO][RK0][main]: num of DataReader workers for eval: 8
-[HCTR][10:35:26.005][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 30.0457 
-[HCTR][10:35:26.007][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 30.0183 
-[HCTR][10:35:26.008][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.1121 
-[HCTR][10:35:26.009][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.1082 
-[HCTR][10:35:26.010][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 30.0378 
-[HCTR][10:35:26.012][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 30.0222 
-[HCTR][10:35:26.013][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 30.0691 
-[HCTR][10:35:26.014][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0925 
-[HCTR][10:35:26.016][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 29.9636 
-[HCTR][10:35:26.017][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 29.9363 
-[HCTR][10:35:26.018][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.0300 
-[HCTR][10:35:26.020][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.0261 
-[HCTR][10:35:26.021][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 29.9558 
-[HCTR][10:35:26.022][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 29.9402 
-[HCTR][10:35:26.023][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 29.9871 
-[HCTR][10:35:26.025][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0105 
-[HCTR][10:35:26.081][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.6863 
-[HCTR][10:35:26.121][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.6589 
-[HCTR][10:35:26.423][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.7527 
-[HCTR][10:35:26.505][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.7488 
-[HCTR][10:35:27.056][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.6785 
-[HCTR][10:35:27.145][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.6628 
-[HCTR][10:35:27.235][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.7097 
-[HCTR][10:35:27.559][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.7332 
-[HCTR][10:35:27.747][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.4089 
-[HCTR][10:35:29.286][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.3816 
-[HCTR][10:35:30.351][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.4753 
-[HCTR][10:35:31.224][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.4714 
-[HCTR][10:35:31.749][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.4011 
-[HCTR][10:35:32.275][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.3855 
-[HCTR][10:35:33.299][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.4324 
-[HCTR][10:35:34.091][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.4558 
-[HCTR][10:35:34.133][INFO][RK0][main]: Vocabulary size: 0
-[HCTR][10:35:34.361][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.9285 
-[HCTR][10:35:34.363][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 29.0203 
-[HCTR][10:35:34.364][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 29.0203 
-[HCTR][10:35:34.365][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 29.0515 
-[HCTR][10:35:34.367][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.9460 
-[HCTR][10:35:34.368][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 29.0046 
-[HCTR][10:35:34.369][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.9890 
-[HCTR][10:35:34.371][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 29.1355 
-[HCTR][10:35:34.372][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.8269 
-[HCTR][10:35:34.373][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 28.9187 
-[HCTR][10:35:34.375][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 28.9187 
-[HCTR][10:35:34.376][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 28.9500 
-[HCTR][10:35:34.377][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.8445 
-[HCTR][10:35:34.379][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 28.9031 
-[HCTR][10:35:34.380][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.8875 
-[HCTR][10:35:34.381][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 29.0339 
-[HCTR][10:35:34.516][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][10:35:34.522][DEBUG][RK0][main]: [device 0] allocating 1.4051 GB, available 27.3660 
-[HCTR][10:35:34.525][DEBUG][RK0][main]: [device 1] allocating 1.4051 GB, available 27.4578 
-[HCTR][10:35:34.528][DEBUG][RK0][main]: [device 2] allocating 1.4051 GB, available 27.4578 
-[HCTR][10:35:34.531][DEBUG][RK0][main]: [device 3] allocating 1.4051 GB, available 27.4890 
-[HCTR][10:35:34.534][DEBUG][RK0][main]: [device 4] allocating 1.4051 GB, available 27.3835 
-[HCTR][10:35:34.538][DEBUG][RK0][main]: [device 5] allocating 1.4051 GB, available 27.4421 
-[HCTR][10:35:34.541][DEBUG][RK0][main]: [device 6] allocating 1.4051 GB, available 27.4265 
-[HCTR][10:35:34.544][DEBUG][RK0][main]: [device 7] allocating 1.4051 GB, available 27.5730 
-[HCTR][10:35:34.545][DEBUG][RK0][main]: [device 0] allocating 0.0088 GB, available 27.3562 
-[HCTR][10:35:34.546][DEBUG][RK0][main]: [device 1] allocating 0.0088 GB, available 27.4480 
-[HCTR][10:35:34.547][DEBUG][RK0][main]: [device 2] allocating 0.0088 GB, available 27.4480 
-[HCTR][10:35:34.548][DEBUG][RK0][main]: [device 3] allocating 0.0088 GB, available 27.4792 
-[HCTR][10:35:34.550][DEBUG][RK0][main]: [device 4] allocating 0.0088 GB, available 27.3738 
-[HCTR][10:35:34.551][DEBUG][RK0][main]: [device 5] allocating 0.0088 GB, available 27.4324 
-[HCTR][10:35:34.552][DEBUG][RK0][main]: [device 6] allocating 0.0088 GB, available 27.4167 
-[HCTR][10:35:34.553][DEBUG][RK0][main]: [device 7] allocating 0.0088 GB, available 27.5632 
-
-
-
===================================================Model Summary===================================================
-[HCTR][10:36:12.594][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
-(8192,1)                                (8192,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-EmbeddingCollection0                    data0                         emb_vec0                      (8192,1,128)                  
-                                        data1                         emb_vec1                      (8192,1,128)                  
-                                        data2                         emb_vec2                      (8192,1,128)                  
-                                        data3                         emb_vec3                      (8192,1,128)                  
-                                        data4                         emb_vec4                      (8192,1,128)                  
-                                        data5                         emb_vec5                      (8192,1,128)                  
-                                        data6                         emb_vec6                      (8192,1,128)                  
-                                        data7                         emb_vec7                      (8192,1,128)                  
-                                        data8                         emb_vec8                      (8192,1,128)                  
-                                        data9                         emb_vec9                      (8192,1,128)                  
-                                        data10                        emb_vec10                     (8192,1,128)                  
-                                        data11                        emb_vec11                     (8192,1,128)                  
-                                        data12                        emb_vec12                     (8192,1,128)                  
-                                        data13                        emb_vec13                     (8192,1,128)                  
-                                        data14                        emb_vec14                     (8192,1,128)                  
-                                        data15                        emb_vec15                     (8192,1,128)                  
-                                        data16                        emb_vec16                     (8192,1,128)                  
-                                        data17                        emb_vec17                     (8192,1,128)                  
-                                        data18                        emb_vec18                     (8192,1,128)                  
-                                        data19                        emb_vec19                     (8192,1,128)                  
-                                        data20                        emb_vec20                     (8192,1,128)                  
-                                        data21                        emb_vec21                     (8192,1,128)                  
-                                        data22                        emb_vec22                     (8192,1,128)                  
-                                        data23                        emb_vec23                     (8192,1,128)                  
-                                        data24                        emb_vec24                     (8192,1,128)                  
-                                        data25                        emb_vec25                     (8192,1,128)                  
-------------------------------------------------------------------------------------------------------------------
-Concat                                  emb_vec0                      sparse_embedding1             (8192,26,128)                 
-                                        emb_vec1                                                                                  
-                                        emb_vec2                                                                                  
-                                        emb_vec3                                                                                  
-                                        emb_vec4                                                                                  
-                                        emb_vec5                                                                                  
-                                        emb_vec6                                                                                  
-                                        emb_vec7                                                                                  
-                                        emb_vec8                                                                                  
-                                        emb_vec9                                                                                  
-                                        emb_vec10                                                                                 
-                                        emb_vec11                                                                                 
-                                        emb_vec12                                                                                 
-                                        emb_vec13                                                                                 
-                                        emb_vec14                                                                                 
-                                        emb_vec15                                                                                 
-                                        emb_vec16                                                                                 
-                                        emb_vec17                                                                                 
-                                        emb_vec18                                                                                 
-                                        emb_vec19                                                                                 
-                                        emb_vec20                                                                                 
-                                        emb_vec21                                                                                 
-                                        emb_vec22                                                                                 
-                                        emb_vec23                                                                                 
-                                        emb_vec24                                                                                 
-                                        emb_vec25                                                                                 
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dense                         fc1                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc3                           relu3                         (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-Interaction                             relu3                         interaction1                  (8192,480)                    
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            interaction1                  fc4                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc4                           relu4                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu4                         fc5                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc5                           relu5                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu5                         fc6                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc6                           relu6                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu6                         fc7                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc7                           relu7                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu7                         fc8                           (8192,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc8                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][10:36:12.594][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
-[HCTR][10:36:12.594][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
-[HCTR][10:36:12.594][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
-[HCTR][10:36:12.594][INFO][RK0][main]: Dense network trainable: True
-[HCTR][10:36:12.594][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][10:36:12.594][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
-[HCTR][10:36:12.594][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][10:36:12.594][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
-[HCTR][10:36:12.594][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
-
-
-
[HCTR][10:36:16.599][INFO][RK0][main]: Evaluation, AverageLoss: 0.144991
-[HCTR][10:36:16.599][INFO][RK0][main]: Eval Time for 70 iters: 1.22035s
-[HCTR][10:36:16.633][INFO][RK0][main]: Iter: 100 Time(100 iters): 4.03885s Loss: 0.144124 lr:0.168333
-[HCTR][10:36:20.570][INFO][RK0][main]: Evaluation, AverageLoss: 0.144851
-[HCTR][10:36:20.570][INFO][RK0][main]: Eval Time for 70 iters: 1.1863s
-[HCTR][10:36:20.615][INFO][RK0][main]: Iter: 200 Time(100 iters): 3.98102s Loss: 0.145444 lr:0.335
-[HCTR][10:36:24.540][INFO][RK0][main]: Evaluation, AverageLoss: 0.141821
-[HCTR][10:36:24.540][INFO][RK0][main]: Eval Time for 70 iters: 1.18638s
-[HCTR][10:36:24.580][INFO][RK0][main]: Iter: 300 Time(100 iters): 3.96441s Loss: 0.144249 lr:0.5
-[HCTR][10:36:28.514][INFO][RK0][main]: Evaluation, AverageLoss: 0.139519
-[HCTR][10:36:28.514][INFO][RK0][main]: Eval Time for 70 iters: 1.18203s
-[HCTR][10:36:28.556][INFO][RK0][main]: Iter: 400 Time(100 iters): 3.97548s Loss: 0.140895 lr:0.5
-[HCTR][10:36:32.490][INFO][RK0][main]: Evaluation, AverageLoss: 0.13942
-[HCTR][10:36:32.491][INFO][RK0][main]: Eval Time for 70 iters: 1.19363s
-[HCTR][10:36:32.533][INFO][RK0][main]: Iter: 500 Time(100 iters): 3.97628s Loss: 0.141202 lr:0.5
-[HCTR][10:36:36.465][INFO][RK0][main]: Evaluation, AverageLoss: 0.13947
-[HCTR][10:36:36.465][INFO][RK0][main]: Eval Time for 70 iters: 1.18342s
-[HCTR][10:36:36.512][INFO][RK0][main]: Iter: 600 Time(100 iters): 3.97817s Loss: 0.136504 lr:0.5
-[HCTR][10:36:40.440][INFO][RK0][main]: Evaluation, AverageLoss: 0.138534
-[HCTR][10:36:40.440][INFO][RK0][main]: Eval Time for 70 iters: 1.19586s
-[HCTR][10:36:40.476][INFO][RK0][main]: Iter: 700 Time(100 iters): 3.96355s Loss: 0.14067 lr:0.5
-[HCTR][10:36:44.421][INFO][RK0][main]: Evaluation, AverageLoss: 0.138213
-[HCTR][10:36:44.421][INFO][RK0][main]: Eval Time for 70 iters: 1.20188s
-[HCTR][10:36:44.465][INFO][RK0][main]: Iter: 800 Time(100 iters): 3.98811s Loss: 0.142139 lr:0.5
-[HCTR][10:36:48.390][INFO][RK0][main]: Evaluation, AverageLoss: 0.138044
-[HCTR][10:36:48.390][INFO][RK0][main]: Eval Time for 70 iters: 1.19324s
-[HCTR][10:36:48.427][INFO][RK0][main]: Iter: 900 Time(100 iters): 3.96149s Loss: 0.136835 lr:0.5
-[HCTR][10:36:52.363][INFO][RK0][main]: Evaluation, AverageLoss: 0.137419
-[HCTR][10:36:52.363][INFO][RK0][main]: Eval Time for 70 iters: 1.18732s
-[HCTR][10:36:52.363][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 39.77s.
-
-
-
-
-
-
-

Use Dynamic Hash Table with Round Robin Table Placement Strategy

-

Embedding collection supports user configure dynamic hash table so the table will support hash input key and the table size will grow when the table is full.

-
-
-
!python3 dlrm_train.py --shard_plan round_robin --use_dynamic_hash_table
-
-
-
-
-
HugeCTR Version: 23.2
-====================================================Model Init=====================================================
-[HCTR][10:29:29.407][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][10:29:29.407][INFO][RK0][main]: Global seed is 1217153067
-[HCTR][10:29:29.506][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-  GPU 1 ->  node 0
-  GPU 2 ->  node 0
-  GPU 3 ->  node 0
-  GPU 4 ->  node 1
-  GPU 5 ->  node 1
-  GPU 6 ->  node 1
-  GPU 7 ->  node 1
-[HCTR][10:29:40.485][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][10:29:40.485][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.4714 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 30.4441 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 30.5378 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 30.5339 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 30.4636 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 30.4480 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 30.4949 
-[HCTR][10:29:40.486][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 30.5183 
-[HCTR][10:29:40.486][INFO][RK0][main]: Start all2all warmup
-[HCTR][10:29:40.651][INFO][RK0][main]: End all2all warmup
-[HCTR][10:29:40.662][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][10:29:40.668][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.668][INFO][RK0][main]: Device 1: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.669][INFO][RK0][main]: Device 2: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.670][INFO][RK0][main]: Device 3: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.670][INFO][RK0][main]: Device 4: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.671][INFO][RK0][main]: Device 5: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.671][INFO][RK0][main]: Device 6: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.672][INFO][RK0][main]: Device 7: Tesla V100-SXM2-32GB
-[HCTR][10:29:40.773][INFO][RK0][main]: eval source ./deepfm_data_nvt/val/_file_list.txt max_row_group_size 133678
-[HCTR][10:29:40.862][INFO][RK0][main]: train source ./deepfm_data_nvt/train/_file_list.txt max_row_group_size 134102
-[HCTR][10:29:40.866][INFO][RK0][main]: num of DataReader workers for train: 8
-[HCTR][10:29:40.866][INFO][RK0][main]: num of DataReader workers for eval: 8
-[HCTR][10:29:40.868][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 30.0457 
-[HCTR][10:29:40.869][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 30.0183 
-[HCTR][10:29:40.871][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.1121 
-[HCTR][10:29:40.872][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.1082 
-[HCTR][10:29:40.873][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 30.0378 
-[HCTR][10:29:40.875][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 30.0222 
-[HCTR][10:29:40.876][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 30.0691 
-[HCTR][10:29:40.878][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0925 
-[HCTR][10:29:40.879][DEBUG][RK0][main]: [device 0] allocating 0.0804 GB, available 29.9636 
-[HCTR][10:29:40.881][DEBUG][RK0][main]: [device 1] allocating 0.0804 GB, available 29.9363 
-[HCTR][10:29:40.882][DEBUG][RK0][main]: [device 2] allocating 0.0804 GB, available 30.0300 
-[HCTR][10:29:40.884][DEBUG][RK0][main]: [device 3] allocating 0.0804 GB, available 30.0261 
-[HCTR][10:29:40.885][DEBUG][RK0][main]: [device 4] allocating 0.0804 GB, available 29.9558 
-[HCTR][10:29:40.886][DEBUG][RK0][main]: [device 5] allocating 0.0804 GB, available 29.9402 
-[HCTR][10:29:40.888][DEBUG][RK0][main]: [device 6] allocating 0.0804 GB, available 29.9871 
-[HCTR][10:29:40.889][DEBUG][RK0][main]: [device 7] allocating 0.0804 GB, available 30.0105 
-[HCTR][10:29:40.949][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.6863 
-[HCTR][10:29:41.055][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.6589 
-[HCTR][10:29:41.157][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.7527 
-[HCTR][10:29:41.250][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.7488 
-[HCTR][10:29:41.333][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.6785 
-[HCTR][10:29:41.419][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.6628 
-[HCTR][10:29:41.525][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.7097 
-[HCTR][10:29:41.619][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.7332 
-[HCTR][10:29:41.780][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.4089 
-[HCTR][10:29:41.866][DEBUG][RK0][main]: [device 1] allocating 0.0000 GB, available 29.3816 
-[HCTR][10:29:41.953][DEBUG][RK0][main]: [device 2] allocating 0.0000 GB, available 29.4753 
-[HCTR][10:29:42.059][DEBUG][RK0][main]: [device 3] allocating 0.0000 GB, available 29.4714 
-[HCTR][10:29:42.150][DEBUG][RK0][main]: [device 4] allocating 0.0000 GB, available 29.4011 
-[HCTR][10:29:42.245][DEBUG][RK0][main]: [device 5] allocating 0.0000 GB, available 29.3855 
-[HCTR][10:29:42.332][DEBUG][RK0][main]: [device 6] allocating 0.0000 GB, available 29.4324 
-[HCTR][10:29:42.434][DEBUG][RK0][main]: [device 7] allocating 0.0000 GB, available 29.4558 
-[HCTR][10:29:42.537][INFO][RK0][main]: Vocabulary size: 0
-[HCTR][10:29:42.786][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.8152 
-[HCTR][10:29:42.787][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 28.7878 
-[HCTR][10:29:42.789][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 28.9441 
-[HCTR][10:29:42.790][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 28.9402 
-[HCTR][10:29:42.791][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.8699 
-[HCTR][10:29:42.793][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 28.8542 
-[HCTR][10:29:42.794][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.9011 
-[HCTR][10:29:42.795][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 28.9246 
-[HCTR][10:29:42.797][DEBUG][RK0][main]: [device 0] allocating 0.1016 GB, available 28.7136 
-[HCTR][10:29:42.798][DEBUG][RK0][main]: [device 1] allocating 0.1016 GB, available 28.6863 
-[HCTR][10:29:42.799][DEBUG][RK0][main]: [device 2] allocating 0.1016 GB, available 28.8425 
-[HCTR][10:29:42.801][DEBUG][RK0][main]: [device 3] allocating 0.1016 GB, available 28.8386 
-[HCTR][10:29:42.802][DEBUG][RK0][main]: [device 4] allocating 0.1016 GB, available 28.7683 
-[HCTR][10:29:42.803][DEBUG][RK0][main]: [device 5] allocating 0.1016 GB, available 28.7527 
-[HCTR][10:29:42.805][DEBUG][RK0][main]: [device 6] allocating 0.1016 GB, available 28.7996 
-[HCTR][10:29:42.806][DEBUG][RK0][main]: [device 7] allocating 0.1016 GB, available 28.8230 
-[HCTR][10:29:42.934][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][10:29:42.940][DEBUG][RK0][main]: [device 0] allocating 1.4051 GB, available 27.2527 
-[HCTR][10:29:42.943][DEBUG][RK0][main]: [device 1] allocating 1.4051 GB, available 27.2253 
-[HCTR][10:29:42.946][DEBUG][RK0][main]: [device 2] allocating 1.4051 GB, available 27.3816 
-[HCTR][10:29:42.949][DEBUG][RK0][main]: [device 3] allocating 1.4051 GB, available 27.3777 
-[HCTR][10:29:42.952][DEBUG][RK0][main]: [device 4] allocating 1.4051 GB, available 27.3074 
-[HCTR][10:29:42.955][DEBUG][RK0][main]: [device 5] allocating 1.4051 GB, available 27.2917 
-[HCTR][10:29:42.958][DEBUG][RK0][main]: [device 6] allocating 1.4051 GB, available 27.3386 
-[HCTR][10:29:42.961][DEBUG][RK0][main]: [device 7] allocating 1.4051 GB, available 27.3621 
-[HCTR][10:29:42.962][DEBUG][RK0][main]: [device 0] allocating 0.0088 GB, available 27.2429 
-[HCTR][10:29:42.964][DEBUG][RK0][main]: [device 1] allocating 0.0088 GB, available 27.2156 
-[HCTR][10:29:42.965][DEBUG][RK0][main]: [device 2] allocating 0.0088 GB, available 27.3718 
-[HCTR][10:29:42.966][DEBUG][RK0][main]: [device 3] allocating 0.0088 GB, available 27.3679 
-[HCTR][10:29:42.967][DEBUG][RK0][main]: [device 4] allocating 0.0088 GB, available 27.2976 
-[HCTR][10:29:42.968][DEBUG][RK0][main]: [device 5] allocating 0.0088 GB, available 27.2820 
-[HCTR][10:29:42.969][DEBUG][RK0][main]: [device 6] allocating 0.0088 GB, available 27.3289 
-[HCTR][10:29:42.970][DEBUG][RK0][main]: [device 7] allocating 0.0088 GB, available 27.3523 
-
-
-
===================================================Model Summary===================================================
-[HCTR][10:30:20.859][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data0,data1,data2,data3,data4,data5,data6,data7,data8,data9,data10,data11,data12,data13,data14,data15,data16,data17,data18,data19,data20,data21,data22,data23,data24,data25
-(8192,1)                                (8192,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-EmbeddingCollection0                    data0                         emb_vec0                      (8192,1,128)                  
-                                        data1                         emb_vec1                      (8192,1,128)                  
-                                        data2                         emb_vec2                      (8192,1,128)                  
-                                        data3                         emb_vec3                      (8192,1,128)                  
-                                        data4                         emb_vec4                      (8192,1,128)                  
-                                        data5                         emb_vec5                      (8192,1,128)                  
-                                        data6                         emb_vec6                      (8192,1,128)                  
-                                        data7                         emb_vec7                      (8192,1,128)                  
-                                        data8                         emb_vec8                      (8192,1,128)                  
-                                        data9                         emb_vec9                      (8192,1,128)                  
-                                        data10                        emb_vec10                     (8192,1,128)                  
-                                        data11                        emb_vec11                     (8192,1,128)                  
-                                        data12                        emb_vec12                     (8192,1,128)                  
-                                        data13                        emb_vec13                     (8192,1,128)                  
-                                        data14                        emb_vec14                     (8192,1,128)                  
-                                        data15                        emb_vec15                     (8192,1,128)                  
-                                        data16                        emb_vec16                     (8192,1,128)                  
-                                        data17                        emb_vec17                     (8192,1,128)                  
-                                        data18                        emb_vec18                     (8192,1,128)                  
-                                        data19                        emb_vec19                     (8192,1,128)                  
-                                        data20                        emb_vec20                     (8192,1,128)                  
-                                        data21                        emb_vec21                     (8192,1,128)                  
-                                        data22                        emb_vec22                     (8192,1,128)                  
-                                        data23                        emb_vec23                     (8192,1,128)                  
-                                        data24                        emb_vec24                     (8192,1,128)                  
-                                        data25                        emb_vec25                     (8192,1,128)                  
-------------------------------------------------------------------------------------------------------------------
-Concat                                  emb_vec0                      sparse_embedding1             (8192,26,128)                 
-                                        emb_vec1                                                                                  
-                                        emb_vec2                                                                                  
-                                        emb_vec3                                                                                  
-                                        emb_vec4                                                                                  
-                                        emb_vec5                                                                                  
-                                        emb_vec6                                                                                  
-                                        emb_vec7                                                                                  
-                                        emb_vec8                                                                                  
-                                        emb_vec9                                                                                  
-                                        emb_vec10                                                                                 
-                                        emb_vec11                                                                                 
-                                        emb_vec12                                                                                 
-                                        emb_vec13                                                                                 
-                                        emb_vec14                                                                                 
-                                        emb_vec15                                                                                 
-                                        emb_vec16                                                                                 
-                                        emb_vec17                                                                                 
-                                        emb_vec18                                                                                 
-                                        emb_vec19                                                                                 
-                                        emb_vec20                                                                                 
-                                        emb_vec21                                                                                 
-                                        emb_vec22                                                                                 
-                                        emb_vec23                                                                                 
-                                        emb_vec24                                                                                 
-                                        emb_vec25                                                                                 
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dense                         fc1                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc3                           relu3                         (8192,128)                    
-------------------------------------------------------------------------------------------------------------------
-Interaction                             relu3                         interaction1                  (8192,480)                    
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            interaction1                  fc4                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc4                           relu4                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu4                         fc5                           (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc5                           relu5                         (8192,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu5                         fc6                           (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc6                           relu6                         (8192,512)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu6                         fc7                           (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc7                           relu7                         (8192,256)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu7                         fc8                           (8192,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc8                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][10:30:20.860][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1000
-[HCTR][10:30:20.860][INFO][RK0][main]: Training batchsize: 65536, evaluation batchsize: 65536
-[HCTR][10:30:20.860][INFO][RK0][main]: Evaluation interval: 100, snapshot interval: 10000000
-[HCTR][10:30:20.860][INFO][RK0][main]: Dense network trainable: True
-[HCTR][10:30:20.860][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][10:30:20.860][INFO][RK0][main]: lr: 0.500000, warmup_steps: 300, end_lr: 0.000000
-[HCTR][10:30:20.860][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][10:30:20.860][INFO][RK0][main]: Training source file: ./deepfm_data_nvt/train/_file_list.txt
-[HCTR][10:30:20.860][INFO][RK0][main]: Evaluation source file: ./deepfm_data_nvt/val/_file_list.txt
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-
-
-
static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-static_map allocated, size=553648128
-[HCTR][10:30:26.070][INFO][RK0][main]: Evaluation, AverageLoss: 0.142151
-[HCTR][10:30:26.070][INFO][RK0][main]: Eval Time for 70 iters: 1.53912s
-[HCTR][10:30:26.123][INFO][RK0][main]: Iter: 100 Time(100 iters): 5.26107s Loss: 0.141023 lr:0.168333
-[HCTR][10:30:31.183][INFO][RK0][main]: Evaluation, AverageLoss: 0.141078
-[HCTR][10:30:31.183][INFO][RK0][main]: Eval Time for 70 iters: 1.57008s
-[HCTR][10:30:31.225][INFO][RK0][main]: Iter: 200 Time(100 iters): 5.10267s Loss: 0.141925 lr:0.335
-[HCTR][10:30:36.309][INFO][RK0][main]: Evaluation, AverageLoss: 0.140561
-[HCTR][10:30:36.309][INFO][RK0][main]: Eval Time for 70 iters: 1.55499s
-[HCTR][10:30:36.362][INFO][RK0][main]: Iter: 300 Time(100 iters): 5.13614s Loss: 0.14338 lr:0.5
-[HCTR][10:30:41.415][INFO][RK0][main]: Evaluation, AverageLoss: 0.139972
-[HCTR][10:30:41.415][INFO][RK0][main]: Eval Time for 70 iters: 1.54929s
-[HCTR][10:30:41.464][INFO][RK0][main]: Iter: 400 Time(100 iters): 5.10246s Loss: 0.141379 lr:0.5
-[HCTR][10:30:46.534][INFO][RK0][main]: Evaluation, AverageLoss: 0.139553
-[HCTR][10:30:46.534][INFO][RK0][main]: Eval Time for 70 iters: 1.56729s
-[HCTR][10:30:46.582][INFO][RK0][main]: Iter: 500 Time(100 iters): 5.11698s Loss: 0.141421 lr:0.5
-[HCTR][10:30:51.642][INFO][RK0][main]: Evaluation, AverageLoss: 0.139362
-[HCTR][10:30:51.642][INFO][RK0][main]: Eval Time for 70 iters: 1.56153s
-[HCTR][10:30:51.696][INFO][RK0][main]: Iter: 600 Time(100 iters): 5.11376s Loss: 0.136499 lr:0.5
-[HCTR][10:30:56.755][INFO][RK0][main]: Evaluation, AverageLoss: 0.138972
-[HCTR][10:30:56.755][INFO][RK0][main]: Eval Time for 70 iters: 1.60721s
-[HCTR][10:30:56.811][INFO][RK0][main]: Iter: 700 Time(100 iters): 5.11548s Loss: 0.141355 lr:0.5
-[HCTR][10:31:01.873][INFO][RK0][main]: Evaluation, AverageLoss: 0.138726
-[HCTR][10:31:01.873][INFO][RK0][main]: Eval Time for 70 iters: 1.56329s
-[HCTR][10:31:01.913][INFO][RK0][main]: Iter: 800 Time(100 iters): 5.10124s Loss: 0.142614 lr:0.5
-[HCTR][10:31:07.016][INFO][RK0][main]: Evaluation, AverageLoss: 0.139617
-[HCTR][10:31:07.016][INFO][RK0][main]: Eval Time for 70 iters: 1.5483s
-[HCTR][10:31:07.063][INFO][RK0][main]: Iter: 900 Time(100 iters): 5.14957s Loss: 0.138442 lr:0.5
-[HCTR][10:31:12.147][INFO][RK0][main]: Evaluation, AverageLoss: 0.138159
-[HCTR][10:31:12.147][INFO][RK0][main]: Eval Time for 70 iters: 1.57499s
-[HCTR][10:31:12.147][INFO][RK0][main]: Finish 1000 iterations with batchsize: 65536 in 51.29s.
-
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/hps_demo.html b/review/pr-458/notebooks/hps_demo.html deleted file mode 100644 index 2bd627d77f..0000000000 --- a/review/pr-458/notebooks/hps_demo.html +++ /dev/null @@ -1,2958 +0,0 @@ - - - - - - - Hierarchical Parameter Server Demo — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_hps-demo/nvidia_logo.png -
-

Hierarchical Parameter Server Demo

-
-

Overview

-

In HugeCTR version 3.5, we provide Python APIs for embedding table lookup with HugeCTR Hierarchical Parameter Server (HPS) -HPS supports different database backends and GPU embedding caches.

-

This notebook demonstrates how to use HPS with HugeCTR Python APIs. Without loss of generality, the HPS APIs are utilized together with the ONNX Runtime APIs to create an ensemble inference model, where HPS is responsible for embedding table lookup while the ONNX model takes charge of feed forward of dense neural networks.

-
    -
  1. Inference with HPS & ONNX

  2. -
  3. Lookup the Embedding Vector from DLPacke

  4. -
  5. Multi-process inferenceon

  6. -
  7. Redis Cluster deployment (without TLS/SSL)

  8. -
  9. Redis Cluster deployment (with TLS/SSL)

  10. -
-
-
-

Setup

-

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

-
-
-

Data Generation

-

HugeCTR provides a tool to generate synthetic datasets. The Data Generator is capable of generating datasets of different file formats and different distributions. We will generate one-hot Parquet datasets with power-law distribution for this notebook:

-
-
-
import hugectr
-from hugectr.tools import DataGeneratorParams, DataGenerator
-
-data_generator_params = DataGeneratorParams(
-  format = hugectr.DataReaderType_t.Parquet,
-  label_dim = 1,
-  dense_dim = 10,
-  num_slot = 4,
-  i64_input_key = True,
-  nnz_array = [1, 1, 1, 1],
-  source = "./data_parquet/file_list.txt",
-  eval_source = "./data_parquet/file_list_test.txt",
-  slot_size_array = [10000, 10000, 10000, 10000],
-  check_type = hugectr.Check_t.Non,
-  dist_type = hugectr.Distribution_t.PowerLaw,
-  power_law_type = hugectr.PowerLaw_t.Short,
-  num_files = 16,
-  eval_num_files = 4,
-  num_samples_per_file = 40960)
-data_generator = DataGenerator(data_generator_params)
-data_generator.generate()
-
-
-
-
-
[HCTR][06:31:47.413][INFO][RK0][main]: Generate Parquet dataset
-[HCTR][06:31:47.413][INFO][RK0][main]: train data folder: ./data_parquet, eval data folder: ./data_parquet, slot_size_array: 10000, 10000, 10000, 10000, nnz array: 1, 1, 1, 1, #files for train: 16, #files for eval: 4, #samples per file: 40960, Use power law distribution: 1, alpha of power law: 1.3
-[HCTR][06:31:47.416][INFO][RK0][main]: ./data_parquet exist
-[HCTR][06:31:47.423][INFO][RK0][main]: ./data_parquet/train/gen_0.parquet
-[HCTR][06:31:50.739][INFO][RK0][main]: ./data_parquet/train/gen_1.parquet
-[HCTR][06:31:50.846][INFO][RK0][main]: ./data_parquet/train/gen_2.parquet
-[HCTR][06:31:50.929][INFO][RK0][main]: ./data_parquet/train/gen_3.parquet
-[HCTR][06:31:51.011][INFO][RK0][main]: ./data_parquet/train/gen_4.parquet
-[HCTR][06:31:51.092][INFO][RK0][main]: ./data_parquet/train/gen_5.parquet
-[HCTR][06:31:51.171][INFO][RK0][main]: ./data_parquet/train/gen_6.parquet
-[HCTR][06:31:51.250][INFO][RK0][main]: ./data_parquet/train/gen_7.parquet
-[HCTR][06:31:51.329][INFO][RK0][main]: ./data_parquet/train/gen_8.parquet
-[HCTR][06:31:51.407][INFO][RK0][main]: ./data_parquet/train/gen_9.parquet
-[HCTR][06:31:51.485][INFO][RK0][main]: ./data_parquet/train/gen_10.parquet
-[HCTR][06:31:51.562][INFO][RK0][main]: ./data_parquet/train/gen_11.parquet
-[HCTR][06:31:51.638][INFO][RK0][main]: ./data_parquet/train/gen_12.parquet
-[HCTR][06:31:51.715][INFO][RK0][main]: ./data_parquet/train/gen_13.parquet
-[HCTR][06:31:51.792][INFO][RK0][main]: ./data_parquet/train/gen_14.parquet
-[HCTR][06:31:51.868][INFO][RK0][main]: ./data_parquet/train/gen_15.parquet
-[HCTR][06:31:51.962][INFO][RK0][main]: ./data_parquet/file_list.txt done!
-[HCTR][06:31:51.986][INFO][RK0][main]: ./data_parquet/val/gen_0.parquet
-[HCTR][06:31:52.064][INFO][RK0][main]: ./data_parquet/val/gen_1.parquet
-[HCTR][06:31:52.142][INFO][RK0][main]: ./data_parquet/val/gen_2.parquet
-[HCTR][06:31:52.218][INFO][RK0][main]: ./data_parquet/val/gen_3.parquet
-[HCTR][06:31:52.296][INFO][RK0][main]: ./data_parquet/file_list_test.txt done!
-
-
-
-
-
-
-

Train from Scratch

-

We can train from scratch by performing the following steps with Python APIs:

-
    -
  1. Create the solver, reader and optimizer, then initialize the model.

  2. -
  3. Construct the model graph by adding input, sparse embedding and dense layers in order.

  4. -
  5. Compile the model and have an overview of the model graph.

  6. -
  7. Dump the model graph to the JSON file.

  8. -
  9. Fit the model, save the model weights and optimizer states implicitly.

  10. -
  11. Dump one batch of evaluation results to files.

  12. -
-
-
-
%%writefile train.py
-import os
-import hugectr
-from mpi4py import MPI
-import numpy as np
-solver = hugectr.CreateSolver(model_name = "hps_demo",
-                              max_eval_batches = 1,
-                              batchsize_eval = 1024,
-                              batchsize = 1024,
-                              lr = 0.001,
-                              vvgpu = [[0]],
-                              i64_input_key = True,
-                              repeat_dataset = True,
-                              use_cuda_graph = True)
-reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = ["./data_parquet/file_list.txt"],
-                                  eval_source = "./data_parquet/file_list_test.txt",
-                                  check_type = hugectr.Check_t.Non,
-                                  slot_size_array = [10000, 10000, 10000, 10000])
-optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam)
-model = hugectr.Model(solver, reader, optimizer)
-model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 10, dense_name = "dense",
-                        data_reader_sparse_param_array = 
-                        [hugectr.DataReaderSparseParam("data1", [1, 1], True, 2),
-                        hugectr.DataReaderSparseParam("data2", [1, 1], True, 2)]))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
-                            workspace_size_per_gpu_in_mb = 4,
-                            embedding_vec_size = 16,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding1",
-                            bottom_name = "data1",
-                            optimizer = optimizer))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
-                            workspace_size_per_gpu_in_mb = 8,
-                            embedding_vec_size = 32,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding2",
-                            bottom_name = "data2",
-                            optimizer = optimizer))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding1"],
-                            top_names = ["reshape1"],
-                            leading_dim=32))                            
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding2"],
-                            top_names = ["reshape2"],
-                            leading_dim=64))                            
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
-                            bottom_names = ["reshape1", "reshape2", "dense"], top_names = ["concat1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["concat1"],
-                            top_names = ["fc1"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc1"],
-                            top_names = ["relu1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu1"],
-                            top_names = ["fc2"],
-                            num_output=1))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
-                            bottom_names = ["fc2", "label"],
-                            top_names = ["loss"]))
-model.compile()
-model.summary()
-model.graph_to_json("hps_demo.json")
-model.fit(max_iter = 1100, display = 200, eval_interval = 1000, snapshot = 1000, snapshot_prefix = "hps_demo")
-
-ground_truth = model.check_out_tensor("fc2", hugectr.Tensor_t.Evaluate)
-np.save("ground_truth.npy", ground_truth)
-
-
-
-
-
Writing train.py
-
-
-
-
-
-
-
!python3 train.py
-
-
-
-
-
HugeCTR Version: 23.8
-====================================================Model Init=====================================================
-[HCTR][06:32:11.556][INFO][RK0][main]: Initialize model: hps_demo
-[HCTR][06:32:11.556][INFO][RK0][main]: Global seed is 2598678435
-[HCTR][06:32:11.561][INFO][RK0][main]: Device to NUMA mapping:
-[HCTR][06:32:11.642][INFO][RK0][main]:   GPU 0 ->  node 0
-[HCTR][06:32:15.564][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][06:32:15.564][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 30.0886 
-[HCTR][06:32:15.564][INFO][RK0][main]: Start all2all warmup
-[HCTR][06:32:15.565][INFO][RK0][main]: End all2all warmup
-[HCTR][06:32:15.566][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][06:32:15.567][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][06:32:15.636][INFO][RK0][main]: eval source ./data_parquet/file_list_test.txt max_row_group_size 40960
-[HCTR][06:32:15.808][INFO][RK0][main]: train source ./data_parquet/file_list.txt max_row_group_size 40960
-[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][06:32:15.810][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][06:32:15.937][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
-[HCTR][06:32:15.938][DEBUG][RK0][main]: [device 0] allocating 0.0047 GB, available 29.6921 
-[HCTR][06:32:15.939][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21845
-[HCTR][06:32:15.940][DEBUG][RK0][main]: [device 0] allocating 0.0092 GB, available 29.6824 
-[HCTR][06:32:15.940][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.940][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.[HCTR][06:32:15.946][WARNING][RK0][main]: You are using reshape layer with parameter leading_dim. This will be deprecated in the future. Please switch to parameter shape.===================================================Model Compile===================================================
-[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][06:32:17.205][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][06:32:17.206][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][06:32:17.207][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][06:32:17.208][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][06:32:17.208][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data1,data2                   
-(1024, 1)                               (1024, 10)                              
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      data1                         sparse_embedding1             (1024, 2, 16)                 
-------------------------------------------------------------------------------------------------------------------
-DistributedSlotSparseEmbeddingHash      data2                         sparse_embedding2             (1024, 2, 32)                 
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding1             reshape1                      (1024, 32)                    
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding2             reshape2                      (1024, 64)                    
-------------------------------------------------------------------------------------------------------------------
-Concat                                  reshape1                      concat1                       (1024, 106)                   
-                                        reshape2                                                                                  
-                                        dense                                                                                     
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            concat1                       fc1                           (1024, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (1024, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (1024, 1)                     
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc2                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-[HCTR][06:32:17.212][INFO][RK0][main]: Save the model graph to hps_demo.json successfully
-=====================================================Model Fit=====================================================
-[HCTR][06:32:17.213][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1100
-[HCTR][06:32:17.213][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
-[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation interval: 1000, snapshot interval: 1000
-[HCTR][06:32:17.213][INFO][RK0][main]: Dense network trainable: True
-[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][06:32:17.213][INFO][RK0][main]: Sparse embedding sparse_embedding2 trainable: True
-[HCTR][06:32:17.213][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][06:32:17.213][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][06:32:17.213][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][06:32:17.213][INFO][RK0][main]: Training source file: ./data_parquet/file_list.txt
-[HCTR][06:32:17.213][INFO][RK0][main]: Evaluation source file: ./data_parquet/file_list_test.txt
-[HCTR][06:32:17.658][INFO][RK0][main]: Iter: 200 Time(200 iters): 0.444961s Loss: 0.693355 lr:0.001
-[HCTR][06:32:18.167][INFO][RK0][main]: Iter: 400 Time(200 iters): 0.508793s Loss: 0.694358 lr:0.001
-[HCTR][06:32:18.589][INFO][RK0][main]: Iter: 600 Time(200 iters): 0.422282s Loss: 0.695494 lr:0.001
-[HCTR][06:32:18.764][INFO][RK0][main]: Iter: 800 Time(200 iters): 0.175263s Loss: 0.691037 lr:0.001
-[HCTR][06:32:18.939][INFO][RK0][main]: Iter: 1000 Time(200 iters): 0.174492s Loss: 0.688767 lr:0.001
-[HCTR][06:32:18.940][INFO][RK0][main]: Evaluation, AUC: 0.503806
-[HCTR][06:32:18.940][INFO][RK0][main]: Eval Time for 1 iters: 0.000913s
-[HCTR][06:32:18.941][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][06:32:19.024][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][06:32:19.092][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][06:32:19.093][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][06:32:19.123][INFO][RK0][main]: Done
-[HCTR][06:32:19.123][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][06:32:19.148][INFO][RK0][main]: Done
-[HCTR][06:32:19.150][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][06:32:19.203][INFO][RK0][main]: Done
-[HCTR][06:32:19.203][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][06:32:19.252][INFO][RK0][main]: Done
-[HCTR][06:32:19.252][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HCTR][06:32:19.262][INFO][RK0][main]: Dumping dense weights to file, successful
-[HCTR][06:32:19.279][INFO][RK0][main]: Dumping dense optimizer states to file, successful
-[HCTR][06:32:19.368][INFO][RK0][main]: Finish 1100 iterations with batchsize: 1024 in 2.16s.
-
-
-
-
-
-
-

Convert HugeCTR to ONNX

-

We will convert the saved HugeCTR models to ONNX using the HugeCTR to ONNX Converter. For more information about the converter, refer to the README in the onnx_converter directory of the repository.

-

For the sake of double checking the correctness, we will investigate both cases of conversion depending on whether or not to convert the sparse embedding models.

-
-
-
import hugectr2onnx
-hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_with_embedding.onnx",
-                            graph_config = "hps_demo.json",
-                            dense_model = "hps_demo_dense_1000.model",
-                            convert_embedding = True,
-                            sparse_models = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"])
-
-hugectr2onnx.converter.convert(onnx_model_path = "hps_demo_without_embedding.onnx",
-                            graph_config = "hps_demo.json",
-                            dense_model = "hps_demo_dense_1000.model",
-                            convert_embedding = False)
-
-
-
-
-
[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
-[HUGECTR2ONNX][INFO]: The model is checked!
-[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_with_embedding.onnx
-[HUGECTR2ONNX][INFO]: Converting Data layer to ONNX
-Skip sparse embedding layers in converted ONNX model
-[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
-Skip sparse embedding layers in converted ONNX model
-[HUGECTR2ONNX][INFO]: Converting DistributedSlotSparseEmbeddingHash layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Reshape layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Concat layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting ReLU layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting InnerProduct layer to ONNX
-[HUGECTR2ONNX][INFO]: Converting Sigmoid layer to ONNX
-[HUGECTR2ONNX][INFO]: The model is checked!
-[HUGECTR2ONNX][INFO]: The model is saved at hps_demo_without_embedding.onnx
-
-
-
-
-

-
-
-

1. Inference with HPS & ONNX

-

We will make inference by performing the following steps with Python APIs:

-
    -
  1. Configure the HPS hyperparameters. Please refer to hps configuration for detailed configurations.

  2. -
  3. Initialize the HPS object, which is responsible for embedding table lookup.

  4. -
  5. Loading the Parquet data.

  6. -
  7. Make inference with the HPS object and the ONNX inference session of hps_demo_without_embedding.onnx.

  8. -
  9. Check the correctness by comparing with dumped evaluation results.

  10. -
  11. Make inference with the ONNX inference session of hps_demo_with_embedding.onnx (double check).

  12. -
-
-
-
from hugectr.inference import HPS, ParameterServerConfig, InferenceParams
-
-import pandas as pd
-import numpy as np
-
-import onnxruntime as ort
-
-slot_size_array = [10000, 10000, 10000, 10000]
-key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
-batch_size = 1024
-
-# 1. Configure the HPS hyperparameters
-ps_config = ParameterServerConfig(
-           emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
-           embedding_vec_size = {"hps_demo": [16, 32]},
-           max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
-           inference_params_array = [
-              InferenceParams(
-                model_name = "hps_demo",
-                max_batchsize = batch_size,
-                hit_rate_threshold = 1.0,
-                dense_model_file = "",
-                sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
-                deployed_devices = [0],
-                use_gpu_embedding_cache = True,
-                cache_size_percentage = 0.5,
-                i64_input_key = True)
-           ])
-
-# 2. Initialize the HPS object
-hps = HPS(ps_config)
-
-# 3. Loading the Parquet data.
-df = pd.read_parquet("data_parquet/val/gen_0.parquet")
-dense_input_columns = df.columns[1:11]
-cat_input1_columns = df.columns[11:13]
-cat_input2_columns = df.columns[13:15]
-dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
-cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
-cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
-
-# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
-embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
-embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)
-sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
-res = sess.run(output_names=[sess.get_outputs()[0].name],
-               input_feed={sess.get_inputs()[0].name: dense_input,
-               sess.get_inputs()[1].name: embedding1,
-               sess.get_inputs()[2].name: embedding2})
-pred = res[0]
-
-# 5. Check the correctness by comparing with dumped evaluation results.
-ground_truth = np.load("ground_truth.npy").flatten()
-print("ground_truth: ", ground_truth)
-
-diff = pred.flatten()-ground_truth
-mse = np.mean(diff*diff)
-print("pred: ", pred)
-print("mse between pred and ground_truth: ", mse)
-
-# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
-sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
-res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
-                   input_feed={sess_ref.get_inputs()[0].name: dense_input,
-                   sess_ref.get_inputs()[1].name: cat_input1,
-                   sess_ref.get_inputs()[2].name: cat_input2})
-pred_ref = res_ref[0]
-diff_ref = pred_ref.flatten()-ground_truth
-mse_ref = np.mean(diff_ref*diff_ref)
-print("pred_ref: ", pred_ref)
-print("mse between pred_ref and ground_truth: ", mse_ref)
-
-
-
-
-
[HCTR][06:32:40.791][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-====================================================HPS Create====================================================
-[HCTR][06:32:40.791][INFO][RK0][main]: Creating HashMap CPU database backend...
-[HCTR][06:32:40.791][DEBUG][RK0][main]: Created blank database backend in local memory!
-[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:32:40.791][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:32:40.791][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][06:32:41.123][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (HashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
-[HCTR][06:32:41.431][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (HashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
-[HCTR][06:32:41.431][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:32:41.431][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][06:32:41.437][INFO][RK0][main]: Model name: hps_demo
-[HCTR][06:32:41.437][INFO][RK0][main]: Max batch size: 1024
-[HCTR][06:32:41.437][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][06:32:41.437][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][06:32:41.437][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][06:32:41.437][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][06:32:41.437][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:32:41.437][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:32:41.437][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:32:41.437][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][06:32:41.437][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:32:41.437][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][06:32:41.453][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][06:32:41.453][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
-ground_truth:  [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
-pred:  [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-mse between pred and ground_truth:  2.3887142e-15
-pred_ref:  [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-mse between pred_ref and ground_truth:  2.3887142e-15
-
-
-
2023-09-20 06:32:41.566238532 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-
-
-
-
-

-
-
-

2. Lookup the Embedding Vector from DLPack

-

We also provide a lookup_fromdlpack interface that could query embedding keys on the CPU and return the embedding vectors on the GPU/CPU.

-
    -
  1. Suppose you have created a Pytorch/Tensorflow tensor that stores the embedded keys.

  2. -
  3. Convert the embedding key tensor to DLPack capsule through the corresponding platform’s to_dlpack function.

  4. -
  5. Creates an empty tensor as a buffer to store embedding vectors.

  6. -
  7. Convert a buffer tensor to DLPack capsule.

  8. -
  9. Lookup the embedding vector of the corresponding embedding key directly through lookup_fromdlpack interface, and output it to the embedding vector buffer tensor

  10. -
  11. If the output capsule is allocated on the GPU, then a device_id needs to be specified in lookup_fromdlpack interface for corresponding embedding cache. If not specified, the default value is device 0

  12. -
-

Note: Please make sure that tensorflow or pytorch have been installed correctly in the merlin-hugectr container:

-
pip install tensorflow
-pip install torch
-
-
-
-
-
embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0).reshape(batch_size, 2, 16)
-embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1).reshape(batch_size, 2, 32)
-
-# 1. Look up from dlpack for Pytorch tensor on CPU
-print(" Look up from dlpack for Pytorch tensor")
-import torch.utils.dlpack
-import os
-print("************Look up from pytorch dlpack on CPU")
-device = torch.device("cpu")
-key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
-out = torch.empty((1,cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=device)
-key_capsule = torch.utils.dlpack.to_dlpack(key)
-print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the keys: {}".format(key.device, key))
-out_capsule = torch.utils.dlpack.to_dlpack(out)
-# Lookup the embedding vectors from dlpack
-hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
-out_put = torch.utils.dlpack.from_dlpack(out_capsule)
-print("[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n".format(out_put.device, out_put))
-diff = out_put-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
-if diff.mean() > 1e-4:
-    raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
-    sys.exit(1)
-else:
-    print("Pytorch dlpack on cpu  results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))
-    
-
-# 2. Look up from dlpack for Pytorch tensor on GPU
-print("************Look up from pytorch dlpack on GPU")
-cuda_device = torch.device("cuda:0" if torch.cuda.is_available else "cpu")
-key = torch.tensor(cat_input1.flatten(),dtype=torch.int64, device=device)
-key_capsule = torch.utils.dlpack.to_dlpack(key)
-out = torch.empty((cat_input1.flatten().shape[0]*16), dtype=torch.float32, device=cuda_device)
-out_capsule = torch.utils.dlpack.to_dlpack(out)
-hps.lookup_fromdlpack(key_capsule, out_capsule,"hps_demo", 0)
-out_put = torch.utils.dlpack.from_dlpack(out_capsule)
-print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: {}, the vectors: {}\n\n".format(out_put.device, out_put))
-diff = out_put.cpu()-embedding1.reshape(1,cat_input1.flatten().shape[0]*16)
-if diff.mean() > 1e-3:
-    raise RuntimeError("Too large mse between pytorch dlpack on cpu and native HPS lookup api: {}".format(diff.mean()))
-    sys.exit(1)
-else:
-    print("Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(diff.mean()))
-
-
-
-
-
 Look up from dlpack for Pytorch tensor
-************Look up from pytorch dlpack on CPU
-The device type of embedding keys that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the keys: tensor([   85, 10028,     0,  ..., 10004,    10, 10000])
-[The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cpu, the vectors: tensor([[-0.0307,  0.0264, -0.0294,  ...,  0.0151, -0.0281,  0.0088]])
-
-Pytorch dlpack on cpu  results are consistent with native HPS lookup api, mse: 0.0
-************Look up from pytorch dlpack on GPU
-The device type of embedding vectors that lookup dlpack from hps interface for embedding table 0 of hps_demo: cuda:0, the vectors: tensor([-0.0307,  0.0264, -0.0294,  ...,  0.0151, -0.0281,  0.0088],
-       device='cuda:0')
-
-
-Pytorch dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0
-
-
-
-
-
-
-
# 3. Look up from dlpack for tensorflow tensor on CPU
-print("Look up from dlpack for Tensorflow tensor")
-from tensorflow.python.dlpack import dlpack  
-import tensorflow as tf
-from tensorflow.python.eager import context
-from tensorflow.python.framework import dtypes
-print("***************Look up from tensorflow dlpack on CPU**********")
-with tf.device('/CPU:0'):
-    key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
-    out_tensor = tf.zeros([1, cat_input2.flatten().shape[0]*32],dtype=tf.float32)
-    print("The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the keys: {}".format(key_tensor.device, key_tensor))
-    key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
-    out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
-hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
-out = tf.experimental.dlpack.from_dlpack(out_dlcapsule)
-print("The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: {}, the vectors: {}\n".format(out.device, out))
-diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
-mse = tf.reduce_mean(diff)
-if mse> 1e-3:
-    raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
-    sys.exit(1)
-else:
-    print("tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: {}".format(mse))
-    
-# 4. Look up from dlpack for tensorflow tensor on GPU
-print("***************Look up from tensorflow dlpack on GPU**********")
-with tf.device('/GPU:0'):
-    key_tensor = tf.constant(cat_input2.flatten(),dtype=tf.int64)
-    out_tensor = tf.zeros([cat_input2.flatten().shape[0]*32],dtype=tf.float32)
-    key_capsule = tf.experimental.dlpack.to_dlpack(key_tensor)
-    out_dlcapsule = tf.experimental.dlpack.to_dlpack(out_tensor)
-hps.lookup_fromdlpack(key_capsule,out_dlcapsule, "hps_demo", 1)
-out= tf.experimental.dlpack.from_dlpack(out_dlcapsule)
-print("[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: {}, the vectors: {}\n".format(out.device, out))
-diff = out-embedding2.reshape(1,cat_input2.flatten().shape[0]*32)
-mse = tf.reduce_mean(diff)
-if mse> 1e-3:
-    raise RuntimeError("Too large mse between tensorflow dlpack on cpu and native HPS lookup api: {}".format(mse))
-    sys.exit(1)
-else:
-    print("tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: {}".format(mse))
-
-
-
-
-
Look up from dlpack for Tensorflow tensor
-
-
-
2023-09-20 06:34:21.729218: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
-To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
-
-
-
***************Look up from tensorflow dlpack on CPU**********
-
-
-
2023-09-20 06:34:44.168630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30048 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0
-2023-09-20 06:34:44.170043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30184 MB memory:  -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0
-2023-09-20 06:34:44.171618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30184 MB memory:  -> device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0
-2023-09-20 06:34:44.173095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 30184 MB memory:  -> device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0
-2023-09-20 06:34:44.174795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 30184 MB memory:  -> device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0
-2023-09-20 06:34:44.176299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 30184 MB memory:  -> device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
-2023-09-20 06:34:44.177782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 30184 MB memory:  -> device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
-2023-09-20 06:34:44.179411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 30184 MB memory:  -> device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0
-
-
-
The device type of embedding keys that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the keys: [20005 30047 20004 ... 30001 20037 30001]
-The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of hps_demo: /job:localhost/replica:0/task:0/device:CPU:0, the vectors: [[ 0.02182689  0.01806355  0.01985828 ...  0.0136845  -0.01738386
-  -0.00323257]]
-
-tensorflow dlpack on CPU results are consistent with native HPS lookup api, mse: 0.0
-***************Look up from tensorflow dlpack on GPU**********
-[HUGECTR][INFO] The device type of embedding vectors that lookup dlpack from hps interface for embedding table 1 of wdl: /job:localhost/replica:0/task:0/device:GPU:0, the vectors: [ 0.02182689  0.01806355  0.01985828 ...  0.0136845  -0.01738386
- -0.00323257]
-
-tensorflow dlpack on GPU results are consistent with native HPS lookup api, mse: 0.0
-
-
-
-
-

-
-
-

3. Multi-process inference

-

It is possible to share the a hashmap database between multiple processes. The following example launches 3 processes which achieve this using the operating system’s shared memory, which is located at /dev/shm in most unix systems. In this example, we separate processes into a primary and multiple secondary processes, and only the primary process initializes the shared memory database. The secondary processes wait until the shared memory has been fully initialized. However, note that inter-process database access is guaranteed to be thread-safe. Therefore, it is also possible to implement more complicated initialization/refresh mechanisms for your use-case.

-
-
-
%%writefile multi_process_hps.py
-import os
-import time
-import multiprocessing as mp
-import pandas as pd
-import numpy as np
-import onnxruntime as ort
-from hugectr import DatabaseType_t
-from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
-
-slot_size_array = [10000, 10000, 10000, 10000]
-key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
-batch_size = 1024
-
-def create_hps(name, initialized, device_id, num_max_processes):
-    print(f'subprocess:{name}{os.getpid()})launch...')
-    
-    # 1. Let secondary processes wait until shared memory is initialized.
-    while name != 'primary' and initialized.value == 0:
-        print(f'Subprocess {name} awaiting SHM initialization...')
-        time.sleep(1)
-
-    # 2. Configure the HPS hyperparameters
-    ps_config = ParameterServerConfig(
-           emb_table_name = {"hps_demo": ["sparse_embedding1", "sparse_embedding2"]},
-           embedding_vec_size = {"hps_demo": [16, 32]},
-           max_feature_num_per_sample_per_emb_table = {"hps_demo": [2, 2]},
-           inference_params_array = [
-              InferenceParams(
-                model_name = "hps_demo",
-                max_batchsize = batch_size,
-                hit_rate_threshold = 1.0,
-                dense_model_file = "",
-                sparse_model_files = ["hps_demo0_sparse_1000.model", "hps_demo1_sparse_1000.model"],
-                device_id=device_id,
-                deployed_devices = [device_id],
-                use_gpu_embedding_cache = True,
-                cache_size_percentage = 0.5,
-                i64_input_key = True)
-           ],
-           volatile_db = VolatileDatabaseParams(
-                DatabaseType_t.multi_process_hash_map,  # Use /dev/shm instead of normal memory for storage.
-                # Skips initializing model. If we run HPS in multiple processes, only one needs to initialize.
-                initialize_after_startup = name == 'primary',
-           ))
-
-    # 3. Initialize the HPS object
-    hps = HPS(ps_config)
-    initialized.value += 1
-    print(f'Subprocess {name} initialized')
-    
-    # 4. In (1) the secondary processes wait until the primary process has completed initializing
-    #    the shared memory. If the last process disconnects, the shared memory is erased.
-    #    Therefore, if threads that currently have attached to the shared memory manage to complete
-    #    their program before another process has attached, the contents of the shared memory are
-    #    lost and the new process will instead construct an empty shared memory. To avoid this
-    #    situation, we have multiple options.
-    #
-    #   a) Setting `shared_memory_auto_remove = False` in the `VolatileDatabaseParams`
-    #      configuration [default: True]. This will prevent the deletion of the shared memory when
-    #      the last process disconnects. In other words, revoking this flag allows you to preserve
-    #      and use the state of a shared memory across multiple program restarts. However, while
-    #      desirable in some situations, this is not the behavior we need here, because this
-    #      notebook cell should be allowed to be executed repeatedly without relying on risidual
-    #      state.
-    #
-    #   b) Another approach is to ensure that the all other processes that should attach have
-    #      attached. Here we achieve this by simply monitoring the `initialized` cross process
-    #      counter variable that we used in (1). Once it hits `num_max_processes` we can be sure
-    #      that each subprocess has properly connected.
-    while initialized.value != num_max_processes:
-        print(f'Subprocess {name} await other processes...')
-        time.sleep(1)
-    
-    # 5. Load query data.
-    df = pd.read_parquet("data_parquet/val/gen_0.parquet")
-    dense_input_columns = df.columns[1:11]
-    cat_input1_columns = df.columns[11:13]
-    cat_input2_columns = df.columns[13:15]
-    dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
-    cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
-    cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
-
-    # 6. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
-    embedding1 = hps.lookup(cat_input1.flatten(), "hps_demo", 0,device_id).reshape(batch_size, 2, 16)
-    embedding2 = hps.lookup(cat_input2.flatten(), "hps_demo", 1,device_id).reshape(batch_size, 2, 32)
-    sess = ort.InferenceSession("hps_demo_without_embedding.onnx")
-    res = sess.run(output_names=[sess.get_outputs()[0].name],
-                   input_feed={sess.get_inputs()[0].name: dense_input,
-                   sess.get_inputs()[1].name: embedding1,
-                   sess.get_inputs()[2].name: embedding2})
-    pred = res[0]
-
-    # 7. Check the correctness by comparing with dumped evaluation results.
-    ground_truth = np.load("ground_truth.npy").flatten()
-    print(f'Subprocess {name}; ground_truth: {ground_truth}')
-    diff = pred.flatten()-ground_truth
-    mse = np.mean(diff*diff)
-    print(f'Subprocess {name}; pred: {pred}')
-    print(f'Subprocess {name}; mse between pred and ground_truth: {mse}')
-
-    # 8. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
-    sess_ref = ort.InferenceSession("hps_demo_with_embedding.onnx")
-    res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
-                   input_feed={sess_ref.get_inputs()[0].name: dense_input,
-                   sess_ref.get_inputs()[1].name: cat_input1,
-                   sess_ref.get_inputs()[2].name: cat_input2})
-    pred_ref = res_ref[0]
-    diff_ref = pred_ref.flatten()-ground_truth
-    mse_ref = np.mean(diff_ref*diff_ref)
-    print(f'Subprocess {name}; pred_ref: {pred_ref}')
-    print(f'Subprocess {name}; mse between pred_ref and ground_truth: {mse_ref}')
-
-    print(f'Subprocess {name} exiting...')
-
-if __name__ == '__main__':
-    # Destroy shared memory.
-    try:
-        os.remove('/dev/shm/hctr_mp_hash_map_database')
-    except:
-        pass
-    
-    initialized = mp.Value('i', 0)
-
-    # Create sub processes.
-    processes = [
-        mp.Process(target=create_hps, args=('primary', initialized, 0, 3)),
-        mp.Process(target=create_hps, args=('secondary', initialized, 1, 3)),
-        mp.Process(target=create_hps, args=('secondary', initialized, 2, 3)),
-    ]
-    for p in processes:
-        p.start()
-
-    # Go to sleep until subprocesses are initialized.
-    while initialized.value < len(processes):
-        print(f'Main process; awaiting subprocess initialization... So far {initialized.value} initialized...')
-        time.sleep(1)
-        
-    # Wait for subprocesses to exit.
-    for i, p in enumerate(processes):
-        print(f'Main process; awaiting subprocess {i} to exit...')
-        p.join()
-    print(f'Main process; exiting...')
-
-
-
-
-
Writing multi_process_hps.py
-
-
-
-
-
-
-
!python3 multi_process_hps.py
-
-
-
-
-
subprocess:primary(1394)launch...
-[HCTR][06:48:37.272][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-====================================================HPS Create====================================================
-[HCTR][06:48:37.272][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
-[HCTR][06:48:37.272][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
-subprocess:secondary(1396)launch...
-Subprocess secondary awaiting SHM initialization...
-Main process; awaiting subprocess initialization... So far 0 initialized...
-subprocess:secondary(1397)launch...
-Subprocess secondary awaiting SHM initialization...
-[HCTR][06:48:37.772][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 269706559488 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 17179868672 bytes; other processes connected = 0
-[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:48:37.773][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:48:37.773][DEBUG][RK0][main]: Created raw model loader in local memory!
-Subprocess secondary awaiting SHM initialization...
-Main process; awaiting subprocess initialization... So far 0 initialized...
-Subprocess secondary awaiting SHM initialization...
-[HCTR][06:48:38.313][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (MultiProcessHashMapBackend); load: 18488 / 18446744073709551615 (0.00%).
-[HCTR][06:48:38.947][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (MultiProcessHashMapBackend); load: 18470 / 18446744073709551615 (0.00%).
-Subprocess secondary awaiting SHM initialization...
-Main process; awaiting subprocess initialization... So far 0 initialized...
-Subprocess secondary awaiting SHM initialization...
-Subprocess secondary awaiting SHM initialization...
-Main process; awaiting subprocess initialization... So far 0 initialized...
-Subprocess secondary awaiting SHM initialization...
-Subprocess secondary awaiting SHM initialization...
-Main process; awaiting subprocess initialization... So far 0 initialized...
-Subprocess secondary awaiting SHM initialization...
-[HCTR][06:48:41.289][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:48:41.289][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][06:48:41.295][INFO][RK0][main]: Model name: hps_demo
-[HCTR][06:48:41.295][INFO][RK0][main]: Max batch size: 1024
-[HCTR][06:48:41.295][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][06:48:41.295][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][06:48:41.295][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][06:48:41.295][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][06:48:41.295][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:48:41.295][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:48:41.295][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:48:41.295][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][06:48:41.295][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:48:41.295][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][06:48:41.311][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][06:48:41.311][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
-Subprocess primary initialized
-Subprocess primary await other processes...
-Main process; awaiting subprocess initialization... So far 1 initialized...
-[HCTR][06:48:42.279][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-====================================================HPS Create====================================================
-[HCTR][06:48:42.280][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
-[HCTR][06:48:42.281][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
-[HCTR][06:48:42.281][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-====================================================HPS Create====================================================
-[HCTR][06:48:42.282][INFO][RK0][main]: Creating Multi-Process HashMap CPU database backend...
-[HCTR][06:48:42.282][INFO][RK0][main]: Connecting to shared memory 'hctr_mp_hash_map_database'...
-Subprocess primary await other processes...
-[HCTR][06:48:42.781][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
-[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:48:42.781][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:48:42.781][DEBUG][RK0][main]: Created raw model loader in local memory!
-Main process; awaiting subprocess initialization... So far 1 initialized...
-[HCTR][06:48:43.281][INFO][RK0][main]: Connected to shared memory 'hctr_mp_hash_map_database'; OS total = 270453215232 bytes, OS available = 260310085632 bytes, HCTR allocated = 17179869184 bytes, HCTR free = 7783505728 bytes; other processes connected = 1
-[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:48:43.281][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:48:43.281][DEBUG][RK0][main]: Created raw model loader in local memory!
-Subprocess primary await other processes...
-Main process; awaiting subprocess initialization... So far 1 initialized...
-Subprocess primary await other processes...
-Main process; awaiting subprocess initialization... So far 1 initialized...
-Subprocess primary await other processes...
-[HCTR][06:48:45.440][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:48:45.441][INFO][RK0][main]: Creating embedding cache in device 1.
-[HCTR][06:48:45.463][INFO][RK0][main]: Model name: hps_demo
-[HCTR][06:48:45.463][INFO][RK0][main]: Max batch size: 1024
-[HCTR][06:48:45.463][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][06:48:45.463][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][06:48:45.463][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][06:48:45.463][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][06:48:45.463][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:48:45.463][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:48:45.463][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:48:45.463][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][06:48:45.463][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:48:45.463][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][06:48:45.706][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:48:45.706][INFO][RK0][main]: Creating embedding cache in device 2.
-[HCTR][06:48:45.711][INFO][RK0][main]: Model name: hps_demo
-[HCTR][06:48:45.711][INFO][RK0][main]: Max batch size: 1024
-[HCTR][06:48:45.711][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][06:48:45.711][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][06:48:45.711][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][06:48:45.711][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][06:48:45.711][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:48:45.711][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:48:45.711][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:48:45.711][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][06:48:45.711][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:48:45.711][INFO][RK0][main]: The refresh percentage : 0.000000
-Main process; awaiting subprocess initialization... So far 1 initialized...
-Subprocess primary await other processes...
-[HCTR][06:48:46.699][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][06:48:46.699][INFO][RK0][main]: Creating lookup session for hps_demo on device: 1
-Subprocess secondary initialized
-Subprocess secondary await other processes...
-[HCTR][06:48:46.764][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][06:48:46.764][INFO][RK0][main]: Creating lookup session for hps_demo on device: 2
-Subprocess secondary initialized
-2023-09-20 06:48:46.842594773 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-Subprocess secondary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
-Subprocess secondary; pred: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
-Subprocess secondary; pred_ref: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
-Subprocess secondary exiting...
-[HCTR][06:48:46.900][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
-Main process; awaiting subprocess 0 to exit...
-2023-09-20 06:48:47.497305659 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-Subprocess primary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
-Subprocess primary; pred: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess primary; mse between pred and ground_truth: 2.3887142264200634e-15
-Subprocess primary; pred_ref: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess primary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
-Subprocess primary exiting...
-[HCTR][06:48:47.568][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
-2023-09-20 06:48:48.101124718 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-Subprocess secondary; ground_truth: [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
-Subprocess secondary; pred: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess secondary; mse between pred and ground_truth: 2.3887142264200634e-15
-Subprocess secondary; pred_ref: [[0.48954916]
- [0.50902206]
- [0.38192907]
- ...
- [0.52649266]
- [0.5065045 ]
- [0.4792769 ]]
-Subprocess secondary; mse between pred_ref and ground_truth: 2.3887142264200634e-15
-Subprocess secondary exiting...
-[HCTR][06:48:48.176][INFO][RK0][main]: Disconnecting from shared memory 'hctr_mp_hash_map_database'.
-Main process; awaiting subprocess 1 to exit...
-[HCTR][06:48:48.687][INFO][RK0][main]: Detached last process from shared memory 'hctr_mp_hash_map_database'. Auto remove in progress...
-Main process; awaiting subprocess 2 to exit...
-Main process; exiting...
-
-
-
-
-

-
-
-

4. Redis Cluster deployment (without TLS/SSL)

-

HugeCTR can use Redis clusters as backing storage. In the following steps we show how to setup a mock Redis / HugeCTR deployment in a single machine. We assume that you have started this notebook in a HugeCTR docker container.

-

Step 1: Get + build Redis

-
-
-
!rm -f 7.0.8.tar.gz && wget https://github.com/redis/redis/archive/7.0.8.tar.gz
-!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
-!cd redis && make
-
-
-
-
-
--2023-09-20 06:49:01--  https://github.com/redis/redis/archive/7.0.8.tar.gz
-Resolving github.com (github.com)... 192.30.255.112
-Connecting to github.com (github.com)|192.30.255.112|:443... connected.
-HTTP request sent, awaiting response... 302 Found
-Location: https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8 [following]
---2023-09-20 06:49:01--  https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8
-Resolving codeload.github.com (codeload.github.com)... 192.30.255.120
-Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
-HTTP request sent, awaiting response... 200 OK
-Length: unspecified [application/x-gzip]
-Saving to: ‘7.0.8.tar.gz’
-
-7.0.8.tar.gz            [   <=>              ]   2.87M  5.50MB/s    in 0.5s    
-
-2023-09-20 06:49:02 (5.50 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]
-
-cd src && make all
-make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
-./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
-    CC Makefile.dep
-./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
-rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep
-rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
-(cd ../deps && make distclean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd hiredis && make clean) > /dev/null || true
-(cd linenoise && make clean) > /dev/null || true
-(cd lua && make clean) > /dev/null || true
-(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
-(cd hdr_histogram && make clean) > /dev/null || true
-(rm -f .make-*)
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd modules && make clean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
-rm -rf *.xo *.so
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
-(cd ../tests/modules && make clean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
-rm -f commandfilter.so basics.so testrdb.so fork.so infotest.so propagate.so misc.so hooks.so blockonkeys.so blockonbackground.so scan.so datatype.so datatype2.so auth.so keyspace_events.so blockedclient.so getkeys.so getchannels.so test_lazyfree.so timer.so defragtest.so keyspecs.so hash.so zset.so stream.so mallocsize.so aclcheck.so list.so subcommands.so reply.so cmdintrospection.so eventloop.so moduleconfigs.so moduleconfigstwo.so publish.so usercall.so commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
-(rm -f .make-*)
-echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
-echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
-echo OPT=-O2 >> .make-settings
-echo MALLOC=jemalloc >> .make-settings
-echo BUILD_TLS= >> .make-settings
-echo USE_SYSTEMD= >> .make-settings
-echo CFLAGS= >> .make-settings
-echo LDFLAGS= >> .make-settings
-echo REDIS_CFLAGS= >> .make-settings
-echo REDIS_LDFLAGS= >> .make-settings
-echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb   -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include >> .make-settings
-echo PREV_FINAL_LDFLAGS=  -g -ggdb -rdynamic >> .make-settings
-(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd hiredis && make clean) > /dev/null || true
-(cd linenoise && make clean) > /dev/null || true
-(cd lua && make clean) > /dev/null || true
-(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
-(cd hdr_histogram && make clean) > /dev/null || true
-(rm -f .make-*)
-(echo "" > .make-cflags)
-(echo "" > .make-ldflags)
-MAKE hiredis
-cd hiredis && make static 
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
-cc -std=c99 -c -O3 -fPIC   -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
-ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
-MAKE linenoise
-cd linenoise && make
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
-cc  -Wall -Os -g  -c linenoise.c
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
-MAKE lua
-cd lua/src && make all CFLAGS="-Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2 " MYLDFLAGS="" AR="ar rc"
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lapi.o lapi.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lcode.o lcode.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldebug.o ldebug.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldo.o ldo.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldump.o ldump.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lfunc.o lfunc.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lgc.o lgc.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o llex.o llex.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmem.o lmem.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lobject.o lobject.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lopcodes.o lopcodes.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lparser.o lparser.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstate.o lstate.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstring.o lstring.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltable.o ltable.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltm.o ltm.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lundump.o lundump.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lvm.o lvm.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lzio.o lzio.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o strbuf.o strbuf.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o fpconv.o fpconv.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lauxlib.o lauxlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lbaselib.o lbaselib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldblib.o ldblib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o liolib.o liolib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmathlib.o lmathlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loslib.o loslib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltablib.o ltablib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstrlib.o lstrlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loadlib.o loadlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o linit.o linit.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cjson.o lua_cjson.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_struct.o lua_struct.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cmsgpack.o lua_cmsgpack.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_bit.o lua_bit.c
-ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o	# DLL needs all object files
-ranlib liblua.a
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua.o lua.c
-cc -o lua  lua.o liblua.a -lm 
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o luac.o luac.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o print.o print.c
-cc -o luac  luac.o print.o liblua.a -lm 
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
-MAKE hdr_histogram
-cd hdr_histogram && make
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
-cc -std=c99 -Wall -Os -g  -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c  hdr_histogram.c 
-ar rcs libhdrhistogram.a hdr_histogram.o
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
-MAKE jemalloc
-cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" 
-checking for xsltproc... false
-checking for gcc... gcc
-checking whether the C compiler works... yes
-checking for C compiler default output file name... a.out
-checking for suffix of executables... 
-checking whether we are cross compiling... no
-checking for suffix of object files... o
-checking whether we are using the GNU C compiler... yes
-checking whether gcc accepts -g... yes
-checking for gcc option to accept ISO C89... none needed
-checking whether compiler is cray... no
-checking whether compiler supports -std=gnu11... yes
-checking whether compiler supports -Wall... yes
-checking whether compiler supports -Wextra... yes
-checking whether compiler supports -Wshorten-64-to-32... no
-checking whether compiler supports -Wsign-compare... yes
-checking whether compiler supports -Wundef... yes
-checking whether compiler supports -Wno-format-zero-length... yes
-checking whether compiler supports -pipe... yes
-checking whether compiler supports -g3... yes
-checking how to run the C preprocessor... gcc -E
-checking for g++... g++
-checking whether we are using the GNU C++ compiler... yes
-checking whether g++ accepts -g... yes
-checking whether g++ supports C++14 features by default... yes
-checking whether compiler supports -Wall... yes
-checking whether compiler supports -Wextra... yes
-checking whether compiler supports -g3... yes
-checking whether libstdc++ linkage is compilable... yes
-checking for grep that handles long lines and -e... /usr/bin/grep
-checking for egrep... /usr/bin/grep -E
-checking for ANSI C header files... yes
-checking for sys/types.h... yes
-checking for sys/stat.h... yes
-checking for stdlib.h... yes
-checking for string.h... yes
-checking for memory.h... yes
-checking for strings.h... yes
-checking for inttypes.h... yes
-checking for stdint.h... yes
-checking for unistd.h... yes
-checking whether byte ordering is bigendian... no
-checking size of void *... 8
-checking size of int... 4
-checking size of long... 8
-checking size of long long... 8
-checking size of intmax_t... 8
-checking build system type... x86_64-pc-linux-gnu
-checking host system type... x86_64-pc-linux-gnu
-checking whether pause instruction is compilable... yes
-checking number of significant virtual address bits... 48
-checking for ar... ar
-checking for nm... nm
-checking for gawk... no
-checking for mawk... mawk
-checking malloc.h usability... yes
-checking malloc.h presence... yes
-checking for malloc.h... yes
-checking whether malloc_usable_size definition can use const argument... no
-checking for library containing log... -lm
-checking whether __attribute__ syntax is compilable... yes
-checking whether compiler supports -fvisibility=hidden... yes
-checking whether compiler supports -fvisibility=hidden... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether tls_model attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether alloc_size attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(gnu_printf, ...) attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(printf, ...) attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(printf, ...) attribute is compilable... yes
-checking for a BSD-compatible install... /usr/bin/install -c
-checking for ranlib... ranlib
-checking for ld... /usr/bin/ld
-checking for autoconf... /usr/bin/autoconf
-checking for memalign... yes
-checking for valloc... yes
-checking whether compiler supports -O3... yes
-checking whether compiler supports -O3... yes
-checking whether compiler supports -funroll-loops... yes
-checking configured backtracing method... N/A
-checking for sbrk... yes
-checking whether utrace(2) is compilable... no
-checking whether a program using __builtin_unreachable is compilable... yes
-checking whether a program using __builtin_ffsl is compilable... yes
-checking whether a program using __builtin_popcountl is compilable... yes
-checking LG_PAGE... 12
-checking pthread.h usability... yes
-checking pthread.h presence... yes
-checking for pthread.h... yes
-checking for pthread_create in -lpthread... yes
-checking dlfcn.h usability... yes
-checking dlfcn.h presence... yes
-checking for dlfcn.h... yes
-checking for dlsym... yes
-checking whether pthread_atfork(3) is compilable... yes
-checking whether pthread_setname_np(3) is compilable... yes
-checking for library containing clock_gettime... none required
-checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
-checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
-checking whether mach_absolute_time() is compilable... no
-checking whether compiler supports -Werror... yes
-checking whether syscall(2) is compilable... yes
-checking for secure_getenv... yes
-checking for sched_getcpu... yes
-checking for sched_setaffinity... yes
-checking for issetugid... no
-checking for _malloc_thread_cleanup... no
-checking for _pthread_mutex_init_calloc_cb... no
-checking for TLS... yes
-checking whether C11 atomics is compilable... no
-checking whether GCC __atomic atomics is compilable... yes
-checking whether GCC 8-bit __atomic atomics is compilable... yes
-checking whether GCC __sync atomics is compilable... yes
-checking whether GCC 8-bit __sync atomics is compilable... yes
-checking whether Darwin OSAtomic*() is compilable... no
-checking whether madvise(2) is compilable... yes
-checking whether madvise(..., MADV_FREE) is compilable... yes
-checking whether madvise(..., MADV_DONTNEED) is compilable... yes
-checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
-checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
-checking for __builtin_clz... yes
-checking whether Darwin os_unfair_lock_*() is compilable... no
-checking whether glibc malloc hook is compilable... no
-checking whether glibc memalign hook is compilable... no
-checking whether pthreads adaptive mutexes is compilable... yes
-checking whether compiler supports -D_GNU_SOURCE... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether strerror_r returns char with gnu source is compilable... yes
-checking for stdbool.h that conforms to C99... yes
-checking for _Bool... yes
-configure: creating ./config.status
-config.status: creating Makefile
-config.status: creating jemalloc.pc
-config.status: creating doc/html.xsl
-config.status: creating doc/manpages.xsl
-config.status: creating doc/jemalloc.xml
-config.status: creating include/jemalloc/jemalloc_macros.h
-config.status: creating include/jemalloc/jemalloc_protos.h
-config.status: creating include/jemalloc/jemalloc_typedefs.h
-config.status: creating include/jemalloc/internal/jemalloc_preamble.h
-config.status: creating test/test.sh
-config.status: creating test/include/test/jemalloc_test.h
-config.status: creating config.stamp
-config.status: creating bin/jemalloc-config
-config.status: creating bin/jemalloc.sh
-config.status: creating bin/jeprof
-config.status: creating include/jemalloc/jemalloc_defs.h
-config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
-config.status: creating test/include/test/jemalloc_test_defs.h
-config.status: executing include/jemalloc/internal/public_symbols.txt commands
-config.status: executing include/jemalloc/internal/private_symbols.awk commands
-config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
-config.status: executing include/jemalloc/internal/public_namespace.h commands
-config.status: executing include/jemalloc/internal/public_unnamespace.h commands
-config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
-config.status: executing include/jemalloc/jemalloc_rename.h commands
-config.status: executing include/jemalloc/jemalloc_mangle.h commands
-config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
-config.status: executing include/jemalloc/jemalloc.h commands
-===============================================================================
-jemalloc version   : 5.2.1-0-g0
-library revision   : 2
-
-CONFIG             : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
-CC                 : gcc
-CONFIGURE_CFLAGS   : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
-SPECIFIED_CFLAGS   : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops 
-EXTRA_CFLAGS       : 
-CPPFLAGS           : -D_GNU_SOURCE -D_REENTRANT
-CXX                : g++
-CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
-SPECIFIED_CXXFLAGS : 
-EXTRA_CXXFLAGS     : 
-LDFLAGS            : 
-EXTRA_LDFLAGS      : 
-DSO_LDFLAGS        : -shared -Wl,-soname,$(@F)
-LIBS               : -lm -lstdc++ -pthread
-RPATH_EXTRA        : 
-
-XSLTPROC           : false
-XSLROOT            : 
-
-PREFIX             : /usr/local
-BINDIR             : /usr/local/bin
-DATADIR            : /usr/local/share
-INCLUDEDIR         : /usr/local/include
-LIBDIR             : /usr/local/lib
-MANDIR             : /usr/local/share/man
-
-srcroot            : 
-abs_srcroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
-objroot            : 
-abs_objroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
-
-JEMALLOC_PREFIX    : je_
-JEMALLOC_PRIVATE_NAMESPACE
-                   : je_
-install_suffix     : 
-malloc_conf        : 
-documentation      : 1
-shared libs        : 1
-static libs        : 1
-autogen            : 0
-debug              : 0
-stats              : 1
-experimetal_smallocx : 0
-prof               : 0
-prof-libunwind     : 0
-prof-libgcc        : 0
-prof-gcc           : 0
-fill               : 1
-utrace             : 0
-xmalloc            : 0
-log                : 0
-lazy_lock          : 0
-cache-oblivious    : 1
-cxx                : 1
-===============================================================================
-cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
-nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
-nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
-nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
-nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
-nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
-nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
-nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
-nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
-nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
-nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
-nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
-nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
-nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
-nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
-nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
-nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
-nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
-nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
-nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
-nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
-nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
-nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
-nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
-nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
-nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
-nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
-nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
-nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
-nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
-nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
-nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
-nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
-nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
-/bin/sh include/jemalloc/internal/private_namespace.sh src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
-cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
-g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
-ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
-ar: `u' modifier ignored since `D' is the default (see `U')
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-    CC adlist.o
-    CC quicklist.o
-    CC ae.o
-    CC anet.o
-    CC dict.o
-    CC server.o
-    CC sds.o
-    CC zmalloc.o
-    CC lzf_c.o
-    CC lzf_d.o
-    CC pqsort.o
-    CC zipmap.o
-    CC sha1.o
-    CC ziplist.o
-    CC release.o
-    CC networking.o
-    CC util.o
-    CC object.o
-    CC db.o
-    CC replication.o
-    CC rdb.o
-    CC t_string.o
-    CC t_list.o
-    CC t_set.o
-    CC t_zset.o
-    CC t_hash.o
-    CC config.o
-    CC aof.o
-    CC pubsub.o
-    CC multi.o
-    CC debug.o
-    CC sort.o
-    CC intset.o
-    CC syncio.o
-    CC cluster.o
-    CC crc16.o
-    CC endianconv.o
-    CC slowlog.o
-    CC eval.o
-    CC bio.o
-    CC rio.o
-    CC rand.o
-    CC memtest.o
-    CC syscheck.o
-    CC crcspeed.o
-    CC crc64.o
-    CC bitops.o
-    CC sentinel.o
-    CC notify.o
-    CC setproctitle.o
-    CC blocked.o
-    CC hyperloglog.o
-    CC latency.o
-    CC sparkline.o
-    CC redis-check-rdb.o
-    CC redis-check-aof.o
-    CC geo.o
-    CC lazyfree.o
-    CC module.o
-    CC evict.o
-    CC expire.o
-    CC geohash.o
-    CC geohash_helper.o
-    CC childinfo.o
-    CC defrag.o
-    CC siphash.o
-    CC rax.o
-    CC t_stream.o
-    CC listpack.o
-    CC localtime.o
-    CC lolwut.o
-    CC lolwut5.o
-    CC lolwut6.o
-    CC acl.o
-    CC tracking.o
-    CC connection.o
-    CC tls.o
-    CC sha256.o
-    CC timeout.o
-    CC setcpuaffinity.o
-    CC monotonic.o
-    CC mt19937-64.o
-    CC resp_parser.o
-    CC call_reply.o
-    CC script_lua.o
-    CC script.o
-    CC functions.o
-    CC function_lua.o
-    CC commands.o
-    LINK redis-server
-    INSTALL redis-sentinel
-    CC redis-cli.o
-    CC redisassert.o
-    CC cli_common.o
-    LINK redis-cli
-    CC redis-benchmark.o
-    LINK redis-benchmark
-    INSTALL redis-check-rdb
-    INSTALL redis-check-aof
-
-Hint: It's a good idea to run 'make test' ;)
-
-make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
-
-
-
-
-

If you see the message Hint: It's a good idea to run 'make test' ;) followed by make[1]: Leaving directory ..., the compilation should have completed successfully.

-

Step 2: Configure a mock Redis cluster

-

WARNING: The following commands will erase the all contents in the following directories: redis-server-1, redis-server-2 and redis-server-3.

-
-
-
!mkdir -p redis-server-1 redis-server-2 redis-server-3
-!rm -f redis-server-1/* redis-server-2/* redis-server-3/*
-
-!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
-!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
-!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server
-
-
-
-
-
-
-
%%writefile redis-server-1/redis.conf
-daemonize yes
-port 7000
-cluster-enabled yes
-cluster-config-file nodes.conf
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-1/redis.conf
-
-
-
-
-
-
-
%%writefile redis-server-2/redis.conf
-daemonize yes
-port 7001
-cluster-enabled yes
-cluster-config-file nodes.conf
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-2/redis.conf
-
-
-
-
-
-
-
%%writefile redis-server-3/redis.conf
-daemonize yes
-port 7002
-cluster-enabled yes
-cluster-config-file nodes.conf
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-3/redis.conf
-
-
-
-
-

Step 3: Form Redis cluster

-

WARNING: The following command will shutdown any processes called redis-cluster in the current system!

-
-
-
# Shutdown existing cluster (if any).
-!pkill redis-server
-
-# Reset configuration and start 3 Redis servers.
-!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
-!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
-!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf
-
-# Form the cluster.
-!redis/src/redis-cli \
-    --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
-    --cluster-yes
-
-
-
-
-
>>> Performing hash slots allocation on 3 nodes...
-Master[0] -> Slots 0 - 5460
-Master[1] -> Slots 5461 - 10922
-Master[2] -> Slots 10923 - 16383
-M: fa9bb82124685a6438a696cc1562693ccc815ff0 127.0.0.1:7000
-   slots:[0-5460] (5461 slots) master
-M: c6d7ad6353bf568d17a147e65b8198ded9d65717 127.0.0.1:7001
-   slots:[5461-10922] (5462 slots) master
-M: e26ae6cfbeea8a1e6367444445364d963ae17436 127.0.0.1:7002
-   slots:[10923-16383] (5461 slots) master
->>> Nodes configuration updated
->>> Assign a different config epoch to each node
->>> Sending CLUSTER MEET messages to join the cluster
-Waiting for the cluster to join
-.
->>> Performing Cluster Check (using node 127.0.0.1:7000)
-M: fa9bb82124685a6438a696cc1562693ccc815ff0 127.0.0.1:7000
-   slots:[0-5460] (5461 slots) master
-M: e26ae6cfbeea8a1e6367444445364d963ae17436 127.0.0.1:7002
-   slots:[10923-16383] (5461 slots) master
-M: c6d7ad6353bf568d17a147e65b8198ded9d65717 127.0.0.1:7001
-   slots:[5461-10922] (5462 slots) master
-[OK] All nodes agree about slots configuration.
->>> Check for open slots...
->>> Check slots coverage...
-[OK] All 16384 slots covered.
-
-
-
-
-
-

Step 4: Run HugeCTR

-
-
-
import os
-import time
-import multiprocessing as mp
-import pandas as pd
-import numpy as np
-import onnxruntime as ort
-from hugectr import DatabaseType_t
-from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
-
-slot_size_array = [10000, 10000, 10000, 10000]
-key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
-batch_size = 1024
-
-print('Launching...')
-
-# 1. Configure the HPS hyperparameters.
-ps_config = ParameterServerConfig(
-       emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
-       embedding_vec_size = {'hps_demo': [16, 32]},
-       max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
-       inference_params_array = [
-          InferenceParams(
-            model_name = 'hps_demo',
-            max_batchsize = batch_size,
-            hit_rate_threshold = 1.0,
-            dense_model_file = '',
-            sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
-            deployed_devices = [0],
-            use_gpu_embedding_cache = True,
-            cache_size_percentage = 0.5,
-            i64_input_key = True)
-       ],
-       volatile_db = VolatileDatabaseParams(
-            DatabaseType_t.redis_cluster,
-            address = '127.0.0.1:7000',
-            num_partitions = 15,
-            num_node_connections = 5,
-       ))
-
-# 2. Initialize the HPS object.
-hps = HPS(ps_config)
-print('HPS initialized')
-
-# 3. Load query data.
-df = pd.read_parquet('data_parquet/val/gen_0.parquet')
-dense_input_columns = df.columns[1:11]
-cat_input1_columns = df.columns[11:13]
-cat_input2_columns = df.columns[13:15]
-dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
-cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
-cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
-
-# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
-embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
-embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
-sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
-res = sess.run(output_names=[sess.get_outputs()[0].name],
-               input_feed={sess.get_inputs()[0].name: dense_input,
-               sess.get_inputs()[1].name: embedding1,
-               sess.get_inputs()[2].name: embedding2})
-pred = res[0].flatten()
-
-# 5. Check the correctness by comparing with dumped evaluation results.
-ground_truth = np.load("ground_truth.npy").flatten()
-print('-------------------------------------------------------------------------------')
-print('                         HPS demo without embedding                            ')
-print('-------------------------------------------------------------------------------')
-print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
-print('-------------------------------------------------------------------------------')
-print(f'Prediction without embedding: {pred.shape} = {pred}')
-
-diff = pred - ground_truth
-mse = np.mean(diff * diff)
-print(f'MSE between prediction and ground_truth: {mse}')
-
-# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
-sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
-res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
-               input_feed={sess_ref.get_inputs()[0].name: dense_input,
-               sess_ref.get_inputs()[1].name: cat_input1,
-               sess_ref.get_inputs()[2].name: cat_input2})
-pred_ref = res_ref[0].flatten()
-
-print('-------------------------------------------------------------------------------')
-print('                           HPS demo with embedding                             ')
-print('-------------------------------------------------------------------------------')
-print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
-print('-------------------------------------------------------------------------------')
-print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')
-
-diff_ref = pred_ref.flatten() - ground_truth
-mse_ref = np.mean(diff_ref * diff_ref)
-print(f'MSE between prediction and ground_truth: {mse_ref}')
-
-
-
-
-
Launching...
-HPS initialized
-[HCTR][06:54:27.572][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-====================================================HPS Create====================================================
-[HCTR][06:54:27.572][INFO][RK0][main]: Creating RedisCluster backend...
-[HCTR][06:54:27.577][INFO][RK0][main]: RedisCluster: Connecting via 127.0.0.1:7000...
-[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][06:54:27.577][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][06:54:27.577][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][06:54:27.753][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
-[HCTR][06:54:27.873][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
-[HCTR][06:54:30.134][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][06:54:30.134][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][06:54:30.140][INFO][RK0][main]: Model name: hps_demo
-[HCTR][06:54:30.140][INFO][RK0][main]: Max batch size: 1024
-[HCTR][06:54:30.140][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][06:54:30.140][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][06:54:30.140][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][06:54:30.140][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][06:54:30.140][INFO][RK0][main]: Use I64 input key: True
-[HCTR][06:54:30.140][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][06:54:30.140][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][06:54:30.140][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][06:54:30.140][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][06:54:30.140][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][06:54:30.156][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][06:54:30.156][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
--------------------------------------------------------------------------------
-                         HPS demo without embedding                            
--------------------------------------------------------------------------------
-Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
--------------------------------------------------------------------------------
-Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
-MSE between prediction and ground_truth: 2.3887142264200634e-15
--------------------------------------------------------------------------------
-                           HPS demo with embedding                             
--------------------------------------------------------------------------------
-Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
--------------------------------------------------------------------------------
-Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
-MSE between prediction and ground_truth: 2.3887142264200634e-15
-
-
-
2023-09-20 06:54:30.230052244 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-
-
-
-
-

Step 5: Shutdown Redis cluster

-
-
-
!pkill redis-server
-
-
-
-
-

-
-
-

5. Redis Cluster deployment (with TLS/SSL)

-

When using Redis as backing storage, HugeCTR can use make use of TLS/SSL to encrypt data transfers. In the following steps we setupt a small Redis cluster and enable SSL for it.

-

Step 1: Build a TLS/SSL capable distribution of Redis

-
-
-
!rm -f 7.0.8.tar.gz && wget https://github.com/redis/redis/archive/7.0.8.tar.gz
-!rm -rf redis-7.0.8 && tar -xf 7.0.8.tar.gz && ln -sf redis-7.0.8 redis
-!cd redis && make BUILD_TLS=yes
-
-
-
-
-
--2023-09-20 06:55:14--  https://github.com/redis/redis/archive/7.0.8.tar.gz
-Resolving github.com (github.com)... 192.30.255.112
-Connecting to github.com (github.com)|192.30.255.112|:443... connected.
-HTTP request sent, awaiting response... 302 Found
-Location: https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8 [following]
---2023-09-20 06:55:14--  https://codeload.github.com/redis/redis/tar.gz/refs/tags/7.0.8
-Resolving codeload.github.com (codeload.github.com)... 192.30.255.121
-Connecting to codeload.github.com (codeload.github.com)|192.30.255.121|:443... connected.
-HTTP request sent, awaiting response... 200 OK
-Length: unspecified [application/x-gzip]
-Saving to: ‘7.0.8.tar.gz’
-
-7.0.8.tar.gz            [     <=>            ]   2.87M  3.24MB/s    in 0.9s    
-
-2023-09-20 06:55:15 (3.24 MB/s) - ‘7.0.8.tar.gz’ saved [3011655]
-
-cd src && make all
-make[1]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
-./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
-    CC Makefile.dep
-./mkreleasehdr.sh: line 2: echo: write error: Broken pipe
-rm -rf redis-server redis-sentinel redis-cli redis-benchmark redis-check-rdb redis-check-aof *.o *.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep
-rm -f adlist.d quicklist.d ae.d anet.d dict.d server.d sds.d zmalloc.d lzf_c.d lzf_d.d pqsort.d zipmap.d sha1.d ziplist.d release.d networking.d util.d object.d db.d replication.d rdb.d t_string.d t_list.d t_set.d t_zset.d t_hash.d config.d aof.d pubsub.d multi.d debug.d sort.d intset.d syncio.d cluster.d crc16.d endianconv.d slowlog.d eval.d bio.d rio.d rand.d memtest.d syscheck.d crcspeed.d crc64.d bitops.d sentinel.d notify.d setproctitle.d blocked.d hyperloglog.d latency.d sparkline.d redis-check-rdb.d redis-check-aof.d geo.d lazyfree.d module.d evict.d expire.d geohash.d geohash_helper.d childinfo.d defrag.d siphash.d rax.d t_stream.d listpack.d localtime.d lolwut.d lolwut5.d lolwut6.d acl.d tracking.d connection.d tls.d sha256.d timeout.d setcpuaffinity.d monotonic.d mt19937-64.d resp_parser.d call_reply.d script_lua.d script.d functions.d function_lua.d commands.d anet.d adlist.d dict.d redis-cli.d zmalloc.d release.d ae.d redisassert.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d ae.d anet.d redis-benchmark.d adlist.d dict.d zmalloc.d redisassert.d release.d crcspeed.d crc64.d siphash.d crc16.d monotonic.d cli_common.d mt19937-64.d
-(cd ../deps && make distclean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd hiredis && make clean) > /dev/null || true
-(cd linenoise && make clean) > /dev/null || true
-(cd lua && make clean) > /dev/null || true
-(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
-(cd hdr_histogram && make clean) > /dev/null || true
-(rm -f .make-*)
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd modules && make clean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
-rm -rf *.xo *.so
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src/modules'
-(cd ../tests/modules && make clean)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
-rm -f commandfilter.so basics.so testrdb.so fork.so infotest.so propagate.so misc.so hooks.so blockonkeys.so blockonbackground.so scan.so datatype.so datatype2.so auth.so keyspace_events.so blockedclient.so getkeys.so getchannels.so test_lazyfree.so timer.so defragtest.so keyspecs.so hash.so zset.so stream.so mallocsize.so aclcheck.so list.so subcommands.so reply.so cmdintrospection.so eventloop.so moduleconfigs.so moduleconfigstwo.so publish.so usercall.so commandfilter.xo basics.xo testrdb.xo fork.xo infotest.xo propagate.xo misc.xo hooks.xo blockonkeys.xo blockonbackground.xo scan.xo datatype.xo datatype2.xo auth.xo keyspace_events.xo blockedclient.xo getkeys.xo getchannels.xo test_lazyfree.xo timer.xo defragtest.xo keyspecs.xo hash.xo zset.xo stream.xo mallocsize.xo aclcheck.xo list.xo subcommands.xo reply.xo cmdintrospection.xo eventloop.xo moduleconfigs.xo moduleconfigstwo.xo publish.xo usercall.xo
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/tests/modules'
-(rm -f .make-*)
-echo STD=-pedantic -DREDIS_STATIC='' -std=c11 >> .make-settings
-echo WARN=-Wall -W -Wno-missing-field-initializers >> .make-settings
-echo OPT=-O2 >> .make-settings
-echo MALLOC=jemalloc >> .make-settings
-echo BUILD_TLS=yes >> .make-settings
-echo USE_SYSTEMD= >> .make-settings
-echo CFLAGS= >> .make-settings
-echo LDFLAGS= >> .make-settings
-echo REDIS_CFLAGS= >> .make-settings
-echo REDIS_LDFLAGS= >> .make-settings
-echo PREV_FINAL_CFLAGS=-pedantic -DREDIS_STATIC='' -std=c11 -Wall -W -Wno-missing-field-initializers -O2 -g -ggdb   -I../deps/hiredis -I../deps/linenoise -I../deps/lua/src -I../deps/hdr_histogram -DUSE_JEMALLOC -I../deps/jemalloc/include -DUSE_OPENSSL  >> .make-settings
-echo PREV_FINAL_LDFLAGS=  -g -ggdb -rdynamic  >> .make-settings
-(cd ../deps && make hiredis linenoise lua hdr_histogram jemalloc)
-make[2]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-(cd hiredis && make clean) > /dev/null || true
-(cd linenoise && make clean) > /dev/null || true
-(cd lua && make clean) > /dev/null || true
-(cd jemalloc && [ -f Makefile ] && make distclean) > /dev/null || true
-(cd hdr_histogram && make clean) > /dev/null || true
-(rm -f .make-*)
-(echo "" > .make-cflags)
-(echo "" > .make-ldflags)
-MAKE hiredis
-cd hiredis && make static USE_SSL=1
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic alloc.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic net.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic hiredis.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sds.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic async.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic read.c
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic sockcompat.c
-ar rcs libhiredis.a alloc.o net.o hiredis.o sds.o async.o read.o sockcompat.o
-cc -std=c99 -c -O3 -fPIC  -DHIREDIS_TEST_SSL -Wall -W -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -g -ggdb -pedantic ssl.c
-ar rcs libhiredis_ssl.a ssl.o
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hiredis'
-MAKE linenoise
-cd linenoise && make
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
-cc  -Wall -Os -g  -c linenoise.c
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/linenoise'
-MAKE lua
-cd lua/src && make all CFLAGS="-Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2 " MYLDFLAGS="" AR="ar rc"
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lapi.o lapi.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lcode.o lcode.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldebug.o ldebug.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldo.o ldo.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldump.o ldump.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lfunc.o lfunc.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lgc.o lgc.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o llex.o llex.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmem.o lmem.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lobject.o lobject.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lopcodes.o lopcodes.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lparser.o lparser.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstate.o lstate.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstring.o lstring.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltable.o ltable.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltm.o ltm.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lundump.o lundump.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lvm.o lvm.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lzio.o lzio.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o strbuf.o strbuf.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o fpconv.o fpconv.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lauxlib.o lauxlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lbaselib.o lbaselib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ldblib.o ldblib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o liolib.o liolib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lmathlib.o lmathlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loslib.o loslib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o ltablib.o ltablib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lstrlib.o lstrlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o loadlib.o loadlib.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o linit.o linit.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cjson.o lua_cjson.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_struct.o lua_struct.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_cmsgpack.o lua_cmsgpack.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua_bit.o lua_bit.c
-ar rc liblua.a lapi.o lcode.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o strbuf.o fpconv.o lauxlib.o lbaselib.o ldblib.o liolib.o lmathlib.o loslib.o ltablib.o lstrlib.o loadlib.o linit.o lua_cjson.o lua_struct.o lua_cmsgpack.o lua_bit.o	# DLL needs all object files
-ranlib liblua.a
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o lua.o lua.c
-cc -o lua  lua.o liblua.a -lm 
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o luac.o luac.c
-cc -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP  -O2    -c -o print.o print.c
-cc -o luac  luac.o print.o liblua.a -lm 
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/lua/src'
-MAKE hdr_histogram
-cd hdr_histogram && make
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
-cc -std=c99 -Wall -Os -g  -DHDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\" -c  hdr_histogram.c 
-ar rcs libhdrhistogram.a hdr_histogram.o
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/hdr_histogram'
-MAKE jemalloc
-cd jemalloc && ./configure --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" 
-checking for xsltproc... false
-checking for gcc... gcc
-checking whether the C compiler works... yes
-checking for C compiler default output file name... a.out
-checking for suffix of executables... 
-checking whether we are cross compiling... no
-checking for suffix of object files... o
-checking whether we are using the GNU C compiler... yes
-checking whether gcc accepts -g... yes
-checking for gcc option to accept ISO C89... none needed
-checking whether compiler is cray... no
-checking whether compiler supports -std=gnu11... yes
-checking whether compiler supports -Wall... yes
-checking whether compiler supports -Wextra... yes
-checking whether compiler supports -Wshorten-64-to-32... no
-checking whether compiler supports -Wsign-compare... yes
-checking whether compiler supports -Wundef... yes
-checking whether compiler supports -Wno-format-zero-length... yes
-checking whether compiler supports -pipe... yes
-checking whether compiler supports -g3... yes
-checking how to run the C preprocessor... gcc -E
-checking for g++... g++
-checking whether we are using the GNU C++ compiler... yes
-checking whether g++ accepts -g... yes
-checking whether g++ supports C++14 features by default... yes
-checking whether compiler supports -Wall... yes
-checking whether compiler supports -Wextra... yes
-checking whether compiler supports -g3... yes
-checking whether libstdc++ linkage is compilable... yes
-checking for grep that handles long lines and -e... /usr/bin/grep
-checking for egrep... /usr/bin/grep -E
-checking for ANSI C header files... yes
-checking for sys/types.h... yes
-checking for sys/stat.h... yes
-checking for stdlib.h... yes
-checking for string.h... yes
-checking for memory.h... yes
-checking for strings.h... yes
-checking for inttypes.h... yes
-checking for stdint.h... yes
-checking for unistd.h... yes
-checking whether byte ordering is bigendian... no
-checking size of void *... 8
-checking size of int... 4
-checking size of long... 8
-checking size of long long... 8
-checking size of intmax_t... 8
-checking build system type... x86_64-pc-linux-gnu
-checking host system type... x86_64-pc-linux-gnu
-checking whether pause instruction is compilable... yes
-checking number of significant virtual address bits... 48
-checking for ar... ar
-checking for nm... nm
-checking for gawk... no
-checking for mawk... mawk
-checking malloc.h usability... yes
-checking malloc.h presence... yes
-checking for malloc.h... yes
-checking whether malloc_usable_size definition can use const argument... no
-checking for library containing log... -lm
-checking whether __attribute__ syntax is compilable... yes
-checking whether compiler supports -fvisibility=hidden... yes
-checking whether compiler supports -fvisibility=hidden... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether tls_model attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether alloc_size attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(gnu_printf, ...) attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(printf, ...) attribute is compilable... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether format(printf, ...) attribute is compilable... yes
-checking for a BSD-compatible install... /usr/bin/install -c
-checking for ranlib... ranlib
-checking for ld... /usr/bin/ld
-checking for autoconf... /usr/bin/autoconf
-checking for memalign... yes
-checking for valloc... yes
-checking whether compiler supports -O3... yes
-checking whether compiler supports -O3... yes
-checking whether compiler supports -funroll-loops... yes
-checking configured backtracing method... N/A
-checking for sbrk... yes
-checking whether utrace(2) is compilable... no
-checking whether a program using __builtin_unreachable is compilable... yes
-checking whether a program using __builtin_ffsl is compilable... yes
-checking whether a program using __builtin_popcountl is compilable... yes
-checking LG_PAGE... 12
-checking pthread.h usability... yes
-checking pthread.h presence... yes
-checking for pthread.h... yes
-checking for pthread_create in -lpthread... yes
-checking dlfcn.h usability... yes
-checking dlfcn.h presence... yes
-checking for dlfcn.h... yes
-checking for dlsym... yes
-checking whether pthread_atfork(3) is compilable... yes
-checking whether pthread_setname_np(3) is compilable... yes
-checking for library containing clock_gettime... none required
-checking whether clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is compilable... yes
-checking whether clock_gettime(CLOCK_MONOTONIC, ...) is compilable... yes
-checking whether mach_absolute_time() is compilable... no
-checking whether compiler supports -Werror... yes
-checking whether syscall(2) is compilable... yes
-checking for secure_getenv... yes
-checking for sched_getcpu... yes
-checking for sched_setaffinity... yes
-checking for issetugid... no
-checking for _malloc_thread_cleanup... no
-checking for _pthread_mutex_init_calloc_cb... no
-checking for TLS... yes
-checking whether C11 atomics is compilable... no
-checking whether GCC __atomic atomics is compilable... yes
-checking whether GCC 8-bit __atomic atomics is compilable... yes
-checking whether GCC __sync atomics is compilable... yes
-checking whether GCC 8-bit __sync atomics is compilable... yes
-checking whether Darwin OSAtomic*() is compilable... no
-checking whether madvise(2) is compilable... yes
-checking whether madvise(..., MADV_FREE) is compilable... yes
-checking whether madvise(..., MADV_DONTNEED) is compilable... yes
-checking whether madvise(..., MADV_DO[NT]DUMP) is compilable... yes
-checking whether madvise(..., MADV_[NO]HUGEPAGE) is compilable... yes
-checking for __builtin_clz... yes
-checking whether Darwin os_unfair_lock_*() is compilable... no
-checking whether glibc malloc hook is compilable... no
-checking whether glibc memalign hook is compilable... no
-checking whether pthreads adaptive mutexes is compilable... yes
-checking whether compiler supports -D_GNU_SOURCE... yes
-checking whether compiler supports -Werror... yes
-checking whether compiler supports -herror_on_warning... yes
-checking whether strerror_r returns char with gnu source is compilable... yes
-checking for stdbool.h that conforms to C99... yes
-checking for _Bool... yes
-configure: creating ./config.status
-config.status: creating Makefile
-config.status: creating jemalloc.pc
-config.status: creating doc/html.xsl
-config.status: creating doc/manpages.xsl
-config.status: creating doc/jemalloc.xml
-config.status: creating include/jemalloc/jemalloc_macros.h
-config.status: creating include/jemalloc/jemalloc_protos.h
-config.status: creating include/jemalloc/jemalloc_typedefs.h
-config.status: creating include/jemalloc/internal/jemalloc_preamble.h
-config.status: creating test/test.sh
-config.status: creating test/include/test/jemalloc_test.h
-config.status: creating config.stamp
-config.status: creating bin/jemalloc-config
-config.status: creating bin/jemalloc.sh
-config.status: creating bin/jeprof
-config.status: creating include/jemalloc/jemalloc_defs.h
-config.status: creating include/jemalloc/internal/jemalloc_internal_defs.h
-config.status: creating test/include/test/jemalloc_test_defs.h
-config.status: executing include/jemalloc/internal/public_symbols.txt commands
-config.status: executing include/jemalloc/internal/private_symbols.awk commands
-config.status: executing include/jemalloc/internal/private_symbols_jet.awk commands
-config.status: executing include/jemalloc/internal/public_namespace.h commands
-config.status: executing include/jemalloc/internal/public_unnamespace.h commands
-config.status: executing include/jemalloc/jemalloc_protos_jet.h commands
-config.status: executing include/jemalloc/jemalloc_rename.h commands
-config.status: executing include/jemalloc/jemalloc_mangle.h commands
-config.status: executing include/jemalloc/jemalloc_mangle_jet.h commands
-config.status: executing include/jemalloc/jemalloc.h commands
-===============================================================================
-jemalloc version   : 5.2.1-0-g0
-library revision   : 2
-
-CONFIG             : --with-version=5.2.1-0-g0 --with-lg-quantum=3 --with-jemalloc-prefix=je_ 'CFLAGS=-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops ' LDFLAGS=
-CC                 : gcc
-CONFIGURE_CFLAGS   : -std=gnu11 -Wall -Wextra -Wsign-compare -Wundef -Wno-format-zero-length -pipe -g3 -fvisibility=hidden -O3 -funroll-loops
-SPECIFIED_CFLAGS   : -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops 
-EXTRA_CFLAGS       : 
-CPPFLAGS           : -D_GNU_SOURCE -D_REENTRANT
-CXX                : g++
-CONFIGURE_CXXFLAGS : -Wall -Wextra -g3 -fvisibility=hidden -O3
-SPECIFIED_CXXFLAGS : 
-EXTRA_CXXFLAGS     : 
-LDFLAGS            : 
-EXTRA_LDFLAGS      : 
-DSO_LDFLAGS        : -shared -Wl,-soname,$(@F)
-LIBS               : -lm -lstdc++ -pthread
-RPATH_EXTRA        : 
-
-XSLTPROC           : false
-XSLROOT            : 
-
-PREFIX             : /usr/local
-BINDIR             : /usr/local/bin
-DATADIR            : /usr/local/share
-INCLUDEDIR         : /usr/local/include
-LIBDIR             : /usr/local/lib
-MANDIR             : /usr/local/share/man
-
-srcroot            : 
-abs_srcroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
-objroot            : 
-abs_objroot        : /hugectr/notebooks/tmr/redis/deps/jemalloc/
-
-JEMALLOC_PREFIX    : je_
-JEMALLOC_PRIVATE_NAMESPACE
-                   : je_
-install_suffix     : 
-malloc_conf        : 
-documentation      : 1
-shared libs        : 1
-static libs        : 1
-autogen            : 0
-debug              : 0
-stats              : 1
-experimetal_smallocx : 0
-prof               : 0
-prof-libunwind     : 0
-prof-libgcc        : 0
-prof-gcc           : 0
-fill               : 1
-utrace             : 0
-xmalloc            : 0
-log                : 0
-lazy_lock          : 0
-cache-oblivious    : 1
-cxx                : 1
-===============================================================================
-cd jemalloc && make CFLAGS="-std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops " LDFLAGS="" lib/libjemalloc.a
-make[3]: Entering directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/jemalloc.sym.o src/jemalloc.c
-nm -a src/jemalloc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/jemalloc.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/arena.sym.o src/arena.c
-nm -a src/arena.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/arena.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/background_thread.sym.o src/background_thread.c
-nm -a src/background_thread.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/background_thread.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/base.sym.o src/base.c
-nm -a src/base.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/base.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bin.sym.o src/bin.c
-nm -a src/bin.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bin.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/bitmap.sym.o src/bitmap.c
-nm -a src/bitmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/bitmap.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ckh.sym.o src/ckh.c
-nm -a src/ckh.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ckh.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ctl.sym.o src/ctl.c
-nm -a src/ctl.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ctl.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/div.sym.o src/div.c
-nm -a src/div.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/div.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent.sym.o src/extent.c
-nm -a src/extent.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_dss.sym.o src/extent_dss.c
-nm -a src/extent_dss.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_dss.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/extent_mmap.sym.o src/extent_mmap.c
-nm -a src/extent_mmap.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/extent_mmap.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hash.sym.o src/hash.c
-nm -a src/hash.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hash.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/hook.sym.o src/hook.c
-nm -a src/hook.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/hook.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/large.sym.o src/large.c
-nm -a src/large.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/large.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/log.sym.o src/log.c
-nm -a src/log.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/log.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/malloc_io.sym.o src/malloc_io.c
-nm -a src/malloc_io.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/malloc_io.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex.sym.o src/mutex.c
-nm -a src/mutex.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/mutex_pool.sym.o src/mutex_pool.c
-nm -a src/mutex_pool.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/mutex_pool.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/nstime.sym.o src/nstime.c
-nm -a src/nstime.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/nstime.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/pages.sym.o src/pages.c
-nm -a src/pages.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/pages.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prng.sym.o src/prng.c
-nm -a src/prng.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prng.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/prof.sym.o src/prof.c
-nm -a src/prof.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/prof.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/rtree.sym.o src/rtree.c
-nm -a src/rtree.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/rtree.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/safety_check.sym.o src/safety_check.c
-nm -a src/safety_check.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/safety_check.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/stats.sym.o src/stats.c
-nm -a src/stats.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/stats.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sc.sym.o src/sc.c
-nm -a src/sc.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sc.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/sz.sym.o src/sz.c
-nm -a src/sz.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/sz.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tcache.sym.o src/tcache.c
-nm -a src/tcache.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tcache.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/test_hooks.sym.o src/test_hooks.c
-nm -a src/test_hooks.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/test_hooks.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/ticker.sym.o src/ticker.c
-nm -a src/ticker.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/ticker.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/tsd.sym.o src/tsd.c
-nm -a src/tsd.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/tsd.sym
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/witness.sym.o src/witness.c
-nm -a src/witness.sym.o | mawk -f include/jemalloc/internal/private_symbols.awk > src/witness.sym
-/bin/sh include/jemalloc/internal/private_namespace.sh src/jemalloc.sym src/arena.sym src/background_thread.sym src/base.sym src/bin.sym src/bitmap.sym src/ckh.sym src/ctl.sym src/div.sym src/extent.sym src/extent_dss.sym src/extent_mmap.sym src/hash.sym src/hook.sym src/large.sym src/log.sym src/malloc_io.sym src/mutex.sym src/mutex_pool.sym src/nstime.sym src/pages.sym src/prng.sym src/prof.sym src/rtree.sym src/safety_check.sym src/stats.sym src/sc.sym src/sz.sym src/tcache.sym src/test_hooks.sym src/ticker.sym src/tsd.sym src/witness.sym > include/jemalloc/internal/private_namespace.gen.h
-cp include/jemalloc/internal/private_namespace.gen.h include/jemalloc/internal/private_namespace.gen.h
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.o src/jemalloc.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/arena.o src/arena.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/base.o src/base.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bin.o src/bin.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/bitmap.o src/bitmap.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ckh.o src/ckh.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ctl.o src/ctl.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/div.o src/div.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent.o src/extent.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_dss.o src/extent_dss.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/extent_mmap.o src/extent_mmap.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hash.o src/hash.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/hook.o src/hook.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/large.o src/large.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/log.o src/log.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/malloc_io.o src/malloc_io.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex.o src/mutex.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/mutex_pool.o src/mutex_pool.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/nstime.o src/nstime.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/pages.o src/pages.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prng.o src/prng.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/prof.o src/prof.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/rtree.o src/rtree.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/safety_check.o src/safety_check.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/stats.o src/stats.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sc.o src/sc.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/sz.o src/sz.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tcache.o src/tcache.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/test_hooks.o src/test_hooks.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/ticker.o src/ticker.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/tsd.o src/tsd.c
-gcc -std=gnu99 -Wall -pipe -g3 -O3 -funroll-loops  -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/witness.o src/witness.c
-g++ -Wall -Wextra -g3 -fvisibility=hidden -O3 -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc_cpp.o src/jemalloc_cpp.cpp
-ar crus lib/libjemalloc.a src/jemalloc.o src/arena.o src/background_thread.o src/base.o src/bin.o src/bitmap.o src/ckh.o src/ctl.o src/div.o src/extent.o src/extent_dss.o src/extent_mmap.o src/hash.o src/hook.o src/large.o src/log.o src/malloc_io.o src/mutex.o src/mutex_pool.o src/nstime.o src/pages.o src/prng.o src/prof.o src/rtree.o src/safety_check.o src/stats.o src/sc.o src/sz.o src/tcache.o src/test_hooks.o src/ticker.o src/tsd.o src/witness.o src/jemalloc_cpp.o
-ar: `u' modifier ignored since `D' is the default (see `U')
-make[3]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps/jemalloc'
-make[2]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/deps'
-    CC adlist.o
-    CC quicklist.o
-    CC ae.o
-    CC anet.o
-    CC dict.o
-    CC server.o
-    CC sds.o
-    CC zmalloc.o
-    CC lzf_c.o
-    CC lzf_d.o
-    CC pqsort.o
-    CC zipmap.o
-    CC sha1.o
-    CC ziplist.o
-    CC release.o
-    CC networking.o
-    CC util.o
-    CC object.o
-    CC db.o
-    CC replication.o
-    CC rdb.o
-    CC t_string.o
-    CC t_list.o
-    CC t_set.o
-    CC t_zset.o
-    CC t_hash.o
-    CC config.o
-    CC aof.o
-    CC pubsub.o
-    CC multi.o
-    CC debug.o
-    CC sort.o
-    CC intset.o
-    CC syncio.o
-    CC cluster.o
-    CC crc16.o
-    CC endianconv.o
-    CC slowlog.o
-    CC eval.o
-    CC bio.o
-    CC rio.o
-    CC rand.o
-    CC memtest.o
-    CC syscheck.o
-    CC crcspeed.o
-    CC crc64.o
-    CC bitops.o
-    CC sentinel.o
-    CC notify.o
-    CC setproctitle.o
-    CC blocked.o
-    CC hyperloglog.o
-    CC latency.o
-    CC sparkline.o
-    CC redis-check-rdb.o
-    CC redis-check-aof.o
-    CC geo.o
-    CC lazyfree.o
-    CC module.o
-    CC evict.o
-    CC expire.o
-    CC geohash.o
-    CC geohash_helper.o
-    CC childinfo.o
-    CC defrag.o
-    CC siphash.o
-    CC rax.o
-    CC t_stream.o
-    CC listpack.o
-    CC localtime.o
-    CC lolwut.o
-    CC lolwut5.o
-    CC lolwut6.o
-    CC acl.o
-    CC tracking.o
-    CC connection.o
-    CC tls.o
-    CC sha256.o
-    CC timeout.o
-    CC setcpuaffinity.o
-    CC monotonic.o
-    CC mt19937-64.o
-    CC resp_parser.o
-    CC call_reply.o
-    CC script_lua.o
-    CC script.o
-    CC functions.o
-    CC function_lua.o
-    CC commands.o
-    LINK redis-server
-    INSTALL redis-sentinel
-    CC redis-cli.o
-    CC redisassert.o
-    CC cli_common.o
-    LINK redis-cli
-    CC redis-benchmark.o
-    LINK redis-benchmark
-    INSTALL redis-check-rdb
-    INSTALL redis-check-aof
-
-Hint: It's a good idea to run 'make test' ;)
-
-make[1]: Leaving directory '/hugectr/notebooks/tmr/redis-7.0.8/src'
-
-
-
-
-

If you see the message Hint: It's a good idea to run 'make test' ;) followed by make[1]: Leaving directory ..., the compilation should have completed successfully.

-

Step 2: Configure a mock Redis cluster

-

Setup TLS/SSL certificates. Can skip if encyryption is not needed.

-

WARNING: The following commands will erase the all contents in the following directories: test_certs, redis-server-1, redis-server-2 and redis-server-3.

-
-
-
!mkdir -p test_certs
-!rm -f test_certs/*
-
-with open("test_certs/openssl.conf", "w") as f:
-    f.write("""[ redis_server ]
-keyUsage = digitalSignature, keyEncipherment
-
-[ hugectr_client ]
-keyUsage = digitalSignature, keyEncipherment
-nsCertType = client""")
-    
-# Create private keys for CA, Redis server and HugeCTR client.
-!openssl genrsa -out test_certs/ca-private.pem 4096
-!openssl genrsa -out test_certs/redis-private.pem 4096
-!openssl genrsa -out test_certs/hugectr-private.pem 4096
-
-# Create public keys for CA, Redis server and HugeCTR client.
-#!openssl rsa -pubout -in test_certs/ca-private.pem -out test_certs/ca-public.pem
-#!openssl rsa -pubout -in test_certs/redis-private.pem -out test_certs/redis-public.pem
-#!openssl rsa -pubout -in test_certs/hugectr-private.pem -out test_certs/hugectr-public.pem
-
-# Form dummy CA.
-!openssl req -new -nodes -sha256 -x509 -subj '/O=NVIDIA Merlin/CN=Certificate Authority' -days 365 \
-    -key test_certs/ca-private.pem \
-    -out test_certs/ca.crt
-    
-# Generate certificate for Redis server.
-!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=Redis Server" \
-    -key test_certs/redis-private.pem | \
-        openssl x509 -req -sha256 \
-            -CA test_certs/ca.crt \
-            -CAkey test_certs/ca-private.pem \
-            -CAserial test_certs/redis.ser \
-            -CAcreateserial \
-            -days 365 \
-            -extfile test_certs/openssl.conf -extensions redis_server \
-            -out test_certs/redis.crt
-
-# Generate certificate for HugeCTR client.
-!openssl req -new -sha256 -subj "/O=NVIDIA Merlin/CN=HugeCTR Redis Client" \
-        -key test_certs/hugectr-private.pem | \
-        openssl x509 \
-            -req -sha256 \
-            -CA test_certs/ca.crt \
-            -CAkey test_certs/ca-private.pem \
-            -CAserial test_certs/hugectr.ser \
-            -CAcreateserial \
-            -days 365 \
-            -extfile test_certs/openssl.conf -extensions hugectr_client \
-            -out test_certs/hugectr.crt
-
-
-
-
-
Certificate request self-signature ok
-subject=O = NVIDIA Merlin, CN = Redis Server
-Certificate request self-signature ok
-subject=O = NVIDIA Merlin, CN = HugeCTR Redis Client
-
-
-
-
-
-
-
!mkdir -p redis-server-1 redis-server-2 redis-server-3
-!rm -f redis-server-1/* redis-server-2/* redis-server-3/*
-
-!ln -sf $PWD/redis/src/redis-server redis-server-1/redis-server
-!ln -sf $PWD/redis/src/redis-server redis-server-2/redis-server
-!ln -sf $PWD/redis/src/redis-server redis-server-3/redis-server
-
-!ln -sf $PWD/test_certs/ca.crt redis-server-1/ca.crt
-!ln -sf $PWD/test_certs/ca.crt redis-server-2/ca.crt
-!ln -sf $PWD/test_certs/ca.crt redis-server-3/ca.crt
-
-!ln -sf $PWD/test_certs/redis-private.pem redis-server-1/private.pem
-!ln -sf $PWD/test_certs/redis-private.pem redis-server-2/private.pem
-!ln -sf $PWD/test_certs/redis-private.pem redis-server-3/private.pem
-
-!ln -sf $PWD/test_certs/redis.crt redis-server-1/redis.crt
-!ln -sf $PWD/test_certs/redis.crt redis-server-2/redis.crt
-!ln -sf $PWD/test_certs/redis.crt redis-server-3/redis.crt
-
-
-
-
-
-
-
%%writefile redis-server-1/redis.conf
-daemonize yes
-port 0
-cluster-enabled yes
-cluster-config-file nodes.conf
-tls-port 7000
-tls-ca-cert-file ca.crt
-tls-cert-file redis.crt
-tls-key-file private.pem
-tls-cluster yes
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-1/redis.conf
-
-
-
-
-
-
-
%%writefile redis-server-2/redis.conf
-daemonize yes
-port 0
-cluster-enabled yes
-cluster-config-file nodes.conf
-tls-port 7001
-tls-ca-cert-file ca.crt
-tls-cert-file redis.crt
-tls-key-file private.pem
-tls-cluster yes
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-2/redis.conf
-
-
-
-
-
-
-
%%writefile redis-server-3/redis.conf
-daemonize yes
-port 0
-cluster-enabled yes
-cluster-config-file nodes.conf
-tls-port 7002
-tls-ca-cert-file ca.crt
-tls-cert-file redis.crt
-tls-key-file private.pem
-tls-cluster yes
-appendonly no
-save ""
-
-
-
-
-
Writing redis-server-3/redis.conf
-
-
-
-
-

Step 3: Form Redis cluster

-

WARNING: The following command will shutdown any processes called redis-cluster in the current system!

-
-
-
# Shutdown existing cluster (if any).
-!pkill redis-server
-
-# Reset configuration and start 3 Redis servers.
-!cd redis-server-1 && rm -f nodes.conf && ./redis-server redis.conf
-!cd redis-server-2 && rm -f nodes.conf && ./redis-server redis.conf
-!cd redis-server-3 && rm -f nodes.conf && ./redis-server redis.conf
-
-# Form the cluster.
-!redis/src/redis-cli \
-    --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
-    --cluster-yes \
-    --tls \
-    --cacert test_certs/ca.crt \
-    --cert test_certs/hugectr.crt \
-    --key test_certs/hugectr-private.pem
-
-
-
-
-
>>> Performing hash slots allocation on 3 nodes...
-Master[0] -> Slots 0 - 5460
-Master[1] -> Slots 5461 - 10922
-Master[2] -> Slots 10923 - 16383
-M: a441806db5506b7600ee8ae794fa01dc31ac83c9 127.0.0.1:7000
-   slots:[0-5460] (5461 slots) master
-M: 6fa93392a396aa3c321736234b7eafc86bb1f979 127.0.0.1:7001
-   slots:[5461-10922] (5462 slots) master
-M: 8e9cd68cc229fcb568a84d7358011201b4246046 127.0.0.1:7002
-   slots:[10923-16383] (5461 slots) master
->>> Nodes configuration updated
->>> Assign a different config epoch to each node
->>> Sending CLUSTER MEET messages to join the cluster
-Waiting for the cluster to join
-..
->>> Performing Cluster Check (using node 127.0.0.1:7000)
-M: a441806db5506b7600ee8ae794fa01dc31ac83c9 127.0.0.1:7000
-   slots:[0-5460] (5461 slots) master
-M: 8e9cd68cc229fcb568a84d7358011201b4246046 127.0.0.1:7002
-   slots:[10923-16383] (5461 slots) master
-M: 6fa93392a396aa3c321736234b7eafc86bb1f979 127.0.0.1:7001
-   slots:[5461-10922] (5462 slots) master
-[OK] All nodes agree about slots configuration.
->>> Check for open slots...
->>> Check slots coverage...
-[OK] All 16384 slots covered.
-
-
-
-
-
-

Step 4: Run HugeCTR

-
-
-
import os
-import time
-import multiprocessing as mp
-import pandas as pd
-import numpy as np
-import onnxruntime as ort
-from hugectr import DatabaseType_t
-from hugectr.inference import HPS, ParameterServerConfig, InferenceParams, VolatileDatabaseParams
-
-slot_size_array = [10000, 10000, 10000, 10000]
-key_offset = np.insert(np.cumsum(slot_size_array), 0, 0)[:-1]
-batch_size = 1024
-
-print('Launching...')
-
-# 1. Configure the HPS hyperparameters.
-ps_config = ParameterServerConfig(
-       emb_table_name = {'hps_demo': ['sparse_embedding1', 'sparse_embedding2']},
-       embedding_vec_size = {'hps_demo': [16, 32]},
-       max_feature_num_per_sample_per_emb_table = {'hps_demo': [2, 2]},
-       inference_params_array = [
-          InferenceParams(
-            model_name = 'hps_demo',
-            max_batchsize = batch_size,
-            hit_rate_threshold = 1.0,
-            dense_model_file = '',
-            sparse_model_files = ['hps_demo0_sparse_1000.model', 'hps_demo1_sparse_1000.model'],
-            deployed_devices = [0],
-            use_gpu_embedding_cache = True,
-            cache_size_percentage = 0.5,
-            i64_input_key = True)
-       ],
-       volatile_db = VolatileDatabaseParams(
-            DatabaseType_t.redis_cluster,
-            address = '127.0.0.1:7000',
-            num_partitions = 15,
-            num_node_connections = 5,
-            enable_tls = True,
-            tls_ca_certificate = 'test_certs/ca.crt',
-            tls_client_certificate = 'test_certs/hugectr.crt',
-            tls_client_key = 'test_certs/hugectr-private.pem',
-            tls_server_name_identification = 'redis.localhost',
-       ))
-
-# 2. Initialize the HPS object.
-hps = HPS(ps_config)
-print('HPS initialized')
-
-# 3. Load query data.
-df = pd.read_parquet('data_parquet/val/gen_0.parquet')
-dense_input_columns = df.columns[1:11]
-cat_input1_columns = df.columns[11:13]
-cat_input2_columns = df.columns[13:15]
-dense_input = df[dense_input_columns].loc[0:batch_size-1].to_numpy(dtype=np.float32)
-cat_input1 = (df[cat_input1_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[0:2]).reshape((batch_size, 2, 1))
-cat_input2 = (df[cat_input2_columns].loc[0:batch_size-1].to_numpy(dtype=np.int64) + key_offset[2:4]).reshape((batch_size, 2, 1))
-
-# 4. Make inference from the HPS object and the ONNX inference session of `hps_demo_without_embedding.onnx`.
-embedding1 = hps.lookup(cat_input1.flatten(), 'hps_demo', 0).reshape(batch_size, 2, 16)
-embedding2 = hps.lookup(cat_input2.flatten(), 'hps_demo', 1).reshape(batch_size, 2, 32)
-sess = ort.InferenceSession('hps_demo_without_embedding.onnx')
-res = sess.run(output_names=[sess.get_outputs()[0].name],
-               input_feed={sess.get_inputs()[0].name: dense_input,
-               sess.get_inputs()[1].name: embedding1,
-               sess.get_inputs()[2].name: embedding2})
-pred = res[0].flatten()
-
-# 5. Check the correctness by comparing with dumped evaluation results.
-ground_truth = np.load("ground_truth.npy").flatten()
-print('-------------------------------------------------------------------------------')
-print('                         HPS demo without embedding                            ')
-print('-------------------------------------------------------------------------------')
-print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
-print('-------------------------------------------------------------------------------')
-print(f'Prediction without embedding: {pred.shape} = {pred}')
-
-diff = pred - ground_truth
-mse = np.mean(diff * diff)
-print(f'MSE between prediction and ground_truth: {mse}')
-
-# 6. Make inference with the ONNX inference session of `hps_demo_with_embedding.onnx` (double check).
-sess_ref = ort.InferenceSession('hps_demo_with_embedding.onnx')
-res_ref = sess_ref.run(output_names=[sess_ref.get_outputs()[0].name],
-               input_feed={sess_ref.get_inputs()[0].name: dense_input,
-               sess_ref.get_inputs()[1].name: cat_input1,
-               sess_ref.get_inputs()[2].name: cat_input2})
-pred_ref = res_ref[0].flatten()
-
-print('-------------------------------------------------------------------------------')
-print('                           HPS demo with embedding                             ')
-print('-------------------------------------------------------------------------------')
-print(f'Ground truth: {ground_truth.shape} = {ground_truth}')
-print('-------------------------------------------------------------------------------')
-print(f'Prediction with embedding: {pred_ref.shape} = {pred_ref}')
-
-diff_ref = pred_ref.flatten() - ground_truth
-mse_ref = np.mean(diff_ref * diff_ref)
-print(f'MSE between prediction and ground_truth: {mse_ref}')
-
-
-
-
-
Launching...
-[HCTR][07:00:07.643][WARNING][RK0][main]: default_value_for_each_table.size() is not equal to the number of embedding tables
-HPS initialized
-====================================================HPS Create====================================================
-[HCTR][07:00:07.643][INFO][RK0][main]: Creating RedisCluster backend...
-[HCTR][07:00:07.644][INFO][RK0][main]: RedisCluster: Connecting via 127.0.0.1:7000...
-[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: initial cache rate = 1
-[HCTR][07:00:07.667][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
-[HCTR][07:00:07.667][DEBUG][RK0][main]: Created raw model loader in local memory!
-[HCTR][07:00:07.894][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding1; cached 18488 / 18488 embeddings in volatile database (RedisCluster); load: 18488 / 18446744073709551615 (0.00%).
-[HCTR][07:00:07.984][INFO][RK0][main]: Table: hps_et.hps_demo.sparse_embedding2; cached 18470 / 18470 embeddings in volatile database (RedisCluster); load: 18470 / 18446744073709551615 (0.00%).
-[HCTR][07:00:07.984][DEBUG][RK0][main]: Real-time subscribers created!
-[HCTR][07:00:07.984][INFO][RK0][main]: Creating embedding cache in device 0.
-[HCTR][07:00:07.990][INFO][RK0][main]: Model name: hps_demo
-[HCTR][07:00:07.990][INFO][RK0][main]: Max batch size: 1024
-[HCTR][07:00:07.990][INFO][RK0][main]: Fuse embedding tables: False
-[HCTR][07:00:07.990][INFO][RK0][main]: Number of embedding tables: 2
-[HCTR][07:00:07.990][INFO][RK0][main]: Use GPU embedding cache: True, cache size percentage: 0.500000
-[HCTR][07:00:07.990][INFO][RK0][main]: Embedding cache type: dynamic
-[HCTR][07:00:07.990][INFO][RK0][main]: Use I64 input key: True
-[HCTR][07:00:07.990][INFO][RK0][main]: Configured cache hit rate threshold: 1.000000
-[HCTR][07:00:07.990][INFO][RK0][main]: The size of thread pool: 80
-[HCTR][07:00:07.990][INFO][RK0][main]: The size of worker memory pool: 2
-[HCTR][07:00:07.990][INFO][RK0][main]: The size of refresh memory pool: 1
-[HCTR][07:00:07.990][INFO][RK0][main]: The refresh percentage : 0.000000
-[HCTR][07:00:07.995][INFO][RK0][main]: LookupSession i64_input_key: True
-[HCTR][07:00:07.995][INFO][RK0][main]: Creating lookup session for hps_demo on device: 0
-[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Awaiting background worker to conclude...
-[HCTR][07:00:07.998][INFO][RK0][main]: RedisCluster: Disconnecting...
--------------------------------------------------------------------------------
-                         HPS demo without embedding                            
--------------------------------------------------------------------------------
-Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
--------------------------------------------------------------------------------
-Prediction without embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
-MSE between prediction and ground_truth: 2.3887142264200634e-15
--------------------------------------------------------------------------------
-                           HPS demo with embedding                             
--------------------------------------------------------------------------------
-Ground truth: (1024,) = [0.4895492  0.509022   0.38192913 ... 0.5264926  0.50650454 0.47927693]
--------------------------------------------------------------------------------
-Prediction with embedding: (1024,) = [0.48954916 0.50902206 0.38192907 ... 0.52649266 0.5065045  0.4792769 ]
-MSE between prediction and ground_truth: 2.3887142264200634e-15
-
-
-
2023-09-20 07:00:08.022623188 [W:onnxruntime:, graph.cc:3543 CleanUnusedInitializersAndNodeArgs] Removing initializer 'key_to_indice_hash_all_tables'. It is not used by any node and should be removed from the model.
-
-
-
-
-

Step 5: Shutdown Redis cluster

-
-
-
!pkill redis-server
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/hugectr_e2e_demo_with_nvtabular.html b/review/pr-458/notebooks/hugectr_e2e_demo_with_nvtabular.html deleted file mode 100644 index b054433bff..0000000000 --- a/review/pr-458/notebooks/hugectr_e2e_demo_with_nvtabular.html +++ /dev/null @@ -1,1362 +0,0 @@ - - - - - - - HugeCTR End-end Example with NVTabular — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_training-with-hdfs/nvidia_logo.png -
-

HugeCTR End-end Example with NVTabular

-
-

Overview

-

In this sample notebook, we are going to:

-
    -
  1. Preprocess data using NVTabular

  2. -
  3. Training model with HugeCTR

  4. -
-
-
-

Setup

-

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

-
-
-

Data Preparation

-
-
-
import os
-import shutil
-
-
-
-
-
-
-
!mkdir -p /hugectr_e2e
-!mkdir -p /hugectr_e2e/criteo/train
-!mkdir -p /hugectr_e2e/criteo/val
-!mkdir -p /hugectr_e2e/model
-
-
-
-
-
-
-
BASE_DIR = os.environ.get("BASE_DIR", "/hugectr_e2e")
-DATA_DIR = os.environ.get("DATA_DIR", BASE_DIR + "/criteo")
-TRAIN_DIR = os.environ.get("TRAIN_DIR", DATA_DIR +"/train")
-VAL_DIR = os.environ.get("VAL_DIR", DATA_DIR +"/val")
-MODEL_DIR = os.environ.get("MODEL_DIR", BASE_DIR + "/model")
-
-
-
-
-

Download the Criteo data for 1 day:

-
-
-
#!wget -P $DATA_DIR https://storage.googleapis.com/criteo-cail-datasets/day_0.gz  #decomment this line to download, otherwise soft link the data.
-#!gzip -d -c $DATA_DIR/day_0.gz > $DATA_DIR/day_0
-INPUT_DATA = os.environ.get("INPUT_DATA", DATA_DIR + "/day_0")
-!ln -s $INPUT_DATA $DATA_DIR/day_0
-
-
-
-
-
ln: failed to create symbolic link '/hugectr_e2e/criteo/day_0': File exists
-
-
-
-
-

Unzip and split data

-
-
-
!head -n 10000000 $DATA_DIR/day_0 > $DATA_DIR/train/train.txt
-!tail -n 2000000 $DATA_DIR/day_0 > $DATA_DIR/val/test.txt 
-
-
-
-
-
-
-

Data Preprocessing using NVTabular

-
-
-
import warnings
-
-warnings.filterwarnings("ignore")
-warnings.simplefilter("ignore", UserWarning)
-
-import os
-import sys
-import argparse
-import glob
-import time
-import numpy as np
-import shutil
-import numba
-
-import dask_cudf
-from dask_cuda import LocalCUDACluster
-from dask.distributed import Client
-
-import nvtabular as nvt
-from merlin.core.compat import device_mem_size, pynvml_mem_size
-from nvtabular.ops import (
-    Categorify,
-    Clip,
-    FillMissing,
-    Normalize,
-    get_embedding_sizes,
-)
-
-import logging
-
-logging.basicConfig(format="%(asctime)s %(message)s")
-logging.root.setLevel(logging.NOTSET)
-logging.getLogger("numba").setLevel(logging.WARNING)
-logging.getLogger("asyncio").setLevel(logging.WARNING)
-
-# define dataset schema
-CATEGORICAL_COLUMNS=["C" + str(x) for x in range(1, 27)]
-CONTINUOUS_COLUMNS=["I" + str(x) for x in range(1, 14)]
-LABEL_COLUMNS = ['label']
-COLUMNS =  LABEL_COLUMNS + CONTINUOUS_COLUMNS +  CATEGORICAL_COLUMNS
-#/samples/criteo mode doesn't have dense features
-criteo_COLUMN=LABEL_COLUMNS +  CATEGORICAL_COLUMNS
-#For new feature cross columns
-CROSS_COLUMNS = ["C1_C2", "C3_C4"]
-
-NUM_INTEGER_COLUMNS = 13
-NUM_CATEGORICAL_COLUMNS = 26
-NUM_TOTAL_COLUMNS = 1 + NUM_INTEGER_COLUMNS + NUM_CATEGORICAL_COLUMNS
-
-
-
-
-
-
-
# Dask dashboard
-dashboard_port = "8787"
-
-# Deploy a Single-Machine Multi-GPU Cluster
-protocol = "tcp"  # "tcp" or "ucx"
-if numba.cuda.is_available():
-    NUM_GPUS = list(range(len(numba.cuda.gpus)))
-else:
-    NUM_GPUS = []
-visible_devices = ",".join([str(n) for n in NUM_GPUS])  # Delect devices to place workers
-device_limit_frac = 0.7  # Spill GPU-Worker memory to host at this limit.
-device_pool_frac = 0.8
-part_mem_frac = 0.15
-
-# Use total device size to calculate args.device_limit_frac
-device_size = device_mem_size(kind="total")
-device_limit = int(device_limit_frac * device_size)
-device_pool_size = int(device_pool_frac * device_size)
-part_size = int(part_mem_frac * device_size)
-
-# Check if any device memory is already occupied
-for dev in visible_devices.split(","):
-    fmem = pynvml_mem_size(kind="free", index=int(dev))
-    used = (device_size - fmem) / 1e9
-    if used > 1.0:
-        warnings.warn(f"BEWARE - {used} GB is already occupied on device {int(dev)}!")
-
-cluster = None  # (Optional) Specify existing scheduler port
-if cluster is None:
-    cluster = LocalCUDACluster(
-        protocol=protocol,
-        n_workers=len(visible_devices.split(",")),
-        CUDA_VISIBLE_DEVICES=visible_devices,
-        device_memory_limit=device_limit,
-        dashboard_address=":" + dashboard_port,
-        rmm_pool_size=(device_pool_size // 256) * 256
-    )
-
-# Create the distributed client
-client = Client(cluster)
-client
-
-
-
-
-
2023-05-26 03:09:04,061 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,061 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,062 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,062 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,063 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,063 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,064 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,064 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,072 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,072 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,085 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,085 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,087 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,087 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-2023-05-26 03:09:04,093 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
-2023-05-26 03:09:04,093 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
-
-
-
-
-
-

Client

-

Client-acc90f7f-fb72-11ed-808f-54ab3adac0a5

- - - - - - - - - - - - - - - - -
Connection method: Cluster objectCluster type: dask_cuda.LocalCUDACluster
- Dashboard: http://127.0.0.1:8787/status -
- - - - -
-

Cluster Info

- -
- - -
-
-
-
-
-
train_output = os.path.join(DATA_DIR, "train")
-print("Training output data: "+train_output)
-val_output = os.path.join(DATA_DIR, "val")
-print("Validation output data: "+val_output)
-train_input = os.path.join(DATA_DIR, "train/train.txt")
-print("Training dataset: "+train_input)
-val_input = os.path.join(DATA_DIR, "val/test.txt")
-PREPROCESS_DIR_temp_train = os.path.join(DATA_DIR, 'train/temp-parquet-after-conversion')  
-PREPROCESS_DIR_temp_val = os.path.join(DATA_DIR, "val/temp-parquet-after-conversion")
-if not os.path.exists(PREPROCESS_DIR_temp_train):
-    os.makedirs(PREPROCESS_DIR_temp_train)
-
-if not os.path.exists(PREPROCESS_DIR_temp_val):
-    os.makedirs(PREPROCESS_DIR_temp_val)
-
-PREPROCESS_DIR_temp = [PREPROCESS_DIR_temp_train, PREPROCESS_DIR_temp_val]
-
-# Make sure we have a clean parquet space for cudf conversion
-for one_path in PREPROCESS_DIR_temp:
-    if os.path.exists(one_path):
-        shutil.rmtree(one_path)
-    os.mkdir(one_path)
-
-#calculate the total processing time
-runtime = time.time()
-
-## train/valid txt to parquet
-train_valid_paths = [(train_input,PREPROCESS_DIR_temp_train),(val_input,PREPROCESS_DIR_temp_val)]
-
-for input, temp_output in train_valid_paths:
-
-    ddf = dask_cudf.read_csv(input,sep='\t',names=LABEL_COLUMNS + CONTINUOUS_COLUMNS + CATEGORICAL_COLUMNS)
-    
-    if CROSS_COLUMNS:
-        for pair in CROSS_COLUMNS:
-            feature_pair = pair.split("_")
-            ddf[pair] = ddf[feature_pair[0]] + ddf[feature_pair[1]]
-
-    ddf["label"] = ddf['label'].astype('float32')
-    ddf[CONTINUOUS_COLUMNS] = ddf[CONTINUOUS_COLUMNS].astype('float32')
-
-    # Save it as parquet format for better memory usage
-    ddf.to_parquet(temp_output,header=True)
-    ##-----------------------------------##
-
-COLUMNS =  LABEL_COLUMNS + CONTINUOUS_COLUMNS + CROSS_COLUMNS + CATEGORICAL_COLUMNS
-train_paths = glob.glob(os.path.join(PREPROCESS_DIR_temp_train, "*.parquet"))
-valid_paths = glob.glob(os.path.join(PREPROCESS_DIR_temp_val, "*.parquet"))
-
-categorify_op = Categorify()
-cat_features = CATEGORICAL_COLUMNS >> categorify_op
-cont_features = CONTINUOUS_COLUMNS >> FillMissing() >> Clip(min_value=0) >> Normalize()
-cross_cat_op = Categorify()
-
-features = LABEL_COLUMNS
-
-features += cont_features
-if CROSS_COLUMNS:
-    for pair in CROSS_COLUMNS:
-        features += [pair] >> cross_cat_op
-
-features += cat_features
-
-workflow = nvt.Workflow(features, client=client)
-
-logging.info("Preprocessing")
-
-output_format = 'parquet'
-
-# just for /samples/criteo model
-train_ds_iterator = nvt.Dataset(train_paths, engine='parquet')
-valid_ds_iterator = nvt.Dataset(valid_paths, engine='parquet')
-
-shuffle = nvt.io.Shuffle.PER_PARTITION
-
-logging.info('Train Datasets Preprocessing.....')
-
-dict_dtypes = {}
-for col in CATEGORICAL_COLUMNS:
-    dict_dtypes[col] = np.int64
-for col in CONTINUOUS_COLUMNS:
-    dict_dtypes[col] = np.float32
-for col in CROSS_COLUMNS:
-    dict_dtypes[col] = np.int64
-for col in LABEL_COLUMNS:
-    dict_dtypes[col] = np.float32
-
-conts = CONTINUOUS_COLUMNS
-
-workflow.fit(train_ds_iterator)
-
-if output_format == 'hugectr':
-    workflow.transform(train_ds_iterator).to_hugectr(
-            cats=CATEGORICAL_COLUMNS + CROSS_COLUMNS,
-            conts=conts,
-            labels=LABEL_COLUMNS,
-            output_path=train_output,
-            shuffle=shuffle)
-else:
-    workflow.transform(train_ds_iterator).to_parquet(
-            output_path=train_output,
-            dtypes=dict_dtypes,
-            cats=CATEGORICAL_COLUMNS + CROSS_COLUMNS,
-            conts=conts,
-            labels=LABEL_COLUMNS,
-            shuffle=shuffle)
-
-###Getting slot size###    
-#--------------------##
-embeddings_dict_cat = categorify_op.get_embedding_sizes(CATEGORICAL_COLUMNS)
-embeddings_dict_cross = cross_cat_op.get_embedding_sizes(CROSS_COLUMNS)
-embeddings = [embeddings_dict_cat[c][0] for c in CATEGORICAL_COLUMNS] + [embeddings_dict_cross[c][0] for c in CROSS_COLUMNS]
-
-print(embeddings)
-##--------------------##
-
-logging.info('Valid Datasets Preprocessing.....')
-
-if output_format == 'hugectr':
-    workflow.transform(valid_ds_iterator).to_hugectr(
-            cats=CATEGORICAL_COLUMNS + CROSS_COLUMNS,
-            conts=conts,
-            labels=LABEL_COLUMNS,
-            output_path=val_output,
-            shuffle=shuffle)
-else:
-    workflow.transform(valid_ds_iterator).to_parquet(
-            output_path=val_output,
-            dtypes=dict_dtypes,
-            cats=CATEGORICAL_COLUMNS + CROSS_COLUMNS,
-            conts=conts,
-            labels=LABEL_COLUMNS,
-            shuffle=shuffle)
-
-embeddings_dict_cat = categorify_op.get_embedding_sizes(CATEGORICAL_COLUMNS)
-embeddings_dict_cross = cross_cat_op.get_embedding_sizes(CROSS_COLUMNS)
-embeddings = [embeddings_dict_cat[c][0] for c in CATEGORICAL_COLUMNS] + [embeddings_dict_cross[c][0] for c in CROSS_COLUMNS]
-
-print(embeddings)
-##--------------------##
-
-## Shutdown clusters
-client.shutdown()
-
-runtime = time.time() - runtime
-
-print("\nDask-NVTabular Criteo Preprocessing Done!")
-print(f"Runtime[s]         | {runtime}")
-print("======================================\n")
-
-
-
-
-
Training output data: /hugectr_e2e/criteo/train
-Validation output data: /hugectr_e2e/criteo/val
-Training dataset: /hugectr_e2e/criteo/train/train.txt
-
-
-
2023-05-26 03:09:49,967 Preprocessing
-2023-05-26 03:09:50,513 Train Datasets Preprocessing.....
-2023-05-26 03:09:57,544 Valid Datasets Preprocessing.....
-
-
-
[1234907, 19683, 13780, 6867, 18490, 4, 6264, 1235, 50, 854680, 114026, 75736, 11, 2159, 7533, 61, 4, 919, 15, 1307783, 404742, 1105613, 87714, 9032, 77, 34, 1577645, 1093030]
-[1234907, 19683, 13780, 6867, 18490, 4, 6264, 1235, 50, 854680, 114026, 75736, 11, 2159, 7533, 61, 4, 919, 15, 1307783, 404742, 1105613, 87714, 9032, 77, 34, 1577645, 1093030]
-
-Dask-NVTabular Criteo Preprocessing Done!
-Runtime[s]         | 11.187256813049316
-======================================
-
-
-
-
-
-
-
### Record the slot size array
-SLOT_SIZE_ARRAY = embeddings
-
-
-
-
-
-
-

Training a WDL model with HugeCTR

-
-
-
%%writefile './train.py'
-import hugectr
-import os
-import argparse
-from mpi4py import MPI
-parser = argparse.ArgumentParser(description=("HugeCTR Training"))
-parser.add_argument("--data_path", type=str, help="Input dataset path (Required)")
-parser.add_argument("--model_path", type=str, help="Directory path to write output (Required)")
-args = parser.parse_args()
-SLOT_SIZE_ARRAY = [1234907, 19683, 13780, 6867, 18490, 4, 6264, 1235, 50, 854680, 114026, 75736, 11, 2159, 7533, 61, 4, 919, 15, 1307783, 404742, 1105613, 87714, 9032, 77, 34, 1581605, 1093030]
-
-solver = hugectr.CreateSolver(max_eval_batches = 4000,
-                              batchsize_eval = 2720,
-                              batchsize = 2720,
-                              lr = 0.001,
-                              vvgpu = [[0]],
-                              repeat_dataset = True,
-                              i64_input_key = True)
-
-reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = [os.path.join(args.data_path, "train/_file_list.txt")],
-                                  eval_source = os.path.join(args.data_path, "val/_file_list.txt"),
-                                  check_type = hugectr.Check_t.Non,
-                                  slot_size_array = SLOT_SIZE_ARRAY)
-optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
-                                    update_type = hugectr.Update_t.Global,
-                                    beta1 = 0.9,
-                                    beta2 = 0.999,
-                                    epsilon = 0.0000001)
-model = hugectr.Model(solver, reader, optimizer)
-
-model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 13, dense_name = "dense",
-                        data_reader_sparse_param_array = 
-                        [hugectr.DataReaderSparseParam("wide_data", 1, True, 2),
-                        hugectr.DataReaderSparseParam("deep_data", 2, False, 26)]))
-
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
-                            workspace_size_per_gpu_in_mb = 80,
-                            embedding_vec_size = 1,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding2",
-                            bottom_name = "wide_data",
-                            optimizer = optimizer))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
-                            workspace_size_per_gpu_in_mb = 1350,
-                            embedding_vec_size = 16,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding1",
-                            bottom_name = "deep_data",
-                            optimizer = optimizer))
-
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding1"],
-                            top_names = ["reshape1"],
-                            leading_dim=416))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding2"],
-                            top_names = ["reshape2"],
-                            leading_dim=2))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReduceSum,
-                            bottom_names = ["reshape2"],
-                            top_names = ["wide_redn"],
-                            axis = 1))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
-                            bottom_names = ["reshape1", "dense"],
-                            top_names = ["concat1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["concat1"],
-                            top_names = ["fc1"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc1"],
-                            top_names = ["relu1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
-                            bottom_names = ["relu1"],
-                            top_names = ["dropout1"],
-                            dropout_rate=0.5))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["dropout1"],
-                            top_names = ["fc2"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc2"],
-                            top_names = ["relu2"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
-                            bottom_names = ["relu2"],
-                            top_names = ["dropout2"],
-                            dropout_rate=0.5))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["dropout2"],
-                            top_names = ["fc3"],
-                            num_output=1))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
-                            bottom_names = ["fc3", "wide_redn"],
-                            top_names = ["add1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
-                            bottom_names = ["add1", "label"],
-                            top_names = ["loss"]))
-model.compile()
-model.summary()
-model.fit(max_iter = 21000, display = 1000, eval_interval = 4000, snapshot = 20000, snapshot_prefix = os.path.join(args.model_path, "wdl/"))
-model.graph_to_json(graph_config_file = os.path.join(args.model_path, "wdl.json"))
-
-
-
-
-
Writing ./train.py
-
-
-
-
-
-
-
!python train.py --data_path $DATA_DIR --model_path $MODEL_DIR
-
-
-
-
-
MpiInitService: MPI was already initialized by another (non-HugeCTR) mechanism.
-HugeCTR Version: 23.4
-====================================================Model Init=====================================================
-[HCTR][03:10:28.412][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][03:10:28.413][INFO][RK0][main]: Global seed is 4031005480
-[HCTR][03:10:29.069][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-[HCTR][03:10:32.353][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][03:10:32.353][DEBUG][RK0][main]: [device 0] allocating 0.0000 GB, available 29.9792 
-[HCTR][03:10:32.353][INFO][RK0][main]: Start all2all warmup
-[HCTR][03:10:32.353][INFO][RK0][main]: End all2all warmup
-[HCTR][03:10:32.355][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][03:10:32.361][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][03:10:32.362][INFO][RK0][main]: eval source /hugectr_e2e/criteo/val/_file_list.txt max_row_group_size 475000
-[HCTR][03:10:32.364][INFO][RK0][main]: train source /hugectr_e2e/criteo/train/_file_list.txt max_row_group_size 475000
-[HCTR][03:10:32.364][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][03:10:32.364][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][03:10:32.365][DEBUG][RK0][main]: [device 0] allocating 0.0018 GB, available 29.7234 
-[HCTR][03:10:32.366][DEBUG][RK0][main]: [device 0] allocating 0.0018 GB, available 29.7175 
-[HCTR][03:10:32.379][INFO][RK0][main]: Vocabulary size: 7946054
-[HCTR][03:10:32.380][INFO][RK0][main]: max_vocabulary_size_per_gpu_=6990506
-[HCTR][03:10:32.391][DEBUG][RK0][main]: [device 0] allocating 0.0788 GB, available 28.3132 
-[HCTR][03:10:32.392][INFO][RK0][main]: max_vocabulary_size_per_gpu_=7372800
-[HCTR][03:10:32.396][DEBUG][RK0][main]: [device 0] allocating 1.3516 GB, available 26.5847 
-[HCTR][03:10:32.397][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][03:10:32.408][DEBUG][RK0][main]: [device 0] allocating 0.2162 GB, available 26.3523 
-[HCTR][03:10:32.409][DEBUG][RK0][main]: [device 0] allocating 0.0056 GB, available 26.3464 
-[HCTR][03:10:40.869][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][03:10:40.869][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][03:10:40.869][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][03:10:40.873][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][03:10:40.873][DEBUG][RK0][main]: [device 0] allocating 0.0001 GB, available 26.3464 
-[HCTR][03:10:40.874][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][03:10:40.879][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][03:10:40.879][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          wide_data,deep_data           
-(2720,1)                                (2720,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      wide_data                     sparse_embedding2             (2720,2,1)                    
-------------------------------------------------------------------------------------------------------------------
-DistributedSlotSparseEmbeddingHash      deep_data                     sparse_embedding1             (2720,26,16)                  
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding1             reshape1                      (2720,416)                    
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding2             reshape2                      (2720,2)                      
-------------------------------------------------------------------------------------------------------------------
-ReduceSum                               reshape2                      wide_redn                     (2720,1)                      
-------------------------------------------------------------------------------------------------------------------
-Concat                                  reshape1                      concat1                       (2720,429)                    
-                                        dense                                                                                     
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            concat1                       fc1                           (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Dropout                                 relu1                         dropout1                      (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dropout1                      fc2                           (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Dropout                                 relu2                         dropout2                      (2720,1024)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dropout2                      fc3                           (2720,1)                      
-------------------------------------------------------------------------------------------------------------------
-Add                                     fc3                           add1                          (2720,1)                      
-                                        wide_redn                                                                                 
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  add1                          loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][03:10:40.879][INFO][RK0][main]: Use non-epoch mode with number of iterations: 21000
-[HCTR][03:10:40.879][INFO][RK0][main]: Training batchsize: 2720, evaluation batchsize: 2720
-[HCTR][03:10:40.879][INFO][RK0][main]: Evaluation interval: 4000, snapshot interval: 20000
-[HCTR][03:10:40.879][INFO][RK0][main]: Dense network trainable: True
-[HCTR][03:10:40.879][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][03:10:40.879][INFO][RK0][main]: Sparse embedding sparse_embedding2 trainable: True
-[HCTR][03:10:40.879][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][03:10:40.879][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][03:10:40.879][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][03:10:40.879][INFO][RK0][main]: Training source file: /hugectr_e2e/criteo/train/_file_list.txt
-[HCTR][03:10:40.879][INFO][RK0][main]: Evaluation source file: /hugectr_e2e/criteo/val/_file_list.txt
-[HCTR][03:10:49.588][INFO][RK0][main]: Iter: 1000 Time(1000 iters): 8.70458s Loss: 0.124098 lr:0.001
-[HCTR][03:10:58.211][INFO][RK0][main]: Iter: 2000 Time(1000 iters): 8.6176s Loss: 0.130088 lr:0.001
-[HCTR][03:11:06.835][INFO][RK0][main]: Iter: 3000 Time(1000 iters): 8.61959s Loss: 0.101731 lr:0.001
-[HCTR][03:11:15.449][INFO][RK0][main]: Iter: 4000 Time(1000 iters): 8.61009s Loss: 0.110557 lr:0.001
-[HCTR][03:11:19.929][INFO][RK0][main]: Evaluation, AUC: 0.738497
-[HCTR][03:11:19.929][INFO][RK0][main]: Eval Time for 4000 iters: 4.47924s
-[HCTR][03:11:28.559][INFO][RK0][main]: Iter: 5000 Time(1000 iters): 13.1046s Loss: 0.10236 lr:0.001
-[HCTR][03:11:37.182][INFO][RK0][main]: Iter: 6000 Time(1000 iters): 8.61852s Loss: 0.102157 lr:0.001
-[HCTR][03:11:45.771][INFO][RK0][main]: Iter: 7000 Time(1000 iters): 8.58452s Loss: 0.123451 lr:0.001
-[HCTR][03:11:54.385][INFO][RK0][main]: Iter: 8000 Time(1000 iters): 8.61023s Loss: 0.122763 lr:0.001
-[HCTR][03:11:58.867][INFO][RK0][main]: Evaluation, AUC: 0.698276
-[HCTR][03:11:58.867][INFO][RK0][main]: Eval Time for 4000 iters: 4.48087s
-[HCTR][03:12:07.487][INFO][RK0][main]: Iter: 9000 Time(1000 iters): 13.097s Loss: 0.0999177 lr:0.001
-[HCTR][03:12:16.103][INFO][RK0][main]: Iter: 10000 Time(1000 iters): 8.61106s Loss: 0.0999892 lr:0.001
-[HCTR][03:12:24.722][INFO][RK0][main]: Iter: 11000 Time(1000 iters): 8.61545s Loss: 0.0883301 lr:0.001
-[HCTR][03:12:33.348][INFO][RK0][main]: Iter: 12000 Time(1000 iters): 8.62134s Loss: 0.0828304 lr:0.001
-[HCTR][03:12:37.823][INFO][RK0][main]: Evaluation, AUC: 0.688598
-[HCTR][03:12:37.823][INFO][RK0][main]: Eval Time for 4000 iters: 4.4733s
-[HCTR][03:12:46.425][INFO][RK0][main]: Iter: 13000 Time(1000 iters): 13.0717s Loss: 0.108287 lr:0.001
-[HCTR][03:12:55.059][INFO][RK0][main]: Iter: 14000 Time(1000 iters): 8.62997s Loss: 0.0745141 lr:0.001
-[HCTR][03:13:03.671][INFO][RK0][main]: Iter: 15000 Time(1000 iters): 8.60764s Loss: 0.0720452 lr:0.001
-[HCTR][03:13:12.287][INFO][RK0][main]: Iter: 16000 Time(1000 iters): 8.61101s Loss: 0.0851126 lr:0.001
-[HCTR][03:13:16.758][INFO][RK0][main]: Evaluation, AUC: 0.685426
-[HCTR][03:13:16.758][INFO][RK0][main]: Eval Time for 4000 iters: 4.47088s
-[HCTR][03:13:25.378][INFO][RK0][main]: Iter: 17000 Time(1000 iters): 13.0865s Loss: 0.0632745 lr:0.001
-[HCTR][03:13:34.011][INFO][RK0][main]: Iter: 18000 Time(1000 iters): 8.62825s Loss: 0.0742994 lr:0.001
-[HCTR][03:13:42.626][INFO][RK0][main]: Iter: 19000 Time(1000 iters): 8.61035s Loss: 0.0679226 lr:0.001
-[HCTR][03:13:51.230][INFO][RK0][main]: Iter: 20000 Time(1000 iters): 8.59954s Loss: 0.0779185 lr:0.001
-[HCTR][03:13:55.704][INFO][RK0][main]: Evaluation, AUC: 0.684045
-[HCTR][03:13:55.704][INFO][RK0][main]: Eval Time for 4000 iters: 4.4736s
-[HCTR][03:13:55.733][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][03:13:55.902][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][03:13:56.075][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][03:13:56.091][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][03:13:56.104][INFO][RK0][main]: Done
-[HCTR][03:13:56.119][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][03:13:56.133][INFO][RK0][main]: Done
-[HCTR][03:13:56.398][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][03:13:56.611][INFO][RK0][main]: Done
-[HCTR][03:13:56.903][INFO][RK0][main]: Rank0: Write optimzer state to file
-[HCTR][03:13:57.152][INFO][RK0][main]: Done
-[HCTR][03:13:57.169][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HCTR][03:13:57.176][INFO][RK0][main]: Dumping dense weights to file, successful
-[HCTR][03:13:57.188][INFO][RK0][main]: Dumping dense optimizer states to file, successful
-[HCTR][03:14:05.788][INFO][RK0][main]: Iter: 21000 Time(1000 iters): 14.5538s Loss: 0.0770708 lr:0.001
-[HCTR][03:14:05.788][INFO][RK0][main]: Finish 21000 iterations with batchsize: 2720 in 204.91s.
-[HCTR][03:14:05.788][INFO][RK0][main]: Save the model graph to /hugectr_e2e/model/wdl.json successfully
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/index.html b/review/pr-458/notebooks/index.html deleted file mode 100644 index e9670a0c77..0000000000 --- a/review/pr-458/notebooks/index.html +++ /dev/null @@ -1,294 +0,0 @@ - - - - - - - HugeCTR Example Notebooks — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

HugeCTR Example Notebooks

-

This directory contains a set of Jupyter notebook that demonstrate how to use HugeCTR.

-

The simplest way to run a one of our notebooks is with a Docker container. -A container provides a self-contained, isolated, and reproducible environment for repetitive experiments. -Docker images are available from the NVIDIA GPU Cloud (NGC).

-
-

1. Clone the HugeCTR Repository

-

Use the following command to clone the HugeCTR repository:

-
git clone https://github.com/NVIDIA/HugeCTR
-
-
-
-
-

2. Pull the NGC Docker and run it

-

Pull the container using the following command:

-
docker pull nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-

Launch the container in interactive mode (mount the HugeCTR root directory into the container for your convenience) by running this command:

-
docker run --gpus all --rm -it --cap-add SYS_NICE --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -u root -v $(pwd):/HugeCTR -w /HugeCTR --network=host --runtime=nvidia nvcr.io/nvidia/merlin/merlin-hugectr:24.06
-
-
-
-

To run the Sparse Operation Kit notebooks, specify the nvcr.io/nvidia/merlin/merlin-hugectr:24.06 container.

-
-
-
-

3. Customized Building (Optional)

-

HugeCTR is already installed in the NGC container. But you can also setup HugeCTR from source to customize the build more. This is useful for developmental purposes.

-
    -
  1. Go to HugeCTR repo and update third party modules

  2. -
-
$ cd HugeCTR
-$ git submodule update --init --recursive
-
-
-
    -
  1. There are options to customize the build using parameters, which are detailed here -Here are some examples of how you can build HugeCTR using these build options:

  2. -
-
# Example 1
-$ mkdir -p build && cd build
-$ cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 .. # Target is NVIDIA V100 with all others by default
-$ make -j && make install
-
-
-
# Example 2
-$ mkdir -p build && cd build
-$ cmake -DCMAKE_BUILD_TYPE=Release -DSM="70;80" -DENABLE_MULTINODES=ON .. # Target is NVIDIA V100 / A100 with the multi-node mode on.
-$ make -j && make install
-
-
-

By default, HugeCTR is installed at /usr/local. However, you can use CMAKE_INSTALL_PREFIX to install HugeCTR to non-default location:

-

$ cmake -DCMAKE_INSTALL_PREFIX=/opt/HugeCTR -DSM=70 ..

-

Refer to the

-
-

How to Start Your Development -documentation for more details on building HugeCTR From Source

-
-
-
-

4. Start the Jupyter Notebook

-
    -
  1. Start Jupyter using these commands:

    -
    cd /HugeCTR/notebooks
    -jupyter-notebook --allow-root --ip 0.0.0.0 --port 8888 --NotebookApp.token='hugectr'
    -
    -
    -
  2. -
  3. Connect to your host machine using the 8888 port by accessing its IP address or name from your web browser: http://[host machine]:8888

    -

    Use the token available from the output by running the command above to log in. For example:

    -

    http://[host machine]:8888/?token=aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b

    -
  4. -
  5. Optional: Import MPI.

    -

    By default, HugeCTR initializes and finalizes MPI when you run the import hugectr statement within the NGC Merlin container. -If you build and install HugeCTR yourself, specify the ENABLE_MULTINODES=ON argument when you build. -See Build HugeCTR from Source.

    -

    If your program uses MPI for a reason other than interacting with HugeCTR, initialize MPI with the from mpi4py import MPI statement before you import HugeCTR.

    -
  6. -
  7. Important Note:

    -

    HugeCTR is written in CUDA/C++ and wrapped to Python using Pybind11. The C++ output will not display in Notebook cells unless you run the Python script in a command line manner.

    -
  8. -
-
-
-

Notebook List

-

The notebooks are located within the container and can be found in the /HugeCTR/notebooks directory.

-

Here’s a list of notebooks that you can run:

- -

The multi-modal-data series of notebooks demonstrate how to use of multi-modal data such as text and images for the task of movie recommendation. -The notebooks use the Movielens-25M dataset.

-

More notebooks on the Hierarchical Parameter Server (HPS) are available with its TensorRT and Tensorflow plugins.

-

For Sparse Operation Kit notebooks, refer to the sparse_operation_kit/notebooks/ directory of the repository or the documentation.

-
-
-

System Specifications

-

The specifications of the system on which each notebook can run successfully are summarized in the table. The notebooks are verified on the system below but it does not mean the minimum requirements.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notebook

CPU

GPU

#GPUs

Author

multi-modal-data

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Vinh Nguyen

continuous_training.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Xiaolei Shi

hps_demo.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Kingsley Liu, Matthias Langer and Yingcan Wei

training_with_remote_filesystem.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Jerry Shi

hugectr_e2e_demo_with_nvtabular.ipynb

Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz
512 GB Memory

Tesla V100-SXM2-32GB
32 GB Memory

1

Jerry Shi

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/00-Intro.html b/review/pr-458/notebooks/multi-modal-data/00-Intro.html deleted file mode 100644 index 4c76db8ab4..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/00-Intro.html +++ /dev/null @@ -1,175 +0,0 @@ - - - - - - - Training Recommender Systems on Multi-modal Data — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-0-intro/nvidia_logo.png -
-

Training Recommender Systems on Multi-modal Data

-
-

Overview

-

Recommender systems are often trained on tabular data, containing numeric fields (such as item price, numbers of user’s purchases) and categorical fields (such as user and item IDs).

-

Multi-modal data refer to data types in other modalities, such as text, image and video. Such data can additionally provide rich inputs to and potentially improve the effectiveness of recommender systems.

-

Several examples include:

-
    -
  • Movie recommendation, where movie poster, plot and synopsis can be used.

  • -
  • Music recommendation, where audio features and lyric can be used.

  • -
  • Itinerary planning and attractions recommendation, where text (user profile, attraction description & review) and photos can be used.

  • -
-

Often times, features from multi-modal data are extracted using domain-specific networks, such as ResNet for images and BERT for text data. These pretrained features, also called pretrained embeddings, are then combined with other trainable features and embeddings for the task of recommendation.

-

This series of notebooks demonstrate the use of multi-modal data (text, image) for the task of movie recommendation, using the Movielens-25M dataset.

- -
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/01-Download-Convert.html b/review/pr-458/notebooks/multi-modal-data/01-Download-Convert.html deleted file mode 100644 index 9f2d1d5e83..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/01-Download-Convert.html +++ /dev/null @@ -1,424 +0,0 @@ - - - - - - - MovieLens-25M: Download and Convert — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-1-download-convert/nvidia_logo.png -
-

MovieLens-25M: Download and Convert

-

The MovieLens-25M is a popular dataset in the recommender systems domain, containing 25M movie ratings for ~62,000 movies given by ~162,000 users.

-

In this notebook, we will download and convert this dataset to a suitable format for subsequent processing.

-
-

Getting Started

-
-
-
# External dependencies
-import os
-import time
-
-import pandas as pd
-from sklearn.model_selection import train_test_split
-
-from nvtabular.utils import download_file
-
-
-
-
-

We define our base input directory, containing the data.

-
-
-
INPUT_DATA_DIR = "./data"
-
-
-
-
-

We will download and unzip the data.

-
-
-
from os.path import exists
-
-if not  exists(os.path.join(INPUT_DATA_DIR, "ml-25m.zip")):
-    download_file("http://files.grouplens.org/datasets/movielens/ml-25m.zip", 
-              os.path.join(INPUT_DATA_DIR, "ml-25m.zip"))
-
-
-
-
-
-
-

Convert the dataset

-

First, we take a look on the movie metadata.

-
-
-
movies = pd.read_csv(os.path.join(INPUT_DATA_DIR, 'ml-25m/movies.csv'))
-movies.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdtitlegenres
01Toy Story (1995)Adventure|Animation|Children|Comedy|Fantasy
12Jumanji (1995)Adventure|Children|Fantasy
23Grumpier Old Men (1995)Comedy|Romance
34Waiting to Exhale (1995)Comedy|Drama|Romance
45Father of the Bride Part II (1995)Comedy
-
-
-

We can see, that genres are a multi-hot categorical features with different number of genres per movie. Currently, genres is a String and we want split the String into a list of Strings. In addition, we drop the title.

-
-
-
movies = movies.drop(['title', 'genres'], axis=1)
-movies.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieId
01
12
23
34
45
-
-
-

We save movies genres in parquet format, so that they can be used by NVTabular in the next notebook.

-
-
-
movies.to_parquet(os.path.join(INPUT_DATA_DIR, "movies_converted.parquet"))
-
-
-
-
-
-
-

Splitting into train and validation dataset

-

We load the movie ratings.

-
-
-
ratings = pd.read_csv(os.path.join(INPUT_DATA_DIR, "ml-25m", "ratings.csv"))
-ratings.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
userIdmovieIdratingtimestamp
012965.01147880044
113063.51147868817
213075.01147868828
316655.01147878820
418993.51147868510
-
-
-

We drop the timestamp column and split the ratings into training and test dataset. We use a simple random split.

-
-
-
ratings = ratings.drop('timestamp', axis=1)
-train, valid = train_test_split(ratings, test_size=0.2, random_state=42)
-
-
-
-
-

We save the dataset to disk.

-
-
-
train.to_parquet(os.path.join(INPUT_DATA_DIR, "train.parquet"))
-valid.to_parquet(os.path.join(INPUT_DATA_DIR, "valid.parquet"))
-
-
-
-
-
-
-

Next steps

-

If you wish to download the real enriched data for the movielens-25m dataset, including movie poster and movie synopsis, then proceed through notebooks 02-04.

-

If you wish to use synthetic multi-modal data, then proceed to notebook 05-Create-Feature-Store.ipynb, synthetic data section.

-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/02-Data-Enrichment.html b/review/pr-458/notebooks/multi-modal-data/02-Data-Enrichment.html deleted file mode 100644 index af2f69aaa7..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/02-Data-Enrichment.html +++ /dev/null @@ -1,454 +0,0 @@ - - - - - - - MovieLens Data Enrichment — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-2-data-enrichment/nvidia_logo.png -
-

MovieLens Data Enrichment

-

In this notebook, we will enrich the MovieLens 25M dataset with poster and movie sypnopsis scrapped from IMDB. If you wish to use synthetic multi-modal data, then proceed to 05-Create-Feature-Store.ipynb, synthetic data section.

-

First, we will need to install some extra package for IMDB data collection.

-
-
-
!pip install imdbpy
-
-
-
-
-

Note: restart the kernel for the new package to take effect.

-
-
-
import IPython
-
-IPython.Application.instance().kernel.do_shutdown(True)
-
-
-
-
-
-

Scraping data from IMDB

-

The IMDB API allows the collection of a rich set of multi-modal meta data from the IMDB database, including link to poster, synopsis and plots.

-
-
-
from imdb import IMDb
-
-# create an instance of the IMDb class
-ia = IMDb()
-
-# get a movie and print its director(s)
-the_matrix = ia.get_movie('0114709')
-for director in the_matrix['directors']:
-    print(director['name'])
-
-# show all information that are currently available for a movie
-print(sorted(the_matrix.keys()))
-
-# show all information sets that can be fetched for a movie
-print(ia.get_movie_infoset())
-
-
-
-
-
-
-
print(the_matrix.get('plot'))
-
-
-
-
-
-
-
the_matrix.get('synopsis')
-
-
-
-
-
-
-

Collect synopsis for all movies

-

Next, we will collect meta data, including the synopsis, for all movies in the dataset. Note that this process will take a while to complete.

-
-
-
from collections import defaultdict
-import pandas as pd
-
-
-
-
-
-
-
links = pd.read_csv("./data/ml-25m/links.csv")
-
-
-
-
-
-
-
links.head()
-
-
-
-
-
-
-
links.imdbId.nunique()
-
-
-
-
-
-
-
from tqdm import tqdm
-import pickle
-from multiprocessing import Process, cpu_count
-from multiprocessing.managers import BaseManager, DictProxy
-
-
-
-
-
-
-
movies = list(links['imdbId'])
-movies_id = list(links['movieId'])
-
-
-
-
-
-
-
movies_infos = {}
-def task(movies, movies_ids, movies_infos):
-    for i, (movie, movies_id) in tqdm(enumerate(zip(movies, movies_ids)), total=len(movies)):        
-        try:
-            movie_info = ia.get_movie(movie)
-            movies_infos[movies_id] = movie_info
-        except Exception as e:
-            print("Movie %d download error: "%movies_id, e)
-
-#task(movies, movies_ids, movies_infos)
-
-
-
-
-

We will now collect the movie metadata from IMDB using parallel threads.

-

Please note: with higher thread counts, there is a risk of being blocked by IMDB DoS software.

-
-
-
print ('Gathering movies information from IMDB...')
-BaseManager.register('dict', dict, DictProxy)
-manager = BaseManager()
-manager.start()
-
-movies_infos = manager.dict()
-
-num_jobs = 5
-total = len(movies)
-chunk_size = total // num_jobs + 1
-processes = []
-
-for i in range(0, total, chunk_size):
-    proc = Process(
-        target=task,
-        args=[
-            movies[i:i+chunk_size],
-            movies_id[i:i+chunk_size],
-            movies_infos
-        ]
-    )
-    processes.append(proc)
-for proc in processes:
-    proc.start()
-for proc in processes:
-    proc.join()
-
-
-
-
-
-
-
movies_infos = movies_infos.copy()
-
-
-
-
-
-
-
len(movies_infos)
-
-
-
-
-
-
-
with open('movies_info.pkl', 'wb') as f:
-    pickle.dump({"movies_infos": movies_infos}, f, protocol=pickle.HIGHEST_PROTOCOL)
-
-
-
-
-
-
-

Scraping movie posters

-

The movie metadata also contains link to poster images. We next collect these posters where available.

-

Note: this process will take some time to complete.

-
-
-
from multiprocessing import Process, cpu_count
-import pickle
-import subprocess
-from tqdm import tqdm
-import os
-
-with open('movies_info.pkl', 'rb') as f:
-    movies_infos = pickle.load(f)['movies_infos']
-
-
-
-
-
-
-
COLLECT_LARGE_POSTER = False
-
-filelist, targetlist = [], []
-largefilelist, largetargetlist = [], []
-
-for key, movie in tqdm(movies_infos.items(), total=len(movies_infos)):
-    if 'cover url' in movie.keys():
-        target_path = './poster_small/%s.jpg'%(movie['imdbID'])
-        if os.path.exists(target_path):
-            continue
-        targetlist.append(target_path)
-        filelist.append(movie['cover url'])
-                
-    # Optionally, collect high-res poster images 
-    if COLLECT_LARGE_POSTER:
-        if 'full-size cover url' in movie.keys():
-            target_path = '"./poster_large/%s.jpg"'%(movie['imdbID'])
-            if os.path.exists(target_path):
-                continue
-            largetargetlist.append(target_path)
-            largefilelist.append(movie['full-size cover url'])                                
-
-
-
-
-
-
-
def download_task(filelist, targetlist):
-    for i, (file, target) in tqdm(enumerate(zip(filelist, targetlist)), total=len(targetlist)):        
-        cmd = 'wget "%s" -O %s'%(file, target)
-        stream = os.popen(cmd)
-        output = stream.read()
-        print(output, cmd)
-
-
-
-
-
-
-
print ('Gathering small posters...')
-!mkdir ./poster_small
-
-num_jobs = 10
-total = len(filelist)
-chunk_size = total // num_jobs + 1
-processes = []
-
-for i in range(0, total, chunk_size):
-    proc = Process(
-        target=download_task,
-        args=[
-            filelist[i:i+chunk_size],
-            targetlist[i:i+chunk_size],            
-        ]
-    )
-    processes.append(proc)
-for proc in processes:
-    proc.start()
-for proc in processes:
-    proc.join()
-
-
-
-
-
-
-
if COLLECT_LARGE_POSTER:
-    print ('Gathering large posters...')
-    !mkdir ./poster_large
-
-    num_jobs = 32
-    total = len(largefilelist)
-    chunk_size = total // num_jobs + 1
-    processes = []
-
-    for i in range(0, total, chunk_size):
-        proc = Process(
-            target=download_task,
-            args=[
-                largefilelist[i:i+chunk_size],
-                largetargetlist[i:i+chunk_size],            
-            ]
-        )
-        processes.append(proc)
-    for proc in processes:
-        proc.start()
-    for proc in processes:
-        proc.join()
-
-
-
-
-
-
-
!ls -l poster_small|wc -l
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/03-Feature-Extraction-Poster.html b/review/pr-458/notebooks/multi-modal-data/03-Feature-Extraction-Poster.html deleted file mode 100644 index 639eab10c6..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/03-Feature-Extraction-Poster.html +++ /dev/null @@ -1,2762 +0,0 @@ - - - - - - - Movie Poster Feature Extraction with ResNet — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-3-feature-extraction-poster/nvidia_logo.png -
-

Movie Poster Feature Extraction with ResNet

-

In this notebook, we will use a pretrained ResNet-50 network to extract image features from the movie poster images.

-

Note: this notebook should be executed from within the nvidia_resnet50 container, built as follows

-
git clone https://github.com/NVIDIA/DeepLearningExamples
-git checkout 5d6d417ff57e8824ef51573e00e5e21307b39697
-cd DeepLearningExamples/PyTorch/Classification/ConvNets
-docker build . -t nvidia_resnet50
-
-
-

Start the container, mounting the current directory:

-
nvidia-docker run --rm --net=host -it -v $PWD:/workspace --ipc=host nvidia_resnet50
-
-
-

Then from within the container:

-
cd /workspace
-jupyter-lab --allow-root --ip='0.0.0.0'
-
-
-
-
-

Download a pretrained ResNet-50 from NVIDIA GPU cloud

-

First, we install an extra package and restart the kernel.

-
-
-
!pip install ipywidgets tqdm
-import IPython
-
-IPython.Application.instance().kernel.do_shutdown(True)
-
-
-
-
-
-
-
from PIL import Image
-import argparse
-import numpy as np
-import json
-import torch
-from torch.cuda.amp import autocast
-import torch.backends.cudnn as cudnn
-
-import sys
-sys.path.append('/workspace/DeepLearningExamples/PyTorch/Classification/ConvNets')
-from image_classification import models
-import torchvision.transforms as transforms
-
-
-
-
-
-
-
from image_classification.models import (
-    resnet50,
-    resnext101_32x4d,
-    se_resnext101_32x4d,
-    efficientnet_b0,
-    efficientnet_b4,
-    efficientnet_widese_b0,
-    efficientnet_widese_b4,
-    efficientnet_quant_b0,
-    efficientnet_quant_b4,
-)
-
-
-
-
-
-
-
def available_models():
-    models = {
-        m.name: m
-        for m in [
-            resnet50,
-            resnext101_32x4d,
-            se_resnext101_32x4d,
-            efficientnet_b0,
-            efficientnet_b4,
-            efficientnet_widese_b0,
-            efficientnet_widese_b4,
-            efficientnet_quant_b0,
-            efficientnet_quant_b4,
-        ]
-    }
-    return models
-
-
-
-
-
-
-
def load_jpeg_from_file(path, image_size, cuda=True):
-    img_transforms = transforms.Compose(
-        [
-            transforms.Resize(image_size + 32),
-            transforms.CenterCrop(image_size),
-            transforms.ToTensor(),
-        ]
-    )
-
-    img = img_transforms(Image.open(path))
-    with torch.no_grad():
-        # mean and std are not multiplied by 255 as they are in training script
-        # torch dataloader reads data into bytes whereas loading directly
-        # through PIL creates a tensor with floats in [0,1] range
-        mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
-        std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
-
-        if cuda:
-            mean = mean.cuda()
-            std = std.cuda()
-            img = img.cuda()
-        img = img.float()
-
-        if img.shape[0] == 1: #mono image
-            #pad channels
-            img = img.repeat([3, 1, 1])
-        input = img.unsqueeze(0).sub_(mean).div_(std)
-
-    return input
-
-
-
-
-
-
-
def check_quant_weight_correctness(checkpoint_path, model):
-    state_dict = torch.load(checkpoint_path, map_location=torch.device('cpu'))
-    state_dict = {k[len("module."):] if k.startswith("module.") else k: v for k, v in state_dict.items()}
-    quantizers_sd_keys = {f'{n[0]}._amax' for n in model.named_modules() if 'quantizer' in n[0]}
-    sd_all_keys = quantizers_sd_keys | set(model.state_dict().keys())
-    assert set(state_dict.keys()) == sd_all_keys, (f'Passed quantized architecture, but following keys are missing in '
-                                                   f'checkpoint: {list(sd_all_keys - set(state_dict.keys()))}')
-
-
-
-
-
-
-
imgnet_classes = np.array(json.load(open("/workspace/DeepLearningExamples/PyTorch/Classification/ConvNets/LOC_synset_mapping.json", "r")))
-
-model_args = {}
-model_args["pretrained_from_file"] = './nvidia_resnet50_200821.pth.tar'
-model = available_models()['resnet50'](model_args)
-
-model = model.cuda()
-model.eval()
-
-
-
-
-
Downloading: "https://api.ngc.nvidia.com/v2/models/nvidia/resnet50_pyt_amp/versions/20.06.0/files/nvidia_resnet50_200821.pth.tar" to /root/.cache/torch/hub/checkpoints/nvidia_resnet50_200821.pth.tar
-
-
-

-
-
-
ResNet(
-  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
-  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-  (relu): ReLU(inplace=True)
-  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
-  (layer1): Sequential(
-    (0): Bottleneck(
-      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-      (downsample): Sequential(
-        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      )
-    )
-    (1): Bottleneck(
-      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (2): Bottleneck(
-      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-  )
-  (layer2): Sequential(
-    (0): Bottleneck(
-      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-      (downsample): Sequential(
-        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
-        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      )
-    )
-    (1): Bottleneck(
-      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (2): Bottleneck(
-      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (3): Bottleneck(
-      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-  )
-  (layer3): Sequential(
-    (0): Bottleneck(
-      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-      (downsample): Sequential(
-        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
-        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      )
-    )
-    (1): Bottleneck(
-      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (2): Bottleneck(
-      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (3): Bottleneck(
-      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (4): Bottleneck(
-      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (5): Bottleneck(
-      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-  )
-  (layer4): Sequential(
-    (0): Bottleneck(
-      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-      (downsample): Sequential(
-        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
-        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      )
-    )
-    (1): Bottleneck(
-      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-    (2): Bottleneck(
-      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
-      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-      (relu): ReLU(inplace=True)
-    )
-  )
-  (avgpool): AdaptiveAvgPool2d(output_size=1)
-  (fc): Linear(in_features=2048, out_features=1000, bias=True)
-)
-
-
-
-
-
-
-

Extract features for all movies

-

Next, we will extract feature for all movie posters, using the last layer just before the classification head, containing 2048 float values.

-
-
-
import glob
-
-filelist = glob.glob('./poster_small/*.jpg')
-len(filelist)
-
-
-
-
-
61951
-
-
-
-
-
-
-
filelist[:10]
-
-
-
-
-
['./poster_small/0055323.jpg',
- './poster_small/0114709.jpg',
- './poster_small/0274711.jpg',
- './poster_small/0055320.jpg',
- './poster_small/0054197.jpg',
- './poster_small/1791658.jpg',
- './poster_small/1288589.jpg',
- './poster_small/0365653.jpg',
- './poster_small/2324928.jpg',
- './poster_small/6000478.jpg']
-
-
-
-
-
-
-
from tqdm import tqdm
-
-batchsize = 64
-num_bathces = len(filelist)//batchsize
-batches = np.array_split(filelist, num_bathces)
-
-
-
-
-
-
-
### strip the last layer
-feature_extractor = torch.nn.Sequential(*list(model.children())[:-1])
-
-feature_dict = {}
-error = 0
-for batch in tqdm(batches):
-    inputs = []
-    imgs = []
-    for i, f in enumerate(batch):
-        try:
-            img = load_jpeg_from_file(f, 224, cuda=True)
-            imgs.append(f.split('/')[-1].split('.')[0])
-            inputs.append(img.squeeze())
-        except Exception as e:
-            print(e)
-            error +=1
-    features = feature_extractor(torch.stack(inputs, dim=0)).cpu().detach().numpy().squeeze()  
-    for i, f in enumerate(imgs):
-        feature_dict[f] =features[i,:]
-
-print('Unable to extract features for %d images'%error)
-
-
-
-
-
  0%|          | 0/967 [00:00<?, ?it/s]
-
-
-
cannot identify image file './poster_small/0114709.jpg'
-cannot identify image file './poster_small/0274711.jpg'
-cannot identify image file './poster_small/0055320.jpg'
-cannot identify image file './poster_small/0054197.jpg'
-cannot identify image file './poster_small/1791658.jpg'
-cannot identify image file './poster_small/1288589.jpg'
-cannot identify image file './poster_small/0365653.jpg'
-cannot identify image file './poster_small/2324928.jpg'
-cannot identify image file './poster_small/6000478.jpg'
-cannot identify image file './poster_small/0168199.jpg'
-cannot identify image file './poster_small/0118926.jpg'
-cannot identify image file './poster_small/0415856.jpg'
-
-
-
  0%|          | 2/967 [00:02<21:58,  1.37s/it]
-
-
-
cannot identify image file './poster_small/0494260.jpg'
-
-
-
  0%|          | 3/967 [00:03<18:50,  1.17s/it]
-
-
-
cannot identify image file './poster_small/0810772.jpg'
-
-
-
  0%|          | 4/967 [00:03<15:46,  1.02it/s]
-
-
-
cannot identify image file './poster_small/0049314.jpg'
-
-
-
  1%|          | 8/967 [00:06<12:59,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0066831.jpg'
-
-
-
  1%|          | 9/967 [00:07<12:22,  1.29it/s]
-
-
-
cannot identify image file './poster_small/0888693.jpg'
-
-
-
  1%|          | 12/967 [00:10<14:20,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0067431.jpg'
-
-
-
  1%|▏         | 14/967 [00:11<13:07,  1.21it/s]
-
-
-
cannot identify image file './poster_small/6522546.jpg'
-cannot identify image file './poster_small/0057811.jpg'
-
-
-
  2%|▏         | 16/967 [00:13<13:05,  1.21it/s]
-
-
-
cannot identify image file './poster_small/5176252.jpg'
-cannot identify image file './poster_small/0112373.jpg'
-
-
-
  2%|▏         | 18/967 [00:14<10:45,  1.47it/s]
-
-
-
cannot identify image file './poster_small/4636254.jpg'
-
-
-
  2%|▏         | 19/967 [00:15<11:11,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0365658.jpg'
-
-
-
  2%|▏         | 24/967 [00:19<13:04,  1.20it/s]
-
-
-
cannot identify image file './poster_small/2124046.jpg'
-
-
-
  4%|▎         | 34/967 [00:26<12:05,  1.29it/s]
-
-
-
cannot identify image file './poster_small/0104469.jpg'
-
-
-
  4%|▍         | 37/967 [00:29<13:35,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0102493.jpg'
-
-
-
  4%|▍         | 41/967 [00:33<13:08,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0051792.jpg'
-
-
-
  5%|▍         | 46/967 [00:36<09:19,  1.65it/s]
-
-
-
cannot identify image file './poster_small/0110017.jpg'
-cannot identify image file './poster_small/0139630.jpg'
-
-
-
  5%|▍         | 48/967 [00:37<08:21,  1.83it/s]
-
-
-
cannot identify image file './poster_small/0143348.jpg'
-
-
-
  5%|▌         | 51/967 [00:38<07:56,  1.92it/s]
-
-
-
cannot identify image file './poster_small/0037618.jpg'
-cannot identify image file './poster_small/0040002.jpg'
-
-
-
  5%|▌         | 53/967 [00:39<08:56,  1.70it/s]
-
-
-
cannot identify image file './poster_small/0317950.jpg'
-
-
-
  6%|▌         | 54/967 [00:40<10:00,  1.52it/s]
-
-
-
cannot identify image file './poster_small/0850669.jpg'
-cannot identify image file './poster_small/0325258.jpg'
-
-
-
  6%|▌         | 58/967 [00:44<12:21,  1.23it/s]
-
-
-
cannot identify image file './poster_small/6569888.jpg'
-cannot identify image file './poster_small/0037736.jpg'
-
-
-
  6%|▋         | 61/967 [00:46<12:33,  1.20it/s]
-
-
-
cannot identify image file './poster_small/0109303.jpg'
-
-
-
  7%|▋         | 66/967 [00:50<10:24,  1.44it/s]
-
-
-
cannot identify image file './poster_small/0103882.jpg'
-
-
-
  7%|▋         | 67/967 [00:50<09:26,  1.59it/s]
-
-
-
cannot identify image file './poster_small/0267287.jpg'
-
-
-
  7%|▋         | 71/967 [00:53<12:02,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0100033.jpg'
-
-
-
  8%|▊         | 77/967 [00:57<10:21,  1.43it/s]
-
-
-
cannot identify image file './poster_small/1601215.jpg'
-
-
-
  8%|▊         | 81/967 [01:00<10:08,  1.46it/s]
-
-
-
cannot identify image file './poster_small/0092028.jpg'
-cannot identify image file './poster_small/0075963.jpg'
-cannot identify image file './poster_small/3267334.jpg'
-
-
-
  9%|▊         | 84/967 [01:02<08:43,  1.69it/s]
-
-
-
cannot identify image file './poster_small/0059398.jpg'
-
-
-
  9%|▉         | 86/967 [01:03<08:15,  1.78it/s]
-
-
-
cannot identify image file './poster_small/0122565.jpg'
-
-
-
 10%|█         | 97/967 [01:09<07:56,  1.82it/s]
-
-
-
cannot identify image file './poster_small/0052572.jpg'
-
-
-
 11%|█         | 102/967 [01:12<10:25,  1.38it/s]
-
-
-
cannot identify image file './poster_small/6404896.jpg'
-
-
-
 11%|█         | 103/967 [01:13<11:14,  1.28it/s]
-
-
-
cannot identify image file './poster_small/0027428.jpg'
-cannot identify image file './poster_small/0033883.jpg'
-
-
-
 11%|█         | 104/967 [01:14<10:13,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0113270.jpg'
-
-
-
 11%|█         | 108/967 [01:18<12:33,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0022286.jpg'
-
-
-
 12%|█▏        | 112/967 [01:21<11:23,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0068953.jpg'
-
-
-
 12%|█▏        | 114/967 [01:23<12:35,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0042949.jpg'
-cannot identify image file './poster_small/0130297.jpg'
-
-
-
 12%|█▏        | 115/967 [01:24<12:22,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0028207.jpg'
-
-
-
 12%|█▏        | 117/967 [01:26<12:46,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0054244.jpg'
-
-
-
 12%|█▏        | 118/967 [01:27<12:54,  1.10it/s]
-
-
-
cannot identify image file './poster_small/1275680.jpg'
-
-
-
 12%|█▏        | 120/967 [01:30<18:32,  1.31s/it]
-
-
-
cannot identify image file './poster_small/0036533.jpg'
-cannot identify image file './poster_small/0037297.jpg'
-
-
-
 13%|█▎        | 130/967 [01:37<09:14,  1.51it/s]
-
-
-
cannot identify image file './poster_small/0962736.jpg'
-cannot identify image file './poster_small/0042548.jpg'
-
-
-
 14%|█▍        | 134/967 [01:40<10:19,  1.34it/s]
-
-
-
cannot identify image file './poster_small/0038109.jpg'
-cannot identify image file './poster_small/0104009.jpg'
-
-
-
 14%|█▍        | 136/967 [01:41<11:21,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0180316.jpg'
-
-
-
 14%|█▍        | 137/967 [01:42<11:00,  1.26it/s]
-
-
-
cannot identify image file './poster_small/0071925.jpg'
-
-
-
 14%|█▍        | 139/967 [01:43<10:33,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0087001.jpg'
-
-
-
 15%|█▍        | 143/967 [01:47<11:02,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0056910.jpg'
-
-
-
 15%|█▍        | 144/967 [01:47<09:55,  1.38it/s]
-
-
-
cannot identify image file './poster_small/0064563.jpg'
-
-
-
 15%|█▌        | 147/967 [01:49<09:19,  1.46it/s]
-
-
-
cannot identify image file './poster_small/1720040.jpg'
-
-
-
 15%|█▌        | 149/967 [01:51<12:05,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0041112.jpg'
-
-
-
 16%|█▌        | 156/967 [01:58<11:37,  1.16it/s]
-
-
-
cannot identify image file './poster_small/4412528.jpg'
-
-
-
 16%|█▌        | 157/967 [01:58<10:57,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0051362.jpg'
-
-
-
 16%|█▋        | 158/967 [01:59<10:15,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0029992.jpg'
-
-
-
 17%|█▋        | 160/967 [02:00<08:26,  1.59it/s]
-
-
-
cannot identify image file './poster_small/0384309.jpg'
-cannot identify image file './poster_small/0028367.jpg'
-
-
-
 17%|█▋        | 162/967 [02:01<08:56,  1.50it/s]
-
-
-
cannot identify image file './poster_small/0038336.jpg'
-
-
-
 17%|█▋        | 163/967 [02:02<10:07,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0058725.jpg'
-
-
-
 17%|█▋        | 164/967 [02:04<11:26,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0113328.jpg'
-
-
-
 17%|█▋        | 166/967 [02:05<09:21,  1.43it/s]
-
-
-
cannot identify image file './poster_small/3878542.jpg'
-
-
-
 17%|█▋        | 167/967 [02:06<10:00,  1.33it/s]
-
-
-
cannot identify image file './poster_small/0026465.jpg'
-
-
-
 17%|█▋        | 169/967 [02:07<10:07,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0040588.jpg'
-
-
-
 18%|█▊        | 175/967 [02:12<10:37,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0086984.jpg'
-
-
-
 18%|█▊        | 178/967 [02:14<09:23,  1.40it/s]
-
-
-
cannot identify image file './poster_small/0309047.jpg'
-
-
-
 19%|█▊        | 181/967 [02:16<10:52,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0031405.jpg'
-
-
-
 19%|█▉        | 185/967 [02:20<11:42,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0097493.jpg'
-
-
-
 19%|█▉        | 186/967 [02:21<11:54,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0346336.jpg'
-cannot identify image file './poster_small/0078841.jpg'
-cannot identify image file './poster_small/0018795.jpg'
-cannot identify image file './poster_small/9151704.jpg'
-
-
-
 20%|█▉        | 191/967 [02:25<10:40,  1.21it/s]
-
-
-
cannot identify image file './poster_small/1417097.jpg'
-
-
-
 20%|██        | 195/967 [02:29<11:12,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0054223.jpg'
-
-
-
 20%|██        | 196/967 [02:30<11:16,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0117477.jpg'
-
-
-
 21%|██        | 199/967 [02:31<07:48,  1.64it/s]
-
-
-
cannot identify image file './poster_small/0000041.jpg'
-
-
-
 21%|██        | 201/967 [02:33<09:47,  1.30it/s]
-
-
-
cannot identify image file './poster_small/0028907.jpg'
-
-
-
 21%|██▏       | 207/967 [02:38<10:09,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0366179.jpg'
-
-
-
 22%|██▏       | 209/967 [02:40<11:12,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0109761.jpg'
-
-
-
 22%|██▏       | 217/967 [02:45<08:58,  1.39it/s]
-
-
-
cannot identify image file './poster_small/7167686.jpg'
-
-
-
 23%|██▎       | 219/967 [02:47<08:51,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0048973.jpg'
-
-
-
 23%|██▎       | 226/967 [02:53<10:37,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0100112.jpg'
-cannot identify image file './poster_small/3606394.jpg'
-
-
-
 23%|██▎       | 227/967 [02:54<10:40,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0021890.jpg'
-
-
-
 24%|██▎       | 228/967 [02:54<10:32,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0033874.jpg'
-cannot identify image file './poster_small/0035019.jpg'
-
-
-
 24%|██▍       | 232/967 [02:57<09:59,  1.23it/s]
-
-
-
cannot identify image file './poster_small/1228953.jpg'
-
-
-
 25%|██▍       | 237/967 [03:02<09:53,  1.23it/s]
-
-
-
cannot identify image file './poster_small/7688990.jpg'
-cannot identify image file './poster_small/0052954.jpg'
-cannot identify image file './poster_small/0092159.jpg'
-
-
-
 25%|██▌       | 242/967 [03:06<09:58,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0094349.jpg'
-cannot identify image file './poster_small/0065136.jpg'
-
-
-
 25%|██▌       | 246/967 [03:09<09:09,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0027805.jpg'
-cannot identify image file './poster_small/0034904.jpg'
-
-
-
 26%|██▌       | 248/967 [03:10<10:07,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0037522.jpg'
-
-
-
 26%|██▌       | 250/967 [03:13<11:19,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0036301.jpg'
-
-
-
 26%|██▋       | 254/967 [03:16<10:26,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0037324.jpg'
-
-
-
 26%|██▋       | 256/967 [03:17<08:46,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0053622.jpg'
-
-
-
 27%|██▋       | 265/967 [03:23<08:14,  1.42it/s]
-
-
-
cannot identify image file './poster_small/7278178.jpg'
-
-
-
 28%|██▊       | 266/967 [03:24<08:31,  1.37it/s]
-
-
-
cannot identify image file './poster_small/0418239.jpg'
-cannot identify image file './poster_small/0040489.jpg'
-cannot identify image file './poster_small/0069280.jpg'
-
-
-
 28%|██▊       | 269/967 [03:27<10:44,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0049143.jpg'
-
-
-
 29%|██▊       | 276/967 [03:33<08:21,  1.38it/s]
-
-
-
cannot identify image file './poster_small/0064840.jpg'
-
-
-
 29%|██▉       | 285/967 [03:41<10:27,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0070723.jpg'
-
-
-
 30%|██▉       | 287/967 [03:43<09:32,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0057997.jpg'
-
-
-
 30%|██▉       | 289/967 [03:44<08:37,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0056072.jpg'
-
-
-
 31%|███       | 295/967 [03:50<10:14,  1.09it/s]
-
-
-
cannot identify image file './poster_small/7446332.jpg'
-
-
-
 31%|███       | 297/967 [03:51<09:45,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0076618.jpg'
-
-
-
 31%|███       | 300/967 [03:55<10:41,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0290014.jpg'
-
-
-
 31%|███       | 302/967 [03:56<08:05,  1.37it/s]
-
-
-
cannot identify image file './poster_small/0347330.jpg'
-
-
-
 31%|███▏      | 303/967 [03:56<08:07,  1.36it/s]
-
-
-
cannot identify image file './poster_small/0159620.jpg'
-
-
-
 31%|███▏      | 304/967 [03:57<08:53,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0044667.jpg'
-
-
-
 32%|███▏      | 307/967 [04:00<10:17,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0040190.jpg'
-cannot identify image file './poster_small/3088364.jpg'
-cannot identify image file './poster_small/0230367.jpg'
-
-
-
 32%|███▏      | 309/967 [04:02<10:05,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0037147.jpg'
-
-
-
 32%|███▏      | 310/967 [04:03<09:15,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0033282.jpg'
-cannot identify image file './poster_small/4028134.jpg'
-
-
-
 32%|███▏      | 312/967 [04:05<10:09,  1.07it/s]
-
-
-
cannot identify image file './poster_small/1352824.jpg'
-
-
-
 32%|███▏      | 314/967 [04:06<08:53,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0079400.jpg'
-
-
-
 33%|███▎      | 318/967 [04:09<08:48,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0449869.jpg'
-cannot identify image file './poster_small/0047526.jpg'
-
-
-
 33%|███▎      | 320/967 [04:11<08:39,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0095593.jpg'
-
-
-
 33%|███▎      | 321/967 [04:12<08:17,  1.30it/s]
-
-
-
cannot identify image file './poster_small/2762334.jpg'
-
-
-
 33%|███▎      | 322/967 [04:12<08:30,  1.26it/s]
-
-
-
cannot identify image file './poster_small/0023293.jpg'
-
-
-
 34%|███▎      | 325/967 [04:15<08:01,  1.33it/s]
-
-
-
cannot identify image file './poster_small/0024593.jpg'
-
-
-
 34%|███▍      | 327/967 [04:16<08:30,  1.25it/s]
-
-
-
cannot identify image file './poster_small/1116182.jpg'
-
-
-
 34%|███▍      | 328/967 [04:17<08:58,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0063462.jpg'
-
-
-
 34%|███▍      | 329/967 [04:18<09:14,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0119577.jpg'
-
-
-
 34%|███▍      | 331/967 [04:19<07:48,  1.36it/s]
-
-
-
cannot identify image file './poster_small/0106727.jpg'
-cannot identify image file './poster_small/0053884.jpg'
-
-
-
 35%|███▍      | 337/967 [04:25<10:45,  1.03s/it]
-
-
-
cannot identify image file './poster_small/0037077.jpg'
-
-
-
 35%|███▌      | 341/967 [04:29<10:09,  1.03it/s]
-
-
-
cannot identify image file './poster_small/0040064.jpg'
-
-
-
 36%|███▌      | 345/967 [04:32<08:07,  1.28it/s]
-
-
-
cannot identify image file './poster_small/0089108.jpg'
-
-
-
 36%|███▌      | 346/967 [04:33<08:30,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0023129.jpg'
-
-
-
 36%|███▌      | 347/967 [04:34<09:10,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0044827.jpg'
-
-
-
 37%|███▋      | 353/967 [04:39<09:09,  1.12it/s]
-
-
-
cannot identify image file './poster_small/0067108.jpg'
-
-
-
 37%|███▋      | 359/967 [04:44<08:04,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0432432.jpg'
-
-
-
 37%|███▋      | 360/967 [04:45<09:17,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0202415.jpg'
-
-
-
 37%|███▋      | 362/967 [04:47<09:21,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0074812.jpg'
-
-
-
 38%|███▊      | 363/967 [04:48<09:48,  1.03it/s]
-
-
-
cannot identify image file './poster_small/0059311.jpg'
-
-
-
 38%|███▊      | 371/967 [04:55<08:19,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0065073.jpg'
-
-
-
 39%|███▊      | 373/967 [04:56<07:01,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0052820.jpg'
-
-
-
 39%|███▉      | 375/967 [04:59<08:32,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0120865.jpg'
-
-
-
 39%|███▉      | 377/967 [05:01<09:15,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0064620.jpg'
-cannot identify image file './poster_small/0068505.jpg'
-cannot identify image file './poster_small/2934916.jpg'
-
-
-
 39%|███▉      | 379/967 [05:02<08:47,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0040137.jpg'
-
-
-
 40%|███▉      | 384/967 [05:07<09:01,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0071864.jpg'
-
-
-
 40%|███▉      | 385/967 [05:08<09:04,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0072973.jpg'
-
-
-
 40%|████      | 387/967 [05:09<07:45,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0449951.jpg'
-
-
-
 40%|████      | 388/967 [05:10<07:35,  1.27it/s]
-
-
-
cannot identify image file './poster_small/0074605.jpg'
-
-
-
 40%|████      | 391/967 [05:13<08:07,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0328955.jpg'
-cannot identify image file './poster_small/0077294.jpg'
-
-
-
 41%|████      | 393/967 [05:14<08:19,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0987918.jpg'
-
-
-
 41%|████      | 394/967 [05:15<07:44,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0067520.jpg'
-
-
-
 41%|████      | 395/967 [05:16<08:02,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0220016.jpg'
-
-
-
 41%|████      | 396/967 [05:17<08:38,  1.10it/s]
-
-
-
cannot identify image file './poster_small/0067236.jpg'
-
-
-
 41%|████      | 397/967 [05:18<08:30,  1.12it/s]
-
-
-
cannot identify image file './poster_small/0085838.jpg'
-
-
-
 41%|████      | 398/967 [05:19<08:39,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0047561.jpg'
-cannot identify image file './poster_small/0066075.jpg'
-
-
-
 42%|████▏     | 407/967 [05:28<08:47,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0123374.jpg'
-
-
-
 42%|████▏     | 410/967 [05:31<08:57,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0026143.jpg'
-
-
-
 43%|████▎     | 411/967 [05:32<08:52,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0064626.jpg'
-
-
-
 43%|████▎     | 413/967 [05:33<07:47,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0822388.jpg'
-
-
-
 43%|████▎     | 420/967 [05:40<08:23,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0101664.jpg'
-
-
-
 44%|████▎     | 423/967 [05:42<07:53,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0403579.jpg'
-cannot identify image file './poster_small/0070112.jpg'
-cannot identify image file './poster_small/2323633.jpg'
-
-
-
 44%|████▍     | 427/967 [05:45<05:54,  1.52it/s]
-
-
-
cannot identify image file './poster_small/0203408.jpg'
-
-
-
 44%|████▍     | 428/967 [05:45<05:28,  1.64it/s]
-
-
-
cannot identify image file './poster_small/1167638.jpg'
-
-
-
 45%|████▍     | 432/967 [05:47<05:12,  1.71it/s]
-
-
-
cannot identify image file './poster_small/0144178.jpg'
-
-
-
 45%|████▍     | 434/967 [05:49<06:01,  1.48it/s]
-
-
-
cannot identify image file './poster_small/0295432.jpg'
-
-
-
 45%|████▍     | 435/967 [05:50<06:18,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0123865.jpg'
-
-
-
 45%|████▌     | 436/967 [05:50<05:42,  1.55it/s]
-
-
-
cannot identify image file './poster_small/0110530.jpg'
-cannot identify image file './poster_small/0082817.jpg'
-
-
-
 45%|████▌     | 437/967 [05:51<06:05,  1.45it/s]
-
-
-
cannot identify image file './poster_small/0067525.jpg'
-
-
-
 45%|████▌     | 438/967 [05:52<06:34,  1.34it/s]
-
-
-
cannot identify image file './poster_small/0046333.jpg'
-
-
-
 45%|████▌     | 439/967 [05:53<06:48,  1.29it/s]
-
-
-
cannot identify image file './poster_small/0248953.jpg'
-
-
-
 46%|████▌     | 441/967 [05:55<07:57,  1.10it/s]
-
-
-
cannot identify image file './poster_small/0000033.jpg'
-cannot identify image file './poster_small/0069165.jpg'
-cannot identify image file './poster_small/0000014.jpg'
-cannot identify image file './poster_small/0000027.jpg'
-
-
-
 48%|████▊     | 460/967 [06:12<08:03,  1.05it/s]
-
-
-
cannot identify image file './poster_small/0063531.jpg'
-
-
-
 48%|████▊     | 462/967 [06:14<07:04,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0041431.jpg'
-
-
-
 48%|████▊     | 463/967 [06:15<07:06,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0831387.jpg'
-
-
-
 48%|████▊     | 465/967 [06:16<07:06,  1.18it/s]
-
-
-
cannot identify image file './poster_small/3908598.jpg'
-
-
-
 48%|████▊     | 467/967 [06:18<06:40,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0056341.jpg'
-
-
-
 49%|████▊     | 470/967 [06:21<07:14,  1.14it/s]
-
-
-
cannot identify image file './poster_small/3833520.jpg'
-
-
-
 49%|████▉     | 472/967 [06:22<06:35,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0058660.jpg'
-
-
-
 49%|████▉     | 475/967 [06:24<06:05,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0086847.jpg'
-
-
-
 49%|████▉     | 476/967 [06:25<06:14,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0074455.jpg'
-
-
-
 49%|████▉     | 477/967 [06:26<06:57,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0037990.jpg'
-
-
-
 50%|████▉     | 481/967 [06:30<07:50,  1.03it/s]
-
-
-
cannot identify image file './poster_small/1764600.jpg'
-
-
-
 50%|████▉     | 482/967 [06:31<07:57,  1.02it/s]
-
-
-
cannot identify image file './poster_small/0372764.jpg'
-cannot identify image file './poster_small/0368576.jpg'
-cannot identify image file './poster_small/0368574.jpg'
-cannot identify image file './poster_small/0366178.jpg'
-
-
-
 50%|█████     | 484/967 [06:33<07:24,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0067118.jpg'
-
-
-
 50%|█████     | 488/967 [06:36<06:22,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0044954.jpg'
-
-
-
 51%|█████▏    | 496/967 [06:42<06:30,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0078950.jpg'
-
-
-
 51%|█████▏    | 498/967 [06:44<06:36,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0050957.jpg'
-cannot identify image file './poster_small/0058374.jpg'
-
-
-
 52%|█████▏    | 499/967 [06:45<06:23,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0027963.jpg'
-
-
-
 52%|█████▏    | 507/967 [06:52<07:13,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0362590.jpg'
-
-
-
 53%|█████▎    | 508/967 [06:53<06:47,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0008309.jpg'
-
-
-
 53%|█████▎    | 509/967 [06:54<06:37,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0065240.jpg'
-
-
-
 53%|█████▎    | 512/967 [06:57<07:17,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0055022.jpg'
-
-
-
 53%|█████▎    | 514/967 [06:59<07:04,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0418753.jpg'
-
-
-
 53%|█████▎    | 515/967 [07:00<06:59,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0070768.jpg'
-cannot identify image file './poster_small/1706680.jpg'
-
-
-
 54%|█████▎    | 518/967 [07:02<06:11,  1.21it/s]
-
-
-
cannot identify image file './poster_small/3836530.jpg'
-cannot identify image file './poster_small/0050545.jpg'
-
-
-
 54%|█████▍    | 522/967 [07:04<04:37,  1.61it/s]
-
-
-
cannot identify image file './poster_small/8752440.jpg'
-
-
-
 54%|█████▍    | 523/967 [07:05<04:04,  1.81it/s]
-
-
-
cannot identify image file './poster_small/0019504.jpg'
-
-
-
 54%|█████▍    | 525/967 [07:06<05:05,  1.45it/s]
-
-
-
cannot identify image file './poster_small/0060117.jpg'
-
-
-
 54%|█████▍    | 526/967 [07:07<05:27,  1.35it/s]
-
-
-
cannot identify image file './poster_small/1172060.jpg'
-
-
-
 56%|█████▌    | 541/967 [07:19<05:28,  1.30it/s]
-
-
-
cannot identify image file './poster_small/3280916.jpg'
-
-
-
 56%|█████▌    | 542/967 [07:20<05:17,  1.34it/s]
-
-
-
cannot identify image file './poster_small/0039502.jpg'
-
-
-
 56%|█████▌    | 543/967 [07:21<06:04,  1.16it/s]
-
-
-
cannot identify image file './poster_small/3800796.jpg'
-
-
-
 56%|█████▋    | 544/967 [07:22<06:33,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0074238.jpg'
-
-
-
 56%|█████▋    | 545/967 [07:23<06:10,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0062032.jpg'
-
-
-
 56%|█████▋    | 546/967 [07:24<05:56,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0053891.jpg'
-
-
-
 57%|█████▋    | 547/967 [07:24<05:38,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0184115.jpg'
-cannot identify image file './poster_small/0060968.jpg'
-
-
-
 57%|█████▋    | 548/967 [07:25<05:44,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0075165.jpg'
-
-
-
 57%|█████▋    | 549/967 [07:26<05:25,  1.28it/s]
-
-
-
cannot identify image file './poster_small/0076998.jpg'
-cannot identify image file './poster_small/0060176.jpg'
-
-
-
 57%|█████▋    | 550/967 [07:27<05:28,  1.27it/s]
-
-
-
cannot identify image file './poster_small/0092745.jpg'
-cannot identify image file './poster_small/0079936.jpg'
-
-
-
 57%|█████▋    | 552/967 [07:28<04:52,  1.42it/s]
-
-
-
cannot identify image file './poster_small/0060747.jpg'
-
-
-
 57%|█████▋    | 553/967 [07:29<04:53,  1.41it/s]
-
-
-
cannot identify image file './poster_small/2523756.jpg'
-
-
-
 57%|█████▋    | 554/967 [07:29<05:16,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0092217.jpg'
-
-
-
 57%|█████▋    | 555/967 [07:31<06:02,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0046906.jpg'
-
-
-
 57%|█████▋    | 556/967 [07:31<06:00,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0206226.jpg'
-cannot identify image file './poster_small/0086484.jpg'
-
-
-
 58%|█████▊    | 558/967 [07:33<05:56,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0175471.jpg'
-
-
-
 58%|█████▊    | 559/967 [07:34<06:12,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0085913.jpg'
-
-
-
 58%|█████▊    | 560/967 [07:35<06:22,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0233687.jpg'
-cannot identify image file './poster_small/0053214.jpg'
-
-
-
 58%|█████▊    | 561/967 [07:36<06:37,  1.02it/s]
-
-
-
cannot identify image file './poster_small/0032794.jpg'
-
-
-
 58%|█████▊    | 562/967 [07:37<06:28,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0040765.jpg'
-cannot identify image file './poster_small/0064541.jpg'
-
-
-
 59%|█████▊    | 568/967 [07:42<05:06,  1.30it/s]
-
-
-
cannot identify image file './poster_small/0365109.jpg'
-
-
-
 59%|█████▉    | 569/967 [07:43<05:12,  1.27it/s]
-
-
-
cannot identify image file './poster_small/0337721.jpg'
-
-
-
 59%|█████▉    | 570/967 [07:43<05:16,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0032234.jpg'
-
-
-
 59%|█████▉    | 572/967 [07:45<05:13,  1.26it/s]
-
-
-
cannot identify image file './poster_small/0344604.jpg'
-
-
-
 59%|█████▉    | 574/967 [07:47<05:47,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0041349.jpg'
-
-
-
 60%|██████    | 581/967 [07:52<04:12,  1.53it/s]
-
-
-
cannot identify image file './poster_small/0180073.jpg'
-cannot identify image file './poster_small/6926486.jpg'
-
-
-
 60%|██████    | 583/967 [07:54<04:51,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0079596.jpg'
-
-
-
 61%|██████    | 586/967 [07:58<07:17,  1.15s/it]
-
-
-
cannot identify image file './poster_small/0140603.jpg'
-cannot identify image file './poster_small/0069745.jpg'
-
-
-
 61%|██████    | 587/967 [07:59<07:07,  1.12s/it]
-
-
-
cannot identify image file './poster_small/0066154.jpg'
-cannot identify image file './poster_small/1745787.jpg'
-
-
-
 61%|██████    | 590/967 [08:02<05:57,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0045995.jpg'
-cannot identify image file './poster_small/0038675.jpg'
-
-
-
 62%|██████▏   | 595/967 [08:06<05:36,  1.10it/s]
-
-
-
cannot identify image file './poster_small/0068971.jpg'
-
-
-
 62%|██████▏   | 596/967 [08:07<05:42,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0050205.jpg'
-
-
-
 62%|██████▏   | 598/967 [08:09<05:45,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0085175.jpg'
-cannot identify image file './poster_small/0424237.jpg'
-
-
-
 62%|██████▏   | 603/967 [08:14<06:38,  1.09s/it]
-
-
-
cannot identify image file './poster_small/0190524.jpg'
-cannot identify image file './poster_small/3365778.jpg'
-
-
-
 62%|██████▏   | 604/967 [08:15<06:19,  1.05s/it]
-
-
-
cannot identify image file './poster_small/8119752.jpg'
-
-
-
 63%|██████▎   | 608/967 [08:23<10:15,  1.71s/it]
-
-
-
cannot identify image file './poster_small/0031742.jpg'
-
-
-
 63%|██████▎   | 610/967 [08:25<09:16,  1.56s/it]
-
-
-
cannot identify image file './poster_small/0100465.jpg'
-
-
-
 63%|██████▎   | 614/967 [08:31<08:09,  1.39s/it]
-
-
-
cannot identify image file './poster_small/0072097.jpg'
-
-
-
 64%|██████▎   | 615/967 [08:32<08:29,  1.45s/it]
-
-
-
cannot identify image file './poster_small/0071771.jpg'
-
-
-
 64%|██████▍   | 617/967 [08:35<07:18,  1.25s/it]
-
-
-
cannot identify image file './poster_small/0174997.jpg'
-
-
-
 64%|██████▍   | 622/967 [08:39<05:10,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0033676.jpg'
-
-
-
 64%|██████▍   | 623/967 [08:40<04:56,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0443567.jpg'
-
-
-
 65%|██████▍   | 624/967 [08:41<04:51,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0047559.jpg'
-
-
-
 65%|██████▍   | 627/967 [08:44<05:15,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0260295.jpg'
-
-
-
 65%|██████▌   | 633/967 [08:49<05:07,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0200768.jpg'
-
-
-
 66%|██████▌   | 640/967 [08:54<04:37,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0245238.jpg'
-
-
-
 66%|██████▋   | 643/967 [08:57<04:31,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0075679.jpg'
-
-
-
 67%|██████▋   | 644/967 [08:58<04:37,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0042418.jpg'
-
-
-
 67%|██████▋   | 645/967 [09:00<07:22,  1.37s/it]
-
-
-
cannot identify image file './poster_small/0036814.jpg'
-
-
-
 67%|██████▋   | 651/967 [09:05<04:30,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0079756.jpg'
-
-
-
 68%|██████▊   | 653/967 [09:07<04:54,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0983922.jpg'
-
-
-
 68%|██████▊   | 658/967 [09:11<04:42,  1.09it/s]
-
-
-
cannot identify image file './poster_small/0058642.jpg'
-
-
-
 68%|██████▊   | 659/967 [09:12<05:02,  1.02it/s]
-
-
-
cannot identify image file './poster_small/0116016.jpg'
-
-
-
 69%|██████▊   | 663/967 [09:15<03:52,  1.31it/s]
-
-
-
cannot identify image file './poster_small/0092238.jpg'
-
-
-
 69%|██████▉   | 666/967 [09:18<04:22,  1.15it/s]
-
-
-
cannot identify image file './poster_small/2226519.jpg'
-
-
-
 69%|██████▉   | 668/967 [09:20<04:30,  1.10it/s]
-
-
-
cannot identify image file './poster_small/0414982.jpg'
-cannot identify image file './poster_small/0419641.jpg'
-
-
-
 69%|██████▉   | 671/967 [09:23<04:19,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0040246.jpg'
-
-
-
 69%|██████▉   | 672/967 [09:24<04:46,  1.03it/s]
-
-
-
cannot identify image file './poster_small/0217168.jpg'
-
-
-
 70%|██████▉   | 674/967 [09:25<04:19,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0038452.jpg'
-
-
-
 70%|██████▉   | 675/967 [09:26<04:07,  1.18it/s]
-
-
-
cannot identify image file './poster_small/3155242.jpg'
-
-
-
 70%|███████   | 678/967 [09:29<04:14,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0038255.jpg'
-cannot identify image file './poster_small/0043153.jpg'
-
-
-
 70%|███████   | 681/967 [09:31<03:52,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0072209.jpg'
-
-
-
 71%|███████   | 686/967 [09:36<04:12,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0074797.jpg'
-
-
-
 71%|███████   | 688/967 [09:37<03:30,  1.32it/s]
-
-
-
cannot identify image file './poster_small/2720826.jpg'
-
-
-
 71%|███████▏  | 690/967 [09:38<03:04,  1.50it/s]
-
-
-
cannot identify image file './poster_small/0068227.jpg'
-
-
-
 72%|███████▏  | 695/967 [09:42<03:32,  1.28it/s]
-
-
-
cannot identify image file './poster_small/0372765.jpg'
-
-
-
 72%|███████▏  | 697/967 [09:44<03:39,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0083713.jpg'
-
-
-
 72%|███████▏  | 699/967 [09:45<03:18,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0252133.jpg'
-
-
-
 72%|███████▏  | 700/967 [09:46<03:15,  1.37it/s]
-
-
-
cannot identify image file './poster_small/0329913.jpg'
-
-
-
 73%|███████▎  | 703/967 [09:48<03:06,  1.41it/s]
-
-
-
cannot identify image file './poster_small/0036840.jpg'
-
-
-
 73%|███████▎  | 705/967 [09:49<02:47,  1.56it/s]
-
-
-
cannot identify image file './poster_small/0067956.jpg'
-
-
-
 73%|███████▎  | 707/967 [09:50<02:33,  1.69it/s]
-
-
-
cannot identify image file './poster_small/2195566.jpg'
-
-
-
 73%|███████▎  | 708/967 [09:51<02:58,  1.45it/s]
-
-
-
cannot identify image file './poster_small/0080549.jpg'
-
-
-
 74%|███████▍  | 714/967 [09:56<03:08,  1.34it/s]
-
-
-
cannot identify image file './poster_small/0073398.jpg'
-
-
-
 74%|███████▍  | 716/967 [09:58<03:08,  1.33it/s]
-
-
-
cannot identify image file './poster_small/0038205.jpg'
-
-
-
 74%|███████▍  | 718/967 [09:59<03:32,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0117220.jpg'
-
-
-
 74%|███████▍  | 719/967 [10:00<03:35,  1.15it/s]
-
-
-
cannot identify image file './poster_small/0046198.jpg'
-
-
-
 75%|███████▍  | 725/967 [10:06<03:44,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0060351.jpg'
-
-
-
 75%|███████▌  | 727/967 [10:07<02:57,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0081568.jpg'
-cannot identify image file './poster_small/0046921.jpg'
-
-
-
 75%|███████▌  | 729/967 [10:08<02:44,  1.45it/s]
-
-
-
cannot identify image file './poster_small/0034739.jpg'
-
-
-
 75%|███████▌  | 730/967 [10:09<02:35,  1.53it/s]
-
-
-
cannot identify image file './poster_small/0023251.jpg'
-cannot identify image file './poster_small/0491764.jpg'
-cannot identify image file './poster_small/0090642.jpg'
-
-
-
 77%|███████▋  | 741/967 [10:16<02:41,  1.40it/s]
-
-
-
cannot identify image file './poster_small/0037928.jpg'
-
-
-
 77%|███████▋  | 743/967 [10:18<02:48,  1.33it/s]
-
-
-
cannot identify image file './poster_small/0457430.jpg'
-cannot identify image file './poster_small/0057283.jpg'
-cannot identify image file './poster_small/0462519.jpg'
-cannot identify image file './poster_small/0110546.jpg'
-
-
-
 77%|███████▋  | 747/967 [10:21<02:58,  1.23it/s]
-
-
-
cannot identify image file './poster_small/0045197.jpg'
-
-
-
 77%|███████▋  | 748/967 [10:22<02:50,  1.29it/s]
-
-
-
cannot identify image file './poster_small/0062523.jpg'
-
-
-
 78%|███████▊  | 750/967 [10:23<02:53,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0112454.jpg'
-cannot identify image file './poster_small/0065243.jpg'
-
-
-
 78%|███████▊  | 753/967 [10:25<02:14,  1.59it/s]
-
-
-
cannot identify image file './poster_small/0396171.jpg'
-
-
-
 79%|███████▉  | 762/967 [10:32<02:34,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0059710.jpg'
-cannot identify image file './poster_small/0080928.jpg'
-cannot identify image file './poster_small/0126004.jpg'
-
-
-
 79%|███████▉  | 765/967 [10:34<02:34,  1.31it/s]
-
-
-
cannot identify image file './poster_small/1833116.jpg'
-
-
-
 80%|███████▉  | 770/967 [10:38<02:02,  1.61it/s]
-
-
-
cannot identify image file './poster_small/0075766.jpg'
-cannot identify image file './poster_small/0123860.jpg'
-cannot identify image file './poster_small/0123970.jpg'
-cannot identify image file './poster_small/0323120.jpg'
-cannot identify image file './poster_small/0035301.jpg'
-
-
-
 80%|███████▉  | 772/967 [10:39<02:26,  1.33it/s]
-
-
-
cannot identify image file './poster_small/1216520.jpg'
-
-
-
 80%|███████▉  | 773/967 [10:40<02:42,  1.19it/s]
-
-
-
cannot identify image file './poster_small/0028331.jpg'
-cannot identify image file './poster_small/1330015.jpg'
-cannot identify image file './poster_small/0062443.jpg'
-
-
-
 80%|████████  | 774/967 [10:41<02:53,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0485241.jpg'
-cannot identify image file './poster_small/0154467.jpg'
-
-
-
 80%|████████  | 776/967 [10:43<02:57,  1.08it/s]
-
-
-
cannot identify image file './poster_small/5235348.jpg'
-
-
-
 80%|████████  | 777/967 [10:44<02:57,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0191074.jpg'
-
-
-
 80%|████████  | 778/967 [10:45<02:56,  1.07it/s]
-
-
-
cannot identify image file './poster_small/0060168.jpg'
-
-
-
 81%|████████  | 779/967 [10:46<02:45,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0081738.jpg'
-
-
-
 81%|████████  | 782/967 [10:48<02:13,  1.39it/s]
-
-
-
cannot identify image file './poster_small/0379473.jpg'
-
-
-
 81%|████████▏ | 788/967 [10:53<02:33,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0063381.jpg'
-cannot identify image file './poster_small/4427076.jpg'
-cannot identify image file './poster_small/0173714.jpg'
-
-
-
 82%|████████▏ | 791/967 [10:56<02:28,  1.19it/s]
-
-
-
cannot identify image file './poster_small/3794028.jpg'
-
-
-
 82%|████████▏ | 793/967 [10:57<02:32,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0464106.jpg'
-cannot identify image file './poster_small/0090570.jpg'
-
-
-
 82%|████████▏ | 794/967 [10:58<02:27,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0087829.jpg'
-
-
-
 83%|████████▎ | 799/967 [11:01<01:39,  1.69it/s]
-
-
-
cannot identify image file './poster_small/0041866.jpg'
-
-
-
 83%|████████▎ | 802/967 [11:03<02:02,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0444682.jpg'
-cannot identify image file './poster_small/0058110.jpg'
-
-
-
 83%|████████▎ | 805/967 [11:06<02:19,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0072392.jpg'
-cannot identify image file './poster_small/0080546.jpg'
-
-
-
 83%|████████▎ | 806/967 [11:07<02:31,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0064482.jpg'
-
-
-
 84%|████████▎ | 809/967 [11:10<02:19,  1.13it/s]
-
-
-
cannot identify image file './poster_small/0044599.jpg'
-
-
-
 84%|████████▍ | 812/967 [11:12<01:59,  1.29it/s]
-
-
-
cannot identify image file './poster_small/0439771.jpg'
-
-
-
 84%|████████▍ | 814/967 [11:13<01:36,  1.58it/s]
-
-
-
cannot identify image file './poster_small/0021756.jpg'
-
-
-
 85%|████████▍ | 820/967 [11:19<02:30,  1.02s/it]
-
-
-
cannot identify image file './poster_small/0039676.jpg'
-cannot identify image file './poster_small/0160801.jpg'
-
-
-
 85%|████████▍ | 821/967 [11:20<02:19,  1.04it/s]
-
-
-
cannot identify image file './poster_small/0032981.jpg'
-cannot identify image file './poster_small/0049854.jpg'
-
-
-
 85%|████████▌ | 823/967 [11:22<02:17,  1.05it/s]
-
-
-
cannot identify image file './poster_small/2605312.jpg'
-cannot identify image file './poster_small/0367257.jpg'
-
-
-
 86%|████████▌ | 829/967 [11:27<02:04,  1.11it/s]
-
-
-
cannot identify image file './poster_small/6817944.jpg'
-
-
-
 87%|████████▋ | 839/967 [11:36<01:39,  1.28it/s]
-
-
-
cannot identify image file './poster_small/0082081.jpg'
-
-
-
 87%|████████▋ | 841/967 [11:37<01:19,  1.59it/s]
-
-
-
cannot identify image file './poster_small/1146283.jpg'
-cannot identify image file './poster_small/0796335.jpg'
-
-
-
 87%|████████▋ | 842/967 [11:38<01:34,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0183355.jpg'
-
-
-
 87%|████████▋ | 844/967 [11:40<01:45,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0218094.jpg'
-cannot identify image file './poster_small/0290820.jpg'
-
-
-
 87%|████████▋ | 845/967 [11:41<01:43,  1.18it/s]
-
-
-
cannot identify image file './poster_small/1059793.jpg'
-cannot identify image file './poster_small/0025665.jpg'
-
-
-
 88%|████████▊ | 848/967 [11:45<02:11,  1.11s/it]
-
-
-
cannot identify image file './poster_small/0259786.jpg'
-
-
-
 88%|████████▊ | 854/967 [11:50<01:36,  1.17it/s]
-
-
-
cannot identify image file './poster_small/0044369.jpg'
-
-
-
 88%|████████▊ | 855/967 [11:51<01:36,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0031127.jpg'
-
-
-
 89%|████████▊ | 857/967 [11:52<01:35,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0283644.jpg'
-
-
-
 89%|████████▉ | 860/967 [11:55<01:32,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0316599.jpg'
-
-
-
 90%|████████▉ | 866/967 [12:00<01:23,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0118767.jpg'
-
-
-
 90%|████████▉ | 867/967 [12:01<01:11,  1.39it/s]
-
-
-
cannot identify image file './poster_small/0059758.jpg'
-cannot identify image file './poster_small/0122194.jpg'
-
-
-
 90%|████████▉ | 868/967 [12:02<01:15,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0070404.jpg'
-
-
-
 90%|████████▉ | 869/967 [12:02<01:15,  1.30it/s]
-
-
-
cannot identify image file './poster_small/0028484.jpg'
-
-
-
 90%|████████▉ | 870/967 [12:03<01:18,  1.24it/s]
-
-
-
cannot identify image file './poster_small/0166792.jpg'
-cannot identify image file './poster_small/0369903.jpg'
-
-
-
 91%|█████████ | 880/967 [12:12<01:05,  1.33it/s]
-
-
-
cannot identify image file './poster_small/0073115.jpg'
-
-
-
 91%|█████████▏| 884/967 [12:15<01:08,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0284655.jpg'
-
-
-
 92%|█████████▏| 885/967 [12:16<01:06,  1.23it/s]
-
-
-
cannot identify image file './poster_small/9236264.jpg'
-
-
-
 92%|█████████▏| 892/967 [12:22<01:03,  1.18it/s]
-
-
-
cannot identify image file './poster_small/0137094.jpg'
-
-
-
 92%|█████████▏| 893/967 [12:23<00:59,  1.25it/s]
-
-
-
cannot identify image file './poster_small/0064323.jpg'
-
-
-
 93%|█████████▎| 895/967 [12:24<00:48,  1.49it/s]
-
-
-
cannot identify image file './poster_small/0062741.jpg'
-
-
-
 93%|█████████▎| 897/967 [12:25<00:41,  1.68it/s]
-
-
-
cannot identify image file './poster_small/0084237.jpg'
-
-
-
 93%|█████████▎| 901/967 [12:28<00:48,  1.35it/s]
-
-
-
cannot identify image file './poster_small/0084273.jpg'
-
-
-
 93%|█████████▎| 903/967 [12:30<00:50,  1.27it/s]
-
-
-
cannot identify image file './poster_small/4193400.jpg'
-
-
-
 94%|█████████▎| 906/967 [12:32<00:50,  1.21it/s]
-
-
-
cannot identify image file './poster_small/0124307.jpg'
-
-
-
 94%|█████████▍| 908/967 [12:34<00:51,  1.14it/s]
-
-
-
cannot identify image file './poster_small/0157383.jpg'
-
-
-
 94%|█████████▍| 913/967 [12:37<00:35,  1.53it/s]
-
-
-
cannot identify image file './poster_small/0412808.jpg'
-
-
-
 95%|█████████▍| 915/967 [12:39<00:35,  1.48it/s]
-
-
-
cannot identify image file './poster_small/0161860.jpg'
-
-
-
 95%|█████████▍| 918/967 [12:41<00:31,  1.58it/s]
-
-
-
cannot identify image file './poster_small/4613254.jpg'
-
-
-
 95%|█████████▌| 920/967 [12:42<00:32,  1.44it/s]
-
-
-
cannot identify image file './poster_small/2788556.jpg'
-
-
-
 96%|█████████▌| 925/967 [12:46<00:34,  1.21it/s]
-
-
-
cannot identify image file './poster_small/1437361.jpg'
-
-
-
 96%|█████████▌| 926/967 [12:47<00:34,  1.19it/s]
-
-
-
cannot identify image file './poster_small/3037582.jpg'
-
-
-
 96%|█████████▌| 928/967 [12:49<00:29,  1.32it/s]
-
-
-
cannot identify image file './poster_small/0048211.jpg'
-
-
-
 97%|█████████▋| 937/967 [12:57<00:26,  1.13it/s]
-
-
-
cannot identify image file './poster_small/4516162.jpg'
-cannot identify image file './poster_small/0033932.jpg'
-
-
-
 97%|█████████▋| 942/967 [13:00<00:17,  1.43it/s]
-
-
-
cannot identify image file './poster_small/0042871.jpg'
-
-
-
 98%|█████████▊| 943/967 [13:01<00:15,  1.57it/s]
-
-
-
cannot identify image file './poster_small/0137799.jpg'
-
-
-
 98%|█████████▊| 944/967 [13:01<00:14,  1.62it/s]
-
-
-
cannot identify image file './poster_small/1714196.jpg'
-
-
-
 98%|█████████▊| 945/967 [13:02<00:14,  1.53it/s]
-
-
-
cannot identify image file './poster_small/0025117.jpg'
-
-
-
 98%|█████████▊| 946/967 [13:03<00:13,  1.51it/s]
-
-
-
cannot identify image file './poster_small/2357144.jpg'
-
-
-
 98%|█████████▊| 948/967 [13:04<00:12,  1.53it/s]
-
-
-
cannot identify image file './poster_small/1525898.jpg'
-cannot identify image file './poster_small/0098088.jpg'
-
-
-
 98%|█████████▊| 950/967 [13:06<00:13,  1.24it/s]
-
-
-
cannot identify image file './poster_small/6537238.jpg'
-cannot identify image file './poster_small/0303151.jpg'
-
-
-
 98%|█████████▊| 951/967 [13:07<00:13,  1.22it/s]
-
-
-
cannot identify image file './poster_small/0315632.jpg'
-
-
-
 99%|█████████▊| 953/967 [13:09<00:12,  1.16it/s]
-
-
-
cannot identify image file './poster_small/0316352.jpg'
-
-
-
 99%|█████████▉| 955/967 [13:11<00:11,  1.08it/s]
-
-
-
cannot identify image file './poster_small/0166557.jpg'
-
-
-
 99%|█████████▉| 956/967 [13:12<00:10,  1.06it/s]
-
-
-
cannot identify image file './poster_small/0066879.jpg'
-
-
-
 99%|█████████▉| 957/967 [13:13<00:09,  1.04it/s]
-
-
-
cannot identify image file './poster_small/3736766.jpg'
-
-
-
 99%|█████████▉| 958/967 [13:13<00:08,  1.11it/s]
-
-
-
cannot identify image file './poster_small/0140340.jpg'
-
-
-
 99%|█████████▉| 959/967 [13:14<00:06,  1.27it/s]
-
-
-
cannot identify image file './poster_small/1570970.jpg'
-cannot identify image file './poster_small/0075364.jpg'
-
-
-
100%|█████████▉| 966/967 [13:21<00:01,  1.07s/it]
-
-
-
cannot identify image file './poster_small/0099901.jpg'
-
-
-
100%|██████████| 967/967 [13:23<00:00,  1.20it/s]
-
-
-
Unable to extract features for 447 images
-
-
-

-
-
-
-
-
-
-
import pickle
-with open('movies_poster_features.pkl', 'wb') as f:
-    pickle.dump({"feature_dict": feature_dict}, f, protocol=pickle.HIGHEST_PROTOCOL)
-
-
-
-
-
-
-
len(feature_dict)
-
-
-
-
-
61504
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/04-Feature-Extraction-Text.html b/review/pr-458/notebooks/multi-modal-data/04-Feature-Extraction-Text.html deleted file mode 100644 index 58ebe56a37..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/04-Feature-Extraction-Text.html +++ /dev/null @@ -1,266 +0,0 @@ - - - - - - - Movie Synopsis Feature Extraction with Bart text summarization — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-04-feature-extraction-text/nvidia_logo.png -
-

Movie Synopsis Feature Extraction with Bart text summarization

-

In this notebook, will will make use of the BART model to extract features from movie synopsis.

-

Note: this notebook should be executed from within the below container:

-
docker pull huggingface/transformers-pytorch-gpu
-docker run --gpus=all  --rm -it --net=host -v $PWD:/workspace --ipc=host huggingface/transformers-pytorch-gpu 
-
-
-

Then from within the container:

-
cd /workspace
-pip install jupyter jupyterlab
-jupyter server extension disable nbclassic
-jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token='admin'
-
-
-

First, we install some extra package.

-
-
-
!pip install imdbpy
-
-# Cuda 11 and A100 support
-!pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
-
-
-
-
-
-
-
import IPython
-
-IPython.Application.instance().kernel.do_shutdown(True)
-
-
-
-
-
{'status': 'ok', 'restart': True}
-
-
-
-
-
-

Download pretrained BART model

-

First, we download a pretrained BART model from HuggingFace library.

-
-
-
from transformers import BartTokenizer, BartModel
-import torch
-
-tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
-model = BartModel.from_pretrained('facebook/bart-large').cuda()
-
-
-
-
-
-
-

Extracting embeddings for all movie’s synopsis

-

We will use the average hidden state of the last decoder layer as text feature, comprising 1024 float values.

-
-
-
import pickle
-
-with open('movies_info.pkl', 'rb') as f:
-    movies_infos = pickle.load(f)['movies_infos']
-
-
-
-
-
-
-
import torch
-import numpy as np
-from tqdm import tqdm
-
-embeddings = {}
-for movie, movie_info in tqdm(movies_infos.items()):
-    synopsis = None
-    synopsis = movie_info.get('synopsis')
-    if synopsis is None:
-        plots = movie_info.get('plot')
-        if plots is not None:
-            synopsis = plots[0]
-    
-    if synopsis is not None:
-        inputs = tokenizer(synopsis, return_tensors="pt", truncation=True, max_length=1024).to('cuda')
-        with torch.no_grad():
-            outputs = model(**inputs, output_hidden_states=True)
-        embeddings[movie] = outputs.last_hidden_state.cpu().detach().numpy()
-
-
-
-
-
100%|██████████| 62423/62423 [43:41<00:00, 23.81it/s]  
-
-
-
-
-
-
-
average_embeddings = {}
-for movie in embeddings:
-    average_embeddings[movie] = np.mean(embeddings[movie].squeeze(), axis=0)
-
-
-
-
-
-
-
with open('movies_synopsis_embeddings-1024.pkl', 'wb') as f:
-    pickle.dump({"embeddings": average_embeddings}, f, protocol=pickle.HIGHEST_PROTOCOL)
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/05-Create-Feature-Store.html b/review/pr-458/notebooks/multi-modal-data/05-Create-Feature-Store.html deleted file mode 100644 index 5c70bd4553..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/05-Create-Feature-Store.html +++ /dev/null @@ -1,759 +0,0 @@ - - - - - - - Creating Multi-Modal Movie Feature Store — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-05-create-feature-store/nvidia_logo.png -
-

Creating Multi-Modal Movie Feature Store

-

Finally, with both the text and image features ready, we now put the multi-modal movie features into a unified feature store.

-

If you have downloaded the real data and proceeded through the feature extraction process in notebooks 03-05, then proceed to create the feature store. Else, skip to the Synthetic data section below to create random features.

-
-

Real data

-
-
-
import pickle
-
-with open('movies_poster_features.pkl', 'rb') as f:
-    poster_feature = pickle.load(f)["feature_dict"]
-    
-len(poster_feature)
-
-
-
-
-
61947
-
-
-
-
-
-
-
with open('movies_synopsis_embeddings-1024.pkl', 'rb') as f:
-    text_feature = pickle.load(f)["embeddings"]
-
-
-
-
-
-
-
len(text_feature)
-
-
-
-
-
61291
-
-
-
-
-
-
-
import pandas as pd
-links = pd.read_csv("./data/ml-25m/links.csv", dtype={"imdbId": str})
-
-
-
-
-
-
-
links.shape
-
-
-
-
-
(62423, 3)
-
-
-
-
-
-
-
links.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdimdbIdtmdbId
010114709862.0
1201134978844.0
23011322815602.0
34011488531357.0
45011304111862.0
-
-
-
-
-
poster_feature['0105812'].shape
-
-
-
-
-
(2048,)
-
-
-
-
-
-
-
import numpy as np
-feature_array = np.zeros((len(links), 1+2048+1024))
-
-for i, row in links.iterrows():
-    feature_array[i,0] = row['movieId']
-    if row['imdbId'] in poster_feature:
-        feature_array[i,1:2049] = poster_feature[row['imdbId']]
-    if row['movieId'] in text_feature:
-        feature_array[i,2049:] = text_feature[row['movieId']]
-    
-
-
-
-
-
-
-
dtype= {**{'movieId': np.int64},**{x: np.float32 for x in ['poster_feature_%d'%i for i in range(2048)]+['text_feature_%d'%i for i in range(1024)]}}
-
-
-
-
-
-
-
len(dtype)
-
-
-
-
-
3073
-
-
-
-
-
-
-
feature_df = pd.DataFrame(feature_array, columns=['movieId']+['poster_feature_%d'%i for i in range(2048)]+['text_feature_%d'%i for i in range(1024)])
-
-
-
-
-
-
-
feature_df.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdposter_feature_0poster_feature_1poster_feature_2poster_feature_3poster_feature_4poster_feature_5poster_feature_6poster_feature_7poster_feature_8...text_feature_1014text_feature_1015text_feature_1016text_feature_1017text_feature_1018text_feature_1019text_feature_1020text_feature_1021text_feature_1022text_feature_1023
01.00.0000000.0882810.0367600.0000000.0064700.0000000.0235530.0001630.238797...0.291230-0.1972720.0242941.307049-0.7895710.084938-0.1873390.0616830.183281-0.356245
12.00.0000000.0000000.0000000.2891050.1346720.6913800.0454170.0000000.051422...0.203168-0.6174490.4438211.501953-0.7369490.180542-0.3136960.2740870.153105-0.218745
23.00.0000000.1875530.0000000.9043700.0694410.0266650.8172110.0000000.125072...0.173140-0.2092400.4519331.491917-0.743956-0.069061-0.9000110.5833470.1928170.224088
34.00.1822790.0146460.0041350.1977960.0779380.0000000.2151270.0211600.023108...-0.3940120.6794621.2254751.196255-0.169627-0.008575-0.1721380.114755-0.127861-0.003679
45.00.0000000.0821230.4472870.0023750.1359560.0000000.9895140.8081800.317510...-0.176658-0.0789920.7261181.017430-0.2498340.183357-0.0714510.6445670.090399-1.147284
-

5 rows × 3073 columns

-
-
-
-
-
feature_df.shape
-
-
-
-
-
(62423, 3073)
-
-
-
-
-
-
-
!pip install pyarrow
-
-
-
-
-
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
-Requirement already satisfied: pyarrow in /usr/local/lib/python3.8/dist-packages (1.0.1)
-Requirement already satisfied: numpy>=1.14 in /usr/local/lib/python3.8/dist-packages (from pyarrow) (1.20.3)
-WARNING: You are using pip version 21.0.1; however, version 21.1.2 is available.
-You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
-
-
-
-
-
-
-
feature_df.to_parquet('feature_df.parquet')
-
-
-
-
-
-
-

Synthetic data

-

If you have not extracted image and text features from real data, proceed with this section to create synthetic features.

-
-
-
import pandas as pd
-links = pd.read_csv("./data/ml-25m/links.csv", dtype={"imdbId": str})
-
-
-
-
-
-
-
import numpy as np
-
-feature_array = np.random.rand(links.shape[0], 3073)
-
-
-
-
-
-
-
feature_array[:,0] = links['movieId'].values
-
-
-
-
-
-
-
feature_df = pd.DataFrame(feature_array, columns=['movieId']+['poster_feature_%d'%i for i in range(2048)]+['text_feature_%d'%i for i in range(1024)])
-
-
-
-
-
-
-
feature_df.to_parquet('feature_df.parquet')
-
-
-
-
-
-
-
feature_df.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdposter_feature_0poster_feature_1poster_feature_2poster_feature_3poster_feature_4poster_feature_5poster_feature_6poster_feature_7poster_feature_8...text_feature_1014text_feature_1015text_feature_1016text_feature_1017text_feature_1018text_feature_1019text_feature_1020text_feature_1021text_feature_1022text_feature_1023
01.00.0262600.8576080.4102470.0666540.3828030.8999980.5115620.5922910.565434...0.6367160.5783690.9961690.4021070.4123180.8599520.2938520.3411140.7271130.085829
12.00.1412650.7217580.6799580.9556340.3910910.3246110.5052110.2583310.048264...0.1615050.4318640.8365320.5250130.6545660.8238410.8183130.8562800.6380480.685537
23.00.1194180.9111460.4707620.7622580.6263350.7689470.2418330.7759920.236340...0.8655480.3878060.6683210.5521220.7502380.8637070.3821730.8944870.5651420.164083
34.00.5381840.9806780.6435130.9285190.7949060.2010220.7446660.9621880.915320...0.7775340.9042000.1673370.8751940.1804810.8159040.8082880.0367110.9027790.580946
45.00.7729510.2397880.0618740.1629970.3883100.2363110.1627570.2071340.111078...0.2500220.3350430.0916740.1215070.4181240.1500200.8035060.0595040.0023420.932321
-

5 rows × 3073 columns

-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/06-ETL-with-NVTabular.html b/review/pr-458/notebooks/multi-modal-data/06-ETL-with-NVTabular.html deleted file mode 100644 index fc9a6fb7d5..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/06-ETL-with-NVTabular.html +++ /dev/null @@ -1,556 +0,0 @@ - - - - - - - ETL with NVTabular — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-06-etl-with-nvtabular/nvidia_logo.png -
-

ETL with NVTabular

-

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems. It provides a high level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS cuDF library.

-

Deep Learning models require the input feature in a specific format. Categorical features needs to be continuous integers (0, …, |C|) to use them with an embedding layer. We will use NVTabular to preprocess the categorical features.

-

This notebook will prepare the Movielens data for use with HugeCTR training.

-
-
-
# External dependencies
-!apt update && apt install -y graphviz
-
-import cudf
-import os
-import shutil
-import numpy as np
-
-import nvtabular as nvt
-
-from os import path
-
-
-
-
-
Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease
-Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease
-Hit:3 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease
-Hit:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease
-Hit:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease
-Reading package lists... Done
-Building dependency tree       
-Reading state information... Done
-16 packages can be upgraded. Run 'apt list --upgradable' to see them.
-Reading package lists... Done
-Building dependency tree       
-Reading state information... Done
-graphviz is already the newest version (2.42.2-3build2).
-The following packages were automatically installed and are no longer required:
-  cmake-data libarchive13 librhash0 libuv1
-Use 'apt autoremove' to remove them.
-0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
-
-
-
-
-

We define our base input directory, containing the data.

-
-
-
INPUT_DATA_DIR = './data'
-
-
-
-
-
-
-
movies = cudf.read_parquet(os.path.join(INPUT_DATA_DIR, "movies_converted.parquet"))
-movies.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieId
01
12
23
34
45
-
-
-
-

Defining our Preprocessing Pipeline

-

The first step is to define the feature engineering and preprocessing pipeline.

-NVTabular has already implemented multiple calculations, called ops. An op can be applied to a ColumnGroup from an overloaded >> operator, which in turn returns a new ColumnGroup. A ColumnGroup is a list of column names as text.

-Example:

-
features = [ column_name, ...] >> op1 >> op2 >> ...
-
-
-

This may sounds more complicated as it is. Let’s define our first pipeline for the MovieLens dataset.

-

Currently, our dataset consists of two separate dataframes. First, we use the JoinExternal operator to left-join the metadata (genres) to our rating dataset.

-
-
-
CATEGORICAL_COLUMNS = ["userId", "movieId"]
-LABEL_COLUMNS = ["rating"]
-
-
-
-
-
-
-
joined = ["userId", "movieId"] >> nvt.ops.JoinExternal(movies, on=["movieId"])
-
-
-
-
-

Data pipelines are Directed Acyclic Graphs (DAGs). We can visualize them with graphviz.

-
-
-
joined.graph
-
-
-
-
-../../_images/066b2fd5e0bb125957a08439799519110a492bebe7436d750d0721277048b473.svg
-
-

Embedding Layers of neural networks require that categorical features are contiguous, incremental Integers: 0, 1, 2, … , |C|-1. We need to ensure that our categorical features fulfill the requirement.

-

Currently, our genres are a list of Strings. In addition, we should transform the single-hot categorical features userId and movieId, as well.
-NVTabular provides the operator Categorify, which provides this functionality with a high-level API out of the box. In NVTabular release v0.3, list support was added for multi-hot categorical features. Both works in the same way with no need for changes.

-

Next, we will add Categorify for our categorical features (single hot: userId, movieId and multi-hot: genres).

-
-
-
cat_features = joined >> nvt.ops.Categorify() 
-movieId_dup = cat_features["movieId"] >> nvt.ops.Rename(postfix='_duplicate')
-
-
-
-
-

The ratings are on a scale between 1-5. We want to predict a binary target with 1 for ratings >3 and 0 for ratings <=3. We use the LambdaOp for it.

-
-
-
ratings = nvt.ColumnGroup(["rating"]) >> (lambda col: (col > 3).astype("int8"))
-
-
-
-
-

We will also be adding a duplicate of the movieId field, which will be used for looking up pretrained movie embedding features.

-
-
-
output = cat_features + ratings + movieId_dup
-(output).graph
-
-
-
-
-../../_images/9029315af2f13b63460fd19ef84ef369a2f7327367b6e976dcf7f624d6784168.svg
-
-

We initialize our NVTabular workflow.

-
-
-
workflow = nvt.Workflow(output)
-
-
-
-
-
-
-

Running the pipeline

-

In general, the Ops in our Workflow will require measurements of statistical properties of our data in order to be leveraged. For example, the Normalize op requires measurements of the dataset mean and standard deviation, and the Categorify op requires an accounting of all the categories a particular feature can manifest. However, we frequently need to measure these properties across datasets which are too large to fit into GPU memory (or CPU memory for that matter) at once.

-

NVTabular solves this by providing the Dataset class, which breaks a set of parquet or csv files into into a collection of cudf.DataFrame chunks that can fit in device memory. The main purpose of this class is to abstract away the raw format of the data, and to allow other NVTabular classes to reliably materialize a dask_cudf.DataFrame collection (and/or collection-based iterator) on demand. Under the hood, the data decomposition corresponds to the construction of a dask_cudf.DataFrame object. By representing our dataset as a lazily-evaluated Dask collection, we can handle the calculation of complex global statistics (and later, can also iterate over the partitions while feeding data into a neural network). part_size defines the size read into GPU-memory at once.

-

Now instantiate dataset iterators to loop through our dataset (which we couldn’t fit into GPU memory). HugeCTR expect the categorical input columns as int64 and continuous/label columns as float32 We need to enforce the required HugeCTR data types, so we set them in a dictionary and give as an argument when creating our dataset.

-
-
-
dict_dtypes = {}
-
-for col in CATEGORICAL_COLUMNS:
-    dict_dtypes[col] = np.int64
-
-for col in LABEL_COLUMNS:
-    dict_dtypes[col] = np.float32
-
-
-
-
-
-
-
train_dataset = nvt.Dataset([os.path.join(INPUT_DATA_DIR, "train.parquet")], part_size="100MB")
-valid_dataset = nvt.Dataset([os.path.join(INPUT_DATA_DIR, "valid.parquet")], part_size="100MB")
-
-
-
-
-
/nvtabular/nvtabular/io/parquet.py:285: UserWarning: Row group memory size (640002432) (bytes) of parquet file is bigger than requested part_size (100000000) for the NVTabular dataset.A row group memory size of 128 MB is generally recommended. You can find info on how to set the row group size of parquet files in https://nvidia-merlin.github.io/NVTabular/main/resources/troubleshooting.html#setting-the-row-group-size-for-the-parquet-files
-  warnings.warn(
-/nvtabular/nvtabular/io/parquet.py:285: UserWarning: Row group memory size (160000608) (bytes) of parquet file is bigger than requested part_size (100000000) for the NVTabular dataset.A row group memory size of 128 MB is generally recommended. You can find info on how to set the row group size of parquet files in https://nvidia-merlin.github.io/NVTabular/main/resources/troubleshooting.html#setting-the-row-group-size-for-the-parquet-files
-  warnings.warn(
-
-
-
-
-

Now that we have our datasets, we’ll apply our Workflow to them and save the results out to parquet files for fast reading at train time. Similar to the scikit learn API, we collect the statistics of our train dataset with .fit.

-
-
-
%%time
-
-workflow.fit(train_dataset)
-
-
-
-
-
CPU times: user 554 ms, sys: 427 ms, total: 981 ms
-Wall time: 1.04 s
-
-
-
<nvtabular.workflow.workflow.Workflow at 0x7fbb086a3370>
-
-
-
-
-

We clear our output directories.

-
-
-
# Make sure we have a clean output path
-if path.exists(os.path.join(INPUT_DATA_DIR, "train-hugectr")):
-    shutil.rmtree(os.path.join(INPUT_DATA_DIR, "train-hugectr"))
-if path.exists(os.path.join(INPUT_DATA_DIR, "valid-hugectr")):
-    shutil.rmtree(os.path.join(INPUT_DATA_DIR, "valid-hugectr"))
-
-
-
-
-

We transform our workflow with .transform. We are going to add only 'userId', 'movieId' columns to _metadata.json, because this json file will be needed for HugeCTR training to obtain the required information from all the rows in each parquet file.

-
-
-
%time
-workflow.transform(train_dataset).to_parquet(
-    output_path=os.path.join(INPUT_DATA_DIR, "train-hugectr"),
-    shuffle=nvt.io.Shuffle.PER_PARTITION,
-    cats=["userId", "movieId"],
-    labels=["rating"],
-    dtypes=dict_dtypes,
-)
-
-
-
-
-
CPU times: user 1 µs, sys: 1e+03 ns, total: 2 µs
-Wall time: 5.25 µs
-
-
-
-
-
-
-
%time
-workflow.transform(valid_dataset).to_parquet(
-    output_path=os.path.join(INPUT_DATA_DIR, "valid-hugectr"),
-    shuffle=False,
-    cats=["userId", "movieId"],
-    labels=["rating"],
-    dtypes=dict_dtypes,
-)
-
-
-
-
-
CPU times: user 2 µs, sys: 2 µs, total: 4 µs
-Wall time: 6.68 µs
-
-
-
-
-

We can take a look in the output dir.

-

In the next notebooks, we will train a deep learning model. Our training pipeline requires information about the data schema to define the neural network architecture. We will save the NVTabular workflow to disk then we can restore it in the next notebooks.

-
-
-
workflow.save(os.path.join(INPUT_DATA_DIR, "workflow-hugectr"))
-
-
-
-
-
-
-
from nvtabular.ops import get_embedding_sizes
-
-embeddings = get_embedding_sizes(workflow)
-print(embeddings)
-
-
-
-
-
{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
-
-
-
-
-
-
-

Checking the pre-processing outputs

-

We can take a look on the data.

-
-
-
import glob
-
-TRAIN_PATHS = sorted(glob.glob(os.path.join(INPUT_DATA_DIR, "train-hugectr", "*.parquet")))
-VALID_PATHS = sorted(glob.glob(os.path.join(INPUT_DATA_DIR, "valid-hugectr", "*.parquet")))
-TRAIN_PATHS, VALID_PATHS
-
-
-
-
-
(['./data/train-hugectr/part_0.parquet'],
- ['./data/valid-hugectr/part_0.parquet'])
-
-
-
-
-

We can see, that genres are a list of Integers

-
-
-
df = cudf.read_parquet(TRAIN_PATHS[0])
-df.head()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
userIdmovieIdratingmovieId_duplicate
0264608740.0874
19743817040.01704
210557435680.03568
339464301.030
4127724981.098
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/07-Training-with-HugeCTR.html b/review/pr-458/notebooks/multi-modal-data/07-Training-with-HugeCTR.html deleted file mode 100644 index 9377599445..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/07-Training-with-HugeCTR.html +++ /dev/null @@ -1,894 +0,0 @@ - - - - - - - Training HugeCTR Model with Pre-trained Embeddings — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- - http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_multi-modal-data-7-training-with-hugectr/nvidia_logo.png -
-

Training HugeCTR Model with Pre-trained Embeddings

-

In this notebook, we will train a deep neural network for predicting user’s rating (binary target with 1 for ratings >3 and 0 for ratings <=3). The two categorical features are userId and movieId.

-

We will also make use of movie’s pretrained embeddings, extracted in the previous notebooks.

-
-

Loading pretrained movie features into non-trainable embedding layer

-
-
-
# loading NVTabular movie encoding
-import pandas as pd
-import os
-
-INPUT_DATA_DIR = './data'
-movie_mapping = pd.read_parquet(os.path.join(INPUT_DATA_DIR, "workflow-hugectr/categories/unique.movieId.parquet"))
-
-
-
-
-
-
-
movie_mapping.tail()
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdmovieId_size
565812091551
565822091571
565832091591
565842091691
565852091711
-
-
-
-
-
feature_df = pd.read_parquet('feature_df.parquet')
-print(feature_df.shape)
-feature_df.head()
-
-
-
-
-
(62423, 3073)
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
movieIdposter_feature_0poster_feature_1poster_feature_2poster_feature_3poster_feature_4poster_feature_5poster_feature_6poster_feature_7poster_feature_8...text_feature_1014text_feature_1015text_feature_1016text_feature_1017text_feature_1018text_feature_1019text_feature_1020text_feature_1021text_feature_1022text_feature_1023
01.00.0262600.8576080.4102470.0666540.3828030.8999980.5115620.5922910.565434...0.6367160.5783690.9961690.4021070.4123180.8599520.2938520.3411140.7271130.085829
12.00.1412650.7217580.6799580.9556340.3910910.3246110.5052110.2583310.048264...0.1615050.4318640.8365320.5250130.6545660.8238410.8183130.8562800.6380480.685537
23.00.1194180.9111460.4707620.7622580.6263350.7689470.2418330.7759920.236340...0.8655480.3878060.6683210.5521220.7502380.8637070.3821730.8944870.5651420.164083
34.00.5381840.9806780.6435130.9285190.7949060.2010220.7446660.9621880.915320...0.7775340.9042000.1673370.8751940.1804810.8159040.8082880.0367110.9027790.580946
45.00.7729510.2397880.0618740.1629970.3883100.2363110.1627570.2071340.111078...0.2500220.3350430.0916740.1215070.4181240.1500200.8035060.0595040.0023420.932321
-

5 rows × 3073 columns

-
-
-
-
-
feature_df.set_index('movieId', inplace=True)
-
-
-
-
-
-
-
from tqdm import tqdm
-import numpy as np
-
-num_tokens = len(movie_mapping)
-embedding_dim = 2048+1024
-hits = 0
-misses = 0
-
-# Prepare embedding matrix
-embedding_matrix = np.zeros((num_tokens, embedding_dim))
-
-print("Loading pretrained embedding matrix...")
-for i, row in tqdm(movie_mapping.iterrows(), total=len(movie_mapping)):
-    movieId = row['movieId']
-    if movieId in feature_df.index: 
-        embedding_vector = feature_df.loc[movieId]
-        # embedding found
-        embedding_matrix[i] = embedding_vector
-        hits += 1
-    else:
-        misses += 1
-print("Found features for %d movies (%d misses)" % (hits, misses))
-
-
-
-
-
Loading pretrained embedding matrix...
-
-
-
100%|████████████████████████████████████| 56586/56586 [00:14<00:00, 3967.46it/s]
-
-
-
Found features for 56585 movies (1 misses)
-
-
-

-
-
-
-
-
-
-
embedding_dim
-
-
-
-
-
3072
-
-
-
-
-
-
-
embedding_matrix
-
-
-
-
-
array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
-        0.        ],
-       [0.17294852, 0.15285189, 0.26095702, ..., 0.75369112, 0.29602144,
-        0.78917433],
-       [0.13539355, 0.84843078, 0.70951219, ..., 0.10441725, 0.72871966,
-        0.11719463],
-       ...,
-       [0.18514273, 0.72422918, 0.04273015, ..., 0.1404219 , 0.54169348,
-        0.96875489],
-       [0.08307642, 0.3673532 , 0.15777258, ..., 0.01297393, 0.36267638,
-        0.14848055],
-       [0.82188376, 0.56516905, 0.70838085, ..., 0.45119769, 0.9273439 ,
-        0.42464321]])
-
-
-
-
-

Next, we write the pretrained embedding to a raw format supported by HugeCTR.

-

Note: As of version 3.2, HugeCTR only supports a maximum embedding size of 1024. Hence, we shall be using the first 512 element of image embedding plus 512 element of text embedding.

-
-
-
import struct
-
-PRETRAINED_EMBEDDING_SIZE = 1024
-
-def convert_pretrained_embeddings_to_sparse_model(keys, pre_trained_sparse_embeddings, hugectr_sparse_model, embedding_vec_size):
-    os.system("mkdir -p {}".format(hugectr_sparse_model))
-    with open("{}/key".format(hugectr_sparse_model), 'wb') as key_file, \
-        open("{}/emb_vector".format(hugectr_sparse_model), 'wb') as vec_file:
-                
-        for i, key in enumerate(keys):
-            vec = np.concatenate([pre_trained_sparse_embeddings[i,:int(PRETRAINED_EMBEDDING_SIZE/2)], pre_trained_sparse_embeddings[i, 1024:1024+int(PRETRAINED_EMBEDDING_SIZE/2)]])
-            key_struct = struct.pack('q', key)
-            vec_struct = struct.pack(str(embedding_vec_size) + "f", *vec)
-            key_file.write(key_struct)
-            vec_file.write(vec_struct)
-
-keys = list(movie_mapping.index)
-convert_pretrained_embeddings_to_sparse_model(keys, embedding_matrix, 'hugectr_pretrained_embedding.model', embedding_vec_size=PRETRAINED_EMBEDDING_SIZE) # HugeCTR not supporting embedding size > 1024
-
-
-
-
-
-
-

Define and train model

-

In this section, we define and train the model. The model comprise trainable embedding layers for categorical features (userId, movieId) and pretrained (non-trainable) embedding layer for movie features.

-

We will write the model to ./model.py and execute it afterwards.

-

First, we need the cardinalities of each categorical feature to assign as slot_size_array in the model below.

-
-
-
import nvtabular as nvt
-from nvtabular.ops import get_embedding_sizes
-
-workflow = nvt.Workflow.load(os.path.join(INPUT_DATA_DIR, "workflow-hugectr"))
-
-embeddings = get_embedding_sizes(workflow)
-print(embeddings)
-
-#{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
-
-
-
-
-
{'userId': (162542, 512), 'movieId': (56586, 512), 'movieId_duplicate': (56586, 512)}
-
-
-
-
-

We use graph_to_json to convert the model to a JSON configuration, required for the inference.

-
-
-
%%writefile './model.py'
-
-import hugectr
-from mpi4py import MPI  # noqa
-INPUT_DATA_DIR = './data/'
-
-solver = hugectr.CreateSolver(
-    vvgpu=[[0]],
-    batchsize=2048,
-    batchsize_eval=2048,
-    max_eval_batches=160,
-    i64_input_key=True,
-    use_mixed_precision=False,
-    repeat_dataset=True,
-)
-optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.Adam)
-reader = hugectr.DataReaderParams(
-    data_reader_type=hugectr.DataReaderType_t.Parquet,
-    source=[INPUT_DATA_DIR + "train-hugectr/_file_list.txt"],
-    eval_source=INPUT_DATA_DIR + "valid-hugectr/_file_list.txt",
-    check_type=hugectr.Check_t.Non,
-    slot_size_array=[162542, 56586, 21, 56586],
-)
-
-model = hugectr.Model(solver, reader, optimizer)
-
-model.add(
-    hugectr.Input(
-        label_dim=1,
-        label_name="label",
-        dense_dim=0,
-        dense_name="dense",
-        data_reader_sparse_param_array=[
-            hugectr.DataReaderSparseParam("data1", nnz_per_slot=[1, 1, 2], is_fixed_length=False, slot_num=3),
-            hugectr.DataReaderSparseParam("movieId", nnz_per_slot=[1], is_fixed_length=True, slot_num=1)
-        ],
-    )
-)
-model.add(
-    hugectr.SparseEmbedding(
-        embedding_type=hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
-        workspace_size_per_gpu_in_mb=3000,
-        embedding_vec_size=16,
-        combiner="sum",
-        sparse_embedding_name="sparse_embedding1",
-        bottom_name="data1",
-        optimizer=optimizer,
-    )
-)
-
-# pretrained embedding
-model.add(
-    hugectr.SparseEmbedding(
-        embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-        workspace_size_per_gpu_in_mb=3000,
-        embedding_vec_size=1024,
-        combiner="sum",
-        sparse_embedding_name="pretrained_embedding",
-        bottom_name="movieId",
-        optimizer=optimizer,
-    )
-)
-
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["sparse_embedding1"],
-                            top_names = ["reshape1"],
-                            leading_dim=48))
-
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
-                            bottom_names = ["pretrained_embedding"],
-                            top_names = ["reshape2"],
-                            leading_dim=1024))
-
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
-                            bottom_names = ["reshape1", "reshape2"],
-                            top_names = ["concat1"]))
-
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["concat1"],
-        top_names=["fc1"],
-        num_output=128,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.ReLU,
-        bottom_names=["fc1"],
-        top_names=["relu1"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu1"],
-        top_names=["fc2"],
-        num_output=128,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.ReLU,
-        bottom_names=["fc2"],
-        top_names=["relu2"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["relu2"],
-        top_names=["fc3"],
-        num_output=1,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["fc3", "label"],
-        top_names=["loss"],
-    )
-)
-model.compile()
-model.summary()
-
-# Load the pretrained embedding layer
-model.load_sparse_weights({"pretrained_embedding": "./hugectr_pretrained_embedding.model"})
-model.freeze_embedding("pretrained_embedding")
-
-model.fit(max_iter=10001, display=100, eval_interval=200, snapshot=5000)
-model.graph_to_json(graph_config_file="hugectr-movielens.json")
-
-
-
-
-
Overwriting ./model.py
-
-
-
-
-

We train our model.

-
-
-
!python model.py
-
-
-
-
-
HugeCTR Version: 3.2
-====================================================Model Init=====================================================
-[HUGECTR][01:09:00][INFO][RANK0]: Global seed is 476440390
-[HUGECTR][01:09:00][INFO][RANK0]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-
-[HUGECTR][01:09:01][WARNING][RANK0]: Peer-to-peer access cannot be fully enabled.
-[HUGECTR][01:09:01][INFO][RANK0]: Start all2all warmup
-[HUGECTR][01:09:01][INFO][RANK0]: End all2all warmup
-[HUGECTR][01:09:01][INFO][RANK0]: Using All-reduce algorithm: NCCL
-[HUGECTR][01:09:01][INFO][RANK0]: Device 0: Tesla V100-SXM2-16GB
-[HUGECTR][01:09:01][INFO][RANK0]: num of DataReader workers: 1
-[HUGECTR][01:09:01][INFO][RANK0]: Vocabulary size: 275735
-[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=16384000
-[HUGECTR][01:09:01][INFO][RANK0]: max_vocabulary_size_per_gpu_=256000
-[HUGECTR][01:09:01][INFO][RANK0]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
-[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
-[HUGECTR][01:09:04][INFO][RANK0]: gpu0 start to init embedding
-[HUGECTR][01:09:04][INFO][RANK0]: gpu0 init embedding done
-[HUGECTR][01:09:04][INFO][RANK0]: Starting AUC NCCL warm-up
-[HUGECTR][01:09:04][INFO][RANK0]: Warm-up done
-===================================================Model Summary===================================================
-label                                   Dense                         Sparse                        
-label                                   dense                          data1,movieId                 
-(None, 1)                               (None, 0)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-LocalizedSlotSparseEmbeddingHash        data1                         sparse_embedding1             (None, 3, 16)                 
-------------------------------------------------------------------------------------------------------------------
-DistributedSlotSparseEmbeddingHash      movieId                       pretrained_embedding          (None, 1, 1024)               
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding1             reshape1                      (None, 48)                    
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 pretrained_embedding          reshape2                      (None, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-Concat                                  reshape1                      concat1                       (None, 1072)                  
-                                        reshape2                                                                                  
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            concat1                       fc1                           (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (None, 1)                     
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc3                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-[HUGECTR][01:09:04][INFO][RANK0]: Loading sparse model: ./hugectr_pretrained_embedding.model
-=====================================================Model Fit=====================================================
-[HUGECTR][01:09:06][INFO][RANK0]: Use non-epoch mode with number of iterations: 10001
-[HUGECTR][01:09:06][INFO][RANK0]: Training batchsize: 2048, evaluation batchsize: 2048
-[HUGECTR][01:09:06][INFO][RANK0]: Evaluation interval: 200, snapshot interval: 5000
-[HUGECTR][01:09:06][INFO][RANK0]: Dense network trainable: True
-[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding pretrained_embedding trainable: False
-[HUGECTR][01:09:06][INFO][RANK0]: Sparse embedding sparse_embedding1 trainable: True
-[HUGECTR][01:09:06][INFO][RANK0]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HUGECTR][01:09:06][INFO][RANK0]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HUGECTR][01:09:06][INFO][RANK0]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HUGECTR][01:09:06][INFO][RANK0]: Training source file: ./data/train-hugectr/_file_list.txt
-[HUGECTR][01:09:06][INFO][RANK0]: Evaluation source file: ./data/valid-hugectr/_file_list.txt
-[HUGECTR][01:09:08][INFO][RANK0]: Iter: 100 Time(100 iters): 2.297110s Loss: 0.581705 lr:0.001000
-[HUGECTR][01:09:11][INFO][RANK0]: Iter: 200 Time(100 iters): 2.274680s Loss: 0.574425 lr:0.001000
-[HUGECTR][01:09:11][INFO][RANK0]: Evaluation, AUC: 0.746443
-[HUGECTR][01:09:11][INFO][RANK0]: Eval Time for 160 iters: 0.054157s
-[HUGECTR][01:09:13][INFO][RANK0]: Iter: 300 Time(100 iters): 2.332273s Loss: 0.564224 lr:0.001000
-[HUGECTR][01:09:15][INFO][RANK0]: Iter: 400 Time(100 iters): 2.277900s Loss: 0.550730 lr:0.001000
-[HUGECTR][01:09:15][INFO][RANK0]: Evaluation, AUC: 0.764630
-[HUGECTR][01:09:15][INFO][RANK0]: Eval Time for 160 iters: 0.054009s
-[HUGECTR][01:09:18][INFO][RANK0]: Iter: 500 Time(100 iters): 2.434429s Loss: 0.536507 lr:0.001000
-[HUGECTR][01:09:20][INFO][RANK0]: Iter: 600 Time(100 iters): 2.279014s Loss: 0.525059 lr:0.001000
-[HUGECTR][01:09:20][INFO][RANK0]: Evaluation, AUC: 0.773702
-[HUGECTR][01:09:20][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
-[HUGECTR][01:09:22][INFO][RANK0]: Iter: 700 Time(100 iters): 2.335757s Loss: 0.532503 lr:0.001000
-[HUGECTR][01:09:25][INFO][RANK0]: Iter: 800 Time(100 iters): 2.278661s Loss: 0.526352 lr:0.001000
-[HUGECTR][01:09:25][INFO][RANK0]: Evaluation, AUC: 0.779897
-[HUGECTR][01:09:25][INFO][RANK0]: Eval Time for 160 iters: 0.167787s
-[HUGECTR][01:09:27][INFO][RANK0]: Iter: 900 Time(100 iters): 2.447136s Loss: 0.547141 lr:0.001000
-[HUGECTR][01:09:29][INFO][RANK0]: Iter: 1000 Time(100 iters): 2.376035s Loss: 0.548916 lr:0.001000
-[HUGECTR][01:09:30][INFO][RANK0]: Evaluation, AUC: 0.784775
-[HUGECTR][01:09:30][INFO][RANK0]: Eval Time for 160 iters: 0.054224s
-[HUGECTR][01:09:32][INFO][RANK0]: Iter: 1100 Time(100 iters): 2.334735s Loss: 0.540766 lr:0.001000
-[HUGECTR][01:09:34][INFO][RANK0]: Iter: 1200 Time(100 iters): 2.277728s Loss: 0.515882 lr:0.001000
-[HUGECTR][01:09:34][INFO][RANK0]: Evaluation, AUC: 0.786808
-[HUGECTR][01:09:34][INFO][RANK0]: Eval Time for 160 iters: 0.054551s
-[HUGECTR][01:09:36][INFO][RANK0]: Iter: 1300 Time(100 iters): 2.336372s Loss: 0.531510 lr:0.001000
-[HUGECTR][01:09:39][INFO][RANK0]: Iter: 1400 Time(100 iters): 2.277408s Loss: 0.511901 lr:0.001000
-[HUGECTR][01:09:39][INFO][RANK0]: Evaluation, AUC: 0.791416
-[HUGECTR][01:09:39][INFO][RANK0]: Eval Time for 160 iters: 0.165986s
-[HUGECTR][01:09:41][INFO][RANK0]: Iter: 1500 Time(100 iters): 2.554217s Loss: 0.522047 lr:0.001000
-[HUGECTR][01:09:44][INFO][RANK0]: Iter: 1600 Time(100 iters): 2.279548s Loss: 0.540521 lr:0.001000
-[HUGECTR][01:09:44][INFO][RANK0]: Evaluation, AUC: 0.793460
-[HUGECTR][01:09:44][INFO][RANK0]: Eval Time for 160 iters: 0.054801s
-[HUGECTR][01:09:46][INFO][RANK0]: Iter: 1700 Time(100 iters): 2.336303s Loss: 0.525447 lr:0.001000
-[HUGECTR][01:09:48][INFO][RANK0]: Iter: 1800 Time(100 iters): 2.278906s Loss: 0.523558 lr:0.001000
-[HUGECTR][01:09:48][INFO][RANK0]: Evaluation, AUC: 0.793137
-[HUGECTR][01:09:48][INFO][RANK0]: Eval Time for 160 iters: 0.054431s
-[HUGECTR][01:09:51][INFO][RANK0]: Iter: 1900 Time(100 iters): 2.336023s Loss: 0.511348 lr:0.001000
-[HUGECTR][01:09:53][INFO][RANK0]: Iter: 2000 Time(100 iters): 2.384979s Loss: 0.515268 lr:0.001000
-[HUGECTR][01:09:53][INFO][RANK0]: Evaluation, AUC: 0.796599
-[HUGECTR][01:09:53][INFO][RANK0]: Eval Time for 160 iters: 0.172160s
-[HUGECTR][01:09:55][INFO][RANK0]: Iter: 2100 Time(100 iters): 2.453174s Loss: 0.526615 lr:0.001000
-[HUGECTR][01:09:58][INFO][RANK0]: Iter: 2200 Time(100 iters): 2.278781s Loss: 0.536789 lr:0.001000
-[HUGECTR][01:09:58][INFO][RANK0]: Evaluation, AUC: 0.798459
-[HUGECTR][01:09:58][INFO][RANK0]: Eval Time for 160 iters: 0.054509s
-[HUGECTR][01:10:00][INFO][RANK0]: Iter: 2300 Time(100 iters): 2.335596s Loss: 0.508902 lr:0.001000
-[HUGECTR][01:10:02][INFO][RANK0]: Iter: 2400 Time(100 iters): 2.277901s Loss: 0.520411 lr:0.001000
-[HUGECTR][01:10:02][INFO][RANK0]: Evaluation, AUC: 0.798726
-[HUGECTR][01:10:02][INFO][RANK0]: Eval Time for 160 iters: 0.054518s
-[HUGECTR][01:10:05][INFO][RANK0]: Iter: 2500 Time(100 iters): 2.444557s Loss: 0.490832 lr:0.001000
-[HUGECTR][01:10:07][INFO][RANK0]: Iter: 2600 Time(100 iters): 2.279310s Loss: 0.507799 lr:0.001000
-[HUGECTR][01:10:07][INFO][RANK0]: Evaluation, AUC: 0.801325
-[HUGECTR][01:10:07][INFO][RANK0]: Eval Time for 160 iters: 0.164203s
-[HUGECTR][01:10:10][INFO][RANK0]: Iter: 2700 Time(100 iters): 2.443310s Loss: 0.519460 lr:0.001000
-[HUGECTR][01:10:12][INFO][RANK0]: Iter: 2800 Time(100 iters): 2.277569s Loss: 0.512426 lr:0.001000
-[HUGECTR][01:10:12][INFO][RANK0]: Evaluation, AUC: 0.800731
-[HUGECTR][01:10:12][INFO][RANK0]: Eval Time for 160 iters: 0.054590s
-[HUGECTR][01:10:14][INFO][RANK0]: Iter: 2900 Time(100 iters): 2.336213s Loss: 0.512216 lr:0.001000
-[HUGECTR][01:10:17][INFO][RANK0]: Iter: 3000 Time(100 iters): 2.384833s Loss: 0.522102 lr:0.001000
-[HUGECTR][01:10:17][INFO][RANK0]: Evaluation, AUC: 0.803801
-[HUGECTR][01:10:17][INFO][RANK0]: Eval Time for 160 iters: 0.054133s
-[HUGECTR][01:10:19][INFO][RANK0]: Iter: 3100 Time(100 iters): 2.334245s Loss: 0.507463 lr:0.001000
-[HUGECTR][01:10:21][INFO][RANK0]: Iter: 3200 Time(100 iters): 2.279046s Loss: 0.526148 lr:0.001000
-[HUGECTR][01:10:21][INFO][RANK0]: Evaluation, AUC: 0.802950
-[HUGECTR][01:10:21][INFO][RANK0]: Eval Time for 160 iters: 0.070003s
-[HUGECTR][01:10:24][INFO][RANK0]: Iter: 3300 Time(100 iters): 2.352114s Loss: 0.504611 lr:0.001000
-[HUGECTR][01:10:26][INFO][RANK0]: Iter: 3400 Time(100 iters): 2.277292s Loss: 0.502907 lr:0.001000
-[HUGECTR][01:10:26][INFO][RANK0]: Evaluation, AUC: 0.804364
-[HUGECTR][01:10:26][INFO][RANK0]: Eval Time for 160 iters: 0.054315s
-[HUGECTR][01:10:28][INFO][RANK0]: Iter: 3500 Time(100 iters): 2.442956s Loss: 0.512927 lr:0.001000
-[HUGECTR][01:10:31][INFO][RANK0]: Iter: 3600 Time(100 iters): 2.277974s Loss: 0.519042 lr:0.001000
-[HUGECTR][01:10:31][INFO][RANK0]: Evaluation, AUC: 0.806404
-[HUGECTR][01:10:31][INFO][RANK0]: Eval Time for 160 iters: 0.054291s
-[HUGECTR][01:10:33][INFO][RANK0]: Iter: 3700 Time(100 iters): 2.335365s Loss: 0.499368 lr:0.001000
-[HUGECTR][01:10:35][INFO][RANK0]: Iter: 3800 Time(100 iters): 2.277786s Loss: 0.509683 lr:0.001000
-[HUGECTR][01:10:35][INFO][RANK0]: Evaluation, AUC: 0.805164
-[HUGECTR][01:10:35][INFO][RANK0]: Eval Time for 160 iters: 0.064908s
-[HUGECTR][01:10:38][INFO][RANK0]: Iter: 3900 Time(100 iters): 2.344106s Loss: 0.508182 lr:0.001000
-[HUGECTR][01:10:40][INFO][RANK0]: Iter: 4000 Time(100 iters): 2.387872s Loss: 0.493841 lr:0.001000
-[HUGECTR][01:10:40][INFO][RANK0]: Evaluation, AUC: 0.808367
-[HUGECTR][01:10:40][INFO][RANK0]: Eval Time for 160 iters: 0.054222s
-[HUGECTR][01:10:42][INFO][RANK0]: Iter: 4100 Time(100 iters): 2.335361s Loss: 0.508106 lr:0.001000
-[HUGECTR][01:10:45][INFO][RANK0]: Iter: 4200 Time(100 iters): 2.278802s Loss: 0.519000 lr:0.001000
-[HUGECTR][01:10:45][INFO][RANK0]: Evaluation, AUC: 0.808897
-[HUGECTR][01:10:45][INFO][RANK0]: Eval Time for 160 iters: 0.054320s
-[HUGECTR][01:10:47][INFO][RANK0]: Iter: 4300 Time(100 iters): 2.334094s Loss: 0.502797 lr:0.001000
-[HUGECTR][01:10:49][INFO][RANK0]: Iter: 4400 Time(100 iters): 2.388990s Loss: 0.508890 lr:0.001000
-[HUGECTR][01:10:49][INFO][RANK0]: Evaluation, AUC: 0.809649
-[HUGECTR][01:10:49][INFO][RANK0]: Eval Time for 160 iters: 0.074584s
-[HUGECTR][01:10:52][INFO][RANK0]: Iter: 4500 Time(100 iters): 2.355005s Loss: 0.505778 lr:0.001000
-[HUGECTR][01:10:54][INFO][RANK0]: Iter: 4600 Time(100 iters): 2.277275s Loss: 0.532776 lr:0.001000
-[HUGECTR][01:10:54][INFO][RANK0]: Evaluation, AUC: 0.810962
-[HUGECTR][01:10:54][INFO][RANK0]: Eval Time for 160 iters: 0.054498s
-[HUGECTR][01:10:56][INFO][RANK0]: Iter: 4700 Time(100 iters): 2.335553s Loss: 0.503001 lr:0.001000
-[HUGECTR][01:10:59][INFO][RANK0]: Iter: 4800 Time(100 iters): 2.279237s Loss: 0.495762 lr:0.001000
-[HUGECTR][01:10:59][INFO][RANK0]: Evaluation, AUC: 0.808618
-[HUGECTR][01:10:59][INFO][RANK0]: Eval Time for 160 iters: 0.054287s
-[HUGECTR][01:11:01][INFO][RANK0]: Iter: 4900 Time(100 iters): 2.449926s Loss: 0.503213 lr:0.001000
-[HUGECTR][01:11:03][INFO][RANK0]: Iter: 5000 Time(100 iters): 2.277141s Loss: 0.481138 lr:0.001000
-[HUGECTR][01:11:03][INFO][RANK0]: Evaluation, AUC: 0.810767
-[HUGECTR][01:11:03][INFO][RANK0]: Eval Time for 160 iters: 0.064807s
-[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Dump hash table from GPU0
-[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table <key,value> pairs to file
-[HUGECTR][01:11:04][INFO][RANK0]: Done
-[HUGECTR][01:11:04][INFO][RANK0]: Rank0: Write hash table to file
-[HUGECTR][01:11:13][INFO][RANK0]: Dumping sparse weights to files, successful
-[HUGECTR][01:11:13][INFO][RANK0]: Rank0: Write optimzer state to file
-[HUGECTR][01:11:14][INFO][RANK0]: Done
-[HUGECTR][01:11:14][INFO][RANK0]: Rank0: Write optimzer state to file
-[HUGECTR][01:11:15][INFO][RANK0]: Done
-[HUGECTR][01:11:34][INFO][RANK0]: Rank0: Write optimzer state to file
-[HUGECTR][01:11:35][INFO][RANK0]: Done
-[HUGECTR][01:11:35][INFO][RANK0]: Rank0: Write optimzer state to file
-[HUGECTR][01:11:36][INFO][RANK0]: Done
-[HUGECTR][01:11:55][INFO][RANK0]: Dumping sparse optimzer states to files, successful
-[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense weights to file, successful
-[HUGECTR][01:11:55][INFO][RANK0]: Dumping dense optimizer states to file, successful
-[HUGECTR][01:11:55][INFO][RANK0]: Dumping untrainable weights to file, successful
-[HUGECTR][01:11:57][INFO][RANK0]: Iter: 5100 Time(100 iters): 53.630313s Loss: 0.485568 lr:0.001000
-[HUGECTR][01:11:59][INFO][RANK0]: Iter: 5200 Time(100 iters): 2.278359s Loss: 0.518924 lr:0.001000
-[HUGECTR][01:11:59][INFO][RANK0]: Evaluation, AUC: 0.811217
-[HUGECTR][01:11:59][INFO][RANK0]: Eval Time for 160 iters: 0.054624s
-[HUGECTR][01:12:02][INFO][RANK0]: Iter: 5300 Time(100 iters): 2.336246s Loss: 0.516505 lr:0.001000
-[HUGECTR][01:12:04][INFO][RANK0]: Iter: 5400 Time(100 iters): 2.384571s Loss: 0.512404 lr:0.001000
-[HUGECTR][01:12:04][INFO][RANK0]: Evaluation, AUC: 0.811464
-[HUGECTR][01:12:04][INFO][RANK0]: Eval Time for 160 iters: 0.054350s
-[HUGECTR][01:12:06][INFO][RANK0]: Iter: 5500 Time(100 iters): 2.334675s Loss: 0.500305 lr:0.001000
-[HUGECTR][01:12:09][INFO][RANK0]: Iter: 5600 Time(100 iters): 2.279563s Loss: 0.484969 lr:0.001000
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/multi-modal-data/index.html b/review/pr-458/notebooks/multi-modal-data/index.html deleted file mode 100644 index c1448374a2..0000000000 --- a/review/pr-458/notebooks/multi-modal-data/index.html +++ /dev/null @@ -1,165 +0,0 @@ - - - - - - - Multi-modal Example Notebooks — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/notebooks/training_with_remote_filesystem.html b/review/pr-458/notebooks/training_with_remote_filesystem.html deleted file mode 100644 index a20f5d25d7..0000000000 --- a/review/pr-458/notebooks/training_with_remote_filesystem.html +++ /dev/null @@ -1,1379 +0,0 @@ - - - - - - - HugeCTR Training with Remote File System Example — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
- -
-
-
-
- -
-
-
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# Each user is responsible for checking the content of datasets and the
-# applicable licenses and determining if suitable for the intended use.
-
-
-
-
-http://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_hugectr_training-with-hdfs/nvidia_logo.png -
-

HugeCTR Training with Remote File System Example

-
-

Overview

-

HugeCTR supports reading Parquet data, loading and saving models from/to remote file systems like HDFS, AWS S3, and GCS. Users can read their data stored in these remote file systems and train with it. And after training, users can choose to dump the trained parameters and optimizer states into these file systems.

-
-
-

Setup HugeCTR

-

To setup the environment, refer to HugeCTR Example Notebooks and follow the instructions there before running the following.

-
-
-

Training with HDFS Example

-

Hadoop is not pre-installe din the Merlin Training Container. To help you build and install HDFS, we provide a script here. Please build and install Hadoop using these two scripts. Make sure you have hadoop installed in your Container by running the following:

-
-
-
!hadoop version
-
-
-
-
-
Hadoop 3.3.2
-Source code repository https://github.com/apache/hadoop.git -r 0bcb014209e219273cb6fd4152df7df713cbac61
-Compiled by root on 2022-07-25T09:53Z
-Compiled with protoc 3.7.1
-From source with checksum 4b40fff8bb27201ba07b6fa5651217fb
-This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.3.2.jar
-
-
-
-
-
-

Data Preparation

-

Users can use the DataSourceParams to setup file system configurations. Currently, we support Local, HDFS, S3, and GCS.

-

Firstly, we want to make sure that we have train and validation datasets ready:

-
-
-
!hdfs dfs -ls hdfs://10.19.172.76:9000/dlrm_parquet/train
-
-
-
-
-
Found 8 items
--rw-r--r--   1 root supergroup  112247365 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_0.parquet
--rw-r--r--   1 root supergroup  112243637 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_1.parquet
--rw-r--r--   1 root supergroup  112251207 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_2.parquet
--rw-r--r--   1 root supergroup  112241764 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_3.parquet
--rw-r--r--   1 root supergroup  112247838 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_4.parquet
--rw-r--r--   1 root supergroup  112244076 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_5.parquet
--rw-r--r--   1 root supergroup  112253553 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_6.parquet
--rw-r--r--   1 root supergroup  112249557 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_7.parquet
-
-
-
-
-
-
-
!hdfs dfs -ls hdfs://10.19.172.76:9000/dlrm_parquet/val
-
-
-
-
-
Found 2 items
--rw-r--r--   1 root supergroup  112239093 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/val/gen_0.parquet
--rw-r--r--   1 root supergroup  112249156 2022-07-27 06:19 hdfs://10.19.172.76:9000/dlrm_parquet/val/gen_1.parquet
-
-
-
-
-

Secondly, create file_list.txt and file_list_test.txt:

-
-
-
!mkdir /dlrm_parquet
-!mkdir /dlrm_parquet/train
-!mkdir /dlrm_parquet/val
-
-
-
-
-
-
-
%%writefile /dlrm_parquet/file_list.txt
-8
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_0.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_1.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_2.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_3.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_4.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_5.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_6.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/train/gen_7.parquet
-
-
-
-
-
Overwriting /dlrm_parquet/file_list.txt
-
-
-
-
-
-
-
%%writefile /dlrm_parquet/file_list_test.txt
-2
-hdfs://10.19.172.76:9000/dlrm_parquet/val/gen_0.parquet
-hdfs://10.19.172.76:9000/dlrm_parquet/val/gen_1.parquet
-
-
-
-
-
Overwriting /dlrm_parquet/file_list_test.txt
-
-
-
-
-

Lastly, create _metadata.json for both train and validation dataset to specify the feature information of your dataset:

-
-
-
%%writefile /dlrm_parquet/train/_metadata.json
-{ "file_stats": [{"file_name": "./dlrm_parquet/train/gen_0.parquet", "num_rows":1000000}, {"file_name": "./dlrm_parquet/train/gen_1.parquet", "num_rows":1000000}, 
-                 {"file_name": "./dlrm_parquet/train/gen_2.parquet", "num_rows":1000000}, {"file_name": "./dlrm_parquet/train/gen_3.parquet", "num_rows":1000000}, 
-                 {"file_name": "./dlrm_parquet/train/gen_4.parquet", "num_rows":1000000}, {"file_name": "./dlrm_parquet/train/gen_5.parquet", "num_rows":1000000}, 
-                 {"file_name": "./dlrm_parquet/train/gen_6.parquet", "num_rows":1000000}, {"file_name": "./dlrm_parquet/train/gen_7.parquet", "num_rows":1000000} ], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, 
-            {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, 
-            {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, 
-           {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, {"col_name": "C19", "index":19}, 
-           {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, 
-           {"col_name": "C23", "index":23}, {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, 
-           {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-           {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, 
-           {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, {"col_name": "C34", "index":34}, 
-           {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, 
-           {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Writing /dlrm_parquet/train/_metadata.json
-
-
-
-
-
-
-
%%writefile /dlrm_parquet/val/_metadata.json
-{ "file_stats": [{"file_name": "./dlrm_parquet/val/gen_0.parquet", "num_rows":1000000}, 
-                 {"file_name": "./dlrm_parquet/val/gen_1.parquet", "num_rows":1000000} ], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, 
-            {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, 
-            {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, 
-           {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, {"col_name": "C19", "index":19}, 
-           {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, 
-           {"col_name": "C23", "index":23}, {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, 
-           {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-           {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, 
-           {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, {"col_name": "C34", "index":34}, 
-           {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, 
-           {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Writing /dlrm_parquet/val/_metadata.json
-
-
-
-
-
-
-

Training a DLRM model

-

Important APIs used in the following script:

-
    -
  1. We use the DataSourceParams to define the remote file system to read data from

  2. -
  3. In DataReaderParams, we specify the DataSourceParams.

  4. -
  5. In fit() method, we specify HDFS path in the snapshot_prefix parameters to dump trained models to HDFS.

  6. -
-
-
-
%%writefile train_with_hdfs.py
-import hugectr
-from mpi4py import MPI
-from hugectr.data import DataSourceParams
-
-# Create a file system configuration 
-data_source_params = DataSourceParams(
-    source = hugectr.DataSourceType_t.HDFS, #use HDFS
-    server = '10.19.172.76', #your HDFS namenode IP
-    port = 9000, #your HDFS namenode port
-)
-
-# DLRM train
-solver = hugectr.CreateSolver(max_eval_batches = 1280,
-                              batchsize_eval = 1024,
-                              batchsize = 1024,
-                              lr = 0.01,
-                              vvgpu = [[1]],
-                              i64_input_key = True,
-                              use_mixed_precision = False,
-                              repeat_dataset = True,
-                              use_cuda_graph = False)
-reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
-                                  source = ["/dlrm_parquet/file_list.txt"],
-                                  eval_source = "/dlrm_parquet/file_list_test.txt",
-                                  slot_size_array = [405274, 72550, 55008, 222734, 316071, 156265, 220243, 200179, 234566, 335625, 278726, 263070, 312542, 203773, 145859, 117421, 78140, 3648, 156308, 94562, 357703, 386976, 238046, 230917, 292, 156382],
-                                  data_source_params = data_source_params, #file system config for data reading
-                                  check_type = hugectr.Check_t.Non)
-optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.SGD,
-                                    update_type = hugectr.Update_t.Local,
-                                    atomic_update = True)
-model = hugectr.Model(solver, reader, optimizer)
-model.add(hugectr.Input(label_dim = 1, label_name = "label",
-                        dense_dim = 13, dense_name = "dense",
-                        data_reader_sparse_param_array = 
-                        [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
-model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-                            workspace_size_per_gpu_in_mb = 10720,
-                            embedding_vec_size = 128,
-                            combiner = "sum",
-                            sparse_embedding_name = "sparse_embedding1",
-                            bottom_name = "data1",
-                            optimizer = optimizer))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["dense"],
-                            top_names = ["fc1"],
-                            num_output=512))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc1"],
-                            top_names = ["relu1"]))                           
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu1"],
-                            top_names = ["fc2"],
-                            num_output=256))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc2"],
-                            top_names = ["relu2"]))                            
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu2"],
-                            top_names = ["fc3"],
-                            num_output=128))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc3"],
-                            top_names = ["relu3"]))                              
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Interaction,
-                            bottom_names = ["relu3","sparse_embedding1"],
-                            top_names = ["interaction1"]))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["interaction1"],
-                            top_names = ["fc4"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc4"],
-                            top_names = ["relu4"]))                              
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu4"],
-                            top_names = ["fc5"],
-                            num_output=1024))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc5"],
-                            top_names = ["relu5"]))                              
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu5"],
-                            top_names = ["fc6"],
-                            num_output=512))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc6"],
-                            top_names = ["relu6"]))                               
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu6"],
-                            top_names = ["fc7"],
-                            num_output=256))
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
-                            bottom_names = ["fc7"],
-                            top_names = ["relu7"]))                                                                              
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
-                            bottom_names = ["relu7"],
-                            top_names = ["fc8"],
-                            num_output=1))                                                                                           
-model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
-                            bottom_names = ["fc8", "label"],
-                            top_names = ["loss"]))
-model.compile()
-model.summary()
-
-model.fit(max_iter = 2020, display = 200, eval_interval = 1000, snapshot = 2000, snapshot_prefix = "hdfs://10.19.172.76:9000/model/dlrm/") 
-
-
-
-
-
Overwriting train_with_hdfs.py
-
-
-
-
-
-
-
!python train_with_hdfs.py
-
-
-
-
-
HugeCTR Version: 3.8
-====================================================Model Init=====================================================
-[HCTR][07:51:52.502][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][07:51:52.502][INFO][RK0][main]: Global seed is 3218787045
-[HCTR][07:51:52.505][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 1 ->  node 0
-[HCTR][07:51:55.607][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][07:51:55.607][INFO][RK0][main]: Start all2all warmup
-[HCTR][07:51:55.609][INFO][RK0][main]: End all2all warmup
-[HCTR][07:51:56.529][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][07:51:56.530][INFO][RK0][main]: Device 1: NVIDIA A10
-[HCTR][07:51:56.531][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][07:51:56.531][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][07:51:57.695][INFO][RK0][main]: Using Hadoop Cluster 10.19.172.76:9000
-[HCTR][07:51:57.740][INFO][RK0][main]: Using Hadoop Cluster 10.19.172.76:9000
-[HCTR][07:51:57.740][INFO][RK0][main]: Vocabulary size: 5242880
-[HCTR][07:51:57.741][INFO][RK0][main]: max_vocabulary_size_per_gpu_=21954560
-[HCTR][07:51:57.755][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][07:52:04.336][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][07:52:04.411][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][07:52:04.413][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][07:52:04.415][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][07:52:04.415][INFO][RK0][main]: label                                   Dense                         Sparse                        
-label                                   dense                          data1                         
-(None, 1)                               (None, 13)                              
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      data1                         sparse_embedding1             (None, 26, 128)               
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            dense                         fc1                           (None, 512)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (None, 512)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu1                         fc2                           (None, 256)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc2                           relu2                         (None, 256)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu2                         fc3                           (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc3                           relu3                         (None, 128)                   
-------------------------------------------------------------------------------------------------------------------
-Interaction                             relu3                         interaction1                  (None, 480)                   
-                                        sparse_embedding1                                                                         
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            interaction1                  fc4                           (None, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc4                           relu4                         (None, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu4                         fc5                           (None, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc5                           relu5                         (None, 1024)                  
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu5                         fc6                           (None, 512)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc6                           relu6                         (None, 512)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu6                         fc7                           (None, 256)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc7                           relu7                         (None, 256)                   
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            relu7                         fc8                           (None, 1)                     
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc8                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][07:52:04.415][INFO][RK0][main]: Use non-epoch mode with number of iterations: 2020
-[HCTR][07:52:04.415][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
-[HCTR][07:52:04.415][INFO][RK0][main]: Evaluation interval: 1000, snapshot interval: 2000
-[HCTR][07:52:04.415][INFO][RK0][main]: Dense network trainable: True
-[HCTR][07:52:04.415][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][07:52:04.415][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: False
-[HCTR][07:52:04.415][INFO][RK0][main]: lr: 0.010000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][07:52:04.415][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][07:52:04.415][INFO][RK0][main]: Training source file: /dlrm_parquet/file_list.txt
-[HCTR][07:52:04.415][INFO][RK0][main]: Evaluation source file: /dlrm_parquet/file_list_test.txt
-[HCTR][07:52:05.134][INFO][RK0][main]: Iter: 200 Time(200 iters): 0.716815s Loss: 0.69327 lr:0.01
-[HCTR][07:52:05.856][INFO][RK0][main]: Iter: 400 Time(200 iters): 0.719486s Loss: 0.693207 lr:0.01
-[HCTR][07:52:06.608][INFO][RK0][main]: Iter: 600 Time(200 iters): 0.750294s Loss: 0.693568 lr:0.01
-[HCTR][07:52:07.331][INFO][RK0][main]: Iter: 800 Time(200 iters): 0.721128s Loss: 0.693352 lr:0.01
-[HCTR][07:52:09.118][INFO][RK0][main]: Iter: 1000 Time(200 iters): 1.78435s Loss: 0.693352 lr:0.01
-[HCTR][07:52:11.667][INFO][RK0][main]: Evaluation, AUC: 0.499891
-[HCTR][07:52:11.668][INFO][RK0][main]: Eval Time for 1280 iters: 2.5486s
-[HCTR][07:52:12.393][INFO][RK0][main]: Iter: 1200 Time(200 iters): 3.2728s Loss: 0.693178 lr:0.01
-[HCTR][07:52:13.116][INFO][RK0][main]: Iter: 1400 Time(200 iters): 0.720984s Loss: 0.693292 lr:0.01
-[HCTR][07:52:13.875][INFO][RK0][main]: Iter: 1600 Time(200 iters): 0.756448s Loss: 0.693053 lr:0.01
-[HCTR][07:52:14.603][INFO][RK0][main]: Iter: 1800 Time(200 iters): 0.725832s Loss: 0.693433 lr:0.01
-[HCTR][07:52:16.382][INFO][RK0][main]: Iter: 2000 Time(200 iters): 1.77763s Loss: 0.693193 lr:0.01
-[HCTR][07:52:18.959][INFO][RK0][main]: Evaluation, AUC: 0.500092
-[HCTR][07:52:18.959][INFO][RK0][main]: Eval Time for 1280 iters: 2.57548s
-[HCTR][07:52:19.575][INFO][RK0][main]: Rank0: Write hash table to file
-[HDFS][INFO]: Write to HDFS /model/dlrm/0_sparse_2000.model/key successfully!
-[HDFS][INFO]: Write to HDFS /model/dlrm/0_sparse_2000.model/emb_vector successfully!
-[HCTR][07:52:31.132][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][07:52:31.132][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HDFS][INFO]: Write to HDFS /model/dlrm/_dense_2000.model successfully!
-[HCTR][07:52:31.307][INFO][RK0][main]: Dumping dense weights to HDFS, successful
-[HDFS][INFO]: Write to HDFS /model/dlrm/_opt_dense_2000.model successfully!
-[HCTR][07:52:31.365][INFO][RK0][main]: Dumping dense optimizer states to HDFS, successful
-[HCTR][07:52:31.430][INFO][RK0][main]: Finish 2020 iterations with batchsize: 1024 in 27.02s.
-
-
-
-
-

Check that our model files are saved in HDFS:

-
-
-
!hdfs dfs -ls hdfs://10.19.172.76:9000/model/dlrm
-
-
-
-
-
Found 3 items
-drwxr-xr-x   - root supergroup          0 2022-07-27 07:52 hdfs://10.19.172.76:9000/model/dlrm/0_sparse_2000.model
--rw-r--r--   3 root supergroup    9479684 2022-07-27 07:52 hdfs://10.19.172.76:9000/model/dlrm/_dense_2000.model
--rw-r--r--   3 root supergroup          0 2022-07-27 07:52 hdfs://10.19.172.76:9000/model/dlrm/_opt_dense_2000.model
-
-
-
-
-
-
-
-

Training a DCN model with AWS S3

-

Before you start: -Please note that AWS S3 SDKs are NOT preinstalled in the NGC docker. To use S3 related functionalites, please do the following steps to customize the building of HugeCTR:

-
    -
  1. git clone https://github.com/NVIDIA/HugeCTR.git

  2. -
  3. cd HugeCTR

  4. -
  5. git submodule update –init –recursive

  6. -
  7. mkdir -p build && cd build

  8. -
  9. cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 -DENABLE_S3=ON … #ENABLE_S3 option will install AWS S3 SDKs for you.

  10. -
  11. make -j && make install

  12. -
-
-

Data preparation

-

Create file_list.txt and file_list_test.txt:

-
-
-
!mkdir -p /hugectr-io-test/data/dcn_parquet/train
-!mkdir -p /hugectr-io-test/data/dcn_parquet/val
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/file_list.txt
-16
-s3://hugectr-io-test/data/dcn_parquet/train/gen_0.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_1.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_2.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_3.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_4.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_5.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_6.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_7.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_8.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_9.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_10.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_11.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_12.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_13.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_14.parquet
-s3://hugectr-io-test/data/dcn_parquet/train/gen_15.parquet
-
-
-
-
-
Writing /hugectr-io-test/data/dcn_parquet/file_list.txt
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-4
-s3://hugectr-io-test/data/dcn_parquet/val/gen_0.parquet
-s3://hugectr-io-test/data/dcn_parquet/val/gen_1.parquet
-s3://hugectr-io-test/data/dcn_parquet/val/gen_2.parquet
-s3://hugectr-io-test/data/dcn_parquet/val/gen_3.parquet
-
-
-
-
-
Writing /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/train/_metadata.json
-{ "file_stats": [{"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_0.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_1.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_2.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_3.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_4.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_5.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_6.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_7.parquet", "num_rows":40960},
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_8.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_9.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_10.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_11.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_12.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_13.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_14.parquet", "num_rows":40960}, {"file_name": "s3://hugectr-io-test/data/dcn_parquet/train/gen_15.parquet", "num_rows":40960}], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, 
-            {"col_name": "C19", "index":19}, {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, {"col_name": "C23", "index":23}, 
-            {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-            {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, 
-            {"col_name": "C34", "index":34}, {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Writing /hugectr-io-test/data/dcn_parquet/train/_metadata.json
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/val/_metadata.json
-{ "file_stats": [{"file_name": "s3://hugectr-io-test/data/dcn_parquet/val/gen_0.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/val/gen_1.parquet", "num_rows":40960},
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/val/gen_2.parquet", "num_rows":40960}, 
-                 {"file_name": "s3://hugectr-io-test/data/dcn_parquet/val/gen_3.parquet", "num_rows":40960}], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, 
-            {"col_name": "C19", "index":19}, {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, {"col_name": "C23", "index":23}, 
-            {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-            {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, 
-            {"col_name": "C34", "index":34}, {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Writing /hugectr-io-test/data/dcn_parquet/val/_metadata.json
-
-
-
-
-
-
-

Training

-

Important APIs used in the following script:

-
    -
  1. We use the DataSourceParams to define the remote file system to read data from, in this case, S3.

  2. -
  3. In DataReaderParams, we specify the DataSourceParams.

  4. -
  5. In fit() method, we specify S3 path in the snapshot_prefix parameters to dump trained models to S3.

  6. -
-
-
-
%%writefile train_with_s3.py
-import hugectr
-from mpi4py import MPI
-from hugectr.data import DataSourceParams
-
-# Create a file system configuration for data reading
-data_source_params = DataSourceParams(
-    source = hugectr.FileSystemType_t.S3, #use AWS S3
-    server = 'us-east-1', #your AWS region
-    port = 9000, #with be ignored
-)
-
-solver = hugectr.CreateSolver(
-    max_eval_batches=1280,
-    batchsize_eval=1024,
-    batchsize=1024,
-    lr=0.001,
-    vvgpu=[[0]],
-    i64_input_key=True,
-    repeat_dataset=True,
-)
-reader = hugectr.DataReaderParams(
-    data_reader_type=hugectr.DataReaderType_t.Parquet,
-    source=["/hugectr-io-test/data/dcn_parquet/file_list.txt"],
-    eval_source="/hugectr-io-test/data/dcn_parquet/file_list_test.txt",
-    slot_size_array=[39884,39043,17289,7420,20263,3,7120,1543,39884,39043,17289,7420,20263,3,7120,1543,63,63,39884,39043,17289,7420,20263,3,7120,1543],
-    data_source_params=data_source_params, # Using the S3 configurations
-    check_type=hugectr.Check_t.Non,
-)
-optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)
-model = hugectr.Model(solver, reader, optimizer)
-model.add(
-    hugectr.Input(
-        label_dim=1,
-        label_name="label",
-        dense_dim=13,
-        dense_name="dense",
-        data_reader_sparse_param_array=[
-            hugectr.DataReaderSparseParam("data1", 1, True, 26)
-        ],
-    )
-)
-model.add(
-    hugectr.SparseEmbedding(
-        embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-        workspace_size_per_gpu_in_mb=150,
-        embedding_vec_size=16,
-        combiner="sum",
-        sparse_embedding_name="sparse_embedding1",
-        bottom_name="data1",
-        optimizer=optimizer,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Reshape,
-        bottom_names=["sparse_embedding1"],
-        top_names=["reshape1"],
-        leading_dim=416,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Concat, bottom_names=["reshape1", "dense"], top_names=["concat1"]
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Slice,
-        bottom_names=["concat1"],
-        top_names=["slice11", "slice12"],
-        ranges=[(0, 429), (0, 429)],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MultiCross,
-        bottom_names=["slice11"],
-        top_names=["multicross1"],
-        num_layers=6,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["slice12"],
-        top_names=["fc1"],
-        num_output=1024,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Dropout,
-        bottom_names=["relu1"],
-        top_names=["dropout1"],
-        dropout_rate=0.5,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Concat,
-        bottom_names=["dropout1", "multicross1"],
-        top_names=["concat2"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["concat2"],
-        top_names=["fc2"],
-        num_output=1,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["fc2", "label"],
-        top_names=["loss"],
-    )
-)
-model.compile()
-model.summary()
-
-model.fit(max_iter = 1100, display = 100, eval_interval = 500, snapshot = 1000, snapshot_prefix = "https://s3.us-east-1.amazonaws.com/hugectr-io-test/pipeline_test/dcn_model/")
-model.graph_to_json(graph_config_file = "dcn.json")
-
-
-
-
-
Overwriting train_with_s3.py
-
-
-
-
-
-
-
!python train_with_s3.py
-
-
-
-
-
HugeCTR Version: 4.1
-====================================================Model Init=====================================================
-[HCTR][06:54:55.819][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][06:54:55.819][INFO][RK0][main]: Global seed is 569406237
-[HCTR][06:54:55.822][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-[HCTR][06:54:57.710][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][06:54:57.710][INFO][RK0][main]: Start all2all warmup
-[HCTR][06:54:57.710][INFO][RK0][main]: End all2all warmup
-[HCTR][06:54:57.711][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][06:54:57.712][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][06:54:57.713][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][06:54:57.713][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][06:54:57.714][INFO][RK0][main]: Using S3 file system backend.
-[HCTR][06:54:59.762][INFO][RK0][main]: Using S3 file system backend.
-[HCTR][06:55:01.777][INFO][RK0][main]: Vocabulary size: 397821
-[HCTR][06:55:01.777][INFO][RK0][main]: max_vocabulary_size_per_gpu_=2457600
-[HCTR][06:55:01.780][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-===================================================Model Compile===================================================
-[HCTR][06:55:03.407][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][06:55:03.408][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][06:55:03.409][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][06:55:03.411][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][06:55:03.412][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data1                         
-(1024,1)                                (1024,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      data1                         sparse_embedding1             (1024,26,16)                  
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding1             reshape1                      (1024,416)                    
-------------------------------------------------------------------------------------------------------------------
-Concat                                  reshape1                      concat1                       (1024,429)                    
-                                        dense                                                                                     
-------------------------------------------------------------------------------------------------------------------
-Slice                                   concat1                       slice11                       (1024,429)                    
-                                                                      slice12                       (1024,429)                    
-------------------------------------------------------------------------------------------------------------------
-MultiCross                              slice11                       multicross1                   (1024,429)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            slice12                       fc1                           (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Dropout                                 relu1                         dropout1                      (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Concat                                  dropout1                      concat2                       (1024,1453)                   
-                                        multicross1                                                                               
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            concat2                       fc2                           (1024,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc2                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][06:55:03.412][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1100
-[HCTR][06:55:03.412][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
-[HCTR][06:55:03.412][INFO][RK0][main]: Evaluation interval: 500, snapshot interval: 1000
-[HCTR][06:55:03.412][INFO][RK0][main]: Dense network trainable: True
-[HCTR][06:55:03.412][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][06:55:03.412][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][06:55:03.412][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][06:55:03.412][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][06:55:03.412][INFO][RK0][main]: Training source file: /hugectr-io-test/data/dcn_parquet/file_list.txt
-[HCTR][06:55:03.412][INFO][RK0][main]: Evaluation source file: /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-[HCTR][06:55:04.668][INFO][RK0][main]: Iter: 100 Time(100 iters): 1.25574s Loss: 0.712926 lr:0.001
-[HCTR][06:55:06.839][INFO][RK0][main]: Iter: 200 Time(100 iters): 2.16987s Loss: 0.701584 lr:0.001
-[HCTR][06:55:08.066][INFO][RK0][main]: Iter: 300 Time(100 iters): 1.22653s Loss: 0.696012 lr:0.001
-[HCTR][06:55:10.229][INFO][RK0][main]: Iter: 400 Time(100 iters): 2.16121s Loss: 0.698167 lr:0.001
-[HCTR][06:55:11.653][INFO][RK0][main]: Iter: 500 Time(100 iters): 1.42367s Loss: 0.695641 lr:0.001
-[HCTR][06:55:29.727][INFO][RK0][main]: Evaluation, AUC: 0.500979
-[HCTR][06:55:29.727][INFO][RK0][main]: Eval Time for 1280 iters: 18.0735s
-[HCTR][06:55:32.311][INFO][RK0][main]: Iter: 600 Time(100 iters): 20.6575s Loss: 0.696028 lr:0.001
-[HCTR][06:55:33.349][INFO][RK0][main]: Iter: 700 Time(100 iters): 1.03696s Loss: 0.693602 lr:0.001
-[HCTR][06:55:35.089][INFO][RK0][main]: Iter: 800 Time(100 iters): 1.73903s Loss: 0.693618 lr:0.001
-[HCTR][06:55:36.191][INFO][RK0][main]: Iter: 900 Time(100 iters): 1.10101s Loss: 0.696232 lr:0.001
-[HCTR][06:55:37.789][INFO][RK0][main]: Iter: 1000 Time(100 iters): 1.59704s Loss: 0.693168 lr:0.001
-[HCTR][06:55:53.378][INFO][RK0][main]: Evaluation, AUC: 0.50103
-[HCTR][06:55:53.378][INFO][RK0][main]: Eval Time for 1280 iters: 15.5882s
-[HCTR][06:55:53.378][INFO][RK0][main]: Using S3 file system backend.
-[HCTR][06:55:55.410][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][06:55:56.473][DEBUG][RK0][main]: Successfully write to AWS S3 location:  https://s3.us-east-1.amazonaws.com/hugectr-io-test/pipeline_test/dcn_model/0_sparse_1000.model/key
-[HCTR][06:55:57.348][DEBUG][RK0][main]: Successfully write to AWS S3 location:  https://s3.us-east-1.amazonaws.com/hugectr-io-test/pipeline_test/dcn_model/0_sparse_1000.model/emb_vector
-[HCTR][06:55:57.360][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][06:55:57.360][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HCTR][06:55:57.361][INFO][RK0][main]: Using S3 file system backend.
-[HCTR][06:56:00.462][DEBUG][RK0][main]: Successfully write to AWS S3 location:  https://s3.us-east-1.amazonaws.com/hugectr-io-test/pipeline_test/dcn_model/_dense_1000.model
-[HCTR][06:56:00.467][INFO][RK0][main]: Dumping dense weights to file, successful
-[HCTR][06:56:00.467][INFO][RK0][main]: Using S3 file system backend.
-[HCTR][06:56:02.839][DEBUG][RK0][main]: Successfully write to AWS S3 location:  https://s3.us-east-1.amazonaws.com/hugectr-io-test/pipeline_test/dcn_model/_opt_dense_1000.model
-[HCTR][06:56:02.843][INFO][RK0][main]: Dumping dense optimizer states to file, successful
-[HCTR][06:56:06.987][INFO][RK0][main]: Finish 1100 iterations with batchsize: 1024 in 63.58s.
-[HCTR][06:56:06.988][INFO][RK0][main]: Save the model graph to dcn.json successfully
-
-
-
-
-
-
-
-

Training a DCN model with Google Cloud Storage

-

Before you start: -Please note that GCS SDK are NOT preinstalled in the NGC docker. To use GCS related functionalites, please do the following steps to customize the building of HugeCTR:

-
    -
  1. git clone https://github.com/NVIDIA/HugeCTR.git

  2. -
  3. cd HugeCTR

  4. -
  5. git submodule update –init –recursive

  6. -
  7. mkdir -p build && cd build

  8. -
  9. cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 -DENABLE_GCS=ON … #ENABLE_GCS option will install GCS SDKs for you.

  10. -
  11. make -j && make install

  12. -
-
-

Data preparation

-

Create file_list.txt and file_list_test.txt:

-
-
-
!mkdir -p /hugectr-io-test/data/dcn_parquet/train
-!mkdir -p /hugectr-io-test/data/dcn_parquet/val
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/file_list.txt
-16
-gs://hugectr-io-test/data/dcn_parquet/train/gen_0.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_1.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_2.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_3.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_4.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_5.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_6.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_7.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_8.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_9.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_10.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_11.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_12.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_13.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_14.parquet
-gs://hugectr-io-test/data/dcn_parquet/train/gen_15.parquet
-
-
-
-
-
Overwriting /hugectr-io-test/data/dcn_parquet/file_list.txt
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-4
-gs://hugectr-io-test/data/dcn_parquet/val/gen_0.parquet
-gs://hugectr-io-test/data/dcn_parquet/val/gen_1.parquet
-gs://hugectr-io-test/data/dcn_parquet/val/gen_2.parquet
-gs://hugectr-io-test/data/dcn_parquet/val/gen_3.parquet
-
-
-
-
-
Overwriting /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/train/_metadata.json
-{ "file_stats": [{"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_0.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_1.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_2.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_3.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_4.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_5.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_6.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_7.parquet", "num_rows":40960},
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_8.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_9.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_10.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_11.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_12.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_13.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_14.parquet", "num_rows":40960}, {"file_name": "gs://hugectr-io-test/data/dcn_parquet/train/gen_15.parquet", "num_rows":40960}], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, 
-            {"col_name": "C19", "index":19}, {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, {"col_name": "C23", "index":23}, 
-            {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-            {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, 
-            {"col_name": "C34", "index":34}, {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Overwriting /hugectr-io-test/data/dcn_parquet/train/_metadata.json
-
-
-
-
-
-
-
%%writefile /hugectr-io-test/data/dcn_parquet/val/_metadata.json
-{ "file_stats": [{"file_name": "gs://hugectr-io-test/data/dcn_parquet/val/gen_0.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/val/gen_1.parquet", "num_rows":40960},
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/val/gen_2.parquet", "num_rows":40960}, 
-                 {"file_name": "gs://hugectr-io-test/data/dcn_parquet/val/gen_3.parquet", "num_rows":40960}], 
-  "labels": [{"col_name": "label0", "index":0} ], 
-  "conts": [{"col_name": "C1", "index":1}, {"col_name": "C2", "index":2}, {"col_name": "C3", "index":3}, {"col_name": "C4", "index":4}, {"col_name": "C5", "index":5}, {"col_name": "C6", "index":6}, 
-            {"col_name": "C7", "index":7}, {"col_name": "C8", "index":8}, {"col_name": "C9", "index":9}, {"col_name": "C10", "index":10}, {"col_name": "C11", "index":11}, {"col_name": "C12", "index":12}, 
-            {"col_name": "C13", "index":13} ], 
-  "cats": [{"col_name": "C14", "index":14}, {"col_name": "C15", "index":15}, {"col_name": "C16", "index":16}, {"col_name": "C17", "index":17}, {"col_name": "C18", "index":18}, 
-            {"col_name": "C19", "index":19}, {"col_name": "C20", "index":20}, {"col_name": "C21", "index":21}, {"col_name": "C22", "index":22}, {"col_name": "C23", "index":23}, 
-            {"col_name": "C24", "index":24}, {"col_name": "C25", "index":25}, {"col_name": "C26", "index":26}, {"col_name": "C27", "index":27}, {"col_name": "C28", "index":28}, 
-            {"col_name": "C29", "index":29}, {"col_name": "C30", "index":30}, {"col_name": "C31", "index":31}, {"col_name": "C32", "index":32}, {"col_name": "C33", "index":33}, 
-            {"col_name": "C34", "index":34}, {"col_name": "C35", "index":35}, {"col_name": "C36", "index":36}, {"col_name": "C37", "index":37}, {"col_name": "C38", "index":38}, {"col_name": "C39", "index":39} ] }
-
-
-
-
-
Overwriting /hugectr-io-test/data/dcn_parquet/val/_metadata.json
-
-
-
-
-
-
-

Training

-

Important APIs used in the following script:

-
    -
  1. We use the DataSourceParams to define the remote file system to read data from, in this case, GCS.

  2. -
  3. In DataReaderParams, we specify the DataSourceParams.

  4. -
  5. In fit() method, we specify GCS path in the snapshot_prefix parameters to dump trained models to GCS.

  6. -
-
-
-
#You need to set the GCP credentials environmental variable to access the GCS.
-
-%env GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/gcs_key.json
-
-
-
-
-
env: GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/gcs_key.json
-
-
-
-
-
-
-
%%writefile train_with_gcs.py
-import hugectr
-from mpi4py import MPI
-from hugectr.data import DataSourceParams
-
-# Create a file system configuration for data reading
-data_source_params = DataSourceParams(
-    source = hugectr.FileSystemType_t.GCS, #use Google Cloud Storage
-    server = 'storage.googleapis.com', #your endpoint override, usually storage.googleapis.com or storage.google.cloud.com
-    port = 9000, #with be ignored
-)
-
-solver = hugectr.CreateSolver(
-    max_eval_batches=1280,
-    batchsize_eval=1024,
-    batchsize=1024,
-    lr=0.001,
-    vvgpu=[[0]],
-    i64_input_key=True,
-    repeat_dataset=True,
-)
-reader = hugectr.DataReaderParams(
-    data_reader_type=hugectr.DataReaderType_t.Parquet,
-    source=["/hugectr-io-test/data/dcn_parquet/file_list.txt"],
-    eval_source="/hugectr-io-test/data/dcn_parquet/file_list_test.txt",
-    slot_size_array=[39884,39043,17289,7420,20263,3,7120,1543,39884,39043,17289,7420,20263,3,7120,1543,63,63,39884,39043,17289,7420,20263,3,7120,1543],
-    data_source_params=data_source_params, # Using the GCS configurations
-    check_type=hugectr.Check_t.Non,
-)
-optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)
-model = hugectr.Model(solver, reader, optimizer)
-model.add(
-    hugectr.Input(
-        label_dim=1,
-        label_name="label",
-        dense_dim=13,
-        dense_name="dense",
-        data_reader_sparse_param_array=[
-            hugectr.DataReaderSparseParam("data1", 1, True, 26)
-        ],
-    )
-)
-model.add(
-    hugectr.SparseEmbedding(
-        embedding_type=hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
-        workspace_size_per_gpu_in_mb=150,
-        embedding_vec_size=16,
-        combiner="sum",
-        sparse_embedding_name="sparse_embedding1",
-        bottom_name="data1",
-        optimizer=optimizer,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Reshape,
-        bottom_names=["sparse_embedding1"],
-        top_names=["reshape1"],
-        leading_dim=416,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Concat, bottom_names=["reshape1", "dense"], top_names=["concat1"]
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Slice,
-        bottom_names=["concat1"],
-        top_names=["slice11", "slice12"],
-        ranges=[(0, 429), (0, 429)],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.MultiCross,
-        bottom_names=["slice11"],
-        top_names=["multicross1"],
-        num_layers=6,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["slice12"],
-        top_names=["fc1"],
-        num_output=1024,
-    )
-)
-model.add(
-    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"])
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Dropout,
-        bottom_names=["relu1"],
-        top_names=["dropout1"],
-        dropout_rate=0.5,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.Concat,
-        bottom_names=["dropout1", "multicross1"],
-        top_names=["concat2"],
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.InnerProduct,
-        bottom_names=["concat2"],
-        top_names=["fc2"],
-        num_output=1,
-    )
-)
-model.add(
-    hugectr.DenseLayer(
-        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
-        bottom_names=["fc2", "label"],
-        top_names=["loss"],
-    )
-)
-model.compile()
-model.summary()
-
-model.fit(max_iter = 1100, display = 100, eval_interval = 500, snapshot = 1000, snapshot_prefix = "https://storage.googleapis.com/hugectr-io-test/pipeline_test/")
-model.graph_to_json(graph_config_file = "dcn.json")
-
-
-
-
-
Overwriting train_with_gcs.py
-
-
-
-
-
-
-
!python train_with_gcs.py
-
-
-
-
-
HugeCTR Version: 4.1
-====================================================Model Init=====================================================
-[HCTR][03:15:35.248][WARNING][RK0][main]: The model name is not specified when creating the solver.
-[HCTR][03:15:35.248][INFO][RK0][main]: Global seed is 1008636636
-[HCTR][03:15:35.251][INFO][RK0][main]: Device to NUMA mapping:
-  GPU 0 ->  node 0
-[HCTR][03:15:37.306][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
-[HCTR][03:15:37.306][INFO][RK0][main]: Start all2all warmup
-[HCTR][03:15:37.306][INFO][RK0][main]: End all2all warmup
-[HCTR][03:15:37.307][INFO][RK0][main]: Using All-reduce algorithm: NCCL
-[HCTR][03:15:37.308][INFO][RK0][main]: Device 0: Tesla V100-SXM2-32GB
-[HCTR][03:15:37.308][INFO][RK0][main]: num of DataReader workers for train: 1
-[HCTR][03:15:37.308][INFO][RK0][main]: num of DataReader workers for eval: 1
-[HCTR][03:15:37.309][INFO][RK0][main]: Using GCS file system backend.
-[HCTR][03:15:37.323][INFO][RK0][main]: Using GCS file system backend.
-[HCTR][03:15:37.328][INFO][RK0][main]: Vocabulary size: 397821
-[HCTR][03:15:37.329][INFO][RK0][main]: max_vocabulary_size_per_gpu_=2457600
-[HCTR][03:15:37.331][INFO][RK0][main]: Graph analysis to resolve tensor dependency
-[HCTR][03:15:37.331][WARNING][RK0][main]: using multi-cross v1
-[HCTR][03:15:37.331][WARNING][RK0][main]: using multi-cross v1
-===================================================Model Compile===================================================
-[HCTR][03:15:39.005][INFO][RK0][main]: gpu0 start to init embedding
-[HCTR][03:15:39.006][INFO][RK0][main]: gpu0 init embedding done
-[HCTR][03:15:39.008][INFO][RK0][main]: Starting AUC NCCL warm-up
-[HCTR][03:15:39.010][INFO][RK0][main]: Warm-up done
-===================================================Model Summary===================================================
-[HCTR][03:15:39.010][INFO][RK0][main]: Model structure on each GPU
-Label                                   Dense                         Sparse                        
-label                                   dense                          data1                         
-(1024,1)                                (1024,13)                               
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-Layer Type                              Input Name                    Output Name                   Output Shape                  
-——————————————————————————————————————————————————————————————————————————————————————————————————————————————————
-DistributedSlotSparseEmbeddingHash      data1                         sparse_embedding1             (1024,26,16)                  
-------------------------------------------------------------------------------------------------------------------
-Reshape                                 sparse_embedding1             reshape1                      (1024,416)                    
-------------------------------------------------------------------------------------------------------------------
-Concat                                  reshape1                      concat1                       (1024,429)                    
-                                        dense                                                                                     
-------------------------------------------------------------------------------------------------------------------
-Slice                                   concat1                       slice11                       (1024,429)                    
-                                                                      slice12                       (1024,429)                    
-------------------------------------------------------------------------------------------------------------------
-MultiCross                              slice11                       multicross1                   (1024,429)                    
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            slice12                       fc1                           (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-ReLU                                    fc1                           relu1                         (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Dropout                                 relu1                         dropout1                      (1024,1024)                   
-------------------------------------------------------------------------------------------------------------------
-Concat                                  dropout1                      concat2                       (1024,1453)                   
-                                        multicross1                                                                               
-------------------------------------------------------------------------------------------------------------------
-InnerProduct                            concat2                       fc2                           (1024,1)                      
-------------------------------------------------------------------------------------------------------------------
-BinaryCrossEntropyLoss                  fc2                           loss                                                        
-                                        label                                                                                     
-------------------------------------------------------------------------------------------------------------------
-=====================================================Model Fit=====================================================
-[HCTR][03:15:39.011][INFO][RK0][main]: Use non-epoch mode with number of iterations: 1100
-[HCTR][03:15:39.011][INFO][RK0][main]: Training batchsize: 1024, evaluation batchsize: 1024
-[HCTR][03:15:39.011][INFO][RK0][main]: Evaluation interval: 500, snapshot interval: 1000
-[HCTR][03:15:39.011][INFO][RK0][main]: Dense network trainable: True
-[HCTR][03:15:39.011][INFO][RK0][main]: Sparse embedding sparse_embedding1 trainable: True
-[HCTR][03:15:39.011][INFO][RK0][main]: Use mixed precision: False, scaler: 1.000000, use cuda graph: True
-[HCTR][03:15:39.011][INFO][RK0][main]: lr: 0.001000, warmup_steps: 1, end_lr: 0.000000
-[HCTR][03:15:39.011][INFO][RK0][main]: decay_start: 0, decay_steps: 1, decay_power: 2.000000
-[HCTR][03:15:39.011][INFO][RK0][main]: Training source file: /hugectr-io-test/data/dcn_parquet/file_list.txt
-[HCTR][03:15:39.011][INFO][RK0][main]: Evaluation source file: /hugectr-io-test/data/dcn_parquet/file_list_test.txt
-[HCTR][03:15:40.236][INFO][RK0][main]: Iter: 100 Time(100 iters): 1.22452s Loss: 0.786299 lr:0.001
-[HCTR][03:15:41.872][INFO][RK0][main]: Iter: 200 Time(100 iters): 1.6347s Loss: 0.738846 lr:0.001
-[HCTR][03:15:43.102][INFO][RK0][main]: Iter: 300 Time(100 iters): 1.22938s Loss: 0.711017 lr:0.001
-[HCTR][03:15:44.736][INFO][RK0][main]: Iter: 400 Time(100 iters): 1.63355s Loss: 0.708317 lr:0.001
-[HCTR][03:15:45.850][INFO][RK0][main]: Iter: 500 Time(100 iters): 1.11226s Loss: 0.697101 lr:0.001
-[HCTR][03:15:59.880][INFO][RK0][main]: Evaluation, AUC: 0.501301
-[HCTR][03:15:59.880][INFO][RK0][main]: Eval Time for 1280 iters: 14.0298s
-[HCTR][03:16:01.456][INFO][RK0][main]: Iter: 600 Time(100 iters): 15.6054s Loss: 0.698077 lr:0.001
-[HCTR][03:16:02.201][INFO][RK0][main]: Iter: 700 Time(100 iters): 0.744573s Loss: 0.697804 lr:0.001
-[HCTR][03:16:03.244][INFO][RK0][main]: Iter: 800 Time(100 iters): 1.04207s Loss: 0.695543 lr:0.001
-[HCTR][03:16:04.007][INFO][RK0][main]: Iter: 900 Time(100 iters): 0.761465s Loss: 0.695323 lr:0.001
-[HCTR][03:16:05.289][INFO][RK0][main]: Iter: 1000 Time(100 iters): 1.28151s Loss: 0.695319 lr:0.001
-[HCTR][03:16:17.647][INFO][RK0][main]: Evaluation, AUC: 0.501347
-[HCTR][03:16:17.647][INFO][RK0][main]: Eval Time for 1280 iters: 12.3576s
-[HCTR][03:16:17.647][INFO][RK0][main]: Using GCS file system backend.
-[HCTR][03:16:17.664][INFO][RK0][main]: Rank0: Write hash table to file
-[HCTR][03:16:18.623][DEBUG][RK0][main]: Successfully write to GCS location:  https://storage.googleapis.com/hugectr-io-test/pipeline_test/0_sparse_1000.model/key
-[HCTR][03:16:20.289][DEBUG][RK0][main]: Successfully write to GCS location:  https://storage.googleapis.com/hugectr-io-test/pipeline_test/0_sparse_1000.model/emb_vector
-[HCTR][03:16:20.294][INFO][RK0][main]: Dumping sparse weights to files, successful
-[HCTR][03:16:20.294][INFO][RK0][main]: Dumping sparse optimzer states to files, successful
-[HCTR][03:16:20.294][INFO][RK0][main]: Using GCS file system backend.
-[HCTR][03:16:21.254][DEBUG][RK0][main]: Successfully write to GCS location:  https://storage.googleapis.com/hugectr-io-test/pipeline_test/_dense_1000.model
-[HCTR][03:16:21.255][INFO][RK0][main]: Dumping dense weights to file, successful
-[HCTR][03:16:21.255][INFO][RK0][main]: Using GCS file system backend.
-[HCTR][03:16:21.803][DEBUG][RK0][main]: Successfully write to GCS location:  https://storage.googleapis.com/hugectr-io-test/pipeline_test/_opt_dense_1000.model
-[HCTR][03:16:21.804][INFO][RK0][main]: Dumping dense optimizer states to file, successful
-[HCTR][03:16:22.606][INFO][RK0][main]: Finish 1100 iterations with batchsize: 1024 in 43.60s.
-[HCTR][03:16:22.607][INFO][RK0][main]: Save the model graph to dcn.json successfully
-
-
-
-
-
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/objects.inv b/review/pr-458/objects.inv deleted file mode 100644 index 5621641f5d..0000000000 Binary files a/review/pr-458/objects.inv and /dev/null differ diff --git a/review/pr-458/performance.html b/review/pr-458/performance.html deleted file mode 100644 index 452e8ba0ea..0000000000 --- a/review/pr-458/performance.html +++ /dev/null @@ -1,189 +0,0 @@ - - - - - - - Performance — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Performance

-

Finding ways to enhance HugeCTR’s performance is one of our top priorities. Unlike other frameworks, we apply all the optimizations in the mlperf submission for each release.

-

We’ve tested HugeCTR’s performance on the following systems:

- -
-

MLPerf on DGX-2 and DGX A100

-

The DLRM benchmark was submitted to MLPerf Training v0.7 with the release of HugeCTR version 2.2 and MLPerf Training v1.0 with the release of HugeCTR version 3.1. We used the Criteo 1TB Click Logs dataset, which contains 4 billion user and item interactions over 24 days. The DGX-2 with 16 V100 GPUs and DGX A100 with eight A100 GPUs were the target machines. For more information, see this blog post.

-_images/mlperf_10.PNG -
Fig. 1: HugeCTR's MLPerf v1.0 Result
-
-
-

Evaluating HugeCTR’s Performance on the DGX-1

-

The scalability and performance of HugeCTR has been tested and compared with TensorFlow running on NVIDIA V100 GPUs within a single DGX-1 system. HugeCTR can achieve a speedup that’s 114 times faster than a multi-thread TensorFlow CPU with only one V100 while generating almost the same loss curves for both evaluation and training (see Fig. 2 and Fig. 3).

-

Test environment:

-
    -
  • CPU Server: Dual 20-core Intel® Xeon® CPU E5-2698 v4 @ 2.20GHz

  • -
  • TensorFlow version 2.0.0

  • -
  • V100 16GB: NVIDIA DGX1 servers

  • -
-

Network:

-
    -
  • Wide Deep Learning: Nx 1024-unit FC layers with ReLU and dropout, emb_dim: 16; Optimizer: Adam for both Linear and DNN models

  • -
  • Deep Cross Network: Nx 1024-unit FC layers with ReLU and dropout, emb_dim: 16, 6x cross layers; Optimizer: Adam for both Linear and DNN models

  • -
-

Dataset: -The data is provided by CriteoLabs. The original training set contains 45,840,617 examples. Each example contains a label (0 by default OR 1 if the ad was clicked) and 39 features in which 13 are integer and 26 are categorical.

-

Preprocessing:

-
    -
  • Common: Preprocessed by using the scripts available in the tools/criteo_script directory of the GitHub repository.

  • -
  • HugeCTR: Converted to the HugeCTR data format with criteo2hugectr.

  • -
  • TF: Converted to the TFRecord format for efficient training on Tensorflow.

  • -
-

The scalability of HugeCTR and the number of active GPUs have increased simply because of the high-efficient data exchange and three-stage processing pipeline. In this pipeline, we overlap the data reading from the file, host to the device data transaction (inter-node and intra-node), and train the GPU. The following chart shows the scalability of HugeCTR with a batch size of 16384 and seven layers on DGX1 servers.

-_images/fig12_multi_gpu_performance.PNG -
Fig. 2: HugeCTR's Multi-GPU Performance
-
-
-

Evaluating HugeCTR’s Performance on TensorFlow

-

In the TensorFlow test case that’s shown here, HugeCTR exhibits a speedup up to 114 times faster compared to a CPU server running TensorFlow with only one V100 GPU and almost the same loss curve.

-_images/WDL.JPG -
Fig. 3: WDL Performance and Loss Curve Comparison with TensorFlow Version 2.0
-



-_images/DCN.JPG -
Fig. 4: DCN performance and Loss Curve Comparison with TensorFlow Version 2.0
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/release_notes.html b/review/pr-458/release_notes.html deleted file mode 100644 index 9652720ae1..0000000000 --- a/review/pr-458/release_notes.html +++ /dev/null @@ -1,1333 +0,0 @@ - - - - - - - Release Notes — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Release Notes

-
-

What’s New in Version 24.06

-
    -
  • Sparse Operation Kit (SOK) Updates:

    -
      -
    • A new API sok.incremental_dump has been added, which allows users to dump newly added keys and values into a numpy array by specifying a time threshold. Currently it only supports sok.DynamicVariable that uses HKV as the backend.

    • -
    • Fixed the issue of wgrad using too much GPU memory.

    • -
    • Fixed an illegal memory access issue in a CUDA kernel during backward propagation.

    • -
    • The documentation and examples for SOK (Sparse Operation Kit) have been updated. For more details, refer to the SOK Documentation.

    • -
    -
  • -
-
-
-

What’s New in Version 23.12

-
    -
  • Lock-free Inference Cache in HPS

    -
      -
    • We have added a new lock-free GPU embedding cache for the hierarhical parameter server, which can further improve the performance of embedding table lookup in inference. It also doesn’t lead to data inconsistency even if concurrent model updates or missing key insertions are in use. That is because we ensure the cache consistency through the asynchronous stream synchronization mechanism. To enable lock-free GPU embedding cache, a user only needs to set “embedding_cache_type” to dynamic and "use_hctr_cache_implementation" to false.

    • -
    -
  • -
  • Official SOK Release

    -
      -
    • The SOK is not an experiment package anymore but is now officially supported by HugeCTR. Do import sparse_operation_kit as sok instead of from sparse_operation_kit import experiment as sok

    • -
    • sok.DynamicVariable supports Merlin-HKV as its backend

    • -
    • The parallel dump and load functions are added

    • -
    -
  • -
  • Code Cleaning and Deprecation

    -
      -
    • Deprecated the Model::export_predictions function. Use the Model::check_out_tensor function instead.

    • -
    • We have deprecated the Norm and legacy Raw DataReaders. Use hugectr.DataReaderType_t.RawAsync or hugectr.DataReaderType_t.Parquet as their alternatives.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • Improved the performance of the HKV lookup via the SOK

    • -
    • Fix an illegal memory access issue from the SOK backward pass, occurring in a corner case

    • -
    • Resolved the mean combiner returning zeroes, when the pooling factor is zero, which can make the SOK lookup return NaN.

    • -
    • Fixed some dependency related build issues

    • -
    • Optimized the performance of the dynamic embedding table (DET) in the SOK.

    • -
    • Fixed the crash when a user specifies negative keys in using the DET via the SOK.

    • -
    • Resolved the occasional correctness issue which becomes visible during the backward propagation phase of the SOK, in handling thousands of embedding tables.

    • -
    • Removed the runtime errors happening in the Tensorflow >= 2.13.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • If we set max_eval_batches and batchsize_eval to some large values such as 5000 and 12000 respectively, the training process leads to the illegal memory access error. The issue is from the CUB, and is fixed in its latest version. However, because it is only included in CUDA 12.3, which is not used by our NGC container yet, until we update our NGC container to rely upon that version of CUDA, please rebuild HugeCTR with the newest CUB as a workaround. Otherwise, please try to avoid such large max_eval_batches and batchsize_eval.

    • -
    • HugeCTR can lead to a runtime error if client code calls RMM’s rmm::mr::set_current_device_resource() or rmm::mr::set_current_device_resource() because HugeCTR’s Parquet Data Reader also calls rmm::mr::set_current_device_resource(), and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, a user can set an environment variable HCTR_RMM_SETTABLE to 0 to disable HugeCTR to set a custom RMM device resource, if they know rmm::mr::set_current_device_resource() is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also this NCCL known issue and this GitHub issue](https://github.com/NVIDIA-Merlin/HugeCTR/issues/243).

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 23.11

-
    -
  • Code Cleaning and Deprecation

    -
      -
    • The offline inference has been deprecated from our documentation, notebook suite, and code. Please check out the HPS plugin for TensorFlow and TensorRT. The multi-GPU inference is not illustrated in this HPS TRT notebook.

    • -
    • We are working on deprecating the Embedding Training Cache (ETC). If you trying using that feature, it still works but omits a deprecation warning message. In a near-futre release, they will be removed from the API and code level. Please refer to the NVIDIA HierarchicalKV as an alternative.

    • -
    • In this release, we have also cleaned up our C++ code and CMakeLists.txt to improve their maintainability and fix minor but potential issues. There will be more code cleanup in several future releases.

    • -
    -
  • -
  • General Updates:

    -
      -
    • Enabled the support of the static CUDA runtime. Now you can experimentally enable the feature by specifying -DUSE_CUDART_STATIC=ON in configuring the code with cmake, while the dynamic CUDA runtime is still used by default.

    • -
    • Added HPS as a custom extension for TorchScript. A user can leverage the HPS embedding lookup during the inference of scripted torch module.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • Resolved a couple of performance regressions when the SOK is used together with HKV, related to unique operation and unified memory

    • -
    • Reduced the unnessary memory consumption of intermediate buffers in loading and dumping the SOK embedding

    • -
    • Fixed the Interaction Layer to support large num_slots

    • -
    • Resolved the occasional runtime error in using multiple H800 GPUs

    • -
    -
  • -
  • Known Issues:

    -
      -
    • If we set max_eval_batches and batchsize_eval to some large values such as 5000 and 12000 respectively, the training process leads to the illegal memory access error. The issue is from the CUB, and is fixed in its latest version. However, because it is only included in CUDA 12.3, which is not used by our NGC container yet, until we update our NGC container to rely upon that version of CUDA, please rebuild HugeCTR with the newest CUB as a workaround. Otherwise, please try to avoid such large max_eval_batches and batchsize_eval.

    • -
    • HugeCTR can lead to a runtime error if client code calls RMM’s rmm::mr::set_current_device_resource() or rmm::mr::set_current_device_resource() because HugeCTR’s Parquet Data Reader also calls rmm::mr::set_current_device_resource(), and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, a user can set an environment variable HCTR_RMM_SETTABLE to 0 to disable HugeCTR to set a custom RMM device resource, if they know rmm::mr::set_current_device_resource() is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also this NCCL known issue and this GitHub issue](https://github.com/NVIDIA-Merlin/HugeCTR/issues/243).

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 23.08

-
    -
  • Hierarchical Parameter Server:

    -
      -
    • Support static EC fp8 quantization: We already support quantization for fp8 in the static cache. HPS will perform fp8 quantization on the embedding vector when reading the embedding table by enable fp8_quant configuration, and perform fp32 dequantization on the embedding vector corresponding to the queried embedding key in the static embedding cache, so as to ensure the accuracy of dense part prediction.

    • -
    • Large model deployment demo based on HPS TensorRT-plugin: This demo shows how to use the HPS TRT-plugin to build a complete TRT engine for deploying a 147GB embedding table based on a 1TB Criteo dataset. We also provide static embedding implementation for fully offloading embedding tables to host page-locke memory for benchmarks on x86 and Grace Hopper Superchip.

    • -
    • Issues Fixed

      -
        -
      • Resolve Kafka update ingestion error. There was an error that prevented handing over online parameter updates coming from Kafka message queues to Redis database backends.

      • -
      • Fixed HPS Triton backend re-initializing the embedding cache issue due to undefined null when getting the embedded cache on the corresponding device.

      • -
      -
    • -
    -
  • -
  • HugeCTR Training & SOK:

    -
      -
    • Dense Embedding Support in Embedding Collection: We add the dense embedding in embedding collection. To use the dense embedding, a user just needs to specify the _concat_ as the combiner. For more information, please refer to dense_embedding.py.

    • -
    • Refinement of sequence mask layer and attention softmax layer to support cross-attention.

    • -
    • We introduce a more generalized reshape layer which allows user to reshape source tensor to destination tensor without dimension restriction. Please refer Reshape Layer API for more detailed information

    • -
    • Issues Fixed

      -
        -
      • Fix error when using Localized Variable in Sparse Operation Kit

      • -
      • Fix bug in Sparse Operation Kit backward computing.

      • -
      • Fix some SOK performance bugs by replacing the calls to DeviceSegmentedSort with DeviceSegmentedRadixSort

      • -
      • Fix a bug from the SOK’s Python API side, which led to the duplicate calls to the model’s forward function and thus degraded the performance.

      • -
      • Reduce the CPU launch overhead

        -
          -
        • Remove dynamic vector allocation in DataDistributor

        • -
        • Remove the use of the checkout value tensor from the DataReader. The data reader generates a nested std::vector on-the-fly and returns the vector to the embedding collection, which incur lots of host overhead. We have made it a class member so that the overhead can be eliminated.

        • -
        -
      • -
      • Align with the latest parquet update. -We have fixed a bug due to the parquet_reader_options::set_num_rows() update of cudf 23.06: PR .

      • -
      • Fix core23 assertion of debug mode -We have fixed an assertion bug while the new core library is enabled if HugeCTR is built in debug mode.

      • -
      -
    • -
    -
  • -
  • General Updates:

    -
      -
    • Cleaned up logging code. Added compile-time format-string validation. Fixed issue where HCTR_PRINT did not interpret format strings properly.

    • -
    • Enabled the experimental enablement of the static CUDA runtime. Use -DUSE_CUDART_STATIC=ON in cmak’ing

    • -
    • Modified the data preprocessing documentation to clarify the correct commands to use in different situations. Fixed the error of the description of arguments

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR can lead to a runtime error if client code calls RMM’s rmm::mr::set_current_device_resource() or rmm::mr::set_current_device_resource() because HugeCTR’s Parquet Data Reader also calls rmm::mr::set_current_device_resource(), and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, a user can set an environment variable HCTR_RMM_SETTABLE to 0 to disable HugeCTR to set a custom RMM device resource, if they know rmm::mr::set_current_device_resource() is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also this NCCL known issue and this GitHub issue](https://github.com/NVIDIA-Merlin/HugeCTR/issues/243).

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 23.06

-

In this release, we have fixed issues and enhanced the code.

-
    -
  • 3G Embedding Updates:

    -
      -
    • Refactored the DataDistributor related code

    • -
    • New SOK load() and dump() APIs are usable in TensorFlow 2. To use the API, specify sok_vars in addition to path.

    • -
    • sok_vars is a list of sok.variable and/or sok.dynamic_variable.

    • -
    • If you want to store optimizer states such as m and v of Adam, the optimizer must be specified as well.

    • -
    • The optimizer must be a tf.keras.optimizers.Optimizer or sok.OptimizerWrapper while their underlying type must be SGD, Adamax, Adadelta, Adagrad, or Ftrl.

    • -
    -
    import sparse_operation_kit as sok
    -
    -sok.load(path, sok_vars, optimizer=None)
    -
    -sok.dump(path, sok_vars, optimizer=None)
    -
    -
    -

    These APIs are independent from the number of GPUs in use and the sharding strategy. For instance, a distributed embedding table trained and dumped with 8 GPUs can be loaded to train on a 4-GPU machine.

    -
  • -
  • Issues Fixed:

    -
      -
    • Fixed the segmentation fault and wrong initialization when the embedding table fusion is enabled in using the HPS UVM implementation

    • -
    • cudaDeviceSynchronize() is removed when building the HugeCTR in the debug mode, so you can enable the CUDA Graph even in the debug mode.

    • -
    • Modified some Notebooks to use the most recent version of NGC container

    • -
    • Fixed the EmbeddingTableCollection utest to run correctly with multiple GPUs

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR can lead to a runtime error if client code calls RMM’s rmm::mr::set_current_device_resource() or rmm::mr::set_current_device_resource() because HugeCTR’s Parquet Data Reader also calls rmm::mr::set_current_device_resource(), and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, set an environment variable HCTR_RMM_SETTABLE to 0 to disable HugeCTR to set a custom RMM device resource, if they know rmm::mr::set_current_device_resource() is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also this NCCL known issue and this GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka,make sure that a sufficient number of Kafka brokers are running, operating properly, and reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 23.04

-
    -
  • Hierarchical Parameter Server Enhancements:

    -
      -
    • HPS Table Fusion: From this release, you can fuse tables of the same embedding vector size in HPS. We support this feature in the HPS plugin for TensorFlow and the Triton backend for HPS. To turn on table fusion, set fuse_embedding_table to true in the HPS JSON file. This feature requires that the key values in different tables do not overlap and the embedding lookup layers are not dependent on each other in the model graph. For more information, refer to HPS configuration and HPS table fusion demo notebook. This feature can reduce the embedding lookup latency significantly when there are multiple tables and GPU embedding cache is employed. About 3x speedup is achieved on V100 for the fused case demonstrated in the notebook compared to the unfused one.

    • -
    • UVM Support: We have upgraded the static embedding solution. For embedding tables whose size exceeds the device memory, we will save high-frequency embeddings in the HBM as an embedding cache and offload the remaining embeddings to the UVM. Compared with the dynamic cache solution that offloads the remaining embeddings to the Volatile DB, the UVM solution has higher CPU lookup throughput. We will support online updating of the UVM solution in a future release. Users can switch between different embedding cache solutions through the embedding_cache_type configuration parameter.

    • -
    • Triton Perf Analayzer’s Request Generator: We have added an inference request generator to generate the JSON request format required by Triton Perf Analyzer. By using this request generator together with the model generator, you can use the Triton Perf Analyzer to profile the HPS performance and do stress testing. For API documentation and demo usage, please refer to README

    • -
    -
  • -
  • General Updates:

    -
      -
    • DenseLayerComputeConfig: MLP and CrossLayer support asynchronous weight gradient computations with data gradient backpropagation when training. We have added a new member hugectr DenseLayerComputeConfig to hugectr.DenseLayer for configuring the computing behavior. The knob for enabling asynchronous weight gradient computations has been moved from hugectr.CreateSolver to hugectr.DenseLayerComputeConfig.async_wgrad. The knob for controlling the fusion mode of weight gradients and bias gradients has been moved from hugectr.DenseLayerSwitchs to hugectr.DenseLayerComputeConfig.fuse_wb.

    • -
    • Hopper Architecture Support: Users can build HugeCTR from scratch with the compute capability 9.0 (DSM=90), so that it can run on Hopper architectures. Note that our NGC container does not support the compute capability yet. Users who are unfamiliar with how to build HugeCTR can refer to the HugeCTR Contribution Guide.

    • -
    • RoCE Support for Hybrid Embedding: With the parameter CommunicationType.IB_NVLink_Hier in HybridEmbeddingParams, the RoCE is supported. We have also added 2 environment variables HUGECTR_ROCE_GID and HUGECTR_ROCE_TC so that a user can control the RoCE NIC’s GID and traffic class. -https://nvidia-merlin.github.io/HugeCTR/main/api/python_interface.html#hybridembeddingparam-class

    • -
    -
  • -
  • Documentation Updates:

    -
      -
    • Data Reader: We have enhanced our Raw data reader to read multi-hot input data, connecting with an embedding collection seamlessly. The raw dataset format is strengthened as well. Refer to our online documentation for more details. We have refined the description for Norm datasest as well.

    • -
    • Embedding Collection: We have added the knob is_exclusive_keys to enable potential acceleration if a user has already preprocessed the input of embedding collection to make the resulting tables exclusive with one another. We have also added the nob comm_strategy in embedding collection for user to configure optimized communication strategy in multi-node training

    • -
    • HPS Plugin: We have fixed the unit of measurement for DLRM inference benchmark results that leverage the HPS plugin. We have updated the user guide for the HPS plugin for TensorFlow and the HPS plugin for TensorRT

    • -
    • Embedding Cache: We have updated the usage of three types of embedding cache. We have updated the descriptions of the three types of embedding cache as well.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • We added a slots emptiness check to prevent SparseParam from being misused.

    • -
    • We revised MPI lifetime service to become MPI init service with slightly greater scope and clearer interface. In this effort, we also fixed a rare bug that could lead access violations during the MPI shutdown procedure.

    • -
    • We fixed a segment fault that occurs when a GPU has no embedding wgrad to update.

    • -
    • SOK build & runtime error related to TF version: We made the SOK Experiment](https://github.com/NVIDIA-Merlin/HugeCTR/tree/main/sparse_operation_kit/experiment) compatible with the Tensorflow >= v2.11.0. The legacy SOK doesn’t support that and newer versions of Tensorflow.

    • -
    • HPS requires CPU memory to be at least 2.5x larger than the model size during its initialization. From this release, we parse the model embedding files through chunks and reduce the required memory to 1.3x model size.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR can lead to a runtime error if client code calls RMM’s rmm::mr::set_current_device_resource() or rmm::mr::set_current_device_resource() because HugeCTR’s Parquet Data Reader also calls rmm::mr::set_current_device_resource(), and it becomes visible to other libraries in the same process. Refer to [this issue] (https://github.com/NVIDIA-Merlin/HugeCTR/issues/356) . As a workaround, a user can set an environment variable HCTR_RMM_SETTABLE to 0 to disable HugeCTR to set a custom RMM device resource, if they know rmm::mr::set_current_device_resource() is called outside HugeCTR. But be cautious, as it could affect the performance of parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also this NCCL known issue and this GitHub issue](https://github.com/NVIDIA-Merlin/HugeCTR/issues/243).

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 23.02

-
    -
  • HPS Enhancements:

    - -
  • -
  • Google Cloud Storage (GCS) Support:

    - -
  • -
  • Issues Fixed:

    -
      -
    • Fixed a bug in HPS static table, which leads to a wrong results when the batch size is larger than 256.

    • -
    • Fixed a preprocessing issue in the wdl_prediction notebook.

    • -
    • Corrected how devices are set and managed in HPS and InferenceModel.

    • -
    • Fixed the debug build error.

    • -
    • Fixed the build error related with the CUDA 12.0.

    • -
    • Fixed reported issues with respect to Multi-Process HashMap in notebook and a couple of minor issues on the side.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 4.3

-
-

Important

-

In January 2023, the HugeCTR team plans to deprecate semantic versioning, such as v4.3. -Afterward, the library will use calendar versioning only, such as v23.01.

-
-
    -
  • Support for BERT and Variants: -This release includes support for BERT in HugeCTR. -The documentation includes updates to the MultiHeadAttention layer and adds documentation for the SequenceMask layer. -For more information, refer to the samples/bst directory of the repository in GitHub.

  • -
  • HPS Plugin for TensorFlow integration with TensorFlow-TensorRT (TF-TRT): -This release includes plugin support for integration with TensorFlow-TensorRT. -For sample code, refer to the Deploy SavedModel using HPS with Triton TensorFlow Backend notebook.

  • -
  • Deep & Cross Network Layer version 2 Support: -This release includes support for Deep & Cross Network version 2. -For conceptual information, refer to https://arxiv.org/abs/2008.13535. -The documentation for the MultiCross Layer is updated.

  • -
  • Enhancements to Hierarchical Parameter Server:

    -
      -
    • RedisClusterBackend now supports TLS/SSL communication. -For sample code, refer to the Hierarchical Parameter Server Demo notebook. -The notebook is updated with step-by-step instructions to show you how to setup HPS to use Redis with (and without) encryption. -The Volatile Database Parameters documentation for HPS is updated with the enable_tls, tls_ca_certificate, tls_client_certificate, tls_client_key, and tls_server_name_identification parameters.

    • -
    • MultiProcessHashMapBackend includes a bug fix that prevented configuring the shared memory size when using JSON file-based configuration.

    • -
    • On-device input keys are supported now so that an extra host-to-device copy is removed to improve performance.

    • -
    • A dependency on the XX-Hash library is removed. -The library is no longer used by HugeCTR.

    • -
    • Added the static table support to the embedding cache. -The static table is suitable when the embedding table can be placed entirely in GPU memory. -In this case, the static table is more than three times faster than the embedding cache lookup. -The static table does not support embedding updates.

    • -
    -
  • -
  • Support for New Optimizers:

    -
      -
    • Added support for SGD, Momentum SGD, Nesterov Momentum, AdaGrad, RMS-Prop, Adam and FTRL optimizers for dynamic embedding table (DET). -For sample code, refer to the test_embedding_table_optimizer.cpp file in the test/utest/embedding_collection/ directory of the repository on GitHub.

    • -
    • Added support for the FTRL optimizer for dense networks.

    • -
    -
  • -
  • Data Reading from S3 for Offline Inference: -In addition to reading during training, HugeCTR now supports reading data from remote file systems such as HDFS and S3 during offline inference by using the DataSourceParams API. -The HugeCTR Training and Inference with Remote File System Example is updated to demonstrate the new functionality.

  • -
  • Documentation Enhancements:

    - -
  • -
  • Issues Fixed:

    -
      -
    • The original CUDA device with NUMA bind before a call to some HugeCTR APIs is recovered correctly now. -This issue sometimes lead to a problem when you mixed calls to HugeCTR and other CUDA enabled libraries.

    • -
    • Fixed the occasional CUDA kernel launch failure of embedding when installed HugeCTR with macro DEBUG.

    • -
    • Fixed an SOK build error that was related to TensorFlow v2.1.0 and higher. -The issue was that the C++ API and C++ standard were updated to use C++17.

    • -
    • Fixed a CUDA 12 related compilation error.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR can lead to a runtime error if client code calls the RMM rmm::mr::set_current_device_resource() method or rmm::mr::set_current_device_resource() method. -The error is due to the Parquet data reader in HugeCTR also calling rmm::mr::set_current_device_resource(). -As a result, the device becomes visible to other libraries in the same process. -Refer to GitHub issue #356 for more information. -As a workaround, you can set environment variable HCTR_RMM_SETTABLE to 0 to prevent HugeCTR from setting a custom RMM device resource, if you know that rmm::mr::set_current_device_resource() is called by client code other than HugeCTR. -But be cautious because the setting can reduce the performance of Parquet reading.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue #243.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 4.2

-
-

Important

-

In January 2023, the HugeCTR team plans to deprecate semantic versioning, such as v4.2. -Afterward, the library will use calendar versioning only, such as v23.01.

-
-
    -
  • Change to HPS with Redis or Kafka: -This release includes a change to Hierarchical Parameter Server and affects deployments that use RedisClusterBackend or model parameter streaming with Kafka. -A third-party library that was used for HPS partition selection algorithm is replaced to improve performance. -The new algorithm can produce different partition assignments for volatile databases. -As a result, volatile database backends that retain data between application startup, such as the RedisClusterBackend, must be reinitialized. -Model streaming with Kafka is equally affected. -To avoid issues with updates, reset all respective queue offsets to the end_offset before you reinitialize the RedisClusterBackend.

  • -
  • Enhancements to the Sparse Operation Kit in DeepRec: -This release includes updates to the Sparse Operation Kit to improve the performance of the embedding variable lookup operation in DeepRec. -The API for the lookup_sparse() function is changed to remove the hotness argument. -The lookup_sparse() function is enhanced to calculate the number of non-zero elements dynamically. -For more information, refer to the sparse_operation_kit directory of the DeepRec repository in GitHub.

  • -
  • Enhancements to 3G Embedding: -This release includes the following enhancements to 3G embedding:

    -
      -
    • The API is changed. -The EmbeddingPlanner class is replaced with the EmbeddingCollectionConfig class. -For examples of the API, see the tests in the test/embedding_collection_test directory of the repository in GitHub.

    • -
    • The API is enhanced to support dumping and loading weights during the training process. -The methods are Model.embedding_dump(path: str, table_names: list[str]) and Model.embedding_load(path: str, list[str]). -The path argument is a directory in file system that you can dump weights to or load weights from. -The table_names argument is a list of embedding table names as strings.

    • -
    -
  • -
  • New Volatile Database Type for HPS: -This release adds a db_type value of multi_process_hash_map to the Hierarchical Parameter Server. -This database type supports sharing embeddings across process boundaries by using shared memory and the /dev/shm device file. -Multiple processes running HPS can read and write to the same hash map. -For an example, refer to the Hierarchcal Parameter Server Demo notebook.

  • -
  • Enhancements to the HPS Redis Backend: -In this release, the Hierarchical Parameter Server can open multiple connections in parallel to each Redis node. -This enhancement enables HPS to take advantage of overlapped processing optimizations in the I/O module of Redis servers. -In addition, HPS can now take advantage of Redis hash tags to co-locate embedding values and metadata. -This enhancement can reduce the number of accesses to Redis nodes and the number of per-node round trip communications that are needed to complete transactions. -As a result, the enhancement increases the insertion performance.

  • -
  • MLPLayer is New: -This release adds an MLP layer with the hugectr.Layer_t.MLP class. -This layer is very flexible and makes it easier to use a group of fused fully-connected layers and enable the related optimizations. -For each fused fully-connected layer in MLPLayer, the output dimension, bias, and activation function are all adjustable. -MLPLayer supports FP32, FP16 and TF32 data types. -For an example, refer to the dgx_a100_mlp.py in the samples/dlrm directory of the GitHub repository to learn how to use the layer.

  • -
  • Sparse Operation Kit installable from PyPi: -Version 1.1.4 of the Sparse Operation Kit is installable from PyPi in the merlin-sok package.

  • -
  • Multi-task Model Support added to the ONNX Model Converter: -This release adds support for multi-task models to the ONNX converter. -This release also includes an enhancement to the preprocess_census.py script in samples/mmoe directory of the GitHub repository.

  • -
  • Issues Fixed:

    -
      -
    • Using the HPS Plugin for TensorFlow with MirroredStrategy and running the Hierarchical Parameter Server Demo notebook triggered an issue with ReplicaContext and caused a crash. -The issue is fixed and resolves GitHub issue #362.

    • -
    • The 4_nvt_process.py sample in the samples/din/utils directory of the GitHub repository is updated to use the latest NVTabular API. -This update resolves GitHub issue #364.

    • -
    • An illegal memory access related to 3G embedding and the dgx_a100_ib_nvlink.py sample in the samples/dlrm directory of the GitHub repository is fixed.

    • -
    • An error in HPS with the lookup_fromdlpack() method is fixed. -The error was related to calculating the number of keys and vectors from the corresponding DLPack tensors.

    • -
    • An error in the HugeCTR backend for Triton Inference Server is fixed. -A crash was triggered when the initial size of the embedding cache is smaller than the allowed minimum size.

    • -
    • An error related to using a ReLU layer with an odd input size in mixed precision mode could trigger a crash. -The issue is fixed.

    • -
    • An error related to using an asynchronous reader with the AsyncParam class and specifying an io_alignment value that is smaller than the block device sector size is fixed. -Now, if the specified io_alignment value is smaller than the block device sector size, io_alignment is automatically set to the block device sector size.

    • -
    • Unreported memory leaks in the GRU layer and collectives are fixed.

    • -
    • Several broken documentation links related to HPS are fixed.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 4.1

-
    -
  • Simplified Interface for 3G Embedding Table Placement Strategy: -3G embedding now provides an easier way for you to configure an embedding table placement strategy. -Instead of using JSON, you can configure the embedding table placement strategy by using function arguments. -You only need to provide the shard_matrix, table_group_strategy, and table_placement_strategy arguments. -With these arguments, 3G embedding can group different tables together and place them according to the shard_matrix argument. -For an example, refer to dlrm_train.py file in the test/embedding_collection_test directory of the repository on GitHub. -For comparison, refer to the same file from the v4.0 branch of the repository.

  • -
  • New MMoE and Shared-Bottom Samples: -This release includes a new shared-bottom model, an example program, preprocessing scripts, and updates to documentation. -For more information, refer to the README.md, mmoe_parquet.py, and other files in the samples/mmoe directory of the repository on GitHub. -This release also includes a fix to the calculation and reporting of AUC for multi-task models, such as MMoE.

  • -
  • Support for AWS S3 File System: -The Parquet DataReader can now read datasets from the Amazon Web Services S3 file system. -You can also load and dump models from and to S3 during training. -The documentation for the DataSourceParams class is updated. -To view sample code, refer to the HugeCTR Training with Remote File System Example class is updated.

  • -
  • Simplification for File System Usage: -You no longer ’t need to pass DataSourceParams for model loading and dumping. -The FileSystem class automatically infers the correct file system type, local, HDFS, or S3, based on the path URI that you specified when you built the model. -For example, the path hdfs://localhost:9000/ is inferred as an HDFS file system and the path https://mybucket.s3.us-east-1.amazonaws.com/ is inferred as an S3 file system.

  • -
  • Support for Loading Models from Remote File Systems to HPS: -This release enables you to load models from HDFS and S3 remote file systems to HPS during inference. -To use the new feature, specify an HDFS for S3 path URI in InferenceParams.

  • -
  • Support for Exporting Intermediate Tensor Values into a Numpy Array: -This release adds function check_out_tensor to Model and InferenceModel. -You can use this function to check out the intermediate tensor values using the Python interface. -This function is especially helpful for debugging. -For more information, refer to Model.check_out_tensor and InferenceModel.check_out_tensor.

  • -
  • On-Device Input Keys for HPS Lookup: -The HPS lookup supports input embedding keys that are on GPU memory during inference. -This enhancement removes a host-to-device copy by using the DLPack lookup_fromdlpack() interface. -By using the interface, the input DLPack capsule of embedding key can be a GPU tensor.

  • -
  • Documentation Enhancements:

    - -
  • -
  • Issues Fixed:

    -
      -
    • The InteractionLayer class is fixed so that it works correctly with num_feas > 30.

    • -
    • The cuBLASLt configuration is corrected by increasing the workspace size and adding the epilogue mask.

    • -
    • The NVTabular based preprocessing script for our samples that demonstrate feature crossing is fixed.

    • -
    • The async data reader is fixed. Previously, it would hang and cause a corruption issue due to an improper I/O block size and I/O alignment problem. -The AsyncParam class is changed to implement the fix. -The io_block_size argument is replaced by the max_nr_request argument and the actual I/O block size that the async reader uses is computed accordingly. -For more information, refer to the AsyncParam class documentation.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • Dumping Adam optimizer states to AWS S3 is not supported.

    • -
    • Dumping to remote file systems when enabled MPI is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 4.0

-
    -
  • 3G Embedding Stablization: -Since the introduction of the next generation of HugeCTR embedding in v3.7, several updates and enhancements were made, including code refactoring to improve usability. -The enhancements for this release are as follows:

    -
      -
    • Optimized the performance for sparse lookup in terms of inter-warp load imbalance. -Sparse Operation Kit (SOK) takes advantage of the enhancement to improve performance.

    • -
    • This release includes a fix for determining the maximum embedding vector size in the GlobalEmbeddingData and LocalEmbeddingData classes.

    • -
    • Version 1.1.4 of Sparse Operation Kit can be installed with Pip and includes the enhancements mentioned in the preceding bullets.

    • -
    -
  • -
  • Embedding Cache Initialization with Configurable Ratio: -In previous releases, the default value for the cache_refresh_percentage_per_iteration parameter of the InferenceParams was 0.1.

    -

    In this release, default value is 0.0 and the parameter provides an additional purpose. -If you set the parameter to a value greater than 0.0 and also set use_gpu_embedding_cache to True for a model, when Hierarchical Parameter Server (HPS) starts, HPS initializes the embedding cache for the model on the GPU by loading a subset of the embedding vectors from the sparse files for the model. -When embedding cache initialization is used, HPS creates log records when it starts at the INFO level. -The logging records are similar to EC initialization for model: "<model-name>", num_tables: <int> and EC initialization on device: <int>. -This enhancement reduces the duration of the warm up phase.

    -
  • -
  • Lazy Initialization of HPS Plugin for TensorFlow: -In this release, when you deploy a SavedModel of TensorFlow with Triton Inference Server, HPS is implicitly initialized when the loaded model is executed for the first time. -In previous releases, you needed to run hps.Init(ps_config_file, global_batch_size) explicitly. -For more information, see the API documentation for hierarchical_parameter_server.Init.

  • -
  • Enhancements to the HDFS Backend:

    -
      -
    • The HDFS Backend is now called IO::HadoopFileSystem.

    • -
    • This release includes fixes for memory leaks.

    • -
    • This release includes refactoring to generalize the interface for HDFS and S3 as remote filesystems.

    • -
    • For more information, see hadoop_filesystem.hpp in the include/io directory of the repository on GitHub.

    • -
    -
  • -
  • Dependency Clarification for Protobuf and Hadoop: -Hadoop and Protobuf are true third_party modules now. -Developers can now avoid unnecessary and frequent cloning and deletion.

  • -
  • Finer granularity control for overlap behavior: -We deperacated the old overlapped_pipeline knob and introduces four new knobs train_intra_iteration_overlap/train_inter_iteration_overlap/eval_intra_iteration_overlap/eval_inter_iteration_overlap to help user better control the overlap behavior. For more information, see the API documentation for Solver.CreateSolver

  • -
  • Documentation Improvements:

    -
      -
    • Removed two deprecated tutorials triton_tf_deploy and dump_to_tf.

    • -
    • Previously, the graphics in the Performance page did not appear. -This issue is fixed in this release.

    • -
    • Previously, the API documentation for the HPS Plugin for TensorFlow did not show the class information. This issue is fixed in this release.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • Fixed a build error that was triggered in debug mode. -The error was caused by the newly introduced 3G embedding unit tests.

    • -
    • When using the Parquet DataReader, if a parquet dataset file specified in metadata.json does not exist, HugeCTR no longer crashes. -The new behavior is to skip the missing file and display a warning message. -This change relates to GitHub issue 321.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 3.9

-
    -
  • Updates to 3G Embedding:

    -
      -
    • Sparse Operation Kit (SOK) is updated to use the HugeCTR 3G embedding as a developer preview feature. -For more information, refer to the Python programs in the sparse_operation_kit/experiment/benchmark/dlrm directory of the repository on GitHub.

    • -
    • Dynamic embedding table mode is added. -The mode is based on the cuCollection with some functionality enhancement. -A dynamic embedding table grows its size when the table is full so that you no longer need to configure the memory usage information for embedding. -For more information, refer to the embedding_storage/dynamic_embedding_storage directory of the repository on GitHub.

    • -
    -
  • -
  • Enhancements to the HPS Plugin for TensorFlow: -This release includes improvements to the interoperability of SOK and HPS. -The plugin now supports the sparse lookup layer. -The documentation for the HPS plugin is enhanced as follows:

    - -
  • -
  • Enhancements to the HPS Backend for Triton Inference Server -This release adds support for integrating the HPS Backend and the TensorFlow Backend through the ensemble mode with Triton Inference Server. -The enhancement enables deploying a TensorFlow model with large embedding tables with Triton by leveraging HPS. -For more information, refer to the sample programs in the hps-triton-ensemble directory of the HugeCTR Backend repository in GitHub.

  • -
  • New Multi-Node Tutorial: -The multi-node training tutorial is new. -The additions show how to use HugeCTR to train a model with multiple nodes and is based on our most recent Docker container. -The tutorial should be useful to users who do not have a job-scheduler-installed cluster such as Slurm Workload Manager. -The update addresses a issue that was first reported in GitHub issue 305.

  • -
  • Support Offline Inference for MMoE: -This release includes MMoE offline inference where both per-class AUC and average AUC are provided. -When the number of class AUCs is greater than one, the output includes a line like the following example:

    -
    [HCTR][08:52:59.254][INFO][RK0][main]: Evaluation, AUC: {0.482141, 0.440781}, macro-averaging AUC: 0.46146124601364136
    -
    -
    -
  • -
  • Enhancements to the API for the HPS Database Backend -This release includes several enhancements to the API for the DatabaseBackend class. -For more information, see database_backend.hpp and the header files for other database backends in the HugeCTR/include/hps directory of the repository. -The enhancements are as follows:

    -
      -
    • You can now specify a maximum time budget, in nanoseconds, for queries so that you can build an application that must operate within strict latency limits. -Fetch queries return execution control to the caller if the time budget is exhausted. -The unprocessed entries are indicated to the caller through a callback function.

    • -
    • The dump and load_dump methods are new. -These methods support saving and loading embedding tables from disk. -The methods support a custom binary format and the RocksDB SST table file format. -These methods enable you to import and export embedding table data between your custom tools and HugeCTR.

    • -
    • The find_tables method is new. -The method enables you to discover all table data that is currently stored for a model in a DatabaseBackend instance. -A new overloaded method for evict is added that can process the results from find_tables to quickly and simply drop all the stored information that is related to a model.

    • -
    -
  • -
  • Documentation Enhancements

    -
      -
    • The documentation for the max_all_to_all_bandwidth parameter of the HybridEmbeddingParam class is clarified to indicate that the bandwidth unit is per-GPU. -Previously, the unit was not specified.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • Hybrid embedding with IB_NVLINK as the communication_type of the -HybridEmbeddingParam -is fixed in this release.

    • -
    • Training performance is affected by a GPU routine that checks if an input key can be out of the embedding table. If you can guarantee that the input keys can work with the specified workspace_size_per_gpu_in_mb, we have a workaround to disable the routine by setting the environment variable HUGECTR_DISABLE_OVERFLOW_CHECK=1. The workaround restores the training performance.

    • -
    • Engineering discovered and fixed a correctness issue with the Softmax layer.

    • -
    • Engineering removed an inline profiler that was rarely used or updated. This change relates to GitHub issue 340.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 3.8

-
    -
  • Sample Notebook to Demonstrate 3G Embedding: -This release includes a sample notebook that introduces the Python API of the -embedding collection and the key concepts for using 3G embedding. -You can view HugeCTR Embedding Collection -from the documentation or access the embedding_collection.ipynb file from the -notebooks -directory of the repository.

  • -
  • DLPack Python API for Hierarchical Parameter Server Lookup: -This release introduces support for embedding lookup from the Hierarchical -Parameter Server (HPS) using the DLPack Python API. The new method is -lookup_fromdlpack(). For sample usage, see the -Lookup the Embedding Vector from DLPack -heading in the “Hierarchical Parameter Server Demo” notebook.

  • -
  • Read Parquet Datasets from HDFS with the Python API: -This release enhances the DataReaderParams -class with a data_source_params argument. You can use the argument to specify -the data source configuration such as the host name of the Hadoop NameNode and the NameNode port number to read from HDFS.

  • -
  • Logging Performance Improvements: -This release includes a performance enhancement that reduces the performance impact of logging.

  • -
  • Enhancements to Layer Classes:

    -
      -
    • The FullyConnected layer now supports 3D inputs

    • -
    • The MatrixMultiply layer now supports 4D inputs.

    • -
    -
  • -
  • Documentation Enhancements:

    -
      -
    • An automatically generated table of contents is added to the top of most -pages in the web documentation. The goal is to provide a better experience -for navigating long pages such as the -HugeCTR Layer Classes and Methods -page.

    • -
    • URLs to the Criteo 1TB click logs dataset are updated. For an example, see the -HugeCTR Wide and Deep Model with Criteo -notebook.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • The data generator for the Parquet file type is fixed and produces consistent file names between the _metadata.json file and the actual dataset files. -Previously, running the data generator to create synthetic data resulted in a core dump. -This issue was first reported in the GitHub issue 321.

    • -
    • Fixed the memory crash in running a large model on multiple GPUs that occurred during AUC warm up.

    • -
    • Fixed the issue of keyset generation in the ETC notebook. -Refer to the GitHub issue 332 for more details.

    • -
    • Fixed the inference build error that occurred when building with debug mode.

    • -
    • Fixed the issue that multi-node training prints duplicate messages.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • Hybrid embedding with IB_NVLINK as the communication_type of the -HybridEmbeddingParam -class does not work currently. We are working on fixing it. The other communication types have no issues.

    • -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    -
  • -
-
-
-

What’s New in Version 3.7

-
    -
  • 3G Embedding Developer Preview: -The 3.7 version introduces next-generation of embedding as a developer preview feature. We call it 3G embedding because it is the new update to the HugeCTR embedding interface and implementation since the unified embedding in v3.1 version, which was the second one. -Compared with the previous embedding, there are three main changes in the embedding collection.

    -
      -
    • First, it allows users to fuse embedding tables with different embedding vector sizes. The previous embedding can only fuse embedding tables with the same embedding vector size. -The enhancement boosts both flexibility and performance.

    • -
    • Second, it extends the functionality of embedding by supporting the concat combiner and supporting different slot lookup on the same embedding table.

    • -
    • Finally, the embedding collection is powerful enough to support arbitrary embedding table placement which includes data parallel and model parallel. -By providing a plan JSON file, you can configure the table placement strategy as you specify. -See the dlrm_train.py file in the embedding_collection_test directory of the repository for a more detailed usage example.

    • -
    -
  • -
  • HPS Performance Improvements:

    -
      -
    • Kafka: Model parameters are now stored in Kafka in a bandwidth-saving multiplexed data format. -This data format vastly increases throughput. In our lab, we measured transfer speeds up to 1.1 Gbps for each Kafka broker.

    • -
    • HashMap backend: Parallel and single-threaded hashmap implementations have been replaced by a new unified implementation. -This new implementation uses a new memory-pool based allocation method that vastly increases upsert performance without diminishing recall performance. -Compared with the previous implementation, you can expect a 4x speed improvement for large-batch insertion operations.

    • -
    • Suppressed and simplified log: Most log messages related to HPS have the log level changed to TRACE rather than INFO or DEBUG to reduce logging verbosity.

    • -
    -
  • -
  • Offline Inference Usability Enhancements:

    -
      -
    • The thread pool size is configurable in the Python interface, which is useful for studying the embedding cache performance in scenarios of asynchronous update. Previously it was set as the minimum value of 16 and std::thread::hardware_concurrency(). For more information, please refer to Hierarchical Parameter Server Configuration.

    • -
    -
  • -
  • DataGenerator Performance Improvements: -You can specify the num_threads parameter to parallelize a Norm dataset generation.

  • -
  • Evaluation Metric Improvements:

    -
      -
    • Average loss performance improvement in multi-node environments.

    • -
    • AUC performance optimization and safer memory management.

    • -
    • Addition of NDCG and SMAPE.

    • -
    -
  • -
  • Embedding Training Cache Parquet Demo: -Created a keyset extractor script to generate keyset files for Parquet datasets. -Provided users with an end-to-end demo of how to train a Parquet dataset using the embedding cache mode. -See the Embedding Training Cache Example notebook.

  • -
  • Documentation Enhancements: -The documentation details for HugeCTR Hierarchical Parameter Server Database Backend are updated for consistency and clarity.

  • -
  • Issues Fixed:

    -
      -
    • If slot_size_array is specified, workspace_size_per_gpu_in_mb is no longer required.

    • -
    • If you build and install HugeCTR from scratch, you can specify the CMAKE_INSTALL_PREFIX CMake variable to identify the installation directory for HugeCTR.

    • -
    • Fixed SOK hang issue when calling sok.Init() with a large number of GPUs. See the GitHub issue 261 and 302 for more details.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • The Criteo 1 TB click logs dataset that is used with many HugeCTR sample programs and notebooks is currently unavailable. -Until the dataset becomes downloadable again, you can run those samples based on our synthetic dataset generator. -For more information, see the Getting Started section of the repository README file.

    • -
    • Data generator of parquet type produces inconsistent file names between _metadata.json and actual dataset files, which will result in core dump fault when using the synthetic dataset.

    • -
    -
  • -
-
-
-

What’s New in Version 3.6

-
    -
  • Concat 3D Layer: -In previous releases, the Concat layer could handle two-dimensional (2D) input tensors only. -Now, the input can be three-dimensional (3D) and you can concatenate the inputs along axis 1 or 2. -For more information, see the API documentation for the Concat Layer.

  • -
  • Dense Column List Support in Parquet DataReader: -In previous releases, HugeCTR assumes each dense feature has a single value and it must be the scalar data type float32. -Now, you can mix float32 or list[float32] for dense columns. -This enhancement means that each dense feature can have more than one value. -For more information, see the API documentation for the Parquet dataset format.

  • -
  • Support for HDFS is Re-enabled in Merlin Containers: -Support for HDFS in Merlin containers is an optional dependency now. -For more information, see HDFS Support.

  • -
  • Evaluation Metric Enhancements: -In previous releases, HugeCTR computes AUC for binary classification only. -Now, HugeCTR supports AUC for multi-label classification. -The implementation is inspired by sklearn.metrics.roc_auc_score and performs the unweighted macro-averaging strategy that is the default for scikit-learn. -You can specify a value for the label_dim parameter of the input layer to enable multi-label classification and HugeCTR will compute the multi-label AUC.

  • -
  • Log Output Format Change: -The default log format now includes milliseconds.

  • -
  • Documentation Enhancements:

    -
      -
    • These release notes are included in the documentation and are available at https://nvidia-merlin.github.io/HugeCTR/v3.6/release_notes.html.

    • -
    • The Configuration section of the Hierarchical Parameter Server information is updated with more information about the parameters in the configuration file.

    • -
    • The example notebooks that demonstrate how to work with multi-modal data are reorganized in the navigation. -The notebooks are now available under the heading Multi-Modal Example Notebooks. -This change is intended to make it easier to find the notebooks.

    • -
    • The documentation in the sparse_operation_kit directory of the repository on GitHub is updated with several clarifications about SOK.

    • -
    -
  • -
  • Issues Fixed:

    -
      -
    • The dlrm_kaggle_fp32.py file in the samples/dlrm/ directory of the repository is updated to show the correct number of samples. -The num_samples value is now set to 36672493. -This fixes GitHub issue 301.

    • -
    • Hierarchical Parameter Server (HPS) would produce a runtime error when the GPU cache was turned off. -This issue is now fixed.

    • -
    -
  • -
  • Known Issues:

    -
      -
    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources. -If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

      -
        -shm-size=1g -ulimit memlock=-1
      -
      -
      -

      See also the NCCL known issue and the GitHub issue.

      -
    • -
    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive. -To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • -
    • The number of data files in the file list should be greater than or equal to the number of data reader workers. -Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • -
    • Joint loss training with a regularizer is not supported.

    • -
    • The Criteo 1 TB click logs dataset that is used with many HugeCTR sample programs and notebooks is currently unavailable. -Until the dataset becomes downloadable again, you can run those samples based on our synthetic dataset generator. -For more information, see the Getting Started section of the repository README file.

    • -
    -
  • -
-
-
-

What’s New in Version 3.5

-
    -
  • HPS interface encapsulation and exporting as library: We encapsulate the Hierarchical Parameter Server(HPS) interfaces and deliver it as a standalone library. Besides, we prodvide HPS Python APIs and demonstrate the usage with a notebook. For more information, please refer to Hierarchical Parameter Server and HPS Demo.

  • -
  • Hierarchical Parameter Server Triton Backend: The HPS Backend is a framework for embedding vectors looking up on large-scale embedding tables that was designed to effectively use GPU memory to accelerate the looking up by decoupling the embedding tables and embedding cache from the end-to-end inference pipeline of the deep recommendation model. For more information, please refer to the samples directory of the HugeCTR backend for Triton Inference Server repository.

  • -
  • SOK pip release: SOK pip releases on https://pypi.org/project/merlin-sok/. Now users can install SOK via pip install merlin-sok.

  • -
  • Joint loss and multi-tasks training support:: We support joint loss in training so that users can train with multiple labels and tasks with different weights. See the MMoE sample in the samples/mmoe directory of the repository to learn the usage.

  • -
  • HugeCTR documentation on web page: Now users can visit our web documentation.

  • -
  • ONNX converter enhancement:: We enable converting MultiCrossEntropyLoss and CrossEntropyLoss layers to ONNX to support multi-label inference. For more information, please refer to the HugeCTR to ONNX Converter information in the onnx_converter directory of the repository.

  • -
  • HDFS python API enhancement:

    -
      -
    • Simplified DataSourceParams so that users do not need to provide all the paths before they are really necessary. Now users only have to pass DataSourceParams once when creating a solver.

    • -
    • Later paths will be automatically regarded as local paths or HDFS paths depending on the DataSourceParams setting. See notebook for usage.

    • -
    -
  • -
  • HPS performance optimization: We use better method to determine partition number in database backends in HPS.

  • -
  • Issues Fixed: HugeCTR input layer now can take dense_dim greater than 1000.

  • -
-
-
-

What’s New in Version 3.4.1

-
    -
  • Support mixed precision inference for dataset with multiple labels: We enable FP16 for the Softmax layer and support mixed precision for multi-label inference. For more information, please refer to Inference API.

  • -
  • Support multi-GPU offline inference with Python API: We support multi-GPU offline inference with the Python interface, which can leverage Hierarchical Parameter Server and enable concurrent execution on multiple devices. For more information, please refer to Inference API and Multi-GPU Offline Inference Notebook.

  • -
  • Introduction to metadata.json: We add the introduction to _metadata.json for Parquet datasets. For more information, please refer to Parquet.

  • -
  • Documents and tool for workspace size per GPU estimation: we add a tool that is named the embedding_workspace_calculator to help calculate the value for workspace_size_per_gpu_in_mb that is required by hugectr.SparseEmbedding. For more information, please refer to the README.md file in the tools/embedding_workspace_calculator directory of the repository and QA 24 in the documentation.

  • -
  • Improved Debugging Capability: The old logging system, which was flagged as deprecated for some time has been removed. All remaining log messages and outputs have been revised and migrated to the new logging system (core23/logging.hpp/cpp). During this revision, we also adjusted log levels for log messages throughout the entire codebase to improve visibility of relevant information.

  • -
  • Support HDFS Parameter Server in Training:

    -
      -
    • Decoupled HDFS in Merlin containers to make the HDFS support more flexible. Users can now compile HDFS related functionalities optionally.

    • -
    • Now supports loading and dumping models and optimizer states from HDFS.

    • -
    • Added a notebook to show how to use HugeCTR with HDFS.

    • -
    -
  • -
  • Support Multi-hot Inference on Hugectr Backend: We support categorical input in multi-hot format for HugeCTR Backend inference.

  • -
  • Multi-label inference with mixed precision: Mixed precision training is enabled for softmax layer.

  • -
  • Python Script and documentation demonstrating how to analyze model files: In this release, we provide a script to retrieve vocabulary information from model file. Please find more details on the README in the tools/model_analyzer directory of the repository.

  • -
  • Issues fixed:

    - -
  • -
-
-
-

What’s New in Version 3.4

-
    -
  • Support for Building HugeCTR with the Unified Merlin Container: HugeCTR can now be built using our unified Merlin container. For more information, refer to our Contributor Guide.

  • -
  • Hierarchical Parameter Server (HPS) Enhancements:

    -
      -
    • New Missing Key (Embedding Table Entries) Insertion Feature: Using a simple flag, it is now possible to configure HugeCTR with missing keys (embedding table entries). During lookup, these missing keys will automatically be inserted into volatile database layers such as the Redis and Hashmap backends.

    • -
    • Asynchronous Timestamp Refresh: To allow time-based eviction to take place, it is now possible to enable timestamp refreshing for frequently used embeddings. Once enabled, refreshing is handled asynchronously using background threads, which won’t block your inference jobs. For most applications, the associated performance impact from enabling this feature is barely noticeable.

    • -
    • HDFS (Hadoop Distributed File System) Parameter Server Support During Training:

      -
        -
      • We’re introducing a new DataSourceParams API, which is a python API that can be used to specify the file system and paths to data and model files.

      • -
      • We’ve added support for loading data from HDFS to the local file system for HugeCTR training.

      • -
      • We’ve added support for dumping trained model and optimizer states into HDFS.

      • -
      -
    • -
    • New Load API Capabilities: In addition to being able to deploy new models, the HugeCTR Backend’s Load API can now be used to update the dense parameters for models and corresponding embedding inference cache online.

    • -
    -
  • -
  • Sparse Operation Kit (SOK) Enhancements:

    -
      -
    • Mixed Precision Training: Enabling mixed precision training using TensorFlow’s pattern to enhance the training performance and lessen memory usage is now possible.

    • -
    • DLRM Benchmark: DLRM is a standard benchmark for recommendation model training, so we added a new notebook. Refer to the sparse_operation_kit/documents/tutorials/DLRM_Benchmark directory of the repository. The notebook shows how to address the performance of SOK on this benchmark.

    • -
    • Uint32_t / int64_t key dtype Support in SOK: Int64 or uint32 can now be used as the key data type for SOK’s embedding. Int64 is the default.

    • -
    • TensorFlow Initializers Support: We now support the TensorFlow native initializer within SOK, such as sok.All2AllDenseEmbedding(embedding_initializer=tf.keras.initializers.RandomUniform()). For more information, refer to All2All Dense Embedding.

    • -
    -
  • -
  • Documentation Enhancements

    -
      -
    • We’ve revised several of our notebooks and readme files to improve readability and accessibility.

    • -
    • We’ve revised the SOK docker setup instructions to indicate that HugeCTR setup issues can be resolved using the --shm-size setting within docker.

    • -
    • Although HugeCTR is designed for scalability, having a robust machine is not necessary for smaller workloads and testing. We’ve documented the required specifications for notebook testing environments. For more information, refer to our README for HugeCTR Jupyter Demo Notebooks.

    • -
    -
  • -
  • Inference Enhancements:We now support HugeCTR inference for managing multiple tasks. When the label dimension is the number of binary classification tasks and MultiCrossEntropyLoss is employed during training, the shape of inference results will be (batch_size*num_batches, label_dim). For more information, refer to Inference API.

  • -
  • Embedding Cache Issue Resolution: The embedding cache issue for very small embedding tables has been resolved.

  • -
-
-
-

What’s New in Version 3.3.1

-
    -
  • Hierarchical Parameter Server (HPS) Enhancements:

    -
      -
    • HugeCTR Backend Enhancements: The HugeCTR Backend is now fully compatible with the Triton model control protocol, so new model configurations can be simply added to the HPS configuration file. The HugeCTR Backend will continue to support online deployments of new models using the Triton Load API. However, with this enhancement, old models can be recycled online using the Triton Unload API.

    • -
    • Simplified Database Backend: Multi-nodes, single-node, and all other kinds of volatile database backends can now be configured using the same configuration object.

    • -
    • Multi-Threaded Optimization of Redis Code: The speedup of HugeCTR version 3.3.1 is 2.3 times faster than HugeCTR version 3.3.

    • -
    • Additional HPS Enhancements and Fixes:

      -
        -
      • You can now build the HPS test environment and implement unit tests for each component.

      • -
      • You’ll no longer encounter the access violation issue when updating Apache Kafka online.

      • -
      • The parquet data reader no longer incorrectly parses the index of categorical features when multiple embedded tables are being used.

      • -
      • The HPS Redis Backend overflow is now invoked during single insertions.

      • -
      -
    • -
    -
  • -
  • New GroupDenseLayer: We’re introducing a new GroupDenseLayer. It can be used to group fused fully connected layers when constructing the model graph. A simplified Python interface is provided for adjusting the number of layers and specifying the output dimensions in each layer, which makes it easy to leverage the highly-optimized fused fully connected layers in HugeCTR. For more information, refer to GroupDenseLayer.

  • -
  • Global Fixes:

    -
      -
    • A warning message now appears when attempting to launch a multi-process job before importing the mpi.

    • -
    • When running with embedding training cache, a massive log is no longer generated.

    • -
    • Legacy conda information has been removed from the HugeCTR documentation.

    • -
    -
  • -
-
-
-

What’s New in Version 3.3

-
    -
  • Hierarchical Parameter Server (HPS) Enhancements:

    -
      -
    • Support for Incremental Model Updates: HPS now supports incremental model updates via Apache Kafka (a distributed event streaming platform) message queues. With this enhancement, HugeCTR can now be connected with Apache Kafka deployments to update models in real time during training and inference. For more information, refer to the Demo Notebok.

    • -
    • Improvements to the Memory Management: The Redis cluster and CPU memory database backends are now the primary sources for memory management. When performing incremental model updates, these memory database backends will automatically evict infrequently used embeddings as training progresses. The performance of the Redis cluster and CPU memory database backends have also been improved.

    • -
    • New Asynchronous Refresh Mechanism: Support for asynchronous refreshing of incremental embedding keys into the embedding cache has been added. The Refresh operation will be triggered when completing the model version iteration or outputting incremental parameters from online training. The Distributed Database and Persistent Database will be updated by Apache Kafka. The GPU embedding cache will then refresh the values of the existing embedding keys and replace them with the latest incremental embedding vectors. For more information, refer to the HPS README.

    • -
    • Configurable Backend Implementations for Databases: Backend implementations for databases are now fully configurable.

    • -
    • Improvements to the JSON Interface Parser: The JSON interface parser can now handle inaccurate parameterization.

    • -
    • More Meaningful Jabber: As requested, we’ve revised the log levels throughout the entire API database backend of the HPS. Selected configuration options are now printed entirely and uniformly to the log. Errors provide more verbose information about pending issues.

    • -
    -
  • -
  • Sparse Operation Kit (SOK) Enhancements:

    -
      -
    • TensorFlow (TF) 1.15 Support: SOK can now be used with TensorFlow 1.15. For more information, refer to README.

    • -
    • Dedicated CUDA Stream: A dedicated CUDA stream is now used for SOK’s Ops, so this may help to eliminate kernel interleaving.

    • -
    • New pip Installation Option: SOK can now be installed using the pip install SparseOperationKit command. See more in our instructions). With this install option, root access to compile SOK is no longer required and python scripts don’t need to be copied.

    • -
    • Visible Device Configuration Supporttf.config.set_visible_device can now be used to set visible GPUs for each process. CUDA_VISIBLE_DEVICES can also be used. When tf.distribute.Strategy is used, the tf.config.set_visible_device argument shouldn’t be set.

    • -
    -
  • -
  • Hybrid-embedding indices pre-computing:The indices needed for hybrid embedding are pre-computed ahead of time and are overlapped with previous iterations.

  • -
  • Cached evaluation indices::The hybrid-embedding indices for eval are cached when applicable, hence eliminating the re-computing of the indices at every eval iteration.

  • -
  • MLP weight/data gradients calculation overlap::The weight gradients of MLP are calculated asynchronously with respect to the data gradients, enabling overlap between these two computations.

  • -
  • Better compute-communication overlap::Better overlap between compute and communication has been enabled to improve training throughput.

  • -
  • Fused weight conversion::The FP32-to-FP16 conversion of the weights are now fused into the SGD optimizer, saving trips to memory.

  • -
  • GraphScheduler::GrapScheduler was added to control the timing of cudaGraph launching. With GraphScheduler, the gap between adjacent cudaGraphs is eliminated.

  • -
  • Multi-Node Training Support Enhancements:You can now perform multi-node training on the cluster with non-RDMA hardware by setting the AllReduceAlgo.NCCL value for the all_reduce_algo argument. For more information, refer to the details for the all_reduce_algo argument in the CreateSolver API.

  • -
  • Support for Model Naming During Model Dumping: You can now specify names for models with the CreateSolvertraining API, which will be dumped to the JSON configuration file with the Model.graph_to_json API. This will facilitate the Triton deployment of saved HugeCTR models, as well as help to distinguish between models when Apache Kafka sends parameters from training to inference.

  • -
  • Fine-Grained Control Accessibility Enhancements for Embedding Layers: We’ve added fine-grained control accessibility to embedding layers. Using the Model.freeze_embedding and Model.unfreeze_embedding APIs, embedding layer weights can be frozen and unfrozen. Additionally, weights for multiple embedding layers can be loaded independently, making it possible to load pre-trained embeddings for a particular layer. For more information, refer to Model API and Section 3.4 of the HugeCTR Criteo Notebook.

  • -
-
-
-

What’s New in Version 3.2.1

-
    -
  • GPU Embedding Cache Optimization: The performance of the GPU embedding cache for the standalone module has been optimized. With this enhancement, the performance of small to medium batch sizes has improved significantly. We’re not introducing any changes to the interface for the GPU embedding cache, so don’t worry about making changes to any existing code that uses this standalone module. For more information, refer to the ReadMe.md file in the gpu_cache directory of the repository.

  • -
  • Model Oversubscription Enhancements: We’re introducing a new host memory cache (HMEM-Cache) component for the model oversubscription feature. When configured properly, incremental training can be efficiently performed on models with large embedding tables that exceed the host memory. For more information, refer to Host Memory Cache in MOS. Additionally, we’ve enhanced the Python interface for model oversubscription by replacing the use_host_memory_ps parameter with a ps_types parameter and adding a sparse_models parameter. For more information about these changes, refer to HugeCTR Python Interface.

  • -
  • Debugging Enhancements: We’re introducing new debugging features such as multi-level logging, as well as kernel debugging functions. We’re also making our error messages more informative so that users know exactly how to resolve issues related to their training and inference code. For more information, refer to the comments in the header files, which are available at HugeCTR/include/base/debug.

  • -
  • Enhancements to the Embedding Key Insertion Mechanism for the Embedding Cache: Missing embedding keys can now be asynchronously inserted into the embedding cache. To enable automatically, set the hit rate threshold within the configuration file. When the actual hit rate of the embedding cache is higher than the hit rate threshold that the user set or vice versa, the embedding cache will insert the missing embedding key asynchronously.

  • -
  • Parameter Server Enhancements: We’re introducing a new “in memory” database that utilizes the local CPU memory for storing and recalling embeddings and uses multi-threading to accelerate lookup and storage. You can now also use the combined CPU-accessible memory of your Redis cluster to store embeddings. We improved the performance for the “persistent” storage and retrieving embeddings from RocksDB using structured column families, as well as added support for creating hierarchical storage such as Redis as distributed cache. You don’t have to worry about updating your Parameter Server configurations to take advantage of these enhancements.

  • -
  • Slice Layer Internalization Enhancements: The Slice layer for the branch toplogy can now be abstracted away in the Python interface. A model graph analysis will be conducted to resolve the tensor dependency and the Slice layer will be internally inserted if the same tensor is consumed more than once to form the branch topology. For more information about how to construct a model graph using branches without the Slice layer, refer to the Getting Started section of the repository README and the Slice Layer information.

  • -
-
-
-

What’s New in Version 3.2

-
    -
  • New HugeCTR to ONNX Converter: We’re introducing a new HugeCTR to ONNX converter in the form of a Python package. All graph configuration files are required and model weights must be formatted as inputs. You can specify where you want to save the converted ONNX model. You can also convert sparse embedding models. For more information, refer to the HugeCTR to ONNX Converter information in the onnx_converter directory and the HugeCTR2ONNX Demo Notebook.

  • -
  • New Hierarchical Storage Mechanism on the Parameter Server (POC): We’ve implemented a hierarchical storage mechanism between local SSDs and CPU memory. As a result, embedding tables no longer have to be stored in the local CPU memory. The distributed Redis cluster is being implemented as a CPU cache to store larger embedding tables and interact with the GPU embedding cache directly. The local RocksDB serves as a query engine to back up the complete embedding table on the local SSDs and assist the Redis cluster with looking up missing embedding keys. For more information about how this works, refer to our HugeCTR Backend documentation

  • -
  • Parquet Format Support within the Data Generator: The HugeCTR data generator now supports the parquet format, which can be configured easily using the Python API. For more information, refer to Data Generator API.

  • -
  • Python Interface Support for the Data Generator: The data generator has been enabled within the HugeCTR Python interface. The parameters associated with the data generator have been encapsulated into the DataGeneratorParams struct, which is required to initialize the DataGenerator instance. You can use the data generator’s Python APIs to easily generate the Norm, Parquet, or Raw dataset formats with the desired distribution of sparse keys. For more information, refer to Data Generator API and the data generator samples in the tools/data_generator directory of the repository.

  • -
  • Improvements to the Formula of the Power Law Simulator within the Data Generator: We’ve modified the formula of the power law simulator within the data generator so that a positive alpha value is always produced, which will be needed for most use cases. The alpha values for Long, Medium, and Short within the power law distribution are 0.9, 1.1, and 1.3 respectively. For more information, refer to Data Generator API.

  • -
  • Support for Arbitrary Input and Output Tensors in the Concat and Slice Layers: The Concat and Slice layers now support any number of input and output tensors. Previously, these layers were limited to a maximum of four tensors.

  • -
  • New Continuous Training Notebook: We’ve added a new notebook to demonstrate how to perform continuous training using the embedding training cache (also referred to as Embedding Training Cache) feature. For more information, refer to HugeCTR Continuous Training.

  • -
  • New HugeCTR Contributor Guide: We’ve added a new HugeCTR Contributor Guide that explains how to contribute to HugeCTR, which may involve reporting and fixing a bug, introducing a new feature, or implementing a new or pending feature.

  • -
  • Sparse Operation Kit (SOK) Enhancements: SOK now supports TensorFlow 2.5 and 2.6. We also added support for identity hashing, dynamic input, and Horovod within SOK. Lastly, we added a new SOK docs set to help you get started with SOK.

  • -
-
-
-

What’s New in Version 3.1

-
    -
  • Hybrid Embedding: Hybrid embedding is designed to overcome the bandwidth constraint imposed by the embedding part of the embedding train workload by algorithmically reducing the traffic over netwoek. Requirements: The input dataset has only one-hot feature items and the model uses the SGD optimizer.

  • -
  • FusedReluBiasFullyConnectedLayer: FusedReluBiasFullyConnectedLayer is one of the major optimizations applied to dense layers. It fuses relu Bias and FullyConnected layers to reduce the memory access on HBM. Requirements: The model uses a layer with separate data / gradient tensors as the bottom layer.

  • -
  • Overlapped Pipeline: The computation in the dense input data path is overlapped with the hybrid embedding computation. Requirements: The data reader is asynchronous, hybrid embedding is used, and the model has a feature interaction layer.

  • -
  • Holistic CUDA Graph: Packing everything inside a training iteration into a CUDA Graph. Limitations: this option works only if use_cuda_graph is turned off and use_overlapped_pipeline is turned on.

  • -
  • Python Interface Enhancements: We’ve enhanced the Python interface for HugeCTR so that you no longer have to manually create a JSON configuration file. Our Python APIs can now be used to create the computation graph. They can also be used to dump the model graph as a JSON object and save the model weights as binary files so that continuous training and inference can take place. We’ve added an Inference API that takes Norm or Parquet datasets as input to facilitate the inference process. For more information, refer to HugeCTR Python Interface and HugeCTR Criteo Notebook.

  • -
  • New Interface for Unified Embedding: We’re introducing a new interface to simplify the use of embeddings and datareaders. To help you specify the number of keys in each slot, we added nnz_per_slot and is_fixed_length. You can now directly configure how much memory usage you need by specifying workspace_size_per_gpu_in_mb instead of max_vocabulary_size_per_gpu. For convenience, mean/sum is used in combinators instead of 0 and 1. In cases where you don’t know which embedding type you should use, you can specify use_hash_table and let HugeCTR automatically select the embedding type based on your configuration. For more information, refer to HugeCTR Python Interface.

  • -
  • Multi-Node Support for Embedding Training Cache (ETC): We’ve enabled multi-node support for the embedding training cache. You can now train a model with a terabyte-size embedding table using one node or multiple nodes even if the entire embedding table can’t fit into the GPU memory. We’re also introducing the host memory (HMEM) based parameter server (PS) along with its SSD-based counterpart. If the sparse model can fit into the host memory of each training node, the optimized HMEM-based PS can provide better model loading and dumping performance with a more effective bandwidth. For more information, refer to HugeCTR Python Interface.

  • -
  • Enhancements to the Multi-Nodes TensorFlow Plugin: The Multi-Nodes TensorFlow Plugin now supports multi-node synchronized training via tf.distribute.MultiWorkerMirroredStrategy. With minimal code changes, you can now easily scale your single GPU training to multi-node multi GPU training. The Multi-Nodes TensorFlow Plugin also supports multi-node synchronized training via Horovod. The inputs for embedding plugins are now data parallel, so the datareader no longer needs to preprocess data for different GPUs based on concrete embedding algorithms. For more information, see the sparse_operation_kit_demo.ipynb notebook in the sparse_operation_kit/notebooks directory of the repository.

  • -
  • NCF Model Support: We’ve added support for the NCF model, as well as the GMF and NeuMF variant models. With this enhancement, we’re introducing a new element-wise multiplication layer and HitRate evaluation metric. Sample code was added that demonstrates how to preprocess user-item interaction data and train a NCF model with it. New examples have also been added that demonstrate how to train NCF models using MovieLens datasets.

  • -
  • DIN and DIEN Model Support: All of our layers support the DIN model. The following layers support the DIEN model: FusedReshapeConcat, FusedReshapeConcatGeneral, Gather, GRU, PReLUDice, ReduceMean, Scale, Softmax, and Sub. We also added sample code to demonstrate how to use the Amazon dataset to train the DIN model. See our sample programs in the samples/din directory of the repository.

  • -
  • Multi-Hot Support for Parquet Datasets: We’ve added multi-hot support for parquet datasets, so you can now train models with a paraquet dataset that contains both one hot and multi-hot slots.

  • -
  • Mixed Precision (FP16) Support in More Layers: The MultiCross layer now supports mixed precision (FP16). All layers now support FP16.

  • -
  • Mixed Precision (FP16) Support in Inference: We’ve added FP16 support for the inference pipeline. Therefore, dense layers can now adopt FP16 during inference.

  • -
  • Optimizer State Enhancements for Continuous Training: You can now store optimizer states that are updated during continuous training as files, such as the Adam optimizer’s first moment (m) and second moment (v). By default, the optimizer states are initialized with zeros, but you can specify a set of optimizer state files to recover their previous values. For more information about dense_opt_states_file and sparse_opt_states_file, refer to Python Interface.

  • -
  • New Library File for GPU Embedding Cache Data: We’ve moved the header/source code of the GPU embedding cache data structure into a stand-alone folder. It has been compiled into a stand-alone library file. Similar to HugeCTR, your application programs can now be directly linked from this new library file for future use. For more information, refer to the ReadMe.md file in the gpu_cache directory of the repository.

  • -
  • Embedding Plugin Enhancements: We’ve moved all the embedding plugin files into a stand-alone folder. The embedding plugin can be used as a stand-alone python module, and works with TensorFlow to accelerate the embedding training process.

  • -
  • Adagrad Support: Adagrad can now be used to optimize your embedding and network. To use it, change the optimizer type in the Optimizer layer and set the corresponding parameters.

  • -
-
-
-

What’s New in Version 3.0.1

-
    -
  • New DLRM Inference Benchmark: We’ve added two detailed Jupyter notebooks to demonstrate how to train, deploy, and benchmark the performance of a deep learning recommendation model (DLRM) with HugeCTR. For more information, refer to our HugeCTR Inference Notebooks.

  • -
  • FP16 Optimization: We’ve optimized the DotProduct, ELU, and Sigmoid layers based on __half2 vectorized loads and stores, improving their device memory bandwidth utilization. MultiCross, FmOrder2, ReduceSum, and Multiply are the only layers that still need to be optimized for FP16.

  • -
  • Synthetic Data Generator Enhancements: We’ve enhanced our synthetic data generator so that it can generate uniformly distributed datasets, as well as power-law based datasets. You can now specify the vocabulary_size and max_nnz per categorical feature instead of across all categorial features. For more information, refer to our user guide.

  • -
  • Reduced Memory Allocation for Trained Model Exportation: To prevent the “Out of Memory” error message from displaying when exporting a trained model, which may include a very large embedding table, the amount of memory allocated by the related functions has been significantly reduced.

  • -
  • Dropout Layer Enhancement: The Dropout layer is now compatible with CUDA Graph. The Dropout layer is using cuDNN by default so that it can be used with CUDA Graph.

  • -
-
-
-

What’s New in Version 3.0

-
    -
  • Inference Support: To streamline the recommender system workflow, we’ve implemented a custom HugeCTR backend on the NVIDIA Triton Inference Server. The HugeCTR backend leverages the embedding cache and parameter server to efficiently manage embeddings of different sizes and models in a hierarchical manner. For more information, refer to our inference repository.

  • -
  • New High-Level API: You can now also construct and train your models using the Python interface with our new high-level API. For more information, refer to our preview example code in the samples/preview directory to grasp how this new API works.

  • -
  • FP16 Support in More Layers: All the layers except MultiCross support mixed precision mode. We’ve also optimized some of the FP16 layer implementations based on vectorized loads and stores.

  • -
  • Enhanced TensorFlow Embedding Plugin: Our embedding plugin now supports LocalizedSlotSparseEmbeddingHash mode. With this enhancement, the DNN model no longer needs to be split into two parts since it now connects with the embedding op through MirroredStrategy within the embedding layer. For more information, see the notebooks/embedding_plugin.ipynb notebook.

  • -
  • Extended Embedding Training Cache: We’ve extended the embedding training cache feature to support LocalizedSlotSparseEmbeddingHash and LocalizedSlotSparseEmbeddingHashOneHot.

  • -
  • Epoch-Based Training Enhancements: The num_epochs option in the Solver clause can now be used with the Raw dataset format.

  • -
  • Deprecation of the eval_batches Parameter: The eval_batches parameter has been deprecated and replaced with the max_eval_batches and max_eval_samples parameters. In epoch mode, these parameters control the maximum number of evaluations. An error message will appear when attempting to use the eval_batches parameter.

  • -
  • MultiplyLayer Renamed: To clarify what the MultiplyLayer does, it was renamed to WeightMultiplyLayer.

  • -
  • Optimized Initialization Time: HugeCTR’s initialization time, which includes the GEMM algorithm search and parameter initialization, was significantly reduced.

  • -
  • Sample Enhancements: Our samples now rely upon the Criteo 1TB Click Logs dataset instead of the Kaggle Display Advertising Challenge dataset. Our preprocessing scripts (Perl, Pandas, and NVTabular) have also been unified and simplified.

  • -
  • Configurable DataReader Worker: You can now specify the number of data reader workers, which run in parallel, with the num_workers parameter. Its default value is 12. However, if you are using the Parquet data reader, you can’t configure the num_workers parameter since it always corresponds to the number of active GPUs.

  • -
-
-
-

What’s New in Version 2.3

-
    -
  • New Python Interface: To enhance the interoperability with NVTabular and other Python-based libraries, we’re introducing a new Python interface for HugeCTR.

  • -
  • HugeCTR Embedding with Tensorflow: To help users easily integrate HugeCTR’s optimized embedding into their Tensorflow workflow, we now offer the HugeCTR embedding layer as a Tensorflow plugin. To better understand how to install, use, and verify it, see our Jupyter notebook tutorial in file notebooks/embedding_plugin.ipynb. The notebook also demonstrates how you can create a new Keras layer, EmbeddingLayer, based on the hugectr.py file in the tools/embedding_plugin/python directory with the helper code that we provide.

  • -
  • Embedding Training Cache: To enable a model with large embedding tables that exceeds the single GPU’s memory limit, we’ve added a new embedding training cache feature, giving you the ability to load a subset of an embedding table into the GPU in a coarse grained, on-demand manner during the training stage.

  • -
  • TF32 Support: We’ve added TensorFloat-32 (TF32), a new math mode and third-generation of Tensor Cores, support on Ampere. TF32 uses the same 10-bit mantissa as FP16 to ensure accuracy while providing the same range as FP32 by using an 8-bit exponent. Since TF32 is an internal data type that accelerates FP32 GEMM computations with tensor cores, you can simply turn it on with a newly added configuration option. For more information, refer to Solver.

  • -
  • Enhanced AUC Implementation: To enhance the performance of our AUC computation on multi-node environments, we’ve redesigned our AUC implementation to improve how the computational load gets distributed across nodes.

  • -
  • Epoch-Based Training: In addition to the max_iter parameter, you can now set the num_epochs parameter in the Solver clause within the configuration file. This mode can only currently be used with Norm dataset formats and their corresponding file lists. All dataset formats will be supported in the future.

  • -
  • New Multi-Node Training Tutorial: To better support multi-node training use cases, we’ve added a new step-by-step tutorial to the tutorial/multinode-training directory of our GitHub repository.

  • -
  • Power Law Distribution Support with Data Generator: Because of the increased need for generating a random dataset whose categorical features follows the power-law distribution, we’ve revised our data generation tool to support this use case. For additional information, refer to the --long-tail description in the Generating Synthetic Data and Benchmarks section of the docs/hugectr_user_guide.md file in the repository.

  • -
  • Multi-GPU Preprocessing Script for Criteo Samples: Multiple GPUs can now be used when preparing the dataset for the programs in the samples directory of our GitHub repository. For more information, see how the preprocess_nvt.py program in the tools/criteo_script directory of the repository is used to preprocess the Criteo dataset for DCN, DeepFM, and W&D samples.

  • -
-
-
-

Known Issues

-
    -
  • HugeCTR uses NCCL to share data between ranks, and NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing: -shm-size=1g -ulimit memlock=-1 -See also NCCL’s known issue. And the GitHub issue.

  • -
  • KafkaProducers startup will succeed, even if the target Kafka broker is unresponsive. In order to avoid data-loss in conjunction with streaming model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers is up, working properly and reachable from the node where you run HugeCTR.

  • -
  • The number of data files in the file list should be no less than the number of data reader workers. Otherwise, different workers will be mapped to the same file and data loading does not progress as expected.

  • -
  • Joint Loss training hasn’t been supported with regularizer.

  • -
-
-
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file diff --git a/review/pr-458/search.html b/review/pr-458/search.html deleted file mode 100644 index 879c518e68..0000000000 --- a/review/pr-458/search.html +++ /dev/null @@ -1,153 +0,0 @@ - - - - - - Search — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- - - - -
- -
- -
-
- -
-
-
-
- - - - - - - - - - - - \ No newline at end of file diff --git a/review/pr-458/searchindex.js b/review/pr-458/searchindex.js deleted file mode 100644 index c40c67685d..0000000000 --- a/review/pr-458/searchindex.js +++ /dev/null @@ -1 +0,0 @@ -Search.setIndex({"docnames": ["QAList", "additional_resources", "api/hugectr_layer_book", "api/index", "api/python_interface", "hierarchical_parameter_server/hps_database_backend", "hierarchical_parameter_server/hps_dlrm_benchmark", "hierarchical_parameter_server/hps_tf_api/index", "hierarchical_parameter_server/hps_tf_api/initialize", "hierarchical_parameter_server/hps_tf_api/layers", "hierarchical_parameter_server/hps_tf_user_guide", "hierarchical_parameter_server/hps_torch_api/index", "hierarchical_parameter_server/hps_torch_api/lookup_layer", "hierarchical_parameter_server/hps_torch_user_guide", "hierarchical_parameter_server/hps_trt_api/hps_plugin", "hierarchical_parameter_server/hps_trt_api/hps_plugin_creator", "hierarchical_parameter_server/hps_trt_api/index", "hierarchical_parameter_server/hps_trt_user_guide", "hierarchical_parameter_server/index", "hierarchical_parameter_server/profiling_hps", "hps_tf/notebooks/hierarchical_parameter_server_demo", "hps_tf/notebooks/hps_multi_table_sparse_input_demo", "hps_tf/notebooks/hps_pretrained_model_training_demo", "hps_tf/notebooks/hps_table_fusion_demo", "hps_tf/notebooks/hps_tensorflow_triton_deployment_demo", "hps_tf/notebooks/index", "hps_tf/notebooks/sok_to_hps_dlrm_demo", "hps_tf/notebooks/sok_train_demo", "hps_torch/notebooks/hps_torch_demo", "hps_torch/notebooks/index", "hps_trt/notebooks/benchmark_tf_trained_large_model", "hps_trt/notebooks/demo_for_hugectr_trained_model", "hps_trt/notebooks/demo_for_pytorch_trained_model", "hps_trt/notebooks/demo_for_tf_trained_model", "hps_trt/notebooks/index", "hugectr_contributor_guide", "hugectr_core_features", "hugectr_talks_blogs", "hugectr_user_guide", "index", "notebooks/embedding_collection", "notebooks/hps_demo", "notebooks/hugectr_e2e_demo_with_nvtabular", "notebooks/index", "notebooks/multi-modal-data/00-Intro", "notebooks/multi-modal-data/01-Download-Convert", "notebooks/multi-modal-data/02-Data-Enrichment", "notebooks/multi-modal-data/03-Feature-Extraction-Poster", "notebooks/multi-modal-data/04-Feature-Extraction-Text", "notebooks/multi-modal-data/05-Create-Feature-Store", "notebooks/multi-modal-data/06-ETL-with-NVTabular", "notebooks/multi-modal-data/07-Training-with-HugeCTR", "notebooks/multi-modal-data/index", "notebooks/training_with_remote_filesystem", "performance", "release_notes", "sparse_operation_kit"], "filenames": ["QAList.md", "additional_resources.md", "api/hugectr_layer_book.md", "api/index.rst", "api/python_interface.md", "hierarchical_parameter_server/hps_database_backend.md", "hierarchical_parameter_server/hps_dlrm_benchmark.md", "hierarchical_parameter_server/hps_tf_api/index.rst", "hierarchical_parameter_server/hps_tf_api/initialize.rst", "hierarchical_parameter_server/hps_tf_api/layers.rst", "hierarchical_parameter_server/hps_tf_user_guide.md", "hierarchical_parameter_server/hps_torch_api/index.rst", "hierarchical_parameter_server/hps_torch_api/lookup_layer.md", "hierarchical_parameter_server/hps_torch_user_guide.md", "hierarchical_parameter_server/hps_trt_api/hps_plugin.md", "hierarchical_parameter_server/hps_trt_api/hps_plugin_creator.md", "hierarchical_parameter_server/hps_trt_api/index.rst", "hierarchical_parameter_server/hps_trt_user_guide.md", "hierarchical_parameter_server/index.md", "hierarchical_parameter_server/profiling_hps.md", "hps_tf/notebooks/hierarchical_parameter_server_demo.ipynb", "hps_tf/notebooks/hps_multi_table_sparse_input_demo.ipynb", "hps_tf/notebooks/hps_pretrained_model_training_demo.ipynb", "hps_tf/notebooks/hps_table_fusion_demo.ipynb", "hps_tf/notebooks/hps_tensorflow_triton_deployment_demo.ipynb", "hps_tf/notebooks/index.md", "hps_tf/notebooks/sok_to_hps_dlrm_demo.ipynb", "hps_tf/notebooks/sok_train_demo.ipynb", "hps_torch/notebooks/hps_torch_demo.ipynb", "hps_torch/notebooks/index.md", "hps_trt/notebooks/benchmark_tf_trained_large_model.ipynb", "hps_trt/notebooks/demo_for_hugectr_trained_model.ipynb", "hps_trt/notebooks/demo_for_pytorch_trained_model.ipynb", "hps_trt/notebooks/demo_for_tf_trained_model.ipynb", "hps_trt/notebooks/index.md", "hugectr_contributor_guide.md", "hugectr_core_features.md", "hugectr_talks_blogs.md", "hugectr_user_guide.md", "index.rst", "notebooks/embedding_collection.ipynb", "notebooks/hps_demo.ipynb", "notebooks/hugectr_e2e_demo_with_nvtabular.ipynb", "notebooks/index.md", "notebooks/multi-modal-data/00-Intro.ipynb", "notebooks/multi-modal-data/01-Download-Convert.ipynb", "notebooks/multi-modal-data/02-Data-Enrichment.ipynb", "notebooks/multi-modal-data/03-Feature-Extraction-Poster.ipynb", "notebooks/multi-modal-data/04-Feature-Extraction-Text.ipynb", "notebooks/multi-modal-data/05-Create-Feature-Store.ipynb", "notebooks/multi-modal-data/06-ETL-with-NVTabular.ipynb", "notebooks/multi-modal-data/07-Training-with-HugeCTR.ipynb", "notebooks/multi-modal-data/index.md", "notebooks/training_with_remote_filesystem.ipynb", "performance.md", "release_notes.md", "sparse_operation_kit.md"], "titles": ["Questions and Answers", "Additional Resources", "HugeCTR Layer Classes and Methods", "HugeCTR API Documentation", "HugeCTR Python Interface", "Hierarchical Parameter Server Database Backend", "Benchmark the DLRM Model with HPS", "Hierarchical Parameter Server API", "HPS Initialize", "HPS Layers", "Hierarchical Parameter Server Plugin for TensorFlow", "HPS Plugin for Torch API", "HPS Plugin for Torch", "Hierarchical Parameter Server Plugin for Torch", "HPS Plugin", "HPS Plugin Creator", "HPS Plugin for TensorRT API", "Hierarchical Parameter Server Plugin for TensorRT", "Hierarchical Parameter Server", "Profiling HPS", "Hierarchical Parameter Server Demo", "HPS for Multiple Tables and Sparse Inputs", "HPS Pretrained Model Training Demo", "HPS Table Fusion Demo", "Deploy SavedModel using HPS with Triton TensorFlow Backend", "Hierarchical Parameter Server Notebooks", "SOK to HPS DLRM Demo", "SOK Train DLRM Demo", "HPS Torch Demo", "Hierarchical Parameter Server Notebooks", "HPS TensorRT Plugin Benchmark for TensorFlow Large Model", "HPS TensorRT Plugin Demo for HugeCTR Trained Model", "HPS TensorRT Plugin Demo for PyTorch Trained Model", "HPS TensorRT Plugin Demo for TensorFlow Trained Model", "HPS Plugin for TensorRT Notebooks", "Contributing to HugeCTR", "HugeCTR Core Features", "HugeCTR Talks and Blogs", "Introduction to HugeCTR", "Merlin HugeCTR", "HugeCTR Embedding Collection", "Hierarchical Parameter Server Demo", "HugeCTR End-end Example with NVTabular", "HugeCTR Example Notebooks", "Training Recommender Systems on Multi-modal Data", "MovieLens-25M: Download and Convert", "MovieLens Data Enrichment", "Movie Poster Feature Extraction with ResNet", "Movie Synopsis Feature Extraction with Bart text summarization", "Creating Multi-Modal Movie Feature Store", "ETL with NVTabular", "Training HugeCTR Model with Pre-trained Embeddings", "Multi-modal Example Notebooks", "HugeCTR Training with Remote File System Example", "Performance", "Release Notes", "Sparse Operation Kit"], "terms": {"try": [0, 5, 31, 36, 41, 46, 47, 55], "provid": [0, 2, 4, 5, 10, 13, 17, 18, 20, 21, 22, 25, 26, 28, 29, 30, 31, 34, 35, 36, 38, 39, 40, 41, 43, 44, 50, 52, 53, 54, 55, 56], "recommend": [0, 2, 4, 5, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 36, 37, 38, 39, 43, 45, 50, 52, 55], "variou": [0, 5, 13, 36, 38], "industri": [0, 36], "high": [0, 3, 5, 10, 13, 17, 18, 20, 21, 22, 24, 26, 27, 28, 32, 33, 36, 37, 38, 46, 50, 54, 55], "effici": [0, 4, 5, 13, 22, 30, 54, 55], "solut": [0, 18, 36, 38, 55], "onlin": [0, 4, 5, 8, 10, 30, 36, 55], "offlin": [0, 4, 5, 43, 55], "also": [0, 2, 4, 5, 6, 14, 19, 20, 21, 22, 24, 25, 26, 30, 36, 38, 40, 41, 43, 44, 46, 50, 51, 55], "refer": [0, 2, 4, 5, 6, 10, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 41, 42, 43, 44, 53, 55], "design": [0, 2, 4, 5, 13, 35, 36, 37, 38, 39, 50, 55, 56], "develop": [0, 5, 25, 29, 34, 36, 37, 39, 40, 43, 55], "want": [0, 2, 4, 5, 9, 30, 34, 35, 36, 45, 50, 53, 55], "port": [0, 4, 5, 25, 29, 30, 34, 41, 42, 43, 53, 55], "optim": [0, 2, 4, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 33, 37, 40, 41, 42, 51, 53, 54, 55], "current": [0, 2, 4, 8, 20, 21, 22, 26, 30, 32, 33, 38, 41, 45, 46, 47, 50, 53, 55], "v2": [0, 2, 4, 24, 47, 55], "dnn": [0, 20, 21, 22, 23, 24, 36, 54, 55, 56], "wdl": [0, 5, 19, 38, 41, 54], "dcn": [0, 2, 4, 36, 38, 54, 55], "deepfm": [0, 38, 55], "dlrm": [0, 10, 17, 25, 30, 31, 32, 33, 38, 54, 55], "variant": [0, 30, 38, 55], "wide": [0, 38, 54, 55], "system": [0, 2, 4, 5, 10, 13, 17, 18, 20, 21, 22, 23, 24, 27, 28, 31, 32, 33, 35, 36, 37, 38, 41, 45, 50, 51, 52, 54, 55], "directori": [0, 4, 5, 10, 19, 24, 25, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 45, 47, 50, 54, 55], "repositori": [0, 4, 10, 18, 19, 24, 31, 32, 33, 35, 36, 38, 39, 40, 41, 53, 54, 55], "github": [0, 4, 19, 23, 25, 28, 29, 30, 33, 34, 35, 36, 38, 39, 40, 41, 43, 47, 50, 53, 54, 55], "them": [0, 2, 4, 9, 13, 20, 21, 22, 23, 24, 26, 30, 35, 50, 55], "express": [0, 2, 5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "confin": 0, "aforement": [0, 2], "you": [0, 2, 4, 5, 8, 9, 10, 13, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 45, 46, 49, 50, 53, 55, 56], "your": [0, 2, 4, 5, 10, 13, 17, 19, 22, 25, 29, 30, 31, 32, 33, 34, 36, 38, 41, 43, 53, 55], "own": [0, 2, 25, 29, 34, 38], "combin": [0, 2, 4, 5, 9, 10, 13, 17, 20, 21, 22, 26, 27, 28, 31, 32, 33, 35, 36, 40, 41, 42, 44, 51, 53, 55], "ha": [0, 2, 4, 5, 6, 9, 12, 14, 15, 20, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 36, 38, 41, 50, 54, 55], "tf": [0, 6, 8, 9, 10, 25, 26, 27, 35, 41, 54, 55], "yet": [0, 4, 20, 21, 24, 26, 30, 55], "compat": [0, 2, 5, 10, 20, 21, 22, 24, 30, 33, 36, 41, 42, 55, 56], "export": [0, 5, 30, 32, 40, 55], "follow": [0, 2, 4, 5, 6, 8, 10, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 41, 42, 43, 47, 50, 53, 54, 55], "instruct": [0, 19, 20, 21, 22, 23, 24, 26, 30, 33, 35, 40, 41, 42, 53, 55], "dump_to_tf": [0, 55], "tutori": [0, 55], "ye": [0, 5, 19, 41], "check": [0, 2, 4, 10, 13, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 43, 53, 55, 56], "out": [0, 2, 4, 5, 22, 24, 30, 32, 33, 36, 41, 43, 50, 55], "dcn2node": 0, "more": [0, 2, 4, 5, 10, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 38, 39, 40, 41, 43, 50, 54, 55], "detail": [0, 3, 4, 6, 15, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 40, 41, 43, 55, 56], "across": [0, 2, 4, 5, 10, 36, 38, 39, 41, 50, 55], "so": [0, 2, 4, 5, 10, 13, 17, 20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 35, 36, 40, 41, 45, 50, 55], "have": [0, 2, 4, 5, 9, 10, 18, 20, 21, 24, 26, 30, 35, 36, 40, 41, 42, 49, 50, 53, 54, 55], "veri": [0, 36, 40, 55], "larg": [0, 2, 5, 6, 10, 13, 17, 18, 34, 36, 37, 38, 40, 41, 46, 48, 50, 55, 56], "just": [0, 30, 42, 47, 55], "mani": [0, 5, 30, 55, 56], "That": [0, 36, 55], "why": [0, 36], "name": [0, 2, 4, 5, 9, 12, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 38, 40, 41, 42, 43, 46, 47, 50, 51, 53, 55], "suppos": [0, 2, 41], "1tb": [0, 30, 36, 38, 54, 55], "16xv100": 0, "32gb": [0, 20, 21, 22, 23, 24, 25, 26, 29, 31, 33, 34, 40, 41, 42, 43, 53], "take": [0, 4, 5, 6, 19, 30, 31, 32, 33, 38, 41, 45, 46, 50, 55, 56], "case": [0, 2, 4, 5, 6, 30, 32, 36, 41, 53, 54, 55, 56], "comput": [0, 2, 4, 5, 6, 9, 20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 35, 40, 41, 50, 55], "0": [0, 2, 4, 5, 6, 8, 9, 10, 13, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54], "v100": [0, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 29, 31, 33, 34, 35, 38, 40, 41, 42, 43, 51, 53, 54, 55], "t4": [0, 10, 13, 17, 38], "A": [0, 2, 4, 5, 9, 19, 25, 29, 30, 34, 36, 37, 38, 40, 41, 43, 50, 55], "machin": [0, 2, 4, 5, 6, 25, 29, 34, 37, 41, 42, 43, 54, 55, 56], "mandatori": 0, "achiev": [0, 2, 4, 5, 6, 13, 28, 30, 36, 41, 54, 55], "best": [0, 2, 4, 5, 6, 22, 28, 36], "perform": [0, 2, 4, 5, 6, 9, 10, 12, 13, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28, 30, 33, 34, 35, 36, 37, 41, 55], "exploit": [0, 10], "nvswitch": [0, 2], "inter": [0, 36, 41, 54, 55], "bandwidth": [0, 2, 4, 5, 36, 55], "ucx": [0, 42], "howev": [0, 2, 5, 10, 17, 30, 35, 38, 41, 43, 49, 50, 55], "rdma": [0, 4, 36, 55], "maxim": [0, 5, 30], "transact": [0, 36, 54, 55], "approach": [0, 30, 41], "offload": [0, 5, 55], "workload": [0, 4, 55, 56], "oper": [0, 2, 4, 5, 19, 20, 21, 22, 23, 24, 26, 30, 33, 37, 40, 41, 43, 50, 55], "mainli": 0, "decid": [0, 2, 4], "kind": [0, 4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 42, 53, 55], "o": [0, 2, 4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 41, 42, 45, 46, 50, 51, 55], "devic": [0, 4, 5, 8, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 47, 50, 51, 53, 54, 55], "dataset": [0, 2, 3, 5, 9, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 40, 41, 42, 43, 44, 46, 50, 52, 53, 54, 55], "section": [0, 4, 5, 30, 33, 35, 40, 45, 46, 49, 51, 55], "api": [0, 2, 8, 15, 19, 20, 21, 22, 23, 24, 26, 27, 28, 31, 36, 37, 41, 43, 46, 47, 50, 53, 55], "document": [0, 2, 4, 5, 15, 18, 19, 26, 27, 30, 33, 35, 36, 39, 41, 43, 55, 56], "introduc": [0, 2, 4, 37, 43, 55], "our": [0, 4, 5, 6, 10, 17, 25, 29, 30, 34, 35, 36, 38, 39, 43, 45, 51, 53, 54, 55], "first": [0, 2, 4, 5, 8, 10, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 38, 45, 46, 47, 48, 50, 51, 55], "version": [0, 2, 4, 5, 6, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 47, 49, 50, 51, 53, 54], "exampl": [0, 2, 4, 5, 8, 9, 15, 19, 25, 29, 30, 34, 35, 36, 38, 40, 41, 44, 50, 54, 55], "hashtabl": [0, 26, 27], "base": [0, 2, 4, 5, 9, 17, 19, 24, 25, 30, 31, 32, 33, 34, 36, 38, 41, 45, 50, 52, 55], "dynam": [0, 2, 4, 5, 23, 33, 38, 41, 55], "insert": [0, 2, 5, 6, 19, 28, 30, 31, 32, 33, 38, 41, 55], "new": [0, 4, 5, 22, 24, 25, 38, 41, 42, 46, 50], "ad": [0, 2, 4, 41, 50, 54, 55], "runtim": [0, 5, 8, 10, 13, 17, 25, 29, 34, 38, 41, 42, 43, 55], "skip": [0, 31, 40, 41, 49, 55], "In": [0, 2, 4, 5, 6, 8, 9, 20, 21, 22, 24, 26, 30, 31, 32, 33, 35, 36, 40, 41, 42, 45, 46, 47, 48, 50, 51, 53, 54, 55, 56], "field": [0, 2, 4, 5, 6, 15, 20, 21, 22, 24, 26, 27, 32, 33, 36, 41, 44, 50], "The": [0, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 45, 46, 50, 51, 52, 53, 54, 55], "one": [0, 2, 4, 5, 6, 8, 9, 14, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 41, 43, 54, 55], "hot": [0, 2, 4, 10, 19, 21, 24, 27, 31, 36, 41, 45, 50, 55], "number": [0, 2, 4, 5, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 40, 41, 42, 44, 45, 51, 53, 54, 55], "specifi": [0, 2, 4, 5, 8, 9, 10, 12, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 40, 41, 42, 43, 53, 55], "slot_num": [0, 2, 4, 5, 20, 21, 22, 24, 26, 27, 30, 32, 33, 51], "There": [0, 2, 4, 10, 19, 30, 31, 32, 33, 38, 43, 55], "sub": [0, 3, 41, 55], "class": [0, 3, 9, 11, 14, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33, 40, 46, 50, 55], "thei": [0, 4, 5, 20, 21, 22, 24, 26, 27, 45, 47, 55, 56], "distinguish": [0, 55], "method": [0, 3, 6, 33, 40, 41, 42, 53, 55], "distribut": [0, 2, 4, 5, 8, 10, 13, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 31, 36, 37, 38, 39, 40, 41, 42, 53, 55, 56], "call": [0, 2, 4, 8, 9, 20, 21, 22, 23, 24, 26, 27, 30, 32, 33, 41, 44, 50, 55], "local": [0, 4, 5, 17, 18, 19, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 40, 41, 42, 43, 49, 53, 55], "mai": [0, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 33, 35, 36, 37, 40, 42, 50, 53, 55], "accord": [0, 2, 4, 5, 24, 30, 31, 32, 33, 36, 55], "index": [0, 2, 4, 9, 12, 15, 24, 26, 27, 30, 31, 32, 33, 36, 42, 49, 51, 53, 55], "regardless": [0, 36, 56], "mean": [0, 2, 4, 5, 9, 19, 21, 22, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 40, 41, 43, 47, 48, 50, 55], "thu": [0, 2, 22, 24, 26, 40, 55], "smaller": [0, 5, 36, 55], "than": [0, 2, 4, 5, 9, 19, 24, 30, 36, 43, 50, 54, 55], "reduct": [0, 2, 9, 22, 36, 38], "per": [0, 2, 4, 5, 6, 31, 34, 41, 45, 55], "global": [0, 4, 5, 8, 9, 20, 21, 22, 24, 26, 27, 31, 32, 33, 36, 40, 41, 42, 50, 51, 53, 55], "reduc": [0, 2, 4, 5, 22, 30, 31, 32, 33, 36, 40, 41, 42, 51, 53, 55], "overal": [0, 5, 36], "much": [0, 36, 55], "less": [0, 4, 30, 36, 55], "made": [0, 2, 55], "some": [0, 2, 4, 5, 23, 24, 30, 35, 41, 43, 46, 48, 55], "larger": [0, 4, 5, 9, 36, 55], "trasact": 0, "iter": [0, 2, 4, 19, 20, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 40, 41, 42, 50, 51, 53, 55], "after": [0, 2, 4, 5, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 42, 53], "forward": [0, 2, 4, 5, 6, 9, 12, 17, 28, 30, 32, 38, 41, 55], "kernel": [0, 4, 24, 30, 31, 32, 33, 46, 47, 48, 55], "function": [0, 2, 4, 5, 8, 9, 12, 19, 20, 21, 22, 24, 26, 27, 31, 32, 33, 35, 38, 41, 50, 55, 56], "collect": [0, 3, 4, 10, 13, 17, 19, 30, 38, 50, 55], "commun": [0, 2, 4, 5, 36, 55], "librari": [0, 4, 10, 13, 17, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 38, 39, 41, 48, 50, 55], "nccl": [0, 4, 22, 31, 36, 40, 41, 42, 51, 53, 55], "should": [0, 2, 4, 5, 8, 9, 10, 14, 20, 21, 22, 24, 28, 30, 31, 32, 33, 35, 36, 41, 47, 48, 49, 50, 55], "sourc": [0, 3, 8, 9, 10, 13, 17, 19, 31, 36, 38, 39, 40, 41, 42, 43, 51, 53, 55], "where": [0, 2, 5, 9, 17, 36, 40, 41, 44, 46, 55], "spars": [0, 3, 4, 5, 9, 19, 22, 24, 25, 26, 27, 30, 31, 37, 38, 40, 41, 42, 43, 51, 53, 55], "an": [0, 2, 4, 5, 8, 9, 15, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 33, 35, 36, 37, 38, 39, 41, 42, 43, 46, 47, 50, 53, 55], "arrai": [0, 2, 4, 15, 20, 21, 22, 24, 26, 28, 31, 32, 33, 41, 42, 47, 51, 55], "belong": [0, 4], "last": [0, 2, 4, 5, 9, 20, 21, 22, 26, 33, 41, 47, 48], "second": [0, 2, 4, 5, 21, 23, 28, 30, 33, 55], "element": [0, 2, 4, 51, 55], "below": [0, 2, 4, 6, 15, 19, 25, 29, 30, 34, 35, 43, 48, 49, 51], "top": [0, 2, 4, 26, 27, 32, 33, 54, 55], "data1": [0, 2, 4, 40, 41, 51, 53], "type": [0, 3, 4, 5, 8, 9, 10, 14, 15, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 33, 36, 38, 40, 41, 42, 44, 50, 51, 53, 55], "distributedslot": 0, "max_feature_num_per_sampl": 0, "data2": [0, 40, 41], "binari": [0, 2, 4, 5, 10, 20, 21, 22, 23, 24, 26, 30, 33, 41, 50, 51, 55], "raw": [0, 2, 3, 23, 24, 28, 30, 31, 32, 33, 38, 41, 44, 50, 51, 55], "snapshot": [0, 4, 31, 40, 41, 42, 51, 53], "json": [0, 2, 4, 5, 6, 8, 9, 10, 12, 15, 19, 20, 21, 22, 23, 24, 26, 27, 28, 41, 42, 47, 50, 51, 53, 55], "interv": [0, 4, 5, 31, 40, 41, 42, 51, 53], "checkpoint": [0, 30, 47], "prefix": [0, 4, 5, 41], "snapshot_prefix": [0, 2, 4, 31, 40, 41, 42, 53], "modifi": [0, 4, 17, 30, 33, 41, 55], "dense_model_fil": [0, 4, 5, 41], "sparse_model_fil": [0, 4, 5, 41], "solver": [0, 2, 3, 31, 40, 41, 42, 51, 53, 55], "write": [0, 4, 5, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 51, 53, 55], "script": [0, 4, 6, 28, 30, 36, 38, 41, 43, 47, 53, 54, 55], "demonstr": [0, 4, 13, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 38, 41, 43, 44, 52, 55], "uniqu": [0, 2, 4, 5, 51, 55], "preprocess": [0, 4, 30, 40, 43, 54, 55], "e": [0, 4, 5, 6, 9, 10, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 35, 36, 41, 46, 47], "g": [0, 2, 4, 5, 10, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 41, 53], "offset": [0, 2, 4, 27, 55], "hash": [0, 2, 5, 31, 38, 41, 42, 51, 53, 55], "nnz": [0, 4, 21, 22, 26, 31, 41], "look": [0, 4, 5, 9, 19, 30, 41, 45, 49, 50, 55], "up": [0, 2, 4, 5, 9, 21, 24, 25, 28, 29, 30, 31, 33, 34, 36, 40, 41, 42, 50, 51, 53, 54, 55], "firstli": [0, 53], "guid": [0, 4, 22, 24, 30, 31, 32, 33, 38, 55], "secondli": [0, 53], "data_gener": [0, 4, 31, 38, 41, 55], "gener": [0, 2, 3, 5, 6, 14, 19, 20, 21, 22, 23, 24, 26, 27, 30, 32, 33, 35, 50, 54, 55], "random": [0, 4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 38, 45, 49, 55], "see": [0, 2, 4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 39, 40, 41, 42, 43, 45, 50, 53, 54, 55], "start": [0, 2, 4, 5, 10, 13, 17, 19, 23, 24, 26, 28, 30, 31, 32, 33, 36, 38, 40, 41, 42, 46, 47, 51, 53, 55], "readm": [0, 40, 41, 55], "thirdli": 0, "huge_ctr": 0, "your_config": 0, "alloc": [0, 4, 5, 26, 30, 31, 32, 33, 40, 41, 42, 55], "accordingli": [0, 33, 55], "necessarili": 0, "exact": [0, 2], "depend": [0, 2, 4, 5, 10, 13, 17, 23, 28, 31, 35, 38, 40, 41, 42, 45, 50, 51, 53, 55], "vocabulari": [0, 2, 21, 31, 40, 42, 51, 53, 55], "workspac": [0, 2, 19, 30, 47, 48, 55], "calcul": [0, 2, 4, 5, 42, 50, 55], "tool": [0, 4, 17, 19, 30, 31, 32, 33, 40, 41, 54, 55], "workspace_s": 0, "usual": [0, 2, 53], "real": [0, 5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 45, 55], "becaus": [0, 2, 5, 6, 19, 26, 27, 30, 41, 50, 54, 55, 56], "non": [0, 2, 4, 5, 10, 20, 21, 22, 23, 24, 26, 30, 31, 33, 35, 40, 41, 42, 43, 53, 55], "uniform": [0, 2, 4, 27, 38], "kei": [0, 2, 4, 5, 9, 10, 12, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 36, 38, 40, 41, 46, 47, 51, 53, 55], "argument": [0, 2, 4, 8, 12, 19, 20, 21, 22, 23, 24, 26, 27, 32, 33, 35, 40, 41, 43, 50, 55], "usag": [0, 3, 4, 5, 10, 19, 30, 31, 32, 33, 40, 42, 55], "replac": [0, 5, 6, 17, 20, 21, 22, 24, 26, 30, 31, 32, 33, 55], "avoid": [0, 2, 4, 5, 20, 21, 22, 26, 30, 33, 36, 41, 55], "wast": [0, 2], "caus": [0, 2, 4, 5, 32, 55], "imbalanc": [0, 2], "add": [0, 3, 10, 13, 15, 17, 19, 21, 25, 29, 30, 31, 34, 35, 38, 40, 41, 42, 43, 50, 51, 53, 55], "maximum": [0, 2, 4, 5, 6, 28, 30, 51, 55], "equat": [0, 4], "_size": [0, 4], "_arrai": [0, 4], "k": [0, 2, 4, 47], "max": [0, 4, 5, 19, 21, 22, 23, 26, 28, 30, 31, 32, 33, 41], "limits_i": [0, 4], "k_i": [0, 4], "list": [0, 2, 3, 5, 6, 9, 20, 21, 24, 26, 27, 28, 30, 32, 33, 40, 41, 42, 45, 46, 47, 50, 51, 55], "inform": [0, 2, 4, 5, 10, 19, 23, 24, 28, 35, 36, 38, 39, 40, 41, 46, 50, 53, 54, 55], "about": [0, 3, 5, 6, 10, 15, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 39, 41, 50, 55], "relat": [0, 2, 4, 5, 24, 35, 53, 55], "amount": [0, 5, 40, 55, 56], "localizedslotsparseembeddinghash": [0, 3, 51, 55], "If": [0, 2, 4, 5, 9, 19, 25, 29, 30, 31, 32, 33, 34, 35, 36, 40, 41, 43, 45, 46, 49, 55], "help": [0, 2, 4, 5, 10, 19, 36, 40, 42, 53, 55], "altern": [0, 4, 5, 30, 43, 55], "both": [0, 2, 4, 5, 6, 26, 27, 35, 36, 41, 49, 50, 53, 54, 55], "localizedslotsparseembeddingonehot": [0, 3, 4], "hybridsparseembed": [0, 4], "workspace_size_per_gpu_in_md": 0, "while": [0, 2, 4, 5, 6, 13, 18, 20, 21, 24, 26, 27, 30, 31, 32, 33, 41, 46, 50, 54, 55, 56], "highli": [0, 2, 55], "reli": [0, 4, 5, 19, 35, 41, 55], "pcie": [0, 34], "connect": [0, 2, 4, 5, 21, 22, 23, 24, 25, 26, 29, 30, 33, 34, 35, 38, 41, 42, 43, 55], "insid": [0, 4, 22, 30, 55], "expect": [0, 50, 55], "150gb": 0, "direct": [0, 50], "It": [0, 2, 4, 5, 8, 10, 12, 13, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 36, 38, 40, 41, 43, 50, 55, 56], "3x": [0, 55], "pci": [0, 20, 21, 22, 23, 24, 26, 33, 41], "convert": [0, 6, 10, 17, 20, 21, 22, 24, 38, 44, 51, 52, 54, 55], "facilit": [0, 10, 30, 55], "process": [0, 2, 4, 5, 8, 24, 30, 31, 32, 33, 36, 40, 42, 45, 46, 49, 54, 55, 56], "save_params_to_fil": [0, 3], "familiar": [0, 4], "pre": [0, 5, 10, 25, 52, 53, 55], "hugectr_criteo": 0, "ipynb": [0, 6, 21, 22, 23, 24, 25, 26, 27, 29, 34, 36, 43, 44, 45, 46, 55], "inher": [0, 38], "extra": [0, 4, 46, 47, 48, 55], "abstract": [0, 2, 4, 50, 55], "awai": [0, 2, 50, 55], "slice": [0, 3, 33, 53, 55], "code": [0, 2, 8, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 50, 53, 55, 56], "cooper": 0, "thread": [0, 4, 5, 8, 20, 21, 22, 23, 24, 26, 31, 32, 33, 35, 38, 41, 42, 46, 54, 55], "cta": 0, "launch": [0, 5, 6, 8, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 38, 41, 43, 55], "foremost": 0, "exce": [0, 2, 5, 18, 55], "block": [0, 8, 30, 31, 32, 33, 41, 46, 55], "would": [0, 5, 32, 55], "better": [0, 2, 4, 5, 30, 42, 55], "warp": [0, 55], "sake": [0, 41], "occup": [0, 19], "still": [0, 4, 5, 24, 55], "freeli": 0, "architectur": [0, 10, 13, 17, 18, 20, 28, 36, 38, 47, 50, 55], "long": [0, 4, 5, 10, 32, 41, 55], "compli": 0, "limit": [0, 3, 5, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 40, 42, 53, 55, 56], "share": [0, 2, 5, 17, 30, 35, 41, 53, 55], "rank": [0, 2, 5, 9, 55], "ipc": [0, 19, 43, 47, 48, 55], "pin": [0, 55], "page": [0, 30, 37, 41, 55], "lock": [0, 55], "resourc": [0, 4, 5, 10, 26, 50, 55, 56], "issu": [0, 2, 5, 30, 33, 35], "option": [0, 2, 4, 5, 6, 19, 22, 24, 30, 33, 35, 36, 38, 41, 42, 46, 53, 55], "docker": [0, 5, 10, 13, 17, 19, 34, 35, 38, 41, 47, 48, 53, 55], "command": [0, 6, 10, 13, 17, 19, 24, 25, 29, 30, 31, 32, 33, 34, 35, 38, 41, 43, 49, 53, 55], "host": [0, 4, 5, 10, 13, 17, 18, 19, 25, 29, 31, 32, 33, 34, 38, 41, 42, 43, 47, 48, 54, 55], "ulimit": [0, 19, 43, 55], "memlock": [0, 19, 43, 55], "stack": [0, 19, 28, 43, 47], "67108864": [0, 19, 43], "leverag": [0, 4, 6, 8, 10, 13, 17, 22, 24, 25, 30, 36, 50, 55], "hirarch": 0, "paramet": [0, 2, 4, 8, 9, 15, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 37, 43, 53, 55, 56], "cach": [0, 4, 6, 10, 13, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 35, 36, 37, 41, 47, 55], "hierarch": [0, 2, 4, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 37, 43, 55], "storag": [0, 4, 5, 10, 13, 17, 18, 20, 28, 30, 31, 32, 33, 36, 40, 41, 42, 55], "encompass": [0, 10, 17], "databas": [0, 4, 10, 13, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 36, 41, 46, 55], "backend": [0, 4, 6, 10, 13, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 31, 32, 33, 35, 36, 41, 47, 53, 55], "updat": [0, 4, 19, 22, 24, 27, 32, 35, 41, 43, 50, 53, 55], "manag": [0, 5, 10, 26, 30, 31, 32, 33, 41, 46, 55], "exhaust": [0, 5, 55], "constantli": [0, 10], "trigger": [0, 5, 8, 22, 33, 36, 55], "thi": [0, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 56], "messag": [0, 5, 19, 33, 35, 41, 42, 55], "scenario": [0, 36, 55, 56], "either": [0, 4, 5, 9, 19, 20, 21, 22, 23, 24, 26, 27, 28, 38, 40, 42, 53], "enforc": [0, 50], "mode": [0, 2, 4, 5, 6, 10, 13, 17, 19, 24, 25, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 51, 53, 55], "hit_rate_threshold": [0, 4, 5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "extend": [0, 2, 5, 10, 21, 22, 26, 27, 55], "enough": [0, 2, 55], "number_of_worker_buffers_in_pool": [0, 4, 5], "pleas": [0, 2, 4, 5, 6, 10, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 40, 41, 46, 53, 55, 56], "hp": [0, 4, 18, 25, 27, 29, 36, 43, 55], "talk": [1, 55], "blog": [1, 54, 55], "question": [1, 35], "answer": 1, "contribut": [1, 55], "hugectr": [1, 5, 13, 17, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33, 44, 50, 52, 55], "differ": [2, 4, 5, 10, 17, 18, 19, 22, 23, 28, 31, 33, 34, 35, 36, 40, 41, 45, 55], "correspond": [2, 4, 5, 9, 36, 41, 50, 55], "python": [2, 3, 5, 10, 13, 15, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 37, 38, 41, 42, 43, 49, 51, 53, 55, 56], "descript": [2, 15, 40, 42, 44, 55], "each": [2, 4, 5, 8, 9, 10, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 34, 36, 38, 40, 41, 42, 43, 50, 51, 53, 54, 55], "includ": [2, 4, 5, 10, 17, 18, 19, 36, 38, 41, 44, 45, 46, 55], "its": [2, 4, 5, 6, 9, 20, 21, 22, 24, 25, 29, 30, 33, 34, 36, 38, 43, 46, 55], "data": [2, 3, 5, 6, 9, 14, 19, 20, 21, 22, 24, 26, 27, 32, 33, 35, 36, 43, 45, 47, 50, 51, 52, 54, 55, 56], "model": [2, 3, 8, 9, 10, 12, 13, 15, 17, 18, 19, 23, 24, 25, 28, 34, 35, 37, 38, 41, 43, 44, 47, 50, 52, 54, 55, 56], "instanc": [2, 4, 5, 6, 9, 30, 33, 36, 46, 47, 48, 55, 56], "sparseembed": [2, 4, 31, 41, 42, 51, 53, 55], "denselay": [2, 4, 31, 40, 41, 42, 51, 53, 55], "can": [2, 4, 5, 6, 8, 9, 10, 13, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 43, 44, 45, 46, 50, 53, 54, 55, 56], "access": [2, 5, 10, 13, 17, 18, 25, 29, 30, 31, 34, 40, 41, 42, 43, 51, 53, 55], "label_dim": [2, 4, 5, 31, 40, 41, 42, 51, 53, 55], "integ": [2, 4, 5, 12, 38, 50, 54], "label": [2, 4, 9, 20, 21, 22, 24, 26, 27, 31, 32, 33, 35, 38, 40, 41, 42, 50, 51, 53, 54, 55], "dimens": [2, 4, 9, 20, 21, 22, 24, 26, 27, 32, 33, 55], "1": [2, 4, 5, 6, 8, 9, 10, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 40, 42, 45, 46, 47, 48, 49, 50, 51, 53], "impli": [2, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "i": [2, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 56], "For": [2, 4, 5, 6, 9, 10, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 50, 54, 55], "item": [2, 4, 44, 46, 47, 48, 53, 54, 55], "click": [2, 36, 37, 38, 39, 54, 55], "NO": [2, 4, 19, 41], "default": [2, 4, 5, 9, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 35, 38, 40, 41, 43, 54, 55], "valu": [2, 4, 5, 8, 9, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 38, 40, 41, 47, 48, 49, 51, 55], "user": [2, 4, 5, 10, 13, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 36, 38, 40, 42, 44, 45, 50, 51, 53, 54, 55], "label_nam": [2, 4, 31, 40, 41, 42, 51, 53], "string": [2, 4, 5, 8, 9, 12, 15, 31, 41, 45, 50, 55], "tensor": [2, 4, 9, 12, 20, 21, 22, 24, 25, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 47, 51, 53, 55], "referenc": [2, 10, 38], "dense_dim": [2, 4, 22, 24, 26, 27, 30, 31, 32, 33, 40, 41, 42, 51, 53, 55], "continu": [2, 4, 5, 30, 43, 46, 50, 55], "featur": [2, 4, 5, 9, 20, 21, 22, 24, 25, 26, 27, 30, 31, 32, 33, 35, 38, 40, 41, 42, 44, 45, 46, 50, 52, 53, 54, 55], "set": [2, 4, 5, 8, 17, 20, 21, 23, 25, 26, 29, 30, 33, 34, 36, 38, 40, 41, 43, 46, 47, 50, 53, 54, 55], "dense_nam": [2, 4, 31, 40, 41, 42, 51, 53], "data_reader_sparse_param_arrai": [2, 4, 31, 40, 41, 42, 51, 53], "datareadersparseparam": [2, 4, 31, 40, 41, 42, 51, 53], "categor": [2, 4, 30, 38, 44, 45, 50, 51, 54, 55], "construct": [2, 4, 41, 50, 55], "sparse_nam": 2, "nnz_per_slot": [2, 51, 55], "is_fixed_length": [2, 51, 55], "reader": [2, 4, 31, 35, 38, 40, 41, 42, 51, 53, 55], "int": [2, 4, 5, 8, 9, 27, 28, 30, 31, 32, 33, 41, 42, 51, 55], "which": [2, 4, 5, 6, 8, 9, 12, 13, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 41, 43, 50, 54, 55, 56], "appli": [2, 4, 5, 13, 19, 20, 24, 25, 28, 30, 35, 36, 40, 50, 54, 55], "everi": [2, 4, 32, 55], "slot": [2, 4, 5, 21, 22, 26, 36, 38, 41, 42, 55], "could": [2, 5, 22, 24, 30, 33, 41, 55], "conveni": [2, 10, 13, 17, 19, 25, 29, 34, 38, 43, 55], "all": [2, 4, 5, 6, 8, 9, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 50, 51, 53, 54, 55, 56], "same": [2, 4, 5, 6, 9, 12, 17, 20, 21, 23, 24, 25, 26, 28, 30, 32, 33, 36, 50, 54, 55], "Or": 2, "initi": [2, 4, 5, 7, 9, 12, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 36, 38, 41, 42, 43, 50, 55], "when": [2, 4, 5, 8, 10, 14, 15, 17, 21, 22, 24, 25, 27, 28, 30, 31, 32, 33, 35, 36, 38, 40, 41, 42, 43, 50, 53, 55], "length": [2, 4, 5, 24, 27, 41], "ident": [2, 5, 24, 30, 33, 55], "note": [2, 4, 5, 9, 19, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 35, 36, 40, 41, 43, 46, 47, 48, 51, 53], "rawasync": [2, 4, 55], "onli": [2, 4, 5, 8, 10, 20, 22, 24, 30, 31, 32, 33, 35, 36, 38, 40, 41, 50, 51, 54, 55, 56], "static": [2, 4, 5, 19, 28, 30, 31, 32, 41, 55], "support": [2, 4, 5, 9, 10, 13, 15, 17, 19, 20, 21, 22, 24, 26, 28, 30, 31, 32, 33, 35, 38, 40, 41, 48, 50, 51, 53, 55, 56], "impact": [2, 55], "parquet": [2, 3, 31, 38, 40, 41, 42, 45, 49, 50, 51, 53, 55], "identifi": [2, 38, 47, 55], "whether": [2, 4, 5, 41, 56], "among": [2, 4, 36, 56], "sampl": [2, 4, 5, 6, 10, 13, 17, 28, 30, 31, 35, 36, 38, 40, 41, 42, 55], "true": [2, 4, 5, 8, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 40, 41, 42, 46, 47, 48, 51, 53, 55], "transfer": [2, 10, 41, 55], "time": [2, 5, 8, 17, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 44, 45, 46, 50, 51, 53, 54, 55], "13": [2, 4, 6, 21, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 47, 51, 53, 54, 55], "26": [2, 4, 5, 6, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 42, 47, 51, 53, 54], "wide_data": [2, 4, 42], "2": [2, 4, 5, 6, 8, 9, 10, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 40, 42, 45, 47, 49, 50, 51, 53], "deep_data": [2, 4, 42], "One": 2, "sever": [2, 4, 5, 10, 19, 20, 21, 22, 23, 24, 30, 36, 40, 44, 52, 55], "befor": [2, 4, 5, 8, 9, 10, 21, 22, 23, 24, 26, 27, 35, 38, 40, 41, 42, 43, 47, 53, 55], "embedding_typ": [2, 4, 31, 41, 42, 51, 53], "embedding_t": [2, 4, 28, 31, 41, 42, 51, 53], "doe": [2, 4, 5, 19, 22, 25, 26, 29, 30, 31, 32, 33, 34, 43, 55], "must": [2, 4, 5, 8, 9, 14, 17, 27, 30, 33, 36, 55], "workspace_size_per_gpu_in_mb": [2, 4, 31, 41, 42, 51, 53, 55], "memori": [2, 4, 5, 10, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 36, 40, 41, 42, 43, 50, 55, 56], "size": [2, 4, 5, 6, 8, 9, 10, 12, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 40, 41, 42, 46, 50, 51, 53, 54, 55], "megabyt": 2, "gpu": [2, 4, 5, 6, 8, 9, 10, 13, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 48, 50, 51, 53, 54, 55, 56], "big": 2, "hold": [2, 5], "state": [2, 4, 5, 24, 31, 35, 36, 41, 42, 48, 50, 51, 53, 55], "dure": [2, 4, 5, 20, 21, 24, 26, 27, 30, 36, 38, 55], "train": [2, 3, 10, 17, 18, 19, 23, 25, 34, 37, 38, 39, 43, 47, 50, 52, 54, 55, 56], "evalu": [2, 3, 5, 8, 20, 21, 24, 26, 31, 40, 41, 42, 50, 51, 53, 55], "To": [2, 4, 5, 8, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 28, 30, 33, 34, 35, 36, 38, 40, 41, 42, 43, 53, 55], "understand": [2, 55], "how": [2, 4, 5, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 40, 41, 43, 50, 52, 55], "slot_size_arrai": [2, 4, 31, 40, 41, 42, 51, 53, 55], "embedding_vec_s": [2, 4, 20, 21, 22, 23, 24, 26, 28, 31, 32, 33, 41, 42, 51, 53], "vector": [2, 4, 5, 9, 10, 12, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 31, 32, 33, 38, 55], "intra": [2, 36, 54], "sum": [2, 4, 9, 21, 40, 41, 42, 51, 53, 55], "sparse_embedding_nam": [2, 4, 31, 41, 42, 51, 53], "bottom_nam": [2, 4, 31, 40, 41, 42, 51, 53], "bottom": [2, 26, 27, 32, 33, 55], "consum": [2, 5, 55, 56], "predefin": 2, "from": [2, 4, 5, 9, 10, 13, 14, 17, 18, 19, 25, 29, 31, 32, 33, 36, 40, 42, 43, 44, 45, 48, 49, 50, 51, 53, 54, 55], "consist": [2, 4, 5, 10, 13, 15, 17, 24, 30, 38, 40, 41, 50, 55], "optparamspi": [2, 3], "dedic": [2, 10, 17, 36, 37, 38, 55, 56], "do": [2, 4, 5, 13, 19, 20, 21, 24, 26, 35, 36, 40, 43, 46, 53, 55], "adopt": [2, 4, 55], "store": [2, 4, 5, 6, 9, 10, 20, 22, 24, 30, 31, 32, 33, 36, 40, 41, 43, 44, 45, 46, 52, 53, 55], "tabl": [2, 4, 5, 6, 9, 10, 12, 13, 15, 17, 18, 20, 22, 24, 25, 26, 27, 29, 31, 32, 33, 34, 35, 36, 38, 41, 42, 43, 51, 53, 55, 56], "get": [2, 4, 14, 36, 40, 41, 42, 46, 48, 55], "indic": [2, 4, 5, 9, 20, 21, 22, 24, 26, 30, 32, 33, 38, 55], "segment": [2, 24, 55], "multipl": [2, 4, 5, 8, 9, 10, 20, 22, 24, 25, 26, 35, 36, 38, 39, 41, 50, 55, 56], "span": 2, "node": [2, 4, 5, 14, 17, 24, 30, 31, 32, 33, 35, 38, 39, 40, 41, 42, 43, 51, 53, 54, 55], "With": [2, 5, 25, 29, 34, 36, 55, 56], "portion": [2, 24], "": [2, 4, 5, 10, 13, 17, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 33, 34, 36, 37, 38, 40, 41, 42, 43, 44, 46, 47, 50, 51], "exist": [2, 5, 10, 13, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 41, 42, 45, 46, 50, 55, 56], "load": [2, 4, 5, 10, 17, 19, 20, 21, 23, 24, 25, 26, 27, 30, 31, 32, 33, 36, 38, 41, 45, 46, 47, 48, 49, 53, 55], "imbal": [2, 55], "oom": 2, "import": [2, 8, 9, 10, 13, 15, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 53, 55], "singl": [2, 4, 36, 40, 41, 42, 50, 54, 55, 56], "assum": [2, 4, 5, 9, 30, 38, 41, 55], "repres": [2, 9, 19, 22, 26, 40, 50], "id": [2, 4, 5, 9, 20, 21, 22, 23, 24, 25, 26, 29, 30, 33, 34, 35, 40, 41, 44], "ar": [2, 4, 5, 6, 9, 10, 13, 15, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 53, 54, 55, 56], "map": [2, 4, 5, 20, 21, 22, 23, 24, 26, 28, 31, 32, 33, 40, 41, 42, 51, 53, 55], "input_key_typ": [2, 4], "By": [2, 5, 13, 35, 36, 43, 50, 55], "32": [2, 4, 5, 6, 8, 21, 23, 25, 29, 30, 32, 33, 34, 40, 41, 42, 43, 46, 47, 51, 53, 55], "bit": [2, 5, 38, 41, 55], "i32": [2, 4], "64": [2, 4, 5, 6, 8, 21, 23, 34, 38, 41, 47], "i64": [2, 4, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "allow": [2, 4, 5, 10, 13, 17, 25, 29, 30, 34, 36, 38, 41, 43, 46, 47, 48, 50, 55], "even": [2, 5, 36, 37, 55], "constrain": 2, "addit": [2, 4, 5, 30, 35, 36, 45, 50, 55], "overflow": [2, 31, 32, 33, 55], "verifi": [2, 4, 25, 29, 34, 43, 55], "beyond": [2, 5], "neg": [2, 55], "confid": 2, "disabl": [2, 5, 30, 31, 33, 35, 48, 55], "environ": [2, 4, 10, 13, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 38, 40, 41, 42, 43, 53, 54, 55], "variabl": [2, 4, 17, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 35, 41, 53, 55], "hugectr_disable_overflow_check": [2, 55], "23": [2, 4, 6, 10, 17, 23, 24, 25, 30, 32, 33, 34, 40, 41, 42, 47, 48, 53], "sparse_embedding1": [2, 4, 5, 8, 20, 21, 24, 30, 31, 40, 41, 42, 51, 53], "input_data": [2, 42], "unlik": [2, 5, 54], "individu": [2, 4], "locat": [2, 5, 35, 36, 38, 41, 43, 53, 55, 56], "scalabl": [2, 10, 18, 36, 54, 55], "avail": [2, 4, 5, 8, 10, 13, 17, 18, 20, 21, 22, 24, 25, 26, 27, 29, 30, 32, 33, 34, 36, 38, 39, 40, 41, 42, 43, 46, 49, 54, 55, 56], "togeth": [2, 4, 35, 39, 41, 43, 55], "format": [2, 3, 5, 6, 10, 20, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 36, 38, 40, 41, 42, 45, 50, 51, 54, 55], "other": [2, 4, 5, 8, 9, 10, 18, 20, 21, 22, 23, 24, 26, 30, 32, 33, 35, 36, 41, 43, 44, 50, 54, 55], "equip": [2, 5, 30], "dgx": [2, 40], "a100": [2, 6, 10, 13, 17, 19, 34, 35, 38, 43, 48], "1221": 2, "754": [2, 40], "8": [2, 4, 5, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 38, 40, 41, 42, 47, 48, 49, 53], "4": [2, 4, 5, 6, 10, 20, 21, 22, 23, 24, 25, 26, 28, 31, 32, 33, 36, 38, 40, 42, 45, 47, 49, 50, 51, 53, 54], "12": [2, 4, 6, 20, 21, 23, 24, 26, 28, 31, 32, 33, 40, 41, 42, 47, 51, 53], "49": [2, 20, 23, 32, 33, 40, 41, 42, 47, 51], "128": [2, 4, 5, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 47, 50, 51, 53], "loss": [2, 4, 20, 21, 22, 24, 26, 27, 31, 32, 33, 36, 40, 41, 42, 51, 53, 54, 55], "final": [2, 4, 43, 49, 55], "fuse": [2, 5, 23, 25, 28, 30, 33, 41, 55], "util": [2, 4, 5, 10, 13, 19, 20, 21, 22, 24, 25, 26, 30, 31, 32, 33, 36, 38, 40, 41, 43, 45, 55, 56], "layer_typ": [2, 4, 31, 40, 41, 42, 51, 53], "layer_t": [2, 4, 31, 40, 41, 42, 51, 53, 55], "cast": [2, 31, 32, 33], "innerproduct": [2, 4, 40, 41, 42, 51, 53], "prelu_dic": 2, "str": [2, 4, 5, 8, 9, 20, 21, 22, 23, 24, 26, 27, 28, 32, 33, 40, 42, 49, 51, 55], "top_nam": [2, 4, 31, 40, 41, 42, 51, 53], "output": [2, 4, 6, 9, 14, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 40, 41, 42, 43, 46, 48, 51, 53, 55], "num_output": [2, 4, 31, 40, 41, 42, 51, 53], "weight_init_typ": 2, "weight": [2, 4, 6, 9, 10, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 36, 38, 41, 42, 51, 53, 55], "initializer_t": 2, "xaviernorm": 2, "xavieruniform": 2, "zero": [2, 4, 5, 20, 21, 22, 23, 24, 26, 28, 36, 41, 49, 51, 55], "bias_init_typ": 2, "bia": [2, 32, 47, 55], "shape": [2, 4, 6, 9, 14, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 40, 41, 42, 47, 49, 51, 53, 55], "batch_siz": [2, 14, 21, 22, 26, 27, 30, 31, 32, 33, 41, 55], "ani": [2, 4, 5, 8, 10, 20, 21, 22, 23, 24, 26, 27, 28, 32, 35, 36, 40, 41, 42, 53, 55], "relu1": [2, 40, 41, 42, 51, 53], "fc2": [2, 22, 30, 40, 41, 42, 51, 53], "1024": [2, 4, 5, 6, 19, 21, 22, 24, 26, 27, 30, 31, 32, 33, 40, 41, 42, 47, 48, 49, 51, 53, 54], "relu2": [2, 40, 42, 51, 53], "compris": [2, 4, 30, 31, 32, 33, 34, 48, 51], "fulli": [2, 5, 23, 31, 34, 36, 38, 40, 41, 42, 51, 53, 55, 56], "fp16": [2, 6, 24, 30, 36, 55], "fp32": [2, 4, 6, 24, 31, 32, 33, 55], "tf32": [2, 30, 33, 55], "act_typ": [2, 31], "activ": [2, 4, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 54, 55], "activation_t": [2, 31], "use_bia": [2, 31], "boolean": [2, 4, 5], "overrid": [2, 4, 26, 53], "bias": 2, "compute_config": 2, "denselayercomputeconfig": [2, 55], "configur": [2, 4, 6, 8, 9, 10, 12, 13, 15, 17, 18, 19, 23, 28, 35, 36, 38, 40, 41, 51, 53, 55], "valid": [2, 4, 5, 35, 42, 50, 51, 53, 55], "flag": [2, 5, 20, 21, 22, 23, 24, 26, 30, 33, 41, 55], "async_wgrad": [2, 55], "fuse_wb": [2, 55], "wgrad": [2, 55], "asynchron": [2, 4, 5, 19, 38, 55], "dgrad": 2, "fals": [2, 4, 5, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 40, 41, 42, 46, 47, 50, 51, 53, 55], "bgrad": 2, "compute_config_bottom": 2, "compute_config_top": 2, "mlp1": [2, 31], "512": [2, 4, 5, 24, 25, 29, 30, 31, 32, 33, 34, 40, 43, 47, 50, 51, 53], "256": [2, 4, 5, 6, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 42, 47, 53, 55], "interaction1": [2, 31, 40, 53], "interaction_grad": 2, "mlp2": [2, 31], "cross": [2, 35, 38, 40, 41, 42, 53, 54, 55], "network": [2, 4, 5, 14, 17, 20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 35, 36, 38, 40, 41, 42, 43, 44, 47, 50, 51, 53, 54, 55], "explicit": [2, 19, 24, 30, 31, 32, 33], "two": [2, 4, 5, 10, 18, 19, 21, 24, 27, 30, 35, 36, 40, 50, 51, 53, 55], "invent": 2, "v1": [2, 20, 21, 22, 24, 30, 33, 35, 53, 54], "respect": [2, 4, 5, 10, 55], "n": [2, 5, 9, 24, 26, 30, 31, 32, 33, 40, 41, 42, 47, 50], "mathemat": 2, "formula": [2, 55], "those": [2, 4, 30, 55], "x_": 2, "l": [2, 26, 27, 46, 53], "x": [2, 4, 6, 8, 9, 26, 27, 32, 33, 34, 41, 42, 49, 53], "t": [2, 19, 24, 30, 35, 36, 42, 47, 50, 55], "_": [2, 20, 21, 22, 23, 24, 26, 27, 28, 31, 33, 40, 42], "w_": 2, "b_l": 2, "x_l": 2, "w_l": 2, "mathbb": 2, "r": [2, 26, 27, 30, 47, 53], "times1": 2, "learnabl": 2, "x_0": 2, "odot": 2, "mathbf": 2, "w": [2, 19, 20, 21, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 41, 43, 55], "elementwis": 2, "dot": 2, "_l": 2, "decreas": [2, 5], "complex": [2, 5, 38, 50], "approxim": [2, 5], "factor": [2, 55], "lower": [2, 5, 35, 38], "matric": 2, "u": [2, 5, 6, 19, 25, 29, 30, 34, 35, 41, 43, 53, 55], "v": [2, 25, 29, 30, 34, 43, 47, 48, 55], "project": [2, 4, 10, 17, 18, 38, 39, 55], "correspondingli": [2, 4, 8, 24], "evolv": 2, "num_lay": [2, 30, 53], "posit": [2, 4, 55], "projection_dim": 2, "degrad": [2, 5, 55], "slice11": [2, 53], "multicross1": [2, 53], "6": [2, 4, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 36, 38, 40, 41, 42, 47, 50, 53], "thefmorder2": 2, "order": [2, 4, 5, 9, 20, 21, 22, 24, 26, 30, 31, 32, 33, 41, 50, 55], "fm": 2, "linear": [2, 32, 47, 54], "pairwis": 2, "product": [2, 5, 10, 19], "latent": 2, "out_dim": [2, 26, 27], "slice32": 2, "10": [2, 4, 5, 6, 8, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 46, 47, 51, 53, 55], "multipli": [2, 9, 47, 55], "space": [2, 5, 30, 42], "weight_dim": 2, "matrix": [2, 4, 9, 36, 51], "slot_dim": 2, "vec_dim": 2, "correctli": [2, 4, 36, 41, 55], "emploi": [2, 4, 5, 28, 30, 31, 32, 33, 55], "result": [2, 4, 5, 9, 24, 28, 30, 31, 32, 33, 35, 36, 41, 50, 54, 55], "wise": [2, 55], "none": [2, 9, 20, 21, 22, 23, 24, 26, 27, 30, 33, 41, 42, 48, 51, 53, 55], "2x": [2, 24], "num_elem": 2, "slice1": 2, "slice2": 2, "eltmultiply1": 2, "implement": [2, 4, 5, 24, 33, 35, 36, 38, 41, 50, 55], "cudnn": [2, 30, 31, 32, 33, 47, 55], "batch": [2, 4, 5, 6, 8, 9, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 34, 38, 41, 47, 54, 55], "normal": [2, 40, 41, 42, 50], "float": [2, 4, 5, 9, 30, 32, 33, 47, 48], "exponenti": 2, "averag": [2, 4, 5, 30, 34, 35, 48, 55], "runningmean": 2, "newmean": 2, "ep": [2, 47], "epsilon": [2, 4, 31, 42], "1e": [2, 4, 28, 41, 47, 50], "5": [2, 4, 5, 6, 8, 9, 10, 13, 17, 20, 21, 22, 23, 24, 26, 28, 32, 33, 36, 38, 40, 42, 45, 46, 47, 49, 50, 51, 53], "gamma_init_typ": 2, "gamma": 2, "beta_init_typ": 2, "beta": [2, 4, 32], "00001": [2, 24], "varianc": 2, "file": [2, 3, 5, 6, 8, 9, 10, 12, 15, 19, 20, 21, 22, 23, 24, 26, 27, 28, 35, 36, 38, 40, 41, 42, 43, 45, 46, 47, 50, 51, 54, 55], "my_snapshot_dense_5000": 2, "find": [2, 4, 5, 30, 35, 38, 50, 54, 55], "norm": [2, 4, 9, 38, 55], "shown": [2, 4, 5, 6, 8, 10, 17, 54], "192325": 2, "003050": 2, "323447": 2, "034817": 2, "091861": 2, "var": [2, 31, 32, 33], "738942": 2, "410794": 2, "370279": 2, "156337": 2, "638146": 2, "759954": 2, "251507": 2, "648882": 2, "176316": 2, "515163": 2, "434012": 2, "422724": 2, "001451": 2, "756962": 2, "126412": 2, "851878": 2, "837513": 2, "694674": 2, "791046": 2, "849544": 2, "694500": 2, "405566": 2, "211646": 2, "936811": 2, "659098": 2, "2d": [2, 55], "3d": [2, 55], "seq_len": 2, "4d": [2, 55], "num_attention_head": 2, "concaten": [2, 20, 21, 24, 26, 27, 28, 30, 32, 33, 38, 51, 55], "axi": [2, 6, 20, 21, 22, 23, 24, 26, 27, 30, 32, 33, 42, 45, 48, 55], "dimension": [2, 38, 55], "num_feas_0": 2, "num_elems_0": 2, "num_fea": [2, 26, 27, 32, 33, 55], "num_elems_1": 2, "num_feas_1": 2, "reshape3": 2, "weight_multiply2": 2, "concat2": [2, 26, 27, 33, 53], "leading_dim": [2, 41, 42, 51, 53], "innermost": 2, "total": [2, 4, 5, 9, 10, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 41, 42, 46, 50, 51], "unspecifi": [2, 41], "n_slot": 2, "time_step": 2, "defin": [2, 5, 15, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 35, 42, 45, 53], "selected_slot": 2, "ignor": [2, 4, 30, 32, 33, 41, 42, 53], "destin": [2, 55], "placehold": [2, 17, 24, 30, 31, 32, 33], "cannot": [2, 5, 9, 10, 31, 35, 38, 40, 41, 42, 47, 51, 53, 56], "deprec": [2, 4, 5, 41, 43, 55], "futur": [2, 4, 5, 20, 21, 22, 26, 28, 35, 41, 55], "restrict": [2, 4, 10, 55], "tailing_dim": 2, "reshape1": [2, 26, 33, 41, 42, 51, 53], "416": [2, 32, 41, 42, 53], "dim": [2, 24, 30, 31, 32, 33, 47], "select1": 2, "selct": 2, "extract": [2, 10, 20, 21, 22, 24, 32, 33, 44, 49, 51, 52], "rang": [2, 4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 42, 46, 47, 49, 53, 55], "tupl": [2, 27], "creat": [2, 4, 5, 6, 10, 22, 28, 31, 32, 33, 35, 40, 41, 42, 43, 44, 45, 46, 47, 50, 52, 53, 55], "inclus": 2, "end": [2, 4, 5, 18, 19, 23, 24, 28, 30, 31, 36, 40, 41, 51, 53, 55], "exclus": [2, 19, 30, 55], "overlap": [2, 4, 5, 23, 28, 54, 55], "unless": [2, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 43, 53], "revers": 2, "along": [2, 4, 9, 55], "b": [2, 4, 24, 30, 41], "d": [2, 4, 30, 32, 40, 41, 42, 46, 47, 49, 51, 55], "c": [2, 10, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 43, 50, 55], "len": [2, 21, 22, 23, 26, 27, 28, 30, 32, 40, 41, 42, 46, 47, 49, 51], "actual": [2, 4, 5, 20, 21, 22, 24, 26, 36, 55], "explicitli": [2, 8, 24, 55], "slice21": 2, "slice22": 2, "weight_multiply1": 2, "3": [2, 4, 5, 6, 8, 9, 19, 20, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 36, 37, 38, 40, 42, 45, 47, 49, 50, 51, 53, 54], "copi": [2, 4, 5, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 36, 40, 42, 46, 53, 55], "branch": [2, 35, 55], "topologi": [2, 4, 10, 55], "graph": [2, 4, 5, 6, 10, 14, 17, 22, 23, 27, 28, 36, 40, 41, 42, 50, 51, 53, 55], "parser": [2, 30, 31, 32, 33, 40, 42, 55], "intern": [2, 4, 8, 24, 26, 41, 55], "handl": [2, 4, 5, 17, 30, 31, 32, 33, 41, 50, 55], "situat": [2, 5, 41, 55], "behav": 2, "abov": [2, 4, 5, 10, 19, 24, 25, 29, 30, 34, 38, 43], "whilst": 2, "simplifi": [2, 50, 55], "randomli": 2, "zeroiz": 2, "drop": [2, 45, 55], "dropout_r": [2, 42, 53], "rate": [2, 4, 5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 37, 38, 39, 41, 45, 50, 51, 55], "between": [2, 4, 5, 18, 30, 34, 36, 40, 41, 50, 55], "dropout1": [2, 42, 53], "unit": [2, 4, 20, 21, 22, 23, 24, 26, 27, 33, 35, 54, 55], "elu_alpha": 2, "scalar": [2, 55], "satur": 2, "fc1": [2, 4, 22, 30, 40, 41, 42, 51, 53], "elu1": 2, "rectifi": 2, "sigmoid1": 2, "NOT": [2, 24, 53], "captur": 2, "typic": [2, 4, 5, 9], "output_dim": 2, "layer1": [2, 47], "layer3": [2, 47], "arbitrari": [2, 9, 36, 55], "manner": [2, 4, 5, 8, 10, 35, 43, 55], "nx": [2, 54], "fc4": [2, 40, 53], "reducesum1": 2, "reducesum2": 2, "remain": [2, 10, 13, 17, 38, 55], "gate": 2, "recurr": 2, "batchsiz": [2, 4, 20, 21, 22, 24, 26, 27, 31, 32, 33, 40, 41, 42, 47, 51, 53], "seqlength": 2, "sequenc": [2, 4, 20, 21, 26, 55], "vector_s": 2, "gru1": 2, "conncat1": 2, "20": [2, 4, 5, 20, 21, 22, 23, 24, 26, 27, 32, 33, 40, 41, 47, 49, 51, 53, 54], "parametr": 2, "adapt": [2, 41], "adjust": [2, 5, 30, 55], "point": [2, 27], "prelu": 2, "dice": 2, "fc_din_i1": 2, "dice_1": 2, "specif": [2, 4, 5, 18, 20, 21, 22, 23, 24, 26, 27, 28, 35, 40, 42, 44, 50, 53, 55], "item1": 2, "scale_item": 2, "sparse_embedding_good": 2, "sparse_embedding_c": 2, "fusedreshapeconcat_item_his_em": 2, "fusedreshapeconcat_item": 2, "accept": [2, 4, 14, 40, 41], "need": [2, 4, 5, 10, 13, 18, 19, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 35, 38, 40, 41, 46, 50, 51, 53, 55], "mask": [2, 26, 27, 55], "10000": [2, 4, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 42], "step": [2, 4, 5, 6, 8, 10, 20, 21, 22, 23, 24, 26, 27, 31, 32, 33, 34, 35, 36, 40, 41, 50, 53, 55], "softmax_i": 2, "y": [2, 6, 32, 50], "produc": [2, 4, 22, 26, 30, 55], "scale_item1": 2, "item_his1": 2, "sub_ih": 2, "reducemean1": 2, "mutipl": 2, "m": [2, 5, 6, 9, 19, 30, 33, 40, 41, 47, 49, 50, 55], "h": [2, 19, 30, 33, 41], "matrixmutiply1": 2, "text": [2, 4, 24, 43, 44, 49, 50, 51, 52], "cdot": 2, "q": [2, 20, 21, 22, 23, 24, 28, 32, 33, 35, 51], "inner": 2, "pad": [2, 4, 47], "due": [2, 15, 22, 33, 55], "inequ": 2, "attent": [2, 55], "head": [2, 35, 42, 45, 46, 47, 49, 50, 51, 55], "seq_from": 2, "hidden_dim": 2, "seq_to": 2, "queri": [2, 5, 10, 19, 38, 41, 55], "attention_out": 2, "mark": 2, "make": [2, 4, 5, 10, 19, 20, 21, 25, 26, 27, 30, 35, 36, 38, 40, 41, 42, 43, 44, 48, 50, 51, 53, 55], "sure": [2, 5, 19, 30, 35, 36, 41, 42, 50, 53, 55], "max_sequence_len_from": 2, "max_sequence_len_to": 2, "sequence_mask": 2, "eight": [2, 54], "num_indic": 2, "gather1": 2, "predict": [2, 3, 5, 31, 32, 33, 41, 50, 51, 55], "use_regular": 2, "regulari": 2, "THe": 2, "regularizer_typ": 2, "regular": [2, 5, 30, 55], "regularizer_t": 2, "l1": 2, "l2": [2, 9], "lambda": [2, 20, 21, 22, 24, 30, 33, 50], "term": [2, 4, 55], "use_regulari": 2, "propag": [2, 4, 5, 17, 41, 55], "phase": [2, 55], "backward": [2, 4, 32, 55], "v3": [2, 55], "7": [2, 4, 10, 13, 17, 20, 21, 22, 23, 24, 26, 28, 30, 32, 33, 37, 38, 40, 41, 42, 47, 53, 54], "releas": [2, 4, 19, 24, 30, 33, 35, 41, 43, 50, 53, 54], "enabl": [2, 4, 5, 6, 10, 13, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 35, 36, 38, 40, 41, 42, 51, 53, 55], "placement": [2, 5, 55], "strategi": [2, 8, 19, 22, 25, 55], "compar": [2, 5, 6, 28, 41, 54, 55], "three": [2, 4, 5, 6, 18, 19, 30, 34, 35, 38, 54, 55], "advantag": [2, 5, 55, 56], "previou": [2, 5, 24, 51, 55], "enhanc": [2, 54, 55], "boost": [2, 36, 55], "flexibl": [2, 18, 55], "lookup": [2, 6, 9, 10, 12, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 36, 38, 55], "parallel": [2, 4, 5, 10, 38, 40, 46, 55, 56], "object": [2, 4, 5, 20, 21, 22, 24, 30, 33, 41, 42, 50, 55], "max_vocabulary_s": [2, 20, 22, 24, 26, 27, 32, 33, 40], "ev_siz": [2, 40], "config": [2, 8, 19, 23, 24, 26, 27, 30, 31, 32, 33, 35, 40, 41, 53, 55], "organ": [2, 4], "nccl_launch_mod": 2, "group": [2, 4, 5, 30, 35, 50, 55], "potenti": [2, 44, 55], "hang": [2, 55], "mix": [2, 4, 5, 24, 31, 40, 41, 42, 51, 53, 55, 56], "precis": [2, 4, 5, 24, 31, 40, 41, 42, 51, 53, 55], "attribut": [2, 15, 41], "dump": [2, 4, 23, 26, 27, 28, 31, 36, 41, 42, 46, 47, 48, 51, 53, 55], "contain": [2, 4, 5, 6, 8, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 36, 41, 43, 44, 45, 46, 47, 48, 50, 53, 54, 55], "incorrectli": [2, 55], "receiv": [2, 5, 10, 22, 26], "error": [2, 4, 5, 24, 30, 31, 33, 35, 41, 46, 47, 55], "know": [2, 5, 19, 55], "opt_param": [2, 4], "greater": [2, 4, 5, 19, 55], "sgd": [2, 4, 40, 53, 55], "adagrad": [2, 4, 55], "momentumsgd": [2, 4], "nesterov": [2, 4, 55], "rmsprop": 2, "adam": [2, 4, 20, 21, 22, 24, 27, 31, 32, 33, 41, 42, 51, 54, 55], "ftrl": [2, 4, 55], "203931": [2, 40], "18598": [2, 40], "14092": [2, 40], "7012": [2, 40], "18977": [2, 40], "6385": [2, 40], "1245": [2, 40], "186213": [2, 40], "71328": [2, 40], "67288": [2, 40], "11": [2, 6, 23, 24, 28, 30, 31, 32, 33, 40, 41, 42, 47, 48, 51, 53], "2168": [2, 40], "7338": [2, 40], "61": [2, 23, 40, 42, 47], "932": [2, 31, 40], "15": [2, 5, 8, 23, 30, 32, 33, 40, 41, 42, 47, 51, 53, 55], "204515": [2, 40], "141526": [2, 40], "199433": [2, 40], "60919": [2, 40], "9137": [2, 40], "71": [2, 23, 26, 40, 47], "34": [2, 23, 32, 33, 40, 41, 42, 47, 51, 53], "embedding_table_list": [2, 40], "append": [2, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 46, 47], "table_": 2, "use_exclusive_kei": [2, 40], "bool": [2, 5], "comm_strategi": [2, 55], "communicationstrategi": 2, "table_config": [2, 40], "major": [2, 4, 9, 38, 55], "arg": [2, 9, 20, 21, 22, 24, 26, 27, 30, 32, 33, 35, 40, 41, 42, 46], "abl": [2, 5, 19, 36, 55], "address": [2, 4, 5, 25, 29, 31, 34, 41, 43, 55], "challeng": [2, 10, 55], "we": [2, 4, 5, 6, 10, 13, 17, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 38, 40, 41, 42, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55], "etp": 2, "significantli": [2, 5, 30, 31, 32, 33, 55], "influenc": 2, "shard_matrix": [2, 40, 55], "num_gpu": [2, 23, 24, 28, 40, 42], "row": [2, 4, 9, 49, 50, 51], "place": [2, 24, 38, 40, 42, 55], "th": [2, 4], "shard_strategi": [2, 40], "mp": [2, 36, 40, 41, 56], "dp": [2, 40, 56], "t0": 2, "t1": 2, "t2": 2, "t3": 2, "And": [2, 4, 53, 55], "embedding_table_nam": [2, 5, 8, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "good": [2, 5, 41], "userid": [2, 45, 50, 51], "ebc_config": [2, 40], "num_tabl": [2, 23, 24, 28, 31, 32, 55], "sparse_embed": [2, 4], "interfac": [3, 35, 36, 38, 41, 55], "level": [3, 18, 50, 55], "createsolv": [3, 31, 40, 41, 42, 51, 53, 55], "asyncparam": [3, 55], "hybridembeddingparam": [3, 55], "datareaderparam": [3, 31, 40, 41, 42, 51, 53, 55], "createoptim": [3, 31, 40, 41, 42, 51, 53], "layer": [3, 5, 6, 7, 8, 10, 12, 13, 17, 18, 23, 25, 28, 30, 31, 32, 34, 36, 38, 40, 41, 42, 47, 48, 50, 53, 54, 55, 56], "compil": [3, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 30, 31, 33, 35, 38, 40, 41, 42, 51, 53, 55], "fit": [3, 10, 31, 36, 40, 41, 42, 50, 51, 53, 55, 56], "summari": [3, 20, 21, 22, 23, 24, 26, 27, 30, 31, 33, 40, 41, 42, 51, 53], "graph_to_json": [3, 31, 41, 42, 51, 53, 55], "construct_from_json": 3, "load_dense_weight": 3, "load_dense_optimizer_st": 3, "load_sparse_weight": [3, 51], "load_sparse_optimizer_st": 3, "freeze_dens": 3, "freeze_embed": [3, 51, 55], "unfreeze_dens": 3, "unfreeze_embed": [3, 55], "reset_learning_rate_schedul": 3, "set_sourc": 3, "low": [3, 5, 10, 13, 17, 18, 20, 21, 22, 24, 26, 27, 28, 32, 33, 36], "learningrateschedul": 3, "get_next": 3, "dataread": [3, 31, 40, 41, 42, 51, 53, 55], "is_eof": 3, "get_learning_rate_schedul": 3, "get_data_reader_train": 3, "get_data_reader_ev": 3, "start_data_read": 3, "set_learning_r": 3, "get_current_loss": 3, "eval": [3, 31, 40, 41, 42, 47, 51, 53, 55], "get_eval_metr": 3, "check_out_tensor": [3, 41, 55], "infer": [3, 6, 8, 10, 13, 17, 18, 19, 25, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 43, 51, 55, 56], "inferenceparam": [3, 5, 41, 55], "inferencemodel": [3, 5, 20, 21, 23, 24, 26, 55], "datageneratorparam": [3, 31, 38, 41, 55], "datagener": [3, 31, 38, 41, 55], "datasourceparam": [3, 53, 55], "input": [3, 4, 5, 6, 9, 14, 19, 20, 22, 23, 24, 25, 26, 27, 28, 31, 32, 33, 36, 38, 40, 41, 42, 44, 45, 47, 48, 50, 51, 53, 55], "embed": [3, 4, 6, 9, 10, 12, 13, 15, 17, 18, 20, 21, 23, 24, 25, 28, 29, 31, 32, 33, 34, 36, 37, 38, 42, 44, 49, 50, 52, 53, 55, 56], "distributedslotsparseembeddinghash": [3, 4, 31, 41, 42, 51, 53], "dens": [3, 4, 5, 6, 9, 10, 17, 20, 21, 22, 23, 24, 26, 27, 30, 31, 32, 33, 36, 38, 40, 41, 42, 51, 53, 55], "fullyconnect": [3, 55], "mlp": [3, 6, 26, 27, 31, 32, 33, 55], "multicross": [3, 53, 55], "fmorder2": [3, 55], "weightmultipli": 3, "elementwisemultipli": 3, "batchnorm": 3, "layernorm": 3, "concat": [3, 20, 21, 22, 23, 24, 30, 32, 33, 40, 41, 42, 51, 53, 55], "reshap": [3, 20, 21, 22, 23, 24, 26, 27, 32, 33, 41, 42, 51, 53, 55], "select": [3, 19, 55], "dropout": [3, 42, 53, 54, 55], "elu": [3, 55], "relu": [3, 22, 26, 27, 31, 32, 33, 40, 41, 42, 47, 51, 53, 54, 55], "sigmoid": [3, 22, 26, 27, 31, 32, 33, 41, 55], "interact": [3, 10, 13, 17, 19, 25, 26, 27, 29, 31, 32, 33, 34, 36, 38, 40, 43, 53, 54, 55], "reducesum": [3, 42, 55], "gru": [3, 55], "preludic": [3, 55], "scale": [3, 36, 37, 38, 50, 55], "fusedreshapeconcat": [3, 55], "fusedreshapeconcatgener": [3, 55], "softmax": [3, 22, 26, 55], "reducemean": [3, 55], "matrixmutipli": 3, "multiheadattent": [3, 55], "sequencemask": [3, 55], "gather": [3, 32, 33, 46, 55], "binarycrossentropyloss": [3, 31, 40, 41, 42, 51, 53], "crossentropyloss": [3, 55], "multicrossentropyloss": [3, 55], "overview": [3, 4, 5, 40], "us": [3, 4, 5, 6, 8, 9, 14, 15, 18, 20, 21, 22, 23, 25, 26, 27, 28, 29, 34, 35, 36, 37, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "known": 3, "embeddingtableconfig": [3, 40], "embeddingcollectionconfig": [3, 40, 55], "embedding_lookup": [3, 6, 9, 20, 21, 23, 24, 30, 33, 40], "shard": [3, 22, 40, 55, 56], "As": [4, 5, 20, 21, 24, 26, 27, 36, 51, 55], "domain": [4, 44, 45], "framework": [4, 10, 13, 17, 18, 20, 24, 28, 34, 35, 36, 37, 38, 39, 41, 54, 55, 56], "focu": 4, "algorithm": [4, 5, 22, 30, 31, 32, 33, 40, 41, 42, 51, 53, 55], "job": [4, 20, 21, 22, 23, 24, 26, 33, 41, 55], "automat": [4, 5, 10, 24, 50, 55], "deploi": [4, 5, 6, 8, 9, 10, 13, 17, 18, 19, 20, 21, 22, 25, 26, 37, 42, 43, 55], "hardwar": [4, 5, 30, 33, 55], "complet": [4, 24, 30, 31, 32, 33, 41, 46, 55], "without": [4, 5, 18, 20, 21, 22, 23, 24, 26, 27, 28, 38, 40, 42, 53, 55], "manual": [4, 20, 21, 23, 24, 26, 55], "been": [4, 20, 24, 31, 32, 33, 41, 54, 55], "wrap": [4, 22, 27, 36, 43, 56], "meanwhil": [4, 36], "maintain": [4, 5, 10, 35, 55], "who": [4, 55], "control": [4, 5, 19, 24, 30, 31, 32, 33, 55], "friendli": 4, "alreadi": [4, 5, 31, 40, 42, 43, 49, 50, 55], "deep": [4, 10, 18, 20, 21, 22, 23, 24, 26, 30, 35, 36, 37, 38, 50, 51, 54, 55, 56], "learn": [4, 10, 18, 37, 38, 50, 54, 55, 56], "like": [4, 5, 10, 19, 53, 55, 56], "kera": [4, 20, 21, 22, 23, 24, 26, 27, 30, 33, 55], "worthwhil": 4, "switch": [4, 41, 55], "notebook": [4, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 53, 55], "workflow": [4, 6, 36, 42, 50, 51, 55], "lot": [4, 30, 55], "core": [4, 6, 20, 21, 22, 23, 24, 26, 30, 33, 34, 41, 42, 54, 55], "structur": [4, 5, 10, 31, 40, 41, 42, 53, 55], "epoch": [4, 31, 40, 41, 42, 51, 53, 55], "simpli": [4, 30, 35, 41, 54, 55], "moreov": 4, "give": [4, 30, 50, 55], "save": [4, 10, 22, 23, 24, 27, 30, 31, 32, 33, 36, 41, 42, 45, 50, 53, 55], "statu": [4, 8, 24, 30, 31, 32, 33, 41, 42, 48], "etc": [4, 19, 37, 40, 55], "return": [4, 5, 8, 9, 12, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 47, 50, 55], "custom": [4, 10, 13, 17, 24, 30, 31, 32, 33, 35, 53, 55], "model_nam": [4, 5, 6, 9, 12, 15, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "empti": [4, 5, 9, 20, 21, 24, 26, 33, 41, 55], "seed": [4, 26, 31, 40, 41, 42, 51, 53], "lr_polici": 4, "polici": [4, 5], "suppot": 4, "fix": [4, 5, 35, 55], "lrpolicy_t": 4, "lr": [4, 31, 32, 40, 41, 42, 51, 53], "schedul": [4, 5, 42, 55], "001": [4, 31, 40, 41, 42, 53], "warmup_step": [4, 31, 36, 40, 41, 42, 51, 53], "warmup": [4, 19, 31, 40, 41, 42, 51, 53], "within": [4, 5, 10, 13, 23, 38, 43, 47, 48, 54, 55], "decay_start": [4, 31, 36, 40, 41, 42, 51, 53], "decai": [4, 36], "decay_step": [4, 31, 36, 40, 41, 42, 51, 53], "decay_pow": [4, 31, 40, 41, 42, 51, 53], "power": [4, 5, 19, 31, 38, 41, 55], "end_lr": [4, 31, 40, 41, 42, 51, 53], "max_eval_batch": [4, 31, 40, 41, 42, 51, 53, 55], "equal": [4, 5, 19, 21, 23, 27, 28, 41, 55], "bigger": [4, 50], "bathc": 4, "100": [4, 5, 19, 23, 27, 28, 30, 40, 47, 48, 51, 53], "batchsize_ev": [4, 31, 40, 41, 42, 51, 53, 55], "minibatch": 4, "2048": [4, 6, 30, 47, 49, 51], "here": [4, 19, 20, 21, 22, 23, 24, 25, 26, 29, 30, 34, 35, 36, 38, 40, 41, 43, 53, 54], "worker": [4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 51, 53, 55], "vvgpu": [4, 31, 40, 41, 42, 51, 53], "physic": 4, "numa": [4, 31, 40, 41, 42, 51, 53, 55], "possibl": [4, 18, 36, 41, 55], "repeat_dataset": [4, 31, 40, 41, 42, 51, 53], "repeat": [4, 19, 47], "otherwis": [4, 5, 24, 42, 55], "use_mixed_precis": [4, 5, 31, 51, 53], "enable_tf32_comput": 4, "acceler": [4, 6, 17, 30, 33, 36, 37, 38, 39, 50, 55, 56], "fullyconnectedlay": [4, 36], "interactionlay": [4, 36, 55], "scaler": [4, 5, 31, 40, 41, 42, 51, 53], "metrics_spec": [4, 40], "metric": [4, 6, 19, 20, 21, 24, 26, 30, 34, 55], "auc": [4, 31, 41, 42, 51, 53, 55], "averageloss": [4, 40], "hitrat": [4, 55], "threshold": [4, 5, 20, 21, 22, 23, 24, 26, 31, 32, 33, 41, 55], "metricstyp": [4, 40], "8025": 4, "termin": [4, 35], "reach": [4, 19], "i64_input_kei": [4, 5, 23, 28, 30, 31, 33, 40, 41, 42, 51, 53], "choos": [4, 40, 53], "nvtabular": [4, 38, 43, 44, 45, 51, 52, 55], "use_algorithm_search": [4, 5], "search": [4, 5, 30, 31, 32, 33, 55], "cublasgemmex": [4, 5], "use_cuda_graph": [4, 5, 31, 41, 53, 55], "cuda": [4, 5, 24, 28, 30, 31, 32, 33, 40, 41, 42, 43, 47, 48, 51, 53, 55], "asyncdataread": 4, "hybridembed": 4, "task": [4, 5, 10, 19, 20, 21, 22, 23, 24, 25, 26, 28, 33, 35, 36, 41, 43, 44, 46, 55], "pack": [4, 5, 20, 21, 22, 23, 24, 28, 32, 33, 51, 55], "device_layout": 4, "longer": [4, 50, 55], "train_intra_iteration_overlap": [4, 55], "detect": [4, 30, 31, 32, 33], "toplogi": [4, 55], "tri": [4, 5], "train_inter_iteration_overlap": [4, 55], "fetch": [4, 46, 55], "next": [4, 30, 32, 46, 47, 50, 51, 55], "earlier": 4, "eval_intra_iteration_overlap": [4, 55], "knob": [4, 55], "similar": [4, 30, 31, 32, 33, 50, 55], "eval_inter_iteration_overlap": [4, 55], "all_reduce_algo": [4, 55], "allreducealgo": [4, 55], "oneshot": 4, "multi": [4, 5, 21, 34, 35, 38, 42, 43, 45, 46, 50, 53, 54, 55], "requir": [4, 5, 6, 10, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 49, 50, 51, 53, 55], "run": [4, 5, 8, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 38, 41, 42, 47, 48, 53, 54, 55], "grouped_all_reduc": 4, "gradient": [4, 20, 21, 22, 24, 26, 27, 33, 55], "effect": [4, 24, 44, 46, 55], "small": [4, 5, 6, 38, 40, 41, 46, 55], "higher": [4, 5, 24, 46, 55], "hybrid": [4, 6, 27, 55], "num_iterations_statist": 4, "statist": [4, 5, 50], "300": [4, 22, 40, 47, 51, 53], "16384": [4, 6, 8, 30, 41, 54], "read": [4, 5, 30, 31, 32, 33, 38, 41, 46, 47, 50, 53, 54, 55], "done": [4, 26, 30, 31, 40, 41, 42, 50, 51, 53], "async_param": 4, "linux": [4, 24, 31, 41], "aio": 4, "peak": [4, 30, 31, 32, 33], "throughput": [4, 5, 19, 30, 36, 55], "num_thread": [4, 5, 55], "least": [4, 5, 9, 24, 28, 55], "num_batches_per_thread": 4, "work": [4, 8, 14, 18, 20, 21, 22, 26, 30, 35, 41, 50, 55], "simultan": [4, 36, 56], "max_num_requests_per_thread": 4, "io": [4, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 38, 42, 43, 50, 53, 55], "request": [4, 5, 6, 19, 24, 30, 31, 32, 33, 35, 41, 50, 55], "72": [4, 23, 33, 34, 47], "most": [4, 5, 13, 19, 30, 33, 41, 55], "multi_hot_read": 4, "io_depth": 4, "queue": [4, 5, 6, 30, 55], "io_align": [4, 55], "byte": [4, 5, 10, 28, 30, 31, 32, 33, 41, 47, 50], "align": [4, 28, 55], "4096": [4, 6, 30, 31, 32, 33, 34, 41], "shuffl": [4, 32, 42, 50], "fed": [4, 30, 38], "aligned_typ": 4, "alignment_t": 4, "auto": [4, 24, 31, 32, 33, 41], "chosen": [4, 5], "obtain": [4, 10, 13, 20, 21, 22, 23, 24, 26, 27, 28, 30, 38, 40, 42, 50, 53], "unsign": 4, "is_dense_float": 4, "except": [4, 20, 21, 22, 23, 24, 26, 27, 28, 40, 41, 42, 46, 47, 53, 55], "thrown": 4, "param": [4, 20, 21, 22, 23, 24, 26, 30, 33], "16": [4, 5, 8, 9, 10, 19, 20, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 40, 41, 42, 47, 50, 51, 53, 54, 55], "overcom": [4, 55], "constraint": [4, 5, 30, 55], "impos": [4, 30, 55], "part": [4, 5, 19, 30, 37, 45, 55], "traffic": [4, 55], "over": [4, 5, 19, 35, 50, 54, 55], "improv": [4, 5, 22, 36, 44, 55], "deploy": [4, 5, 6, 10, 18, 20, 21, 24, 26, 30, 34, 35, 36, 55], "convers": [4, 30, 33, 41, 42, 55], "encod": [4, 51], "hybrid_embedding_param": 4, "max_num_frequent_categori": 4, "frequent": [4, 5, 50, 55], "categori": [4, 5, 30, 36, 50, 51, 55], "max_num_infrequent_sampl": 4, "infrequ": [4, 55], "p_dup_max": 4, "probabl": [4, 19, 38], "appear": [4, 19, 30, 38, 55], "onc": [4, 8, 33, 35, 41, 50, 55], "wai": [4, 10, 13, 17, 19, 25, 29, 30, 34, 38, 40, 43, 50, 54, 55], "determin": [4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53, 55], "nvlink": [4, 34], "max_all_reduce_bandwidth": 4, "max_all_to_all_bandwidth": [4, 55], "efficiency_bandwidth_ratio": 4, "communication_typ": [4, 55], "being": [4, 19, 46, 55], "communicationtyp": [4, 55], "ib_nvlink": [4, 55], "ib_nvlink_hi": [4, 55], "nvlink_singlenod": 4, "protocol": [4, 42, 46, 47, 48, 55], "infiniband": [4, 5], "roce": [4, 55], "special": [4, 36], "gid": [4, 55], "hugectr_roce_gid": [4, 55], "hugectr_roce_tc": [4, 55], "hybrid_embedding_typ": 4, "hybridembeddingtyp": 4, "now": [4, 22, 30, 31, 32, 33, 42, 46, 49, 50, 55], "01": [4, 19, 24, 26, 27, 30, 31, 32, 33, 40, 41, 44, 47, 51, 53, 55], "3e11": 4, "9e11": 4, "warn": [4, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 50, 51, 53, 55], "data_reader_typ": [4, 31, 40, 41, 42, 51, 53], "datareadertype_t": [4, 31, 40, 41, 42, 51, 53, 55], "file_list": [4, 31, 38, 41, 53], "txt": [4, 31, 38, 40, 41, 42, 51, 53, 55], "train_data": 4, "bin": [4, 19, 41, 49], "keyset": [4, 55], "show": [4, 5, 8, 18, 19, 20, 24, 25, 26, 33, 40, 41, 46, 54, 55], "eval_sourc": [4, 31, 40, 41, 42, 51, 53], "check_typ": [4, 31, 40, 41, 42, 51, 53], "mechan": [4, 5, 38, 41, 42, 55], "check_t": [4, 31, 40, 41, 42, 51, 53], "checksum": [4, 53], "cache_eval_data": 4, "num_sampl": [4, 20, 21, 22, 24, 26, 27, 32, 33, 55], "eval_num_sampl": 4, "float_label_dens": 4, "interpret": [4, 55], "log": [4, 25, 29, 30, 31, 32, 33, 34, 35, 38, 41, 42, 43, 54, 55], "f": [4, 19, 20, 21, 22, 23, 24, 28, 30, 32, 33, 35, 41, 42, 46, 47, 48, 49, 51], "num_work": [4, 32, 55], "concurr": [4, 5, 19, 30, 55], "empir": 4, "data_source_param": [4, 53, 55], "hdf": [4, 5, 35, 43, 55], "aw": [4, 5, 35, 43, 55], "s3": [4, 5, 35, 43, 55], "googl": [4, 55], "cloud": [4, 13, 17, 25, 29, 34, 43, 55], "async": [4, 41, 55], "fig": [4, 10, 17, 18, 31, 32, 33, 36, 38, 54], "minimum": [4, 6, 25, 29, 30, 34, 43, 55], "granular": [4, 55], "header": [4, 24, 38, 41, 42, 55], "tabular": [4, 44, 50], "alwai": [4, 9, 55], "constant": [4, 30, 32, 33, 41], "payment": 4, "preced": [4, 18, 55], "yellow": 4, "box": [4, 50], "depict": 4, "reserv": [4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "signific": [4, 5, 22, 41], "exclud": 4, "charg": [4, 41], "definit": [4, 41], "typedef": 4, "struct": [4, 20, 21, 22, 23, 24, 26, 27, 28, 32, 33, 51, 55], "datasetheader_": 4, "error_check": 4, "check_num": 4, "number_of_record": 4, "datasethead": 4, "data_": 4, "check_sum": 4, "char": [4, 15, 41], "checkbit": 4, "slot_": 4, "changeabl": 4, "45": [4, 23, 30, 32, 33, 40, 41, 42, 47, 51, 53, 54], "67": [4, 23, 47], "undefin": [4, 31, 55], "behavior": [4, 5, 30, 41, 55], "given": [4, 9, 45], "assign": [4, 5, 24, 30, 31, 32, 33, 35, 41, 51, 55], "line": [4, 30, 33, 40, 41, 42, 43, 55], "path": [4, 5, 19, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 35, 40, 42, 45, 46, 47, 50, 51, 53, 55], "cat": [4, 28, 42, 50, 53], "simple_sparse_embedding_file_list": 4, "simple_sparse_embed": 4, "simple_sparse_embedding0": 4, "simple_sparse_embedding1": 4, "simple_sparse_embedding2": 4, "simple_sparse_embedding3": 4, "simple_sparse_embedding4": 4, "simple_sparse_embedding5": 4, "simple_sparse_embedding6": 4, "simple_sparse_embedding7": 4, "simple_sparse_embedding8": 4, "simple_sparse_embedding9": 4, "wdl_norm": 4, "file_list_test": [4, 31, 38, 41, 53], "aspect": 4, "datatyp": [4, 24, 31, 32, 33, 41], "outperform": 4, "disk": [4, 30, 45, 50, 55], "feed": [4, 33, 38, 41, 50], "go": [4, 35, 40, 41, 42, 43, 50], "incorpor": [4, 13], "3g": [4, 55], "wdl_raw": 4, "validation_data": 4, "column": [4, 41, 42, 45, 49, 50, 51, 55], "orient": 4, "open": [4, 5, 10, 13, 17, 20, 21, 22, 23, 24, 28, 30, 31, 32, 33, 36, 38, 39, 41, 46, 47, 48, 49, 51, 55], "free": [4, 41, 42, 55], "apach": [4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53, 55], "hadoop": [4, 35, 36, 43, 53, 55], "ecosystem": 4, "compress": [4, 38], "nest": [4, 55], "loader": [4, 23, 24, 28, 30, 31, 32, 33, 41], "miss": [4, 5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 47, 51, 55], "int64": [4, 5, 9, 10, 12, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 41, 42, 49, 50, 55], "arrang": 4, "numer": [4, 5, 38, 44], "separ": [4, 5, 10, 20, 21, 24, 26, 27, 31, 41, 50, 55], "_metadata": [4, 50, 53, 55], "file_stat": [4, 53], "file_nam": [4, 53], "file0": 4, "num_row": [4, 53], "409600": 4, "file1": 4, "col_nam": [4, 53], "c1": [4, 53], "c2": [4, 53], "c3": [4, 53], "c4": [4, 53], "cont": [4, 42, 53], "i1": 4, "i2": 4, "i3": 4, "parquet_data": 4, "_file_list": [4, 40, 42, 51], "val": [4, 31, 40, 41, 42, 53], "50000": [4, 24], "20000": [4, 20, 21, 24, 42], "whose": [4, 24, 25, 26, 40, 55], "duplic": [4, 36, 50, 55], "ensur": [4, 5, 8, 13, 19, 23, 28, 30, 41, 50, 55], "snippet": 4, "0th": 4, "1st": [4, 21, 40], "third": [4, 18, 19, 35, 43, 55], "60000": 4, "entri": [4, 21, 22, 26, 33, 55], "resid": 4, "folder": [4, 19, 24, 26, 30, 31, 40, 41, 55], "basic": [4, 9, 10, 12, 41], "four": [4, 55], "frame": 4, "edit": [4, 35], "desir": [4, 10, 30, 41, 55], "chang": [4, 5, 19, 20, 21, 26, 30, 36, 50, 55, 56], "hyperparamet": [4, 36, 41], "well": [4, 50, 55], "meticul": 4, "update_typ": [4, 31, 40, 42, 53], "hit": [4, 5, 19, 20, 21, 22, 23, 24, 26, 31, 32, 33, 41, 50, 51, 55], "lazyglob": 4, "semant": [4, 55], "optimizer_typ": [4, 31, 40, 41, 42, 51, 53], "optimizer_t": [4, 31, 40, 41, 42, 51, 53], "update_t": [4, 31, 40, 42, 53], "beta1": [4, 31, 42], "9": [4, 5, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 38, 41, 42, 47, 48, 53], "beta2": [4, 31, 42], "999": [4, 31, 42], "lambda1": 4, "lambda2": 4, "momentum_factor": 4, "atomic_upd": [4, 40, 53], "atom": [4, 41], "0000001": [4, 42], "groupdens": 4, "hugectr_layer_book": 4, "trane": 4, "reader_param": 4, "groupdenselay": [4, 55], "embeddingcollect": 4, "seri": [4, 37, 43, 44], "taken": [4, 9], "dense_lay": 4, "overload": [4, 50, 55], "flexibli": 4, "buffer": [4, 5, 26, 41, 55], "loss_nam": 4, "loss_weight": 4, "match": [4, 5, 8, 24, 30, 38], "through": [4, 5, 10, 13, 36, 37, 38, 39, 41, 45, 47, 49, 50, 55], "num_epoch": [4, 55], "max_it": [4, 31, 40, 41, 42, 51, 53, 55], "2000": [4, 19, 21, 42, 51, 53], "displai": [4, 31, 40, 41, 42, 43, 51, 53, 55], "200": [4, 24, 26, 27, 30, 31, 40, 41, 51, 53], "eval_interv": [4, 31, 40, 41, 42, 51, 53], "execut": [4, 6, 8, 10, 24, 30, 33, 35, 41, 47, 48, 51, 55], "1000": [4, 10, 19, 21, 27, 28, 31, 40, 41, 42, 47, 51, 53, 55], "invalid": 4, "remot": [4, 5, 43, 55], "gc": [4, 5, 53, 55], "localhost": [4, 5, 6, 20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 41, 55], "9000": [4, 5, 42, 53, 55], "dir": [4, 5, 19, 30, 31, 32, 33, 50], "virtual": [4, 5, 41], "style": [4, 5, 55], "region": [4, 5, 53], "offici": [4, 5, 55], "uri": [4, 5, 55], "bucket": [4, 5], "url": [4, 5, 18, 24, 35, 46, 55], "http": [4, 5, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 38, 39, 40, 41, 42, 43, 45, 47, 48, 49, 50, 53, 55], "googleapi": [4, 5, 40, 42, 53], "com": [4, 5, 13, 17, 19, 24, 25, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 47, 49, 50, 53, 55], "mpi": [4, 31, 40, 41, 42, 43, 51, 53, 55], "print": [4, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 41, 42, 46, 47, 50, 51, 55], "graph_config_fil": [4, 42, 51, 53], "fine": [4, 55], "tune": 4, "include_dense_network": 4, "whole": [4, 10, 30, 36, 56], "dense_opt_states_fil": [4, 55], "\u2170": 4, "sparse_embedding_fil": 4, "\u2171": 4, "sparse_embedding_files_map": 4, "dict": [4, 8, 20, 21, 22, 24, 26, 27, 32, 33, 41, 46], "sparse_embedding2": [4, 5, 41, 42], "358": [4, 24], "wdl_0_sparse_4000": 4, "wdl_1_sparse_4000": 4, "sparse_opt_states_fil": [4, 55], "sparse_opt_states_files_map": 4, "freez": [4, 32], "criteo": [4, 30, 38, 40, 42, 54, 55], "embedding_nam": 4, "unfreez": 4, "reset": [4, 41, 55], "base_lr": 4, "under": [4, 8, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 40, 42, 50, 53, 55], "On": [4, 8, 40, 55], "basi": [4, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "expos": 4, "elabor": 4, "datareader32": 4, "datareader64": 4, "re": [4, 30, 35, 38, 41, 46, 55], "form": [4, 25, 38, 41, 55], "begin": [4, 5, 28, 30, 33], "train_data_read": 4, "eval_data_read": 4, "enter": [4, 41], "loop": [4, 22, 33, 41, 50], "later": [4, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 50, 55], "emb_vector": [4, 9, 10, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33, 51, 53], "distributedslotembed": [4, 36], "slot_id": 4, "localizedslotembed": [4, 36], "info": [4, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 50, 51, 53, 55], "nth": 4, "suffix": [4, 41], "latest": [4, 55], "via": [4, 5, 6, 10, 28, 30, 41, 49, 55], "numpi": [4, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 41, 42, 47, 48, 49, 50, 51, 55], "float32": [4, 9, 10, 12, 14, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 41, 42, 49, 50, 55], "float16": 4, "flow": 4, "debug": [4, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 35, 40, 41, 42, 53, 55], "correct": [4, 30, 41, 55], "intermedi": [4, 5, 35, 55], "easili": [4, 10, 20, 28, 38, 50, 55], "tensor_nam": 4, "tensor_typ": 4, "tensor_t": [4, 41], "1280": [4, 24, 53], "75": [4, 6, 10, 13, 17, 23, 32, 33, 38, 42, 47], "concat1": [4, 26, 27, 30, 33, 41, 42, 51, 53], "sparse_embedding1_train_flow": 4, "fc1_evaluate_flow": 4, "inferencesess": [4, 41], "server": [4, 6, 19, 21, 22, 23, 24, 26, 27, 28, 31, 32, 33, 35, 37, 43, 48, 53, 54, 55], "tensorflow": [4, 5, 6, 8, 17, 18, 20, 21, 22, 23, 25, 26, 27, 34, 35, 36, 37, 38, 41, 43, 55, 56], "tensorrt": [4, 5, 6, 15, 18, 24, 43, 55], "purpos": [4, 38, 43, 50, 55], "triton": [4, 5, 6, 17, 18, 25, 34, 35, 38, 55], "deriv": [4, 10], "besid": [4, 20, 22, 24, 55], "session": [4, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "model_config_path": 4, "inference_param": 4, "max_batchs": [4, 5, 41], "num_batch": [4, 55], "40000": [4, 24], "40960": [4, 31, 41, 53], "960": 4, "cardin": [4, 51], "suitabl": [4, 5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 40, 42, 45, 53, 55], "model_config": 4, "dcn_dense_1000": 4, "dcn0_sparse_1000": 4, "deployed_devic": [4, 5, 41], "use_gpu_embedding_cach": [4, 5, 41, 55], "cache_size_percentag": [4, 5, 41], "inference_model": [4, 20, 21, 26], "pred": [4, 23, 28, 32, 41], "embed_vec_s": [4, 9, 20, 22, 24, 26, 27, 30, 32, 33], "sparse_embedding1_inference_flow": 4, "acknowledg": 4, "synthet": [4, 30, 31, 41, 45, 46, 55], "num_slot": [4, 31, 41, 55], "paruqet": 4, "test_data": 4, "nnz_arrai": [4, 31, 41], "simul": [4, 55], "dist_typ": [4, 31, 41], "distribution_t": [4, 31, 41], "powerlaw": [4, 19, 31, 41], "power_law_typ": [4, 31, 41], "law": [4, 20, 21, 22, 23, 24, 26, 27, 28, 31, 38, 40, 41, 42, 53, 55], "powerlaw_t": [4, 31, 41], "alpha": [4, 19, 31, 32, 38, 41, 55], "medium": [4, 37, 55], "short": [4, 31, 37, 41, 55], "num_fil": [4, 31, 41], "eval_num_fil": [4, 31, 41], "num_samples_per_fil": [4, 31, 41], "5242880": [4, 53], "1310720": 4, "regard": [4, 5, 6, 55], "data_generator_param": [4, 31, 41], "encapsul": [4, 13, 55], "datasourc": 4, "hugect": 4, "filesystemtype_t": [4, 53], "ip": [4, 5, 25, 29, 30, 34, 43, 47, 48, 53], "cluster": [4, 5, 10, 24, 33, 36, 38, 42, 53, 55, 56], "namenod": [4, 53, 55], "endpoint": [4, 5, 53], "put": [4, 49], "Will": 4, "listen": [4, 5], "huge": 5, "further": [5, 55], "grant": 5, "abil": [5, 18, 55], "perman": 5, "demo": [5, 19, 36, 55], "offer": [5, 18, 55, 56], "superior": 5, "cpu": [5, 6, 8, 10, 18, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 41, 43, 47, 48, 50, 54, 55], "counterpart": [5, 6, 55], "although": [5, 55], "modern": 5, "center": 5, "nvidia": [5, 6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 49, 50, 53, 54, 55], "increas": [5, 30, 54, 55], "ai": [5, 36], "come": [5, 55], "spearhead": 5, "vastli": [5, 55], "clsuter": 5, "ram": [5, 10], "asid": 5, "retain": [5, 55], "hdd": [5, 10], "sdd": [5, 10], "magnitud": [5, 30], "ddr": 5, "hbm": [5, 27, 55], "cost": 5, "throughout": [5, 55], "latenc": [5, 6, 10, 13, 17, 18, 19, 20, 24, 28, 30, 34, 36, 41, 55], "drr": 5, "act": 5, "therebi": 5, "respons": [5, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 42, 53], "redi": [5, 55], "retriev": [5, 9, 10, 13, 17, 20, 24, 26, 28, 36, 55], "rdb": [5, 41], "aof": [5, 41], "seamless": 5, "restart": [5, 41, 46, 47, 48], "particip": 5, "claim": 5, "guarante": [5, 33, 41, 55], "statement": [5, 13, 17, 43], "hiredi": [5, 41], "love": 5, "hear": 5, "experi": [5, 10, 19, 25, 29, 34, 43, 55], "let": [5, 41, 50, 55], "successfulli": [5, 10, 24, 25, 29, 30, 31, 32, 33, 34, 36, 41, 42, 43, 53], "unsuccessfulli": 5, "target": [5, 19, 35, 36, 41, 43, 46, 50, 51, 54, 55], "link": [5, 10, 24, 30, 38, 40, 42, 46, 49, 55], "consid": [5, 20, 21, 26, 36, 49], "compliment": 5, "expand": 5, "capabl": [5, 20, 21, 22, 23, 24, 26, 31, 33, 35, 41, 55, 56], "capac": [5, 18, 36], "entir": [5, 55], "whatev": 5, "reason": [5, 24, 43], "becom": [5, 38, 55], "unavail": [5, 55], "respond": [5, 35], "though": 5, "properti": [5, 50], "emphas": 5, "rough": 5, "guidelin": 5, "often": [5, 44], "ethernet": 5, "rel": 5, "practic": [5, 36], "gb": [5, 25, 29, 34, 40, 41, 42, 43], "few": 5, "tb": [5, 55], "mainten": 5, "stream": [5, 23, 28, 30, 33, 41, 46, 55], "extern": [5, 45, 50], "kafka": [5, 55], "downtim": 5, "retrain": [5, 30], "logic": [5, 9], "whenev": 5, "engin": [5, 6, 17, 18, 24, 34, 42, 50, 52, 55], "associ": [5, 55], "resolv": [5, 31, 40, 41, 42, 51, 53, 55], "turn": [5, 6, 23, 50, 55], "represent": 5, "fill": [5, 40, 41], "publish": [5, 41], "certain": 5, "ingest": [5, 55], "stage": [5, 19, 20, 21, 24, 26, 30, 31, 32, 33, 38, 54, 55], "suffici": [5, 55], "attempt": [5, 30, 31, 32, 33, 55], "minim": [5, 55, 56], "recent": [5, 55], "lru": 5, "volatiledatabaseparam": [5, 41], "persistentdatabaseparam": 5, "updatesourceparam": 5, "These": [5, 10, 13, 17, 26, 33, 38, 44, 55], "packag": [5, 10, 17, 22, 24, 26, 30, 32, 33, 36, 38, 46, 47, 48, 49, 50, 55, 56], "supportlonglong": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "fuse_embedding_t": [5, 23, 28, 30, 33, 55], "volatile_db": [5, 41], "persistent_db": 5, "update_sourc": 5, "At": [5, 40], "origin": [5, 26, 38, 54, 55], "synchron": [5, 19, 30, 35, 55], "plugin": [5, 6, 18, 22, 23, 25, 26, 28, 29, 35, 37, 38, 43, 55], "torch": [5, 17, 18, 29, 32, 41, 47, 48, 55], "describ": [5, 36], "speak": 5, "rare": [5, 40, 55], "might": [5, 30, 32], "sens": 5, "vari": 5, "heterogen": 5, "network_fil": [5, 19], "number_of_refresh_buffers_in_pool": 5, "thread_pool_s": [5, 28, 33], "cache_refresh_percentage_per_iter": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 55], "default_value_for_each_t": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "maxnum_des_feature_per_sampl": [5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "embedding_cache_typ": [5, 23, 28, 30, 55], "refresh_delai": [5, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "refresh_interv": [5, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "maxnum_catfeature_query_per_table_per_sampl": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "embedding_vecsize_per_t": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "divis": 5, "device_id": [5, 26, 41], "devicelist": 5, "goe": 5, "directli": [5, 20, 21, 24, 26, 27, 33, 38, 41, 47, 55], "uvm": [5, 55], "No": [5, 20, 21, 23, 24, 26, 30, 31, 32, 33, 40], "percentag": [5, 6, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "pool": [5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 55], "refresh": [5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 55], "increment": [5, 50, 55], "occur": [5, 31, 55], "frequenc": [5, 30, 55], "volum": 5, "std": [5, 41, 47, 55], "hardware_concurr": [5, 55], "delai": [5, 19], "wait": [5, 28, 30, 35, 41, 45], "timer": [5, 41], "servic": [5, 10, 20, 21, 22, 26, 30, 33, 55], "period": [5, 36], "partit": [5, 50, 55], "use_context_stream": [5, 23, 28, 30, 33], "context": [5, 23, 28, 30, 33, 41], "sparse_fil": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "wdl_infer": [5, 19], "wdl0_sparse_20000": 5, "wdl1_sparse_20000": 5, "dense_fil": [5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "wdl_dense_20000": 5, "num_of_worker_buffer_in_pool": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "num_of_refresher_buffer_in_pool": [5, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "deployed_device_list": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "max_batch_s": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "table1": [5, 23, 28], "table2": [5, 23, 28], "gpucachep": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "gpucach": [5, 8, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "back": [5, 24, 41, 55], "indvidu": 5, "collabor": 5, "inject": 5, "underli": [5, 10, 13, 17, 38, 55], "redis_clust": [5, 41], "127": [5, 26, 30, 41, 42], "7000": [5, 41, 42], "user_nam": 5, "password": 5, "num_partit": [5, 41], "allocation_r": 5, "268435456": 5, "mib": [5, 30, 31, 32, 33], "shared_memory_s": 5, "17179869184": [5, 41], "gib": [5, 42], "shared_memory_nam": 5, "hctr_mp_hash_map_databas": [5, 41], "shared_memory_auto_remov": [5, 41], "65536": [5, 6, 20, 30, 40], "enable_tl": [5, 41, 55], "tls_ca_certif": [5, 41, 55], "cacertbundl": 5, "crt": [5, 41], "tls_client_certif": [5, 41, 55], "client_cert": 5, "pem": [5, 41], "tls_client_kei": [5, 41, 55], "client_kei": 5, "tls_server_name_identif": [5, 41, 55], "overflow_margin": 5, "overflow_polici": 5, "databaseoverflowpolicy_t": 5, "enum_valu": 5, "overflow_resolution_target": 5, "initialize_after_startup": [5, 41], "initial_cache_r": 5, "cache_missed_embed": 5, "update_filt": 5, "filter": [5, 30], "7003": 5, "7004": 5, "7005": 5, "10000000": [5, 40, 42], "evict_random": 5, "hash_map": 5, "multi_process_hash_map": [5, 41, 55], "live": [5, 24], "dev": [5, 41, 42, 55], "shm": [5, 41, 55], "parallel_hash_map": 5, "degre": 5, "hashmap": [5, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 55], "split": [5, 30, 42, 47, 55], "evenli": 5, "min": [5, 19, 21, 31, 32, 33], "number_of_cpu_cor": 5, "build": [5, 6, 17, 18, 24, 25, 29, 34, 36, 41, 47, 50, 53, 55, 56], "denot": 5, "word": [5, 9, 41], "upper": 5, "bound": 5, "latter": 5, "particularli": [5, 38], "imag": [5, 25, 29, 34, 35, 38, 43, 44, 46, 47, 49, 51, 52], "insuffici": 5, "nativ": [5, 6, 10, 18, 19, 23, 25, 30, 31, 32, 36, 41, 55], "doc": [5, 24, 30, 31, 32, 33, 41, 55], "symbol": [5, 31, 42], "attach": [5, 41], "remov": [5, 35, 40, 41, 50, 55], "disconnect": [5, 41], "program": [5, 30, 31, 32, 33, 41, 43, 55], "pattern": [5, 55], "account": [5, 50], "divid": [5, 9], "num_paritit": 5, "strictli": 5, "incur": [5, 55], "overhead": [5, 6, 22, 30, 55], "too": [5, 41, 50, 55], "5x": [5, 24, 55], "mass": 5, "chunk": [5, 35, 50, 55], "transmiss": 5, "stabil": [5, 30], "1000000": [5, 53], "conjunct": [5, 55], "262143": 5, "18": [5, 19, 23, 30, 32, 33, 41, 47, 51, 53], "lead": [5, 35, 36, 37, 55], "obscur": 5, "therefor": [5, 30, 35, 41, 55], "tl": [5, 55], "ssl": [5, 55], "secur": [5, 50], "encrypt": [5, 41, 55], "slightli": [5, 55], "filesystem": [5, 30, 55], "certif": [5, 41], "ca": [5, 41], "client": [5, 19, 24, 30, 31, 32, 33, 41, 42, 55], "privat": [5, 41], "sni": 5, "instabl": 5, "sporad": 5, "consumpt": [5, 55], "occupi": [5, 24, 30, 33, 42], "ideal": 5, "100000000": [5, 50], "reliabl": [5, 50], "condit": [5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "prune": 5, "conduct": [5, 6, 28, 29, 55], "until": [5, 19, 20, 21, 24, 26, 41, 55], "evict_least_us": 5, "lfu": 5, "effort": [5, 55], "evict_oldest": 5, "complic": [5, 41, 50, 56], "comparison": [5, 54, 55], "faster": [5, 54, 55], "deliv": [5, 55], "evict": [5, 41, 55], "doubl": [5, 9, 41], "fraction": 5, "keep": 5, "exactli": [5, 9, 55], "shrunk": 5, "80": [5, 10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 31, 32, 33, 35, 38, 41, 42, 43, 47], "surpass": 5, "content": [5, 20, 21, 22, 23, 24, 26, 27, 28, 40, 41, 42, 53, 55], "previous": [5, 55], "reconnect": 5, "rocksdb": [5, 55], "materi": [5, 50], "immedi": 5, "upon": [5, 55], "startup": [5, 30, 55], "50": [5, 20, 22, 23, 24, 30, 32, 33, 41, 42], "anoth": [5, 41, 42, 55], "written": [5, 20, 21, 24, 26, 30, 33, 43], "send": [5, 6, 24, 30, 31, 32, 33, 41, 55], "hps_": 5, "databasetype_t": [5, 41], "tmp": [5, 32, 42], "read_onli": 5, "rocks_db": 5, "datatabas": 5, "prevent": [5, 38, 41, 55], "found": [5, 9, 20, 21, 23, 24, 26, 30, 32, 33, 36, 40, 41, 43, 51, 53], "Be": 5, "awar": [5, 31], "overwrit": [5, 22, 23, 24, 26, 30, 33, 40, 51, 53], "driver": [5, 40], "nf": 5, "kept": [5, 6], "sync": 5, "kafka_message_queu": 5, "broker": [5, 55], "metadata_refresh_interval_m": 5, "30000": [5, 20, 21, 24], "poll_timeout_m": 5, "500": [5, 40, 51, 53], "receive_buffer_s": 5, "262144": 5, "8192": [5, 6, 30, 40], "failure_backoff_m": 5, "max_commit_interv": 5, "9092": 5, "null": [5, 15, 41, 55], "semicolon": 5, "delimit": 5, "pair": [5, 38, 42, 51], "topic": [5, 10, 36], "metadata": [5, 35, 45, 46, 50, 55], "download": [5, 19, 30, 35, 40, 42, 44, 46, 49, 52, 55], "send_buffer_s": 5, "kafkamessagesink": 5, "push": 5, "millisecond": [5, 55], "dispatch": [5, 22, 26], "fail": [5, 24, 42], "success": [5, 6, 8, 30, 31, 35, 41, 42, 51, 53], "temporarili": 5, "unreach": 5, "retri": 5, "commit": 5, "independ": [5, 13, 24, 30, 31, 32, 33, 55], "were": [5, 20, 21, 22, 24, 30, 33, 50, 54, 55], "sinc": [5, 23, 24, 30, 31, 32, 33, 36, 38, 40, 41, 55], "create_tf_model": 6, "py": [6, 22, 23, 24, 26, 30, 31, 32, 33, 35, 36, 38, 40, 41, 42, 50, 51, 53, 55], "savedmodel": [6, 8, 21, 25, 26, 30, 33, 55], "16gb": [6, 51, 54], "almost": [6, 20, 21, 24, 26, 54], "furthermor": 6, "create_trt_engin": 6, "summar": [6, 10, 25, 29, 30, 34, 43, 52], "onnx": [6, 17, 43, 55], "surgeri": [6, 17], "experiment": [6, 23, 27, 41, 55], "variablepolici": 6, "save_variable_devic": 6, "common": [6, 24, 26, 27, 36, 53, 54, 56], "baselin": 6, "nn": [6, 9, 10, 12, 20, 21, 22, 24, 28, 30, 32, 33, 47], "lookuplay": [6, 7, 8, 11, 22, 23, 26, 28], "unchang": 6, "integr": [6, 10, 17, 18, 20, 34, 37, 55], "built": [6, 18, 20, 21, 24, 26, 30, 34, 35, 36, 47, 55], "optimum": 6, "131072": 6, "investig": [6, 41], "sxm4": [6, 34], "80gb": [6, 34, 36], "trt": [6, 15, 25, 30, 31, 32, 33, 43, 55], "hps_tensorflow_triton_deployment_demo": [6, 25], "demo_for_tf_trained_model": [6, 34], "repeatedli": [6, 22, 33, 41], "analyz": [6, 30, 55], "serv": [6, 18, 30, 33, 36, 55], "studi": [6, 55], "measur": [6, 30, 50, 55], "perf_analyz": [6, 19, 30], "8000": [6, 24, 30, 31, 32, 33, 42], "categorical_featur": [6, 30, 31, 32, 33], "numerical_featur": [6, 30, 31, 32, 33], "276633": 6, "7912898": 6, "7946796": 6, "7963854": 6, "7971191": 6, "7991237": 6, "7991368": 6, "7998351": 6, "7999728": 6, "8014930": 6, "13554004": 6, "14136456": 6, "14382203": 6, "14382219": 6, "14384425": 6, "14395091": 6, "14395194": 6, "14395215": 6, "14396165": 6, "14671338": 6, "22562171": 6, "25307802": 6, "32394527": 6, "32697105": 6, "32709007": 6, "32709104": 6, "76171875": 6, "806640625": 6, "609375": 6, "04296875": 6, "7919921875": 6, "0986328125": 6, "9453125": [6, 33], "38671875": 6, "3984375": 6, "9462890625": 6, "side": [6, 19, 24, 30, 55], "report": [6, 24, 30, 55], "count": [6, 24, 26, 27, 30, 31, 32, 33, 46], "28589": 6, "avg": [6, 30], "562": [6, 41, 47], "usec": [6, 30, 34], "59": [6, 20, 23, 33, 40, 47, 51, 53, 55], "431": [6, 41], "53": [6, 19, 23, 26, 40, 47, 51, 53], "merlin": [6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 41, 42, 43, 50, 53, 55], "nvcr": [6, 10, 13, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 38, 43, 55], "02": [6, 10, 17, 25, 30, 33, 34, 41, 44, 45, 47, 51, 53], "amd": [6, 34], "epyc": [6, 34], "7742": [6, 34], "processor": [6, 34], "softwar": [6, 10, 13, 17, 20, 21, 22, 23, 24, 26, 27, 28, 38, 40, 42, 46, 53, 55], "22": [6, 23, 24, 30, 32, 33, 47, 51, 53, 55], "cuda11": 6, "microsecond": 6, "figur": [6, 18], "logarithm": 6, "10x": 6, "speedup": [6, 54, 55], "551": [6, 28, 30, 40], "612": 6, "380": [6, 28, 30, 40, 42], "389": 6, "42": [6, 22, 23, 30, 31, 32, 33, 40, 41, 42, 45, 47, 50, 51], "608": [6, 40, 47, 53], "667": [6, 41, 53], "381": [6, 40], "346": [6, 47], "76": [6, 23, 30, 53], "832": 6, "639": 6, "438": [6, 47], "428": [6, 47], "94": [6, 23, 47], "1911": 6, "849": 6, "604": [6, 47], "534": [6, 40], "58": [6, 23, 30, 33, 42, 47, 51, 53], "4580": 6, "1059": 6, "927": [6, 30, 40], "766": 6, "98": [6, 23, 47, 50], "9872": 6, "1459": 6, "1446": 6, "1114": 6, "86": [6, 23, 24, 26, 40, 41, 47], "19643": 6, "2490": 6, "2432": 6, "1767": 6, "35292": 6, "4131": 6, "4355": 6, "3053": 6, "56": [6, 20, 23, 24, 26, 30, 31, 33, 34, 40, 42, 47, 51, 53], "32768": [6, 30], "54090": 6, "7795": 6, "6816": 6, "5247": 6, "31": [6, 23, 31, 32, 33, 40, 41, 42, 47, 51, 53], "107742": 6, "15036": 6, "13012": 6, "10022": 6, "213990": 6, "29374": 6, "25440": 6, "19340": 6, "06": [6, 13, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 38, 40, 41, 42, 43, 44, 47, 51, 53], "init": [7, 8, 10, 19, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 35, 40, 41, 42, 43, 51, 53, 55], "sparselookuplay": [7, 8, 10, 22], "hierarchical_parameter_serv": [8, 9, 10, 20, 21, 22, 23, 24, 26, 27, 28, 55], "kwarg": [8, 9, 20, 21, 22, 23, 24, 26, 27, 30, 32, 33], "abbrevi": [8, 9], "implicitli": [8, 24, 41, 55], "ps_config_fil": [8, 9, 12, 15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 55], "global_batch_s": [8, 9, 20, 21, 22, 23, 24, 26, 27, 32, 33, 55], "constructor": [8, 9, 24], "safe": [8, 32, 41], "implicit": [8, 9, 24, 25], "especi": [8, 55], "cuda_visible_devic": [8, 20, 21, 22, 23, 24, 26, 28, 32, 33, 42, 55], "set_visible_devic": [8, 27, 55], "addition": [8, 44, 55], "visibl": [8, 55], "horovod": [8, 27, 55], "scope": [8, 22, 32, 55], "hvd": [8, 27], "sess": [8, 41], "hps_init": 8, "keyword": 8, "dictionari": [8, 50], "demo_model": [8, 15], "demo_model_spars": 8, "sparse_embedding0": [8, 21, 22, 26, 31, 32, 33], "demo_model2": 8, "demo_model2_sparse_0": 8, "demo_model2_sparse_1": 8, "ok": [8, 41, 48], "wrapper": [9, 12, 24], "embedding_lookup_spars": [9, 10, 21, 22], "table_id": [9, 12, 15, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40], "emb_vec_s": [9, 12, 15, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33], "emb_vec_dtyp": [9, 20, 21, 22, 23, 24, 26], "sparse_lookup_lay": [9, 21, 22, 26], "def": [9, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 40, 41, 46, 47, 51], "_infer_step": [9, 20, 21, 26], "embedding_vector": [9, 20, 24, 26, 27, 30, 32, 33, 51], "sp_id": [9, 21, 22, 26], "sp_weight": [9, 21, 22, 26], "enumer": [9, 20, 21, 22, 24, 26, 27, 28, 33, 40, 41, 46, 47, 51], "max_norm": [9, 55], "op": [9, 24, 30, 31, 32, 33, 42, 50, 51, 55], "canon": 9, "sparsetensor": [9, 21, 22, 26], "aggreg": 9, "int32": [9, 12, 14, 15, 23, 28, 30, 31, 32, 33], "sqrtn": 9, "squar": 9, "root": [9, 19, 25, 29, 30, 34, 35, 42, 43, 47, 48, 53, 55], "clip": [9, 42, 55], "d0": 9, "d1": 9, "self": [9, 10, 13, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 38, 41, 43], "3x16": 9, "vector_for_id_1": 9, "vector_for_id_3": 9, "vector_for_id_0": 9, "rais": [9, 27, 40, 41], "typeerror": 9, "neither": 9, "nor": 9, "valueerror": [9, 27], "lookup_lay": [9, 20, 21, 23, 24], "Its": [9, 55], "get_shap": 9, "realiz": [10, 17, 20, 28, 36], "toolkit": [10, 13, 17, 20, 38], "face": 10, "hundr": 10, "gigabyt": 10, "qualiti": 10, "engag": 10, "dozen": 10, "mitig": 10, "volatil": [10, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 55], "tail": [10, 42, 51, 55], "characterist": [10, 30], "hierarchi": 10, "ssd": [10, 18, 55], "subscrib": [10, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "illustr": [10, 36, 55], "sok": [10, 25, 36, 37, 55, 56], "distributedembed": [10, 26], "dissect": 10, "subgraph": 10, "composid": 10, "saver": [10, 26], "dump_to_fil": [10, 26, 27], "sm": [10, 13, 17, 35, 38], "volta": [10, 13, 17, 36, 38], "70": [10, 13, 17, 19, 23, 32, 35, 38, 40, 43, 47, 53], "ture": [10, 13, 17, 36, 38], "amper": [10, 13, 17, 36, 38, 55], "h100": [10, 13, 17, 34, 35, 38], "hopper": [10, 13, 17, 30, 38, 55], "90": [10, 13, 17, 19, 23, 24, 35, 38, 47, 55], "compon": [10, 13, 17, 19, 35, 38, 55], "applic": [10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 27, 28, 38, 40, 41, 42, 46, 47, 48, 53, 55], "portabl": [10, 13, 17, 30, 38], "reproduc": [10, 13, 17, 25, 29, 33, 34, 38, 43], "agnost": [10, 17, 38], "ll": [10, 35, 38, 40, 50, 55], "pull": [10, 13, 17, 19, 35, 38, 48], "rm": [10, 13, 17, 19, 25, 29, 30, 34, 35, 38, 41, 43, 47, 48, 55], "cap": [10, 13, 17, 19, 25, 29, 34, 38, 40, 43], "sys_nic": [10, 13, 17, 19, 25, 29, 34, 38, 43], "python3": [10, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 38, 40, 41, 49], "jupyt": [10, 13, 17, 24, 30, 31, 32, 33, 47, 48, 55], "cover": [10, 30, 41, 46], "migrat": [10, 55], "hps_dlrm_benchmark": [10, 17], "md": [10, 17, 55], "inherit": 12, "modul": [12, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33, 35, 41, 42, 43, 47, 55], "hps_torch": [12, 13, 28, 29], "__init__": [12, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33], "har": 13, "seamlessli": [13, 55], "catalog": [13, 17, 38], "explor": 13, "visit": [13, 55], "24": [13, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 38, 40, 41, 42, 43, 47, 51, 53, 54], "hpsplugin": [14, 16], "registr": [14, 15], "hps_trt": [14, 15, 17, 30, 31, 32, 33, 34], "num_keys_per_sampl": 14, "embedding_vector_s": 14, "hpsplugincr": [14, 15, 16], "registri": 14, "trail": 15, "charact": [15, 21, 24, 26], "np": [15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 41, 42, 47, 48, 49, 50, 51], "pluginfield": 15, "hps_conf": 15, "dtype": [15, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 41, 42, 49, 50, 55], "string_": 15, "pluginfieldtyp": 15, "popular": [17, 45], "tf2onnx": [17, 30, 33], "hugectr2onnx": [17, 31, 36, 41, 55], "graphsurgeon": 17, "ld_preload": [17, 24, 30, 31, 32, 33], "usr": [17, 19, 22, 24, 26, 30, 31, 32, 33, 35, 41, 43, 49], "lib": [17, 22, 24, 26, 30, 31, 32, 33, 41, 49], "libhps_plugin": [17, 30, 31, 32, 33], "pytorch": [17, 28, 29, 34, 41, 47, 48], "ctype": [17, 30, 31, 32, 33], "cdll": [17, 30, 31, 32, 33], "rtld_global": [17, 30, 31, 32, 33], "subcompon": 18, "meet": [18, 41], "site": 18, "relationship": [18, 55], "highest": 18, "speed": [18, 30, 33, 36, 40, 55], "benefit": 18, "unifi": [18, 49, 55], "extens": [18, 41, 48, 55], "hugectr_backend": 18, "critic": [19, 20, 21, 22, 23, 24, 26, 30, 33, 41], "hps_profil": 19, "benchmark": [19, 34, 41, 54, 55], "trion": 19, "procedur": [19, 55], "embedding_cach": 19, "num_kei": 19, "p": [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 41, 42, 43, 51, 53, 55], "table_s": 19, "630000": 19, "warmup_iter": 19, "900": [19, 28, 40, 41, 51, 53], "000285m": 19, "000384853m": 19, "median": 19, "000365m": 19, "95": [19, 23, 47], "000428m": 19, "99": [19, 23, 47], "000465m": 19, "009736m": 19, "73973e": 19, "010842m": 19, "0117076m": 19, "011596m": 19, "012219m": 19, "016642m": 19, "027379m": 19, "86236": 19, "dedupl": 19, "019159m": 19, "0272492m": 19, "027262m": 19, "028104m": 19, "029548m": 19, "052309m": 19, "36681": 19, "178875m": 19, "231377m": 19, "227815m": 19, "267493m": 19, "284738m": 19, "47672m": 19, "4389": 19, "merg": [19, 30, 35], "007656m": 19, "00850756m": 19, "008434m": 19, "009117m": 19, "011863m": 19, "018697m": 19, "118568": 19, "105163m": 19, "15741m": 19, "153763m": 19, "192302m": 19, "208846m": 19, "402043m": 19, "6503": 19, "52": [19, 23, 31, 40, 41, 47, 51, 53, 55], "021729m": 19, "0227739m": 19, "02253m": 19, "023695m": 19, "025035m": 19, "043024m": 19, "44385": 19, "decompress": 19, "deuniqu": 19, "011247m": 19, "0121274m": 19, "011953m": 19, "013055m": 19, "014706m": 19, "022186m": 19, "83661": 19, "719323": 19, "843972": 19, "854749": 19, "894188": 19, "90276": 19, "918169": 19, "parti": [19, 35, 43, 55], "git": [19, 25, 29, 30, 34, 35, 43, 47, 53], "clone": [19, 30, 35, 47, 53, 55], "cd": [19, 25, 29, 30, 34, 35, 38, 40, 41, 43, 47, 48, 53], "submodul": [19, 35, 43, 53], "recurs": [19, 35, 43, 53], "ngc": [19, 36, 47, 49, 53, 55], "mount": [19, 25, 29, 34, 43, 47], "pwd": [19, 25, 29, 30, 34, 41, 43, 47, 48], "8888": [19, 25, 29, 30, 34, 43], "mkdir": [19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 41, 42, 43, 46, 51, 53], "cmake": [19, 35, 36, 43, 50, 53, 55], "dcmake_build_typ": [19, 35, 43, 53], "dsm": [19, 35, 43, 53, 55], "denable_infer": [19, 30, 35], "ON": [19, 30, 35, 36, 43, 53, 55], "denable_profil": 19, "j": [19, 21, 22, 26, 32, 33, 35, 43, 53], "exit": [19, 41], "hotkei": 19, "histogram": 19, "100000": [19, 22], "hot_key_percentag": 19, "hot_key_coverag": 19, "test": [19, 30, 35, 41, 42, 45, 53, 54, 55], "discard": 19, "database_backend": [19, 55], "vdb": 19, "pdb": 19, "refresh_embeddingcach": 19, "lookup_sess": 19, "e2": 19, "model_sampl": 19, "190813m": 19, "243117m": 19, "238085m": 19, "283761m": 19, "346377m": 19, "511712m": 19, "4200": [19, 51], "075086m": 19, "127312m": 19, "121235m": 19, "166826m": 19, "219295m": 19, "285409m": 19, "8248": 19, "44": [19, 23, 26, 30, 32, 33, 40, 41, 47, 51, 53], "xx": [19, 55], "mutual": [19, 30], "accur": 19, "don": [19, 35, 55], "prepar": [19, 38, 50, 51, 55], "everyth": [19, 55], "tritonserv": [19, 24, 30, 31, 32, 33], "your_model_nam": 19, "perf_output": 19, "csv": [19, 45, 46, 49, 50], "verbos": [19, 24, 30, 32, 35, 55], "your_generated_request": 19, "pipelin": [19, 36, 54, 55], "pofil": 19, "copyright": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "2021": [20, 21, 22, 23, 24, 26, 27, 28, 37, 40, 42, 53], "corpor": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "right": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "licens": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "complianc": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "www": [20, 21, 22, 23, 24, 26, 27, 28, 33, 40, 42, 53], "org": [20, 21, 22, 23, 24, 26, 27, 28, 33, 38, 40, 42, 45, 48, 49, 53, 55], "agre": [20, 21, 22, 23, 24, 26, 27, 28, 35, 40, 41, 42, 53], "AS": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "warranti": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "OR": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53, 54], "OF": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "languag": [20, 21, 22, 23, 24, 26, 27, 28, 37, 40, 42, 53], "govern": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53], "permiss": [20, 21, 22, 23, 24, 26, 27, 28, 35, 40, 42, 53], "intend": [20, 21, 22, 23, 24, 26, 27, 28, 40, 42, 53, 55], "preinstal": [20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 53], "naiv": [20, 23], "neural": [20, 21, 22, 23, 24, 26, 30, 36, 38, 41, 50, 51], "gpu_num": [20, 21, 22, 24, 26, 27, 32, 33], "iter_num": [20, 21, 22, 24, 26, 27, 32, 33], "vocabulary_range_per_slot": [20, 21, 22, 24, 26, 27, 32, 33], "naive_dnn": 20, "dense_model_path": [20, 21, 22, 24, 26, 27], "naive_dnn_dens": 20, "embedding_table_path": [20, 21, 22, 24, 26, 27, 32, 33], "naive_dnn_spars": 20, "saved_path": [20, 21, 22, 24, 26, 27, 33], "naive_dnn_tf_saved_model": 20, "np_key_typ": [20, 21, 22, 24, 26, 27, 32, 33], "np_vector_typ": [20, 21, 22, 24, 26, 27, 32, 33], "tf_key_typ": [20, 21, 22, 23, 24, 26, 27, 33], "tf_vector_typ": [20, 21, 22, 24, 26, 27, 33], "join": [20, 21, 22, 23, 24, 26, 28, 30, 32, 33, 41, 42, 45, 46, 50, 51], "generate_random_sampl": [20, 21, 22, 24, 26, 27, 32, 33], "key_dtyp": [20, 21, 22, 24, 26, 32, 33], "vocab_rang": [20, 21, 22, 23, 24, 26, 28, 32, 33], "keys_per_slot": [20, 21, 24, 32, 33], "randint": [20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33], "tf_dataset": [20, 21, 22, 24, 26, 27, 33], "from_tensor_slic": [20, 21, 22, 24, 26, 27, 33], "drop_remaind": [20, 21, 22, 24, 26, 27, 33], "trainmodel": [20, 21, 24], "init_tensor": [20, 22, 24, 26, 30, 32, 33], "super": [20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 33], "initial_valu": [20, 21, 22, 24, 30, 33], "fc_1": [20, 21, 24], "kernel_initi": [20, 21, 23, 24], "ones": [20, 21, 22, 23, 24, 26, 27, 30, 32, 33], "bias_initi": [20, 21, 23, 24], "fc_2": [20, 21, 24], "logit": [20, 21, 22, 23, 24, 26, 27, 30, 32, 33], "learning_r": [20, 21, 22, 24, 26, 27, 33, 36], "loss_fn": [20, 21, 22, 24, 26, 27, 33], "binarycrossentropi": [20, 21, 22, 24, 26, 27, 33], "from_logit": [20, 21, 22, 24, 26], "_train_step": [20, 21, 22, 24, 26, 27, 33], "gradienttap": [20, 21, 22, 24, 26, 27, 33], "tape": [20, 21, 22, 24, 26, 27, 33], "grad": [20, 21, 22, 24, 26, 33], "trainable_vari": [20, 21, 22, 24, 26, 27, 33], "apply_gradi": [20, 21, 22, 24, 26, 27, 33], "zip": [20, 21, 22, 24, 26, 27, 33, 45, 46], "id_tensor": 20, "trained_model": [20, 21, 22, 24, 26, 27, 32, 33], "weights_list": [20, 21, 22, 24, 33], "get_weight": [20, 21, 22, 24, 33], "embedding_weight": [20, 22, 24, 32, 33], "dense_model": [20, 21, 24, 26, 27, 31, 41], "get_lay": [20, 21, 24, 26, 27], "2022": [20, 21, 22, 24, 26, 27, 31, 37, 40, 53, 55], "07": [20, 21, 22, 23, 26, 27, 32, 33, 40, 41, 42, 44, 47, 51, 53], "742983": 20, "platform": [20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 34, 37, 41, 55], "cpu_feature_guard": [20, 21, 22, 23, 24, 26, 30, 33, 41], "cc": [20, 21, 22, 23, 24, 26, 27, 30, 33, 41], "151": [20, 21, 22, 26], "oneapi": [20, 21, 22, 23, 24, 26, 30], "onednn": [20, 21, 22, 23, 24, 26, 30], "avx2": [20, 21, 22, 26, 41], "fma": [20, 21, 22, 26, 41], "rebuild": [20, 21, 22, 23, 24, 26, 30, 33, 41, 55], "appropri": [20, 21, 22, 23, 24, 26, 30, 33, 41], "present": [20, 21, 22, 24, 30, 33], "track": [20, 21, 22, 24, 30, 33, 41], "strong": [20, 21, 22, 24, 30, 33], "rewritten": [20, 21, 22, 24, 30, 33], "subclass": [20, 21, 22, 24, 30, 33], "_________________________________________________________________": [20, 21, 26], "input_1": [20, 21, 22, 23, 24, 26], "inputlay": [20, 21, 22, 23, 24, 26, 30, 33], "embedding_l": 20, "ookup": 20, "tfoplambda": [20, 21, 22, 23, 24, 26, 30, 33], "48": [20, 21, 23, 26, 30, 32, 33, 40, 41, 47, 51], "12544": [20, 21], "257": [20, 21, 22, 24, 26, 30], "801": [20, 40], "trainabl": [20, 21, 22, 23, 24, 26, 30, 31, 33, 40, 41, 42, 44, 53], "57": [20, 22, 23, 30, 33, 40, 42, 47, 51, 53], "326494": 20, "common_runtim": [20, 21, 22, 23, 24, 26, 33, 41], "gpu_devic": [20, 21, 22, 23, 24, 26, 33, 41], "1525": [20, 21, 22, 26], "replica": [20, 21, 22, 23, 24, 26, 33, 41], "30989": [20, 21, 22, 26], "mb": [20, 21, 22, 23, 24, 26, 33, 41, 50], "tesla": [20, 21, 22, 23, 24, 25, 26, 29, 31, 33, 34, 40, 41, 42, 43, 51, 53], "sxm2": [20, 21, 22, 23, 24, 25, 26, 29, 31, 33, 34, 40, 41, 42, 43, 51, 53], "bu": [20, 21, 22, 23, 24, 26, 33, 40, 41], "0000": [20, 21, 22, 23, 24, 26, 33, 40, 41, 42], "00": [20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 40, 41, 47, 48, 51, 53], "6136": 20, "6875": 20, "4463": 20, "05712890625": 20, "3192": 20, "029296875": 20, "2180": 20, "40283203125": 20, "1419": 20, "980712890625": 20, "879": [20, 40, 42], "0396728515625": 20, "513": [20, 42], "3021240234375": 20, "272": [20, 31, 41], "9712219238281": 20, "129": 20, "147705078125": 20, "21624755859375": 20, "model_1": [20, 21, 22, 24, 26, 30], "input_2": [20, 21, 22, 23, 24, 26], "compile_metr": [20, 21, 24, 26], "645703": 20, "368": [20, 21, 26, 28, 33, 40, 41], "asset": [20, 21, 24, 26, 30, 33], "readi": [20, 21, 24, 26, 30, 31, 32, 33, 40, 49, 53], "load_model": [20, 21, 23, 24, 26], "create_and_save_inference_graph": [20, 21, 24, 26], "convert_to_sparse_model": [20, 21, 22, 24, 32, 33], "embeddings_weight": [20, 21, 22, 24, 32, 33], "wb": [20, 21, 22, 23, 24, 28, 30, 31, 32, 33, 46, 47, 48, 51], "key_fil": [20, 21, 22, 23, 24, 28, 32, 33, 51], "vec_fil": [20, 21, 22, 23, 24, 28, 32, 33, 51], "vec": [20, 21, 22, 23, 24, 28, 32, 33, 51], "key_struct": [20, 21, 22, 23, 24, 28, 32, 33, 51], "vec_struct": [20, 21, 22, 23, 24, 28, 32, 33, 51], "wa": [20, 21, 22, 23, 24, 26, 30, 35, 36, 40, 42, 43, 50, 53, 54, 55], "model_2": [20, 21, 24], "input_3": [20, 21, 22, 23, 24, 26], "reshape_1": [20, 21, 22, 23, 24], "12801": 20, "necessari": [20, 21, 26, 30, 35, 55], "peek": [20, 21, 26], "writefil": [20, 21, 22, 23, 24, 26, 30, 31, 32, 33, 40, 41, 42, 51, 53], "inference_with_saved_model": [20, 21, 26], "embedding_vectors_peek": 20, "id_tensors_peek": 20, "pars": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 55], "hctr": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 53, 55], "25": [20, 21, 23, 28, 32, 33, 40, 42, 47, 50, 51, 53], "009": [20, 40], "rk0": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 53, 55], "main": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 35, 40, 41, 42, 50, 53, 55], "010": [20, 40, 53], "db": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41, 55], "357": 20, "hps_et": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 41], "preallocatedhashmapbackend": [20, 21, 22, 26], "18446744073709551615": [20, 21, 22, 23, 24, 26, 31, 32, 33, 41], "363": [20, 40, 47], "000000": [20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 40, 41, 42, 49, 51, 53], "405": 20, "23265739": 20, "11092357": 20, "09594781": 20, "16974597": 20, "22555737": 20, "20454781": 20, "22397298": 20, "1229516": 20, "12451896": 20, "21348731": 20, "11943579": 20, "2502464": 20, "5283": 20, "17773": 20, "26371": 20, "5043": 20, "17928": 20, "22941": 20, "5154": 20, "18816": 20, "28670": 20, "9014": 20, "16185": 20, "22256": 20, "9893": 20, "14515": 20, "25771": 20, "5377": 20, "18265": 20, "28063": 20, "hierarchical_parameter_server_demo": [21, 22, 23, 24, 25, 26, 27], "dive": [21, 22, 23, 24, 26, 27, 37], "slot_num_per_t": 21, "embed_vec_size_per_t": 21, "max_vocabulary_size_per_t": 21, "vocabulary_range_per_slot_per_t": 21, "max_nnz_per_slot_per_t": 21, "multi_table_sparse_input_dens": 21, "multi_table_sparse_input": 21, "multi_table_sparse_input_sparse_0": 21, "multi_table_sparse_input_sparse_1": 21, "multi_table_sparse_input_tf_saved_model": 21, "generate_sparse_kei": [21, 22, 26], "max_nnz_per_slot": 21, "max_nnz_of_all_slot": 21, "max_nnz": [21, 22, 26, 27, 55], "sort": [21, 22, 26, 41, 46, 50], "choic": [21, 22, 26, 40], "dense_shap": [21, 22, 26], "generate_dense_kei": 21, "dense_kei": 21, "assert": [21, 23, 28, 47, 55], "sparse_kei": [21, 22, 26, 27], "init_tensors_per_t": 21, "max_nnz_of_all_slots_per_t": 21, "el": 21, "params0": 21, "params1": 21, "input_shap": 21, "fc_3": 21, "embeddings0": 21, "embeddings1": 21, "math": [21, 30, 55], "embedding_weights_per_t": 21, "51": [21, 23, 40, 41, 42, 47, 51, 53], "09": [21, 30, 41, 42, 47, 51, 53], "676041": 21, "271131": 21, "__________________________________________________________________________________________________": [21, 22, 23, 24, 26, 30, 33], "embedding_look": [21, 22, 24, 30, 33], "up_spars": [21, 22], "embedding_looku": [21, 22, 24, 30, 33], "p_spars": [21, 22], "16640": 21, "29": [21, 22, 23, 26, 27, 30, 32, 33, 40, 41, 42, 47, 51, 53], "441": [21, 41, 47], "14588": 21, "11693": 21, "8232": 21, "9658203125": 21, "6276": 21, "9736328125": 21, "4676": 21, "82861328125": 21, "2921": 21, "1875": 21, "1938": 21, "2447509765625": 21, "1093": 21, "598388671875": 21, "616": 21, "3092651367188": 21, "61248779296875": 21, "input_4": [21, 22, 23, 24], "335404": 21, "absl": [21, 24, 26, 33], "_wrapped_model": [21, 24, 26], "args_0": [21, 24, 26], "unsupport": [21, 24, 26], "renam": [21, 24, 26, 50, 55], "args_0_1": [21, 24, 26], "input_5": [21, 23, 24], "input_6": [21, 23, 24], "sparseloo": [21, 22, 26], "kuplay": [21, 22, 26], "reshape_2": [21, 23], "reshape_3": [21, 23], "29441": 21, "args_0_3": [21, 26], "embeddings0_peek": 21, "embeddings1_peek": 21, "inputs_peek": [21, 26], "2nd": [21, 40], "495": 21, "855": [21, 47], "33": [21, 23, 26, 30, 31, 32, 33, 40, 42, 47, 51, 53], "195": [21, 30, 47], "201": [21, 47, 53], "212": [21, 28, 41], "9905": 21, "1750": 21, "4223": 21, "20477": 21, "22119": 21, "23797": 21, "6111": 21, "9122444": 21, "76979905": 21, "7415885": 21, "66938084": 21, "90488005": 21, "7773342": 21, "6368773": 21, "276": [21, 47], "1610": 21, "408": [21, 30, 32, 40, 42, 53], "1884": 21, "678": [21, 28, 47], "1762": 21, "369": [21, 40], "1794": 21, "403": 21, "1216": 21, "909": 21, "1427": 21, "28882617": 21, "41947648": 21, "597903": 21, "37505823": 21, "70420146": 21, "38864705": 21, "32224336": 21, "31987724": 21, "43596342": 21, "5383081": 21, "37384593": 21, "6026224": 21, "dnn_dens": 22, "dnn_spars": 22, "dnn_tf_saved_model": 22, "dense_featur": [22, 24, 26, 27], "astyp": [22, 23, 24, 26, 27, 28, 31, 32, 33, 42, 50], "fc3": [22, 30, 40, 42, 51, 53], "input_cat": [22, 26], "input_dens": [22, 26, 27], "concat_fea": 22, "mirroredstrategi": [22, 55], "_replica_loss": 22, "compute_average_loss": 22, "_reshape_input": 22, "_dataset_fn": 22, "input_context": 22, "replica_batch_s": 22, "get_per_replica_batch_s": 22, "num_input_pipelin": 22, "input_pipeline_id": 22, "distribute_datasets_from_funct": 22, "41": [22, 23, 30, 32, 33, 40, 41, 47, 48, 51, 53], "55": [22, 23, 30, 33, 41, 42, 47, 51, 53], "554588": 22, "606412": 22, "608128": 22, "609468": 22, "0a": [22, 40, 41], "610818": 22, "0b": [22, 40, 41], "160": [22, 31, 47, 51], "170": [22, 31, 32], "175104": 22, "262400": [22, 30], "437": [22, 28, 31, 41, 47], "761": [22, 30], "eagerli": 22, "call_for_each_replica": 22, "experimental_run": 22, "dist": [22, 24, 26, 32, 49], "1082": [22, 26, 40], "userwarn": [22, 26, 42, 50], "binary_crossentropi": [22, 26], "dispatch_target": [22, 26], "batch_all_reduc": 22, "num_pack": 22, "allreduc": 22, "indexedslic": 22, "broadcast": 22, "perreplica": 22, "1950232": 22, "20766959": 22, "2006835": 22, "21188965": 22, "681": [22, 31, 47], "73474": 22, "691": 22, "33826": 22, "588": [22, 40, 42], "15265": 22, "622": [22, 47], "72485": 22, "9260483": 22, "509967": 22, "0374002": 22, "1059036": 22, "002458": 22, "7079678": 22, "333396": 22, "6451607": 22, "_apply_all_reduc": 22, "_all_reduc": 22, "0x7fba4c2dc1f0": 22, "retrac": [22, 33], "trace": [22, 32, 33, 55], "expens": [22, 33], "excess": [22, 33], "pass": [22, 30, 31, 32, 33, 41, 47, 55], "instead": [22, 24, 33, 38, 41, 55], "outsid": [22, 33, 55], "experimental_relax_shap": 22, "relax": 22, "unnecessari": [22, 33, 55], "controlling_retrac": [22, 33], "api_doc": [22, 33], "8326673": 22, "79405844": 22, "85364443": 22, "92679256": 22, "0x7fba4c2dcdc0": 22, "5796976": 22, "54752666": 22, "57471323": 22, "54845804": 22, "61678064": 22, "647662": 22, "6421599": 22, "6278339": 22, "28049487": 22, "2768654": 22, "2943622": 22, "2805586": 22, "2102679": 22, "368755": 22, "4997649": 22, "5143406": 22, "413176": 22, "42411563": 22, "38453132": 22, "4314984": 22, "pretrainedembed": 22, "fc": [22, 23, 47, 54], "new_fc": 22, "train_with_pretrained_embed": 22, "707": [22, 47], "17": [22, 23, 26, 27, 32, 33, 41, 47, 51, 53, 55], "153": 22, "177": [22, 30, 33], "180": [22, 33], "188": [22, 31, 42], "191": [22, 47, 53], "197": 22, "concat_1": [22, 32], "171": [22, 30, 31, 41], "17934436": 22, "17969523": 22, "18917403": 22, "18102707": 22, "7858478": 22, "68311": 22, "66279": 22, "5826445": 22, "7325904": 22, "7331751": 22, "7210605": 22, "7671325": 22, "62144834": 22, "5696643": 22, "5946336": 22, "64713424": 22, "88115656": 22, "9079187": 22, "98161024": 22, "97925556": 22, "6572284": 22, "6304919": 22, "66552734": 22, "6695935": 22, "2002374": 22, "19162768": 22, "1874283": 22, "19209734": 22, "5284709": 22, "6028371": 22, "5635803": 22, "5773235": 22, "74001855": 22, "71915305": 22, "619328": 22, "7890761": 22, "55197906": 22, "5565746": 22, "52792": 22, "6230979": 22, "templat": [23, 38], "emebed": 23, "off": [23, 35, 40, 55], "create_model_for_table_fus": 23, "pytest": [23, 28], "vocab_s": [23, 28], "num_query_kei": [23, 28], "num_it": [23, 28], "list_physical_devic": [23, 27], "set_memory_growth": [23, 27], "set_inter_op_parallelism_thread": 23, "hps_config": [23, 28], "_tabl": [23, 28], "generate_embedding_t": [23, 28], "hugectr_sparse_model": [23, 28, 51], "00025": 23, "set_up_model_fil": [23, 28], "table_nam": [23, 27, 28, 55], "model_file_nam": [23, 28], "concat_embed": 23, "create_savedmodel": 23, "hps_config_json_object": [23, 28], "indent": [23, 28], "outfil": [23, 28], "__name__": [23, 30, 41], "__main__": [23, 30, 32, 41], "2023": [23, 30, 32, 33, 40, 41, 42, 55], "03": [23, 30, 32, 33, 40, 42, 44, 47, 49, 50, 51, 53], "28": [23, 24, 30, 32, 33, 40, 42, 47, 51, 53], "206281": 23, "194": [23, 24, 28, 30], "sse3": [23, 24, 26, 30, 33], "sse4": [23, 24, 26, 30, 33], "avx": [23, 24, 26, 30, 33], "36": [23, 24, 30, 31, 32, 33, 40, 47, 51, 53], "420084": 23, "926162": 23, "1637": 23, "30996": 23, "input_7": [23, 24], "input_8": 23, "embedding_lookup0": 23, "embedding_lookup1": 23, "embedding_lookup2": 23, "embedding_lookup3": 23, "embedding_lookup4": 23, "embedding_lookup5": 23, "embedding_lookup6": 23, "embedding_lookup7": 23, "3328": [23, 32], "reshape_4": 23, "reshape_5": 23, "reshape_6": 23, "reshape_7": 23, "26624": 23, "26625": 23, "625": 23, "38": [23, 30, 31, 32, 33, 41, 47, 51, 53], "079": 23, "use_static_t": [23, 28, 30, 31, 32, 33], "8_tabl": [23, 28], "080": [23, 40], "blank": [23, 24, 28, 30, 31, 32, 33, 41], "547": [23, 40, 47], "table0": [23, 28], "hashmapbackend": [23, 24, 31, 32, 33, 41], "39": [23, 26, 32, 33, 40, 47, 51, 53, 54], "379": [23, 40, 42, 47], "830": 23, "40": [23, 30, 32, 33, 40, 41, 42, 47, 51, 53], "448": 23, "table3": [23, 28], "899": [23, 40, 45], "table4": [23, 28], "934": [23, 40], "table5": [23, 28], "43": [23, 26, 32, 33, 41, 47, 48, 53], "097": [23, 30, 40, 42], "table6": [23, 28], "296": [23, 28, 41, 45, 55], "table7": [23, 28], "297": [23, 47], "306": [23, 45, 53], "469": 23, "470": [23, 47], "ec": [23, 24, 28, 30, 31, 32, 33, 55], "475": [23, 28, 47], "lookupsess": [23, 28, 30, 33, 41], "inputs_seq": [23, 28], "elaps": [23, 28], "918038": 23, "325440": 23, "818316": 23, "756": [23, 28, 40], "hps_database_backend": [23, 28], "html": [23, 24, 28, 30, 31, 32, 33, 41, 48, 50, 55], "292": [23, 40, 53], "fused_embedding0": [23, 28], "80000": 23, "299": [23, 26, 31, 40], "406": [23, 40, 47], "407": [23, 40, 41, 47, 53], "14": [23, 24, 30, 31, 32, 33, 40, 41, 42, 47, 49, 51, 53], "19": [23, 32, 33, 40, 41, 42, 47, 51, 53], "21": [23, 30, 32, 33, 41, 47, 49, 51, 53], "27": [23, 26, 30, 31, 32, 33, 40, 41, 42, 47, 51, 53], "30": [23, 26, 32, 33, 40, 41, 47, 50, 51, 53, 55], "35": [23, 31, 32, 33, 40, 47, 51, 53], "37": [23, 24, 30, 31, 32, 33, 40, 41, 42, 47, 53], "46": [23, 31, 32, 33, 40, 41, 42, 47, 51], "47": [23, 31, 32, 33, 41, 47, 51], "54": [23, 30, 31, 32, 40, 41, 42, 47, 51, 53], "60": [23, 26, 27, 35, 38, 47, 53], "62": [23, 42, 45, 47], "63": [23, 31, 47, 53], "65": [23, 47], "66": [23, 24, 47], "68": [23, 47, 50], "69": [23, 47], "73": [23, 26, 30, 32, 47], "74": [23, 30, 47], "77": [23, 40, 42, 47], "78": [23, 47], "79": [23, 42, 47], "81": [23, 47], "82": [23, 47], "83": [23, 40, 47], "84": [23, 26, 40, 47], "85": [23, 24, 26, 30, 40, 41, 47], "87": [23, 47], "88": [23, 47], "89": [23, 24, 40, 41, 47], "91": [23, 42, 47], "92": [23, 30, 47], "93": [23, 47], "96": [23, 47], "97": [23, 42, 47], "9442901611328125": 23, "hps_tf_triton_dens": 24, "hps_tf_triton": 24, "hps_tf_triton_sparse_0": 24, "hps_tf_triton_tf_saved_model": 24, "requestsdependencywarn": 24, "urllib3": 24, "chardet": 24, "doesn": [24, 42, 55], "concated_featur": 24, "919938": 24, "444040": 24, "1532": 24, "30991": 24, "23296": 24, "553": [24, 30, 40, 47], "10934": 24, "333984375": 24, "9218": 24, "0703125": 24, "7060": 24, "255859375": [24, 33], "5094": 24, "876953125": 24, "3605": 24, "475830078125": 24, "2593": 24, "270751953125": 24, "1741": 24, "0677490234375": 24, "1045": 24, "5091552734375": 24, "541": [24, 30, 40, 47], "4227905273438": 24, "242": [24, 47], "8596649169922": 24, "hps_tf": [24, 25], "model_repo": [24, 30, 31, 32, 33], "triton_model_repo": 24, "23553": 24, "pbtxt": [24, 30, 31, 32, 33], "mv": [24, 26, 27, 30, 31, 32, 33], "tensorflow_savedmodel": 24, "data_typ": [24, 30, 31, 32, 33], "type_int64": 24, "type_fp32": [24, 30, 31, 32, 33], "output_1": [24, 30, 33], "version_polici": 24, "instance_group": [24, 30, 31, 32, 33], "kind_gpu": [24, 30, 31, 32, 33], "tree": [24, 30, 31, 32, 33, 50, 55], "34mmodel_repo": [24, 31, 32, 33], "00m": [24, 31, 32], "34m1": [24, 31, 32, 33], "34mmodel": 24, "34masset": 24, "keras_metadata": 24, "pb": 24, "saved_model": [24, 30, 33], "34mvariabl": 24, "00000": 24, "34mhps_tf_triton_sparse_0": 24, "background": [24, 30, 31, 32, 33, 41, 55], "merlin_hp": 24, "py3": 24, "x86_64": [24, 31, 41], "egg": 24, "libhierarchical_parameter_serv": 24, "curl": 24, "format_non": 24, "is_shape_tensor": 24, "allow_ragged_batch": 24, "label_filenam": 24, "batch_input": 24, "batch_output": 24, "prioriti": [24, 54], "priority_default": 24, "input_pinned_memori": 24, "output_pinned_memori": 24, "gather_kernel_buffer_threshold": 24, "eager_batch": 24, "dynamic_batch": 24, "preferred_batch_s": 24, "max_queue_delay_microsecond": 24, "preserve_ord": 24, "priority_level": 24, "default_priority_level": 24, "priority_queue_polici": 24, "hps_tf_triton_0": 24, "secondary_devic": 24, "profil": [24, 30, 31, 32, 33, 44, 55], "passiv": 24, "host_polici": 24, "default_model_filenam": [24, 30, 31, 32, 33], "cc_model_filenam": 24, "metric_tag": 24, "model_warmup": 24, "tritoncli": [24, 31, 32, 33], "httpclient": [24, 31, 32, 33], "send_inference_request": 24, "num_request": 24, "triton_cli": 24, "inferenceservercli": [24, 31, 32, 33], "is_server_l": 24, "get_model_repository_index": 24, "key_tensor": [24, 41], "dense_tensor": 24, "inferinput": [24, 31, 32, 33], "np_to_triton_dtyp": [24, 31, 32, 33], "set_data_from_numpi": [24, 31, 32, 33], "inferrequestedoutput": [24, 31, 32, 33], "get_respons": [24, 31, 32, 33], "health": 24, "httpsocketpoolrespons": 24, "plain": 24, "post": [24, 54], "bytearrai": 24, "model_vers": [24, 31, 32, 33], "binary_data_s": [24, 31, 32, 33], "fall": 24, "trt_convert": 24, "original_model_path": 24, "new_model_path": 24, "instanti": [24, 50], "trtgraphconverterv2": 24, "input_saved_model_dir": 24, "precision_mod": 24, "trtprecisionmod": 24, "trt_func": 24, "convert_to_tensor": [24, 27], "input_fn": 24, "yield": 24, "output_saved_model_dir": 24, "clear": [24, 30, 32, 33, 50], "prior": [24, 30, 33], "924379": 24, "grappler": 24, "elig": 24, "924537": 24, "single_machin": 24, "928272": 24, "deactiv": 24, "028482": 24, "028568": 24, "061909": 24, "068593": 24, "tf2tensorrt": 24, "trt_optimization_pass": 24, "198": 24, "calibr": 24, "use_calibr": 24, "069761": 24, "952": [24, 40], "noop": 24, "1x": 24, "nonconvert": 24, "deeplearn": 24, "069860": 24, "tf_trt_max_allowed_engin": 24, "minimum_segment_s": 24, "069893": 24, "convert_graph": 24, "799": [24, 30, 40, 47], "candid": 24, "060667": 24, "916": [24, 40], "trtengineop_000_000": 24, "trtengineop": 24, "biasadd": 24, "concatv2": 24, "const": [24, 30, 33, 41], "matmul": [24, 26, 27, 32, 33], "329": [24, 41, 47, 53], "745": [24, 33, 40], "753": [24, 41, 47], "200000": [24, 31, 32, 33], "778": [24, 47], "814": [24, 31, 47], "815": 24, "818078": 24, "104": [24, 28, 42, 47], "818150": 24, "106": [24, 30, 41], "749149": 24, "convert_nod": 24, "1275": 24, "814132": 24, "trt_logger": [24, 30, 31, 32, 33], "defaultlogg": 24, "cpp": [24, 30, 31, 32, 33, 41, 55], "cublaswrapp": 24, "cubla": [24, 30, 31, 32, 33], "817575": 24, "trt_engine_op": 24, "1061": [24, 33], "creation": 24, "817694": 24, "894": [24, 41], "823806": 24, "34m2": 24, "serial": 24, "numba": [24, 30, 33, 42], "select_devic": [24, 30, 33], "close": [24, 30, 33], "rememb": [24, 30, 35], "kill": 24, "again": [24, 30, 55], "simplest": [25, 29, 34, 43], "isol": [25, 29, 34, 43], "repetit": [25, 29, 34, 43], "prefer": [25, 28, 29, 34], "notebookapp": [25, 29, 34, 43, 48], "token": [25, 29, 34, 43, 48], "web": [25, 29, 30, 34, 37, 43, 55], "browser": [25, 29, 30, 34, 43], "aae96ae9387cd28151868fee318c3b3581a2d794f3b25c6b": [25, 29, 34, 43], "hps_multi_table_sparse_input_demo": 25, "hps_pretrained_model_training_demo": 25, "mirror": [25, 55], "sok_to_hps_dlrm_demo": 25, "sparseoperationkit": [25, 26, 27, 55], "hps_table_fusion_demo": 25, "author": [25, 29, 34, 37, 41, 43], "intel": [25, 29, 34, 43, 54], "xeon": [25, 29, 34, 43, 54], "e5": [25, 29, 34, 43, 54], "2698": [25, 29, 34, 43, 54], "v4": [25, 29, 34, 43, 54, 55], "20ghz": [25, 29, 34, 43, 54], "kingslei": [25, 29, 34, 37, 43], "liu": [25, 29, 34, 37, 43], "hierarchicalparameterserv": [26, 27], "sparse_operation_kit_demo": [26, 27, 55], "sparse_operation_kit": [26, 27, 35, 43, 55], "sy": [26, 27, 30, 33, 41, 42, 47, 50], "unit_test": 26, "test_script": 26, "tf2": 26, "260000": [26, 27, 31, 32, 33], "dlrm_dens": [26, 27], "dlrm_spars": [26, 27, 30], "dlrm_tf_saved_model": [26, 27, 33], "plugin_adam": 26, "arch": [26, 27, 32, 33], "out_activ": [26, 27, 32, 33], "secondorderfeatureinteract": [26, 27, 32, 33], "self_interact": [26, 27], "dot_product": [26, 27, 32, 33], "transpose_b": [26, 27, 33], "ones_lik": [26, 27], "linalg": [26, 27], "band_part": [26, 27], "flat_interact": [26, 27, 32, 33], "boolean_mask": [26, 27], "max_vocabulary_size_per_gpu": [26, 55], "arch_bot": [26, 27, 32, 33], "arch_top": [26, 27, 32, 33], "embedding_lay": [26, 27], "bot_nn": [26, 27, 33], "top_nn": [26, 27, 33], "interaction_op": [26, 27, 33], "interaction_out_dim": [26, 27, 32, 33], "els": [26, 27, 40, 41, 42, 47, 49, 51], "reshape_layer1": [26, 27, 33], "dense_x": [26, 27, 32, 33], "concat_featur": [26, 27, 30, 32, 33], "z": [26, 27, 32, 33], "emb_opt": 26, "get_embedding_optim": 26, "dense_opt": [26, 27], "get_dense_optim": 26, "embedding_sav": [26, 27], "load_embedding_valu": 26, "embedding_vari": [26, 27], "other_vari": 26, "split_embedding_variable_from_oth": 26, "emb_grad": 26, "optimizerscop": 26, "experimental_aggregate_gradi": 26, "793169": 26, "152": [26, 42], "323141": 26, "gpu_bfc_alloc": 26, "allow_growth": 26, "tf_force_gpu_allow_growth": 26, "323214": 26, "30997": 26, "078977": 26, "kit_cc": [26, 27], "kit_cc_infra": [26, 27], "src": [26, 27, 41], "107": 26, "local_replica_id": 26, "109": 26, "4287744788": 26, "raw_manag": [26, 27], "132": [26, 30, 53], "embeddingvari": [26, 27], "raw_param": 26, "120": [26, 31, 41, 47], "global_replica_id": 26, "137": [26, 28, 47], "facad": 26, "225": [26, 40, 47], "378": [26, 42, 53], "temporari": 26, "kit_cc_impl": [26, 27], "dumping_funct": [26, 27], "num_total_kei": 26, "total_max_vocabulary_s": 26, "350": 26, "upload": 26, "loop_num": 26, "260": 26, "235": [26, 40], "9379717111587524": 26, "12726": 26, "013671875": 26, "78772735595703": 26, "33247375488281": 26, "48320770263672": 26, "234": [26, 30], "79978942871094": 26, "6663873195648193": 26, "426162719726562": 26, "430748462677002": 26, "768443584442139": 26, "38544": 26, "distributed_embed": 26, "distrib": 26, "4160000": 26, "utedembed": 26, "second_order_feature_interacti": [26, 33], "ion": [26, 33], "second_order_feature_interactio": [26, 33], "127233": 26, "325": [26, 47], "777": [26, 30, 47, 53], "165": 26, "089529": 26, "untrac": [26, 33], "bottom_0_layer_call_fn": [26, 33], "bottom_0_layer_call_and_return_conditional_loss": [26, 33], "bottom_1_layer_call_fn": [26, 33], "bottom_1_layer_call_and_return_conditional_loss": [26, 33], "bottom_2_layer_call_fn": [26, 33], "callabl": [26, 33], "embeddingvariable_kei": [26, 27], "embeddingvariable_valu": [26, 27], "079021": [26, 27], "192": [26, 27, 41], "147": [26, 27, 40, 47], "18360": [26, 27], "rw": [26, 27, 53], "nobodi": [26, 27], "nogroup": [26, 27], "16640000": [26, 27], "jul": [26, 27], "2080000": [26, 27], "911439": 26, "490542": 26, "165777": 26, "043599": 26, "embeddings_peek": 26, "183": [26, 28, 33, 40], "184": 26, "682": 26, "689": 26, "736": [26, 53], "888": [26, 40], "4486": 26, "5745": 26, "255671": 26, "252879": 26, "252045": 26, "145888": 26, "6825647": 26, "6801282": 26, "68074": 26, "68074226": 26, "6818684": 26, "6809397": 26, "3980061": 26, "3981627": 26, "3980992": 26, "78289294": 26, "7833897": 26, "78293324": 26, "78336245": 26, "78305507": 26, "78301686": 26, "880705": 26, "88164043": 26, "88109225": 26, "87982655": 26, "88028604": 26, "88119066": 26, "8650326": 26, "86442304": 26, "86414057": 26, "8642554": 26, "8640611": 26, "8645548": 26, "783202": 26, "78315204": 26, "78240466": 26, "7826805": 26, "78258413": 26, "7824805": 26, "8573375": 26, "85796195": 26, "85979205": 26, "8595341": 26, "85846806": 26, "85798156": 26, "7563881": 26, "7563928": 26, "7564304": 26, "7563316": 26, "7563634": 26, "7564283": 26, "62020814": 26, "6213356": 26, "62018126": 26, "62036": 26, "6201106": 26, "6201722": 26, "85459447": 26, "85330284": 26, "854774": 26, "854769": 26, "8547034": 26, "85447353": 26, "64481944": 26, "6447684": 26, "6449137": 26, "64472693": 26, "64465916": 26, "64503783": 26, "7852191": 26, "78577": 26, "78521436": 26, "7852911": 26, "78544927": 26, "7853453": 26, "6184057": 26, "61849916": 26, "61735946": 26, "61852926": 26, "61921203": 26, "6175788": 26, "7092892": 26, "7092928": 26, "7092843": 26, "70928746": 26, "70928514": 26, "70928574": 26, "6360293": 26, "6360285": 26, "636029": 26, "63602984": 26, "63602865": 26, "63602734": 26, "69062346": 26, "69038725": 26, "690281": 26, "6907744": 26, "6904431": 26, "6903974": 26, "6840397": 26, "684031": 26, "68404853": 26, "6840508": 26, "68404937": 26, "68404216": 26, "7159784": 26, "71973306": 26, "7159706": 26, "7161063": 26, "71603465": 26, "71592766": 26, "67292804": 26, "67351913": 26, "67328465": 26, "67328894": 26, "6733438": 26, "67301095": 26, "68593156": 26, "6859398": 26, "68593466": 26, "6859294": 26, "6859311": 26, "68593705": 26, "72352993": 26, "7230278": 26, "72331727": 26, "72321206": 26, "72359455": 26, "7233958": 26, "60178": 26, "6017275": 26, "60140777": 26, "60140765": 26, "60151523": 26, "6015818": 26, "73245263": 26, "73322636": 26, "7328412": 26, "73278296": 26, "7325789": 26, "7329973": 26, "68950844": 26, "69225705": 26, "6898281": 26, "6889306": 26, "68944615": 26, "69020116": 26, "848309": 26, "84465414": 26, "84872234": 26, "8486877": 26, "84938526": 26, "8492384": 26, "701107": 26, "6997489": 26, "70110285": 26, "700902": 26, "7011098": 26, "70111394": 26, "5723409": 26, "5738345": 26, "5723305": 26, "57233423": 26, "57233775": 26, "572342": 26, "82768726": 26, "82793933": 26, "8282728": 26, "8282294": 26, "82802093": 26, "8280283": 26, "6491487": 26, "64926434": 26, "64963746": 26, "64926565": 26, "64935625": 26, "64957225": 26, "5615084": 26, "56340796": 26, "5635457": 26, "5635438": 26, "5613529": 26, "56135494": 26, "9477315": 26, "94783926": 26, "94776624": 26, "9477597": 26, "9477446": 26, "9477345": 26, "74906373": 26, "7491199": 26, "74906075": 26, "7490612": 26, "7490609": 26, "7490617": 26, "6141995": 26, "6144503": 26, "6139838": 26, "6140719": 26, "6141932": 26, "61409426": 26, "6773844": 26, "67902935": 26, "67736465": 26, "6773715": 26, "6773739": 26, "67744035": 26, "700472": 26, "70258003": 26, "69977176": 26, "70001334": 26, "75941193": 26, "7594471": 26, "75891864": 26, "7593392": 26, "75900066": 26, "75923026": 26, "tf_cpp_min_log_level": 27, "local_batch_s": 27, "1200": [27, 31, 51, 53], "tolist": 27, "sok_backend_typ": 27, "selcet": 27, "hkv": [27, 55], "det": [27, 55], "sparse_model_path": 27, "sok_embedding_table_path": 27, "sok_dlrm_spars": 27, "local_rank": 27, "generate_ragged_tensor_sampl": 27, "embedding_table_s": 27, "lookup_num": 27, "total_indic": 27, "reduce_sum": 27, "raggedtensor": 27, "from_row_length": 27, "total_data": 27, "dynamicvari": [27, 55], "lookup_spars": [27, 55], "sokemblay": 27, "embedding_dim": [27, 51], "var_typ": 27, "table_num": 27, "sok_var": [27, 55], "init_capac": 27, "max_capac": 27, "reshape_layer_list": 27, "sok_reshap": 27, "sok_concat1": 27, "ret_embed": 27, "embed_vec_dim": 27, "embedding_model": 27, "sok_embed": 27, "dense_reshape1": 27, "dense_concat1": 27, "dense_concat2": 27, "input_spars": 27, "sparse_input": 27, "dense_input": [27, 41], "get_embedding_model": 27, "get_embedding_vari": 27, "get_dense_vari": 27, "tmp_var": 27, "sparse_var": 27, "dense_var": 27, "filter_vari": 27, "embedding_load": [27, 55], "opt": [27, 30, 31, 32, 33, 35, 41, 43, 53], "embedding_var": 27, "embedding_dump": [27, 55], "trainer": 27, "distributedgradienttap": 27, "embedding_opt": 27, "optimizerwrapp": [27, 55], "emb_tap": 27, "dense_grad": 27, "embedding_grad": 27, "input_tupl": 27, "dump_model": 27, "du": 28, "lh": 28, "0m": [28, 33, 41, 49], "40m": 28, "multithread": 28, "jit": [28, 33], "fork": [28, 41], "ground": [28, 41], "truth": [28, 41], "modulelist": 28, "keys_list": 28, "annot": 28, "fut": 28, "preds_seq": 28, "preds_seq_gt": 28, "diff": [28, 41], "mse": [28, 41], "05": [28, 30, 31, 42, 44, 45, 46, 47, 49, 51, 53], "836": [28, 40], "839": [28, 47, 53], "use_hctr_cache_implement": [28, 30, 33, 55], "init_ec": [28, 30, 33], "840": [28, 31, 54], "880": [28, 47, 53], "936": 28, "pointer": 28, "975": 28, "018": [28, 33, 40], "041": [28, 40], "059": [28, 40, 42], "070": [28, 40], "088": [28, 40], "113": 28, "123": [28, 40, 41], "167": [28, 30, 41, 47], "196": [28, 31, 47], "210": [28, 34], "223": 28, "239": 28, "252": [28, 41], "284": [28, 40], "307": [28, 45, 47, 53], "319": [28, 40], "336": [28, 30, 53], "360": [28, 47, 53], "390": [28, 40], "409": [28, 40, 42, 53], "446": 28, "453": [28, 41], "515": [28, 30, 47], "535": 28, "560": [28, 47], "580": [28, 40], "597": 28, "606": [28, 40, 53], "615": [28, 40, 47], "624": [28, 33, 47], "632": 28, "668": [28, 40, 47, 53], "695": [28, 47, 53], "712": [28, 53], "725": [28, 47], "740": [28, 53], "768": [28, 40], "783": [28, 31, 40], "794": [28, 40, 47], "821": [28, 47], "844": [28, 47], "861": 28, "890": 28, "920": [28, 31, 47], "929": [28, 41, 42], "938": [28, 41], "957": [28, 31, 47], "979": [28, 40], "006": [28, 40, 53], "016": [28, 40], "027": [28, 40], "037": [28, 40], "046": [28, 40], "056": [28, 40], "064": [28, 33, 41, 42], "085": [28, 42], "095": 28, "110": 28, "125": 28, "136": [28, 47], "163": [28, 40, 47], "173": 28, "231": 28, "249": 28, "250": [28, 40, 41, 47], "10996460914611816": 28, "hps_torch_demo": 29, "plugin_lib_nam": [30, 31, 32, 33], "plugin_handl": [30, 31, 32, 33], "08": [30, 31, 33, 41, 47, 51, 53], "552734": 30, "litemodel": 30, "reduced_embed": 30, "reduce_mean": [30, 41], "keepdim": 30, "numrical_featur": [30, 33], "3fc_light": 30, "embedding_lookup_1": 30, "inputlai": 30, "er": 30, "up_1": 30, "reduce_mean_1": 30, "tfoplam": 30, "bda": 30, "p_1": 30, "141": 30, "145408": 30, "065": 30, "Then": [30, 31, 32, 33, 47, 48], "graphsurgoen": [30, 31, 32, 33], "deeplearningexampl": [30, 47], "quick": [30, 35], "spark": 30, "fast": [30, 50], "verif": 30, "final_output_dir": 30, "frequency_limit": 30, "roughli": 30, "quickli": [30, 50, 55], "haven": 30, "feauture_": 30, "npy": [30, 41], "deploy_path": 30, "hps_model": 30, "memb_vector": 30, "mkei": 30, "light": [30, 36], "enable_pagelock": [30, 33], "577492": 30, "runpi": [30, 33], "runtimewarn": [30, 33], "unpredict": [30, 33], "behaviour": [30, 33], "msg": [30, 33], "462": [30, 47, 53], "protobuf": [30, 33, 55], "extrem": [30, 33, 38], "slow": [30, 33], "1557": [30, 33], "tag": [30, 33, 35, 41, 55], "928": [30, 47], "signatur": [30, 33, 41], "serving_default": [30, 33], "signature_def": [30, 33], "440": [30, 31, 40, 41], "8f8d49": [30, 33], "opset": [30, 33], "459": 30, "fold": [30, 33], "482": [30, 47], "781": [30, 31, 41], "onnx_graphsurgeon": [30, 31, 32, 33], "shape_infer": [30, 31, 32, 33], "import_onnx": [30, 31, 32, 33], "statefulpartitionedcal": [30, 33], "unknown": [30, 32, 33], "hps_node": [30, 32, 33], "attr": [30, 31, 32, 33], "cleanup": [30, 31, 32, 33, 55], "toposort": [30, 31, 32, 33], "export_onnx": [30, 31, 32, 33], "3fc_light_with_hp": 30, "color": 30, "pip": [30, 41, 46, 47, 48, 49, 55], "distinct": 30, "139822124016208": 30, "139821990953120": 30, "unk__6": 30, "logger": [30, 31, 32, 33], "explicit_batch": [30, 31, 32, 33], "networkdefinitioncreationflag": [30, 31, 32, 33], "create_hps_plugin_cr": 30, "trt_version": 30, "__version__": 30, "init_libnvinfer_plugin": 30, "plg_registri": 30, "get_plugin_registri": 30, "plugin_cr": 30, "plugin_creator_list": 30, "hps_plugin_cr": 30, "get_plugin_cr": 30, "build_engine_from_onnx": [30, 31, 32, 33], "onnx_model_path": [30, 31, 32, 33, 41], "builder": [30, 31, 32, 33], "create_network": [30, 31, 32, 33], "onnxpars": [30, 31, 32, 33], "create_builder_config": [30, 31, 32, 33], "builder_config": [30, 31, 32, 33], "rb": [30, 31, 32, 33, 46, 48, 49], "set_flag": 30, "builderflag": [30, 33], "create_optimization_profil": [30, 31, 32, 33], "set_shap": [30, 31, 32, 33], "add_optimization_profil": [30, 31, 32, 33], "build_serialized_network": [30, 31, 32, 33], "serialized_engin": [30, 31, 32, 33], "dynamic_3fc_light": 30, "fout": [30, 31, 32, 33], "memusagechang": [30, 31, 32, 33], "974": [30, 40], "2531": 30, "661": 30, "2943": 30, "735": 30, "lazi": [30, 31, 32, 33, 55], "regist": [30, 31, 32, 33, 46], "plugin_vers": [30, 31, 32, 33], "plugin_namespac": [30, 31, 32, 33], "116": [30, 31, 40, 53], "117": [30, 47], "140": [30, 41], "227": [30, 32, 47], "ktf32": [30, 33], "0088975": 30, "cublaslt": [30, 31, 32, 33, 55], "2952": 30, "12129": 30, "349": [30, 53], "190": 30, "3301": 30, "12319": 30, "against": 30, "persist": [30, 31, 32, 33, 40, 55], "16672": 30, "memusagestat": [30, 31, 32, 33], "1248": 30, "blockassign": [30, 31, 32, 33], "shift": [30, 31, 32, 33], "shiftntopdown": [30, 31, 32, 33], "took": [30, 31, 32, 33], "040758m": 30, "905970176": 30, "905969664": 30, "3302": 30, "12397": 30, "12407": 30, "encount": [30, 55], "affect": [30, 55], "accuraci": [30, 55], "subnorm": 30, "dynamic_3fc_lite_hps_trt": 30, "tensorrt_plan": [30, 31, 32, 33], "type_int32": [30, 31, 32, 33], "mmodel_repo": 30, "pythonpath": 30, "tensorflow2": 30, "dlrm_and_dcnv2": 30, "perf_data": 30, "minut": 30, "finish": [30, 31, 40, 41, 42, 53], "spark2json": 30, "dataset_path": 30, "binary_split_converted_data": 30, "num": [30, 31, 40, 41, 42, 51, 53], "2000000": [30, 42], "grpcinferenceservic": 30, "8001": 30, "httpservic": 30, "8002": 30, "sh": [30, 36, 38, 40, 41], "echo": [30, 41], "bash": [30, 38, 40], "25600": 30, "time_window": 30, "window": 30, "5000": [30, 31, 42, 51, 55], "msec": 30, "20941": 30, "1163": 30, "sec": 30, "851": 30, "standard": [30, 50, 55], "deviat": [30, 50], "1184": 30, "p50": 30, "p90": 30, "922": 30, "p95": 30, "977": 30, "p99": 30, "1190": 30, "846": [30, 41], "recv": 30, "343": 30, "108": [30, 47], "12800": 30, "14135": 30, "785": [30, 31, 40], "143": [30, 47], "1264": 30, "286": [30, 31, 40], "1236": 30, "1340": 30, "1374": 30, "1476": 30, "1258": 30, "1166": 30, "889": [30, 40], "619": [30, 40], "156": [30, 41, 47], "6400": 30, "8116": 30, "450": 30, "826": 30, "2206": [30, 32], "391": [30, 42, 47], "2183": 30, "2321": 30, "2368": 30, "2486": 30, "2199": 30, "118": [30, 47, 53], "2081": 30, "1632": 30, "1173": 30, "3200": [30, 51], "5311": 30, "295": [30, 41, 47], "3377": 30, "3349": 30, "3486": 30, "3530": 30, "3820": 30, "3370": 30, "155": 30, "3215": 30, "2591": 30, "162": [30, 45, 47], "2068": 30, "1600": [30, 51, 53], "3518": 30, "5109": 30, "5068": 30, "5242": 30, "5316": 30, "5741": 30, "5104": 30, "4933": 30, "4134": 30, "138": 30, "3742": 30, "800": [30, 31, 40, 41, 51, 53], "1910": 30, "9412": 30, "9384": 30, "9529": 30, "9581": 30, "10106": 30, "9406": 30, "294": [30, 32, 53], "9112": 30, "7674": 30, "267": 30, "7179": 30, "130": [30, 47], "400": [30, 31, 40, 41, 51, 53], "992": [30, 33], "1033": 30, "18132": 30, "726": 30, "18051": 30, "18257": 30, "18330": 30, "23069": 30, "18125": 30, "1278": 30, "16847": 30, "14999": 30, "476": [30, 40, 47], "14234": 30, "203": [30, 33, 41], "6081": 30, "34878": 30, "34734": 30, "35143": 30, "35288": 30, "40804": 30, "34872": 30, "2584": 30, "32288": 30, "516": [30, 40], "29340": 30, "870": [30, 47], "28111": 30, "270": [30, 31], "253": [30, 40], "053": 30, "71063": 30, "1570": 30, "70749": 30, "71666": 30, "73226": 30, "77979": 30, "71058": 30, "5092": 30, "65966": 30, "60716": 30, "1804": 30, "58482": 30, "333": [30, 40], "argpars": [30, 40, 42, 47], "argumentpars": [30, 40, 42], "glob": [30, 42, 47, 50], "defaultdict": [30, 46], "log_pattern": 30, "inference_benchmark": 30, "cmd_log": 30, "result_log": 30, "extract_result_from_log": 30, "log_path": 30, "job_log_pattern": 30, "readlin": 30, "job_log": 30, "each_job_log": 30, "add_argu": [30, 40, 42], "parse_arg": [30, 40, 42], "perf_result": 30, "idx": [30, 32], "tresult": 30, "prebuilt": 30, "undergo": 30, "soon": 30, "arm": 30, "particular": [30, 35, 50, 55], "yourself": [30, 43], "setup": [30, 35, 43, 55], "contrast": 30, "advis": 30, "slight": 30, "alter": 30, "nvstage": 30, "dockerfil": [30, 35], "ctr": [30, 35, 37, 38, 39, 56], "sed": 30, "duse_huge_pag": 30, "action": [30, 40], "sudo": 30, "180000": 30, "node0": 30, "hugepag": [30, 41], "2048kb": 30, "nr_hugepag": 30, "reus": 30, "accomplish": 30, "suggest": 30, "hctr_src": 30, "chmod": 30, "lab": [30, 47, 48, 55], "predcondit": 30, "outlin": [30, 35], "itself": 30, "data_parquet": [31, 41], "561": [31, 41, 47], "564": [31, 40, 41], "568": [31, 40, 41, 47], "gen_0": [31, 41, 53], "204": [31, 42], "gen_1": [31, 41, 53], "455": 31, "gen_2": [31, 41, 53], "709": 31, "gen_3": [31, 41, 53], "gen_4": [31, 41, 53], "gen_5": [31, 41, 53], "gen_6": [31, 41, 53], "gen_7": [31, 41, 53], "gen_8": [31, 41, 53], "gen_9": [31, 41, 53], "411": [31, 47, 53], "gen_10": [31, 41, 53], "650": [31, 40], "gen_11": [31, 41, 53], "885": [31, 40, 47], "gen_12": [31, 41, 53], "gen_13": [31, 41, 53], "341": [31, 47], "gen_14": [31, 41, 53], "577": [31, 41], "gen_15": [31, 41, 53], "818": 31, "827": 31, "066": [31, 53], "537": [31, 40], "751": [31, 40], "mpi4pi": [31, 40, 41, 42, 43, 51, 53], "0001": [31, 32, 42], "dlrm_hugectr_graph": 31, "dlrm_hugectr": 31, "cuipcopenmemhandle_v2": 31, "gnu": [31, 41], "libcuda": 31, "539": [31, 40], "2950905596": 31, "542": [31, 47], "698": 31, "peer": [31, 40, 41, 42, 51, 53], "all2al": [31, 40, 41, 42, 51, 53, 55], "699": [31, 41, 47], "700": [31, 40, 47, 51, 53], "705": [31, 47], "782": [31, 47], "max_vocabulary_size_per_gpu_": [31, 41, 42, 51, 53], "3413333": 31, "791": [31, 40, 41, 47], "analysi": [31, 40, 41, 42, 51, 53, 55], "795": [31, 40], "772": [31, 40, 41, 47], "gpu0": [31, 41, 42, 51, 53], "warm": [31, 36, 41, 42, 51, 53, 55], "480": [31, 40, 53], "001000": [31, 41, 42, 51, 53], "522": [31, 40, 47], "72017": 31, "693168": [31, 53], "64947": 31, "694016": 31, "600": [31, 40, 41, 51, 53], "60927": 31, "69323": 31, "432": [31, 47], "60078": 31, "693079": 31, "050": [31, 40], "60162": 31, "693134": 31, "206": [31, 41], "498656": 31, "156138": 31, "rank0": [31, 41, 42, 51, 53], "456": [31, 47, 53], "958": [31, 40, 47], "optimz": [31, 41, 42, 51, 53], "514": [31, 40, 47], "555": [31, 47], "693": 31, "694": 31, "823": [31, 42, 47], "414": [31, 40], "dlrm_hugectr0_sparse_1000": 31, "dlrm_hugectr_dense_1000": 31, "dlrm_hugectr_dens": 31, "graph_config": [31, 41], "convert_embed": [31, 41], "unknown_1": 31, "elif": [31, 32, 40], "unknown_2": 31, "dlrm_hugectr_with_hp": 31, "dlrm_with_hp": 31, "262": [31, 41], "1014": 31, "886": [31, 40], "1239": 31, "cuda_module_load": [31, 32], "env": [31, 32, 33, 53], "onnx2trt_util": [31, 32, 33], "377": [31, 32, 40, 47], "down": [31, 32, 33], "812": [31, 47], "813": 31, "189": 31, "239950": 31, "205": [31, 41], "419": [31, 40], "335": [31, 40], "146": 31, "5763": 31, "1314": 31, "5879": 31, "1368": 31, "kfaster_dynamic_shapes_0805": 31, "preview": [31, 55], "34118830080": 31, "20304": 31, "10752": 31, "scratch": [31, 32, 33, 35, 36, 55], "32505856": 31, "4628": 31, "09284m": 31, "48099840": 31, "6321": 31, "1580": 31, "6322": 31, "1590": 31, "ten": [31, 32, 33], "cmdline": [31, 32, 33], "con": [31, 32, 33], "sorrt": [31, 32, 33], "libtriton_tensorrt": [31, 32, 33], "capab": [31, 32, 33], "iliti": [31, 32, 33], "ectori": [31, 32, 33], "bac": [31, 32, 33], "kend": [31, 32, 33], "shutil": [31, 32, 33, 42, 50], "as_numpi": [31, 32, 33], "49642828": 31, "52846366": 31, "99999994": 31, "9999992": 31, "9999905": 31, "dataload": [32, 47], "dlrm_pytorch": 32, "dlrm_pytorch_spars": 32, "onnx_path": 32, "modified_onnx_path": 32, "dlrm_pytorch_with_hp": 32, "tqdm": [32, 46, 47, 48, 51], "tqdmwarn": 32, "iprogress": 32, "ipywidget": [32, 47], "readthedoc": 32, "en": 32, "stabl": 32, "user_instal": 32, "autonotebook": 32, "notebook_tqdm": 32, "sequenti": [32, 47], "add_modul": 32, "_linear_layer_": 32, "_relu_layer_": 32, "inplac": [32, 47, 51], "transpos": 32, "index_select": 32, "from_pretrain": [32, 48], "bot_mlp": 32, "interaction_lay": 32, "top_mlp": 32, "criterion": 32, "bceloss": 32, "x0_iter": 32, "from_numpi": 32, "pin_memori": 32, "drop_last": 32, "x1_iter": 32, "y_iter": 32, "squeez": [32, 33, 47, 48], "zero_grad": 32, "state_dict": [32, 47], "bottom_linear_layer_1": 32, "in_featur": [32, 47], "out_featur": [32, 47], "bottom_relu_layer_1": 32, "bottom_linear_layer_2": 32, "bottom_relu_layer_2": 32, "bottom_linear_layer_3": 32, "bottom_relu_layer_3": 32, "top_linear_layer_1": 32, "479": [32, 33, 40], "top_relu_layer_1": 32, "top_linear_layer_2": 32, "top_relu_layer_2": 32, "top_linear_layer_3": 32, "top_relu_layer_3": 32, "top_linear_layer_4": 32, "top_relu_layer_4": 32, "top_linear_layer_5": 32, "top_relu_layer_5": 32, "1652954816818237": 32, "7626148462295532": 32, "1845550537109375": 32, "7347715497016907": 32, "0786197185516357": 32, "9271171689033508": 32, "7060756683349609": 32, "7490934133529663": 32, "8274499773979187": 32, "7962949275970459": 32, "6947690844535828": 32, "7241608500480652": 32, "7649394869804382": 32, "7043794393539429": 32, "6948238611221313": 32, "7003152370452881": 32, "7330600619316101": 32, "711887001991272": 32, "6917610168457031": 32, "7227296233177185": 32, "7232402563095093": 32, "7025701999664307": 32, "6962350010871887": 32, "7100769281387329": 32, "7159318923950195": 32, "6963521242141724": 32, "7058508396148682": 32, "7144895792007446": 32, "7082542181015015": 32, "6955724954605103": 32, "6997341513633728": 32, "7167338132858276": 32, "6962475776672363": 32, "6955674290657043": 32, "7098587155342102": 32, "6992183327674866": 32, "6928209066390991": 32, "6933107972145081": 32, "697549045085907": 32, "6969214677810669": 32, "6935250163078308": 32, "6948344111442566": 32, "7015650868415833": 32, "6928752660751343": 32, "6936203837394714": 32, "6962599158287048": 32, "6941655278205872": 32, "6939643025398254": 32, "6933950185775757": 32, "6970551013946533": 32, "0014": 32, "9997": 32, "9991": 32, "0004": 32, "0005": 32, "0002": 32, "dummy_kei": 32, "dummy_numerical_featur": 32, "randn": 32, "input_nam": 32, "output_nam": [32, 41], "dynamic_ax": 32, "ipykernel_52545": 32, "1281679600": 32, "tracerwarn": 32, "incorrect": 32, "stride": [32, 47], "requires_grad": 32, "gather_output_0": 32, "onnx_nam": 32, "gemm_output_0": 32, "gemm": [32, 55], "transb": 32, "114": [32, 40, 47, 54], "relu_output_0": 32, "1455": 32, "constant_output_0": 32, "cpulongtyp": 32, "reshape_output_0": 32, "allowzero": 32, "concat_output_0": 32, "3456": 32, "transpose_output_0": 32, "perm": 32, "matmul_output_0": 32, "729": [32, 47], "gather_33": 32, "351": [32, 33, 40], "concat_1_output_0": 32, "268": 32, "1035": 32, "497": 32, "1259": 32, "543": [32, 47], "652": [32, 40], "653": [32, 40, 47, 53], "209": [32, 47], "220": 32, "280": [32, 41], "433": 32, "331": [32, 47, 53], "144": [32, 33, 47], "5771": 32, "933": [32, 40], "115": [32, 40, 47], "5886": 32, "987": [32, 53], "34103362048": 32, "45142016": 32, "011619m": 32, "58774016": 32, "5933": 32, "1043": 32, "5128022": 32, "51312006": 32, "51246136": 32, "5129204": 32, "51302147": 32, "513144": 32, "dlrm_tf": 33, "dlrm_tf_spars": 33, "032517": 33, "963734": 33, "1638": 33, "30974": 33, "171392": 33, "2197505": 33, "897": [33, 47], "578464": 33, "executor": 33, "1209": 33, "abort": 33, "invalid_argu": 33, "_2": 33, "51200": 33, "892396": 33, "xla": 33, "169": [33, 42, 47], "0x55e0fdfeb330": 33, "892450": 33, "streamexecutor": 33, "897903": 33, "mlir": 33, "dump_mlir_util": 33, "269": [33, 47], "crash": [33, 55], "mlir_crash_reproducer_directori": 33, "379151": 33, "stream_executor": 33, "cuda_dnn": 33, "424": [33, 40], "8902": 33, "502058": 33, "device_compil": 33, "lifetim": [33, 55], "_baseoptim": 33, "_update_step_xla": 33, "0x7fa9660adab0": 33, "reduce_retrac": 33, "68028259277344": 33, "2571352064": 33, "639234": 33, "4132346": 33, "20792958": 33, "5957": 33, "8994140625": 33, "231005": 33, "96875": 33, "185315": 33, "3125": 33, "151740": 33, "43695": 33, "6640625": 33, "45556": 33, "24609375": 33, "131654": 33, "78125": 33, "8805829286575317": 33, "49121": 33, "47265625": 33, "60609": 33, "62109375": 33, "676294": 33, "375": [33, 40, 47], "31208": 33, "66015625": 33, "156789": 33, "65625": 33, "103213": 33, "1015625": 33, "394046783447266": 33, "10789": 33, "5703125": 33, "2716": 33, "05859375": 33, "139559": 33, "130419": 33, "13583": 33, "6923828125": 33, "7378": 33, "22802734375": 33, "81185": 33, "40625": 33, "18370": 33, "3314": 33, "90478515625": 33, "15871": 33, "3154296875": 33, "545": [33, 40, 47], "2841796875": 33, "1281": 33, "3038330078125": 33, "52890": 33, "2550": 33, "232177734375": 33, "4526": 33, "03759765625": 33, "5832462310791": 33, "22301483154297": 33, "7525691986084": 33, "034607887268066": 33, "6510401964187622": 33, "275766372680664": 33, "707094430923462": 33, "7623991966247559": 33, "5783321857452393": 33, "8166252374649048": 33, "885994553565979": 33, "912842869758606": 33, "7323049902915955": 33, "7469371557235718": 33, "8475004434585571": 33, "serializ": 33, "get_config": 33, "proper": 33, "248789": 33, "721088": 33, "126": 33, "926": [33, 47], "868": [33, 40, 41, 47], "302": [33, 41, 47, 55], "255": [33, 41, 47, 53], "04": [33, 42, 44, 45, 47, 50, 51, 53], "unsqueez": [33, 47], "dlrm_tf_with_hp": 33, "2013": 33, "4018": 33, "721": 33, "421": [33, 40], "4516": 33, "793": [33, 40, 47], "374": 33, "774": [33, 47], "fp8_quant": [33, 55], "775": 33, "860": [33, 40, 47], "863": 33, "864": 33, "869": [33, 40, 42, 47], "902": [33, 42], "947": [33, 40, 41], "51968": 33, "047": [33, 40], "069": [33, 40, 42], "077": 33, "034216": 33, "8710": 33, "1051": 33, "8711": 33, "18350080": 33, "007954m": 33, "31981568": 33, "8764": 33, "1091": 33, "1101": 33, "00mdlrm_tf_with_hp": 33, "00mconfig": 33, "tritonbackend_modelinstanceiniti": 33, "dlrm_tf_with_hps_0": 33, "34091672": 33, "demo_for_pytorch_trained_model": 34, "demo_for_hugectr_trained_model": 34, "instal": [34, 35, 41, 43, 46, 47, 48, 49, 50, 53, 55], "benchmark_tf_trained_large_model": 34, "147gb": [34, 55], "interconnect": 34, "bz": 34, "2tb": 34, "gen4": 34, "1396": [34, 41], "sxm5": 34, "platinum": 34, "8480c": 34, "gen5": 34, "773": [34, 40, 41, 47], "nvl": 34, "94gb": 34, "grace": [34, 55], "480gb": 34, "c2c": 34, "grate": 35, "interest": 35, "submit": [35, 54], "bug": [35, 55], "review": [35, 44], "think": 35, "priorit": 35, "comment": [35, 55], "propos": 35, "ahead": [35, 55], "pend": [35, 55], "forget": 35, "properli": [35, 41, 55], "ask": 35, "approv": 35, "clarif": [35, 55], "hesit": 35, "promptli": 35, "contributor": [35, 38, 55], "journei": 35, "1007": 35, "56a762eae3f8": 35, "dst_imag": 35, "docker_fil": 35, "rmm_ver": 35, "vnightli": 35, "cudf_ver": 35, "nvtab_ver": 35, "hugectr_dev_mod": 35, "cli": [35, 41], "varnam": 35, "quiet": 35, "suppress": [35, 55], "devel": 35, "cmake_build_typ": 35, "eval_batch": [35, 55], "enable_multinod": [35, 43], "enable_infer": 35, "enable_hdf": 35, "enable_s3": [35, 53], "amazon": [35, 55], "sdk": [35, 53], "skk": 35, "AND": 35, "denable_hdf": [35, 36], "denable_s3": [35, 53], "denable_multinod": [35, 43], "cmake_install_prefix": [35, 43, 55], "dcmake_install_prefix": [35, 43], "devel_infer": 35, "full": [36, 40, 46, 55], "varieti": 36, "8xa100": 36, "localizedslotembeddinghash": 36, "aren": 36, "distributedslotembeddinghash": 36, "localizedslotembeddingonehot": 36, "gender": 36, "wouldn": 36, "easi": [36, 55], "openmpi": 36, "gpudirect": 36, "dcn_2node_8gpu": 36, "footprint": 36, "tensorcor": 36, "mixed_precis": 36, "arithmet": 36, "underflow": 36, "exchang": [36, 54], "onnx_convert": [36, 41, 55], "hugectr2onnx_demo": 36, "ofembed": 36, "thing": 36, "redund": 36, "novelgpu": 36, "forintegr": 36, "hugectr_wdl_predict": 36, "hierrach": 36, "difficult": 36, "confer": 37, "websit": [37, 39], "titl": [37, 45], "date": 37, "speaker": 37, "video": [37, 44], "episod": 37, "\u52a0\u901f\u7684\u63a8\u8350\u7cfb\u7edf\u6846\u67b6": 37, "joei": 37, "wang": 37, "\u4e2d\u6587": 37, "\u5206\u7ea7\u53c2\u6570\u670d\u52a1\u5668\u5982\u4f55\u52a0\u901f\u63a8\u7406": 37, "\u4f7f\u7528": 37, "\u52a0\u901f": 37, "\u8bad\u7ec3": 37, "gem": 37, "guo": 37, "gtc": 37, "sping": 37, "march": 37, "matthia": [37, 43], "langer": [37, 43], "yingcan": [37, 43], "wei": [37, 43], "yu": 37, "fan": [37, 40], "english": 37, "apsara": 37, "\u63a8\u8350\u7cfb\u7edf": 37, "oct": 37, "spring": 37, "tencent": 37, "advertis": [37, 55], "april": 37, "xiangt": 37, "kong": 37, "Into": 37, "minseok": 37, "lee": 37, "jianb": 37, "dong": 37, "china": 37, "2020": [37, 53], "\u6df1\u5165\u7814\u7a76\u6027\u80fd\u4f18\u5316": 37, "\u6027\u80fd\u63d0\u5347": 37, "\u500d": 37, "\u7684\u9ad8\u6027\u80fd": 37, "\u5e7f\u544a\u63a8\u8350\u52a0\u901f\u7cfb\u7edf\u7684\u843d\u5730\u5b9e\u73b0": 37, "\u63a8\u7406\u8fc7\u7a0b": 37, "\u5c06": 37, "\u96c6\u6210\u4e8e": 37, "estim": [37, 38, 39, 55, 56], "2019": 37, "\u52a0\u901f\u7684\u63a8\u8350\u7cfb\u7edf\u8bad\u7ec3": 37, "wechat": 37, "\u5206\u7ea7\u53c2\u6570\u670d\u52a1\u5668\u7cfb\u5217\u4e4b\u4e09": 37, "\u96c6\u6210\u5230tensorflow": 37, "nov": 37, "devblog": 37, "\u5206\u5c42\u53c2\u6570\u670d\u52a1\u5668\u6269\u5c55\u63a8\u8350\u7cfb\u7edf\u63a8\u7406": 37, "august": 37, "shashank": 37, "verma": 37, "wenwen": 37, "gao": 37, "jerri": [37, 43], "shi": [37, 43], "kit": [37, 43, 55], "\u7cfb\u5217\u4e4b\u4e8c": 37, "june": 37, "kunlun": 37, "li": 37, "\u7cfb\u5217\u4e4b\u4e00": 37, "\u5206\u7ea7\u53c2\u6570\u670d\u52a1\u5668\u7cfb\u5217\u4e4b\u4e8c": 37, "\u5206\u7ea7\u53c2\u6570\u670d\u52a1\u5668\u7cfb\u5217\u4e4b\u4e00": 37, "jan": 37, "sept": 37, "vinh": [37, 43], "nguyen": [37, 43], "ann": 37, "spencer": 37, "meituan": 37, "interview": 37, "jun": [37, 40], "huang": 37, "sheng": 37, "luo": 37, "benedikt": 37, "schiffer": 37, "\u6269\u5c55\u548c\u52a0\u901f\u5927\u578b\u6df1\u5ea6\u5b66\u4e60\u63a8\u8350\u7cfb\u7edf": 37, "\u7cfb\u5217\u7b2c": 37, "\u90e8\u5206": 37, "\u7684": 37, "\u8bad\u7ec3\u5927\u578b\u6df1\u5ea6\u5b66\u4e60\u63a8\u8350\u6a21\u578b": 37, "ashish": 37, "sardana": 37, "ir": 37, "aug": 37, "oldridg": 37, "juli": 37, "massiv": [38, 55], "bottleneck": [38, 47], "record": [38, 42, 55], "homogen": 38, "easier": [38, 55], "sign": 38, "advanc": 38, "p100": 38, "pascal": 38, "team": [38, 55], "research": 38, "dcn_norm_generate_train": 38, "wdl_norm_generate_train": 38, "dlrm_raw_generate_train": 38, "dcn_parquet_generate_train": 38, "criteo_data": [38, 40], "panda": [38, 40, 41, 45, 46, 49, 51, 55], "introduct": [39, 55], "overarch": 39, "bring": 39, "cell": [40, 41, 43], "shell": 40, "softlink": 40, "kaggl": [40, 55], "occurr": 40, "postfix": [40, 50], "day_1": 40, "wdl_data": 40, "3rd": 40, "4th": 40, "embodi": 40, "5th": 40, "6th": 40, "soft": [40, 42], "project_root": 40, "home": 40, "wget": [40, 41, 42, 46], "cail": [40, 42], "day_0": [40, 42], "gz": [40, 41, 42], "deepfm_data_nvt": 40, "nvt": [40, 42, 50, 51], "ln": [40, 41, 42], "smi": 40, "460": [40, 47], "disp": 40, "uncorr": 40, "ecc": 40, "temp": [40, 42], "perf": [40, 55], "pwr": 40, "mig": 40, "00000000": 40, "33c": 40, "p0": 40, "42w": 40, "300w": 40, "0mib": 40, "16160mib": 40, "35c": 40, "45w": 40, "36c": 40, "44w": 40, "8a": [40, 41], "34c": 40, "41w": 40, "gi": 40, "ci": 40, "pid": 40, "dlrm_train": [40, 55], "use_dynamic_hash_t": 40, "shard_plan": 40, "round_robin": 40, "store_tru": 40, "generate_shard_plan": 40, "target_gpu": 40, "gpu_id": 40, "mp_tabl": 40, "6000": [40, 42], "dp_tabl": 40, "use_embedding_collect": 40, "num_embed": 40, "ebc": 40, "emb_vec_list": 40, "emb_vec": 40, "relu3": [40, 53], "relu4": [40, 53], "fc5": [40, 53], "relu5": [40, 53], "fc6": [40, 53], "relu6": [40, 53], "fc7": [40, 53], "relu7": [40, 53], "fc8": [40, 53], "3508545476": 40, "637": 40, "4714": 40, "4441": 40, "609": [40, 53], "5378": 40, "5339": 40, "4636": 40, "4480": 40, "4949": 40, "5183": 40, "789": [40, 53], "790": 40, "792": [40, 41], "919": [40, 42], "max_row_group_s": [40, 41, 42], "133678": 40, "022": 40, "134102": 40, "029": 40, "0804": 40, "0457": 40, "030": 40, "0183": 40, "032": 40, "1121": 40, "033": 40, "035": 40, "0378": 40, "0222": 40, "038": 40, "0691": 40, "039": 40, "0925": 40, "9636": 40, "043": 40, "9363": 40, "044": 40, "0300": 40, "0261": 40, "9558": 40, "049": 40, "9402": 40, "9871": 40, "052": 40, "0105": 40, "6863": 40, "224": [40, 47], "6589": 40, "330": 40, "7527": 40, "474": 40, "7488": 40, "6785": 40, "646": 40, "6628": 40, "755": [40, 53], "7097": 40, "7332": 40, "040": 40, "4089": 40, "175": [40, 47], "3816": 40, "4753": 40, "467": [40, 47, 53], "617": [40, 47, 54], "4011": 40, "3855": 40, "921": 40, "4324": 40, "063": [40, 42], "4558": 40, "221": 40, "1016": 40, "7546": 40, "7253": 40, "410": [40, 47, 53], "8425": 40, "412": [40, 42, 53], "8308": 40, "413": [40, 41, 42, 47, 53], "7957": 40, "9031": 40, "415": [40, 53], "9578": 40, "417": 40, "6531": 40, "418": 40, "6238": 40, "420": [40, 47], "8386": 40, "7410": 40, "422": 40, "7292": 40, "6941": 40, "425": [40, 42], "8015": 40, "426": 40, "8562": 40, "558": [40, 47], "4051": 40, "1921": 40, "567": [40, 41], "1628": 40, "570": [40, 47], "3777": 40, "573": 40, "2800": [40, 51], "576": 40, "2683": 40, "579": 40, "2332": 40, "582": 40, "3406": 40, "585": 40, "3953": 40, "587": [40, 47], "0088": [40, 41], "1824": 40, "1531": 40, "589": [40, 41], "3679": 40, "590": [40, 47], "2703": 40, "591": 40, "2585": 40, "592": 40, "2234": 40, "593": 40, "3308": 40, "595": [40, 47], "457": 40, "data0": 40, "data3": 40, "data4": 40, "data5": 40, "data6": 40, "data7": 40, "data8": 40, "data9": 40, "data10": 40, "data11": 40, "data12": 40, "data13": 40, "data14": 40, "data15": 40, "data16": 40, "data17": 40, "data18": 40, "data19": 40, "data20": 40, "data21": 40, "data22": 40, "data23": 40, "data24": 40, "data25": 40, "embeddingcollection0": 40, "emb_vec0": 40, "emb_vec1": 40, "emb_vec2": 40, "emb_vec3": 40, "emb_vec4": 40, "emb_vec5": 40, "emb_vec6": 40, "emb_vec7": 40, "emb_vec8": 40, "emb_vec9": 40, "emb_vec10": 40, "emb_vec11": 40, "emb_vec12": 40, "emb_vec13": 40, "emb_vec14": 40, "emb_vec15": 40, "emb_vec16": 40, "emb_vec17": 40, "emb_vec18": 40, "emb_vec19": 40, "emb_vec20": 40, "emb_vec21": 40, "emb_vec22": 40, "emb_vec23": 40, "emb_vec24": 40, "emb_vec25": 40, "500000": [40, 41], "458": 40, "14373": 40, "24478": 40, "697": [40, 47], "23782": 40, "142604": 40, "168333": 40, "865": 40, "142137": 40, "25698": 40, "19912": 40, "142685": 40, "1404": 40, "24589": 40, "18021": 40, "143021": 40, "211": [40, 42], "139695": 40, "25073": 40, "245": 40, "16407": 40, "141111": 40, "13893": 40, "24958": 40, "17112": 40, "141069": 40, "138218": 40, "25123": 40, "18422": 40, "135439": 40, "759": 40, "137244": 40, "25471": 40, "803": [40, 53], "19334": 40, "139792": 40, "136812": 40, "2416": 40, "17574": 40, "140519": 40, "135968": 40, "25386": 40, "18238": 40, "134846": 40, "291": [40, 55], "134873": 40, "23619": 40, "3445591887": 40, "383": 40, "384": [40, 47], "385": [40, 42, 47], "386": 40, "628": 40, "643": [40, 41, 47], "651": [40, 47], "654": 40, "939": [40, 41], "946": [40, 41, 47], "997": 40, "0258": 40, "0417": 40, "0144": 40, "011": [40, 41, 42, 53], "1042": 40, "015": 40, "0339": 40, "020": 40, "024": [40, 41], "0652": 40, "0886": [40, 41], "071": 40, "075": [40, 42], "9285": 40, "084": 40, "9480": 40, "092": [40, 41], "9324": 40, "9792": [40, 42], "101": 40, "0027": 40, "332": [40, 55], "9753": 40, "746": 40, "748": [40, 47], "749": 40, "9675": 40, "752": 40, "9519": 40, "9988": 40, "757": 40, "8738": 40, "8464": 40, "760": 40, "762": [40, 47, 53], "763": 40, "8660": 40, "765": [40, 47], "8503": 40, "767": 40, "8972": 40, "9207": 40, "911": 40, "917": 40, "4128": 40, "924": 40, "4792": 40, "930": 40, "4050": 40, "3894": 40, "937": [40, 41, 47], "4363": 40, "940": [40, 41], "4597": 40, "941": [40, 41], "4031": 40, "942": [40, 47], "3757": 40, "944": [40, 47], "4695": 40, "945": [40, 47], "4656": 40, "3796": 40, "948": [40, 47], "4265": 40, "950": [40, 47], "4500": [40, 51], "841": [40, 47], "842": [40, 47], "251": [40, 53], "143524": 40, "34586": 40, "345": [40, 47], "48449": 40, "142247": 40, "657": 40, "141641": 40, "33134": 40, "40384": 40, "142243": 40, "139913": 40, "33118": 40, "161": 40, "40793": 40, "142713": 40, "138901": 40, "34956": 40, "40618": 40, "140238": 40, "883": 40, "138208": 40, "34071": 40, "38745": 40, "140117": 40, "326": 40, "137638": 40, "34076": 40, "42352": 40, "135055": 40, "727": [40, 47, 53], "137268": 40, "728": 40, "33588": 40, "819": [40, 53], "38619": 40, "139783": 40, "193": 40, "136816": 40, "3762": 40, "43341": 40, "140772": 40, "581": [40, 47], "136368": 40, "3521": 40, "673": 40, "41807": 40, "135264": 40, "985": 40, "135726": 40, "34242": 40, "198655838": 40, "517": 40, "730": [40, 47], "731": 40, "732": 40, "896": 40, "907": 40, "913": [40, 47], "914": 40, "915": [40, 47], "969": 40, "002": 40, "004": 40, "005": [40, 53], "007": [40, 53], "008": [40, 53], "012": 40, "013": 40, "014": 40, "017": 40, "021": 40, "023": 40, "025": 40, "081": 40, "121": [40, 41], "423": [40, 41, 47], "505": [40, 53], "145": 40, "559": [40, 42, 47], "747": [40, 47], "275": 40, "091": [40, 42], "133": [40, 42], "361": [40, 42, 53], "0203": 40, "364": [40, 42, 55], "365": [40, 41, 42, 53], "0515": 40, "367": 40, "9460": 40, "0046": 40, "9890": 40, "371": [40, 47], "1355": 40, "372": 40, "8269": 40, "373": [40, 47], "9187": 40, "376": 40, "9500": 40, "8445": 40, "8875": 40, "3660": 40, "525": [40, 47], "4578": 40, "528": 40, "531": [40, 53], "4890": 40, "3835": 40, "538": 40, "4421": 40, "544": [40, 42, 47], "5730": 40, "3562": 40, "546": [40, 47], "548": [40, 47], "550": [40, 47], "3738": 40, "552": [40, 47], "4167": 40, "5632": 40, "594": 40, "599": 40, "144991": 40, "22035": 40, "633": [40, 47], "03885": 40, "144124": 40, "144851": 40, "1863": 40, "98102": 40, "145444": 40, "540": 40, "141821": 40, "18638": 40, "96441": 40, "144249": 40, "139519": 40, "18203": 40, "556": [40, 41, 47], "97548": 40, "140895": 40, "490": 40, "13942": 40, "491": 40, "19363": 40, "533": 40, "97628": 40, "141202": 40, "465": [40, 47], "13947": 40, "18342": 40, "97817": 40, "136504": 40, "138534": 40, "19586": 40, "96355": 40, "14067": 40, "138213": 40, "20188": 40, "98811": 40, "142139": 40, "138044": 40, "19324": 40, "427": [40, 47, 50], "96149": 40, "136835": 40, "137419": 40, "18732": 40, "grow": [40, 55], "1217153067": 40, "506": 40, "485": [40, 41, 47], "486": 40, "662": 40, "669": 40, "670": 40, "671": [40, 42, 47], "672": [40, 47], "862": [40, 49], "866": [40, 47], "871": 40, "872": [40, 53], "873": [40, 41, 42], "875": [40, 53], "876": 40, "878": 40, "881": 40, "882": 40, "884": [40, 47], "949": 40, "055": 40, "157": [40, 47], "780": [40, 53], "953": [40, 47], "150": [40, 41, 53], "434": [40, 47], "786": 40, "8152": 40, "787": 40, "7878": 40, "9441": 40, "8699": 40, "8542": 40, "9011": 40, "9246": 40, "797": 40, "7136": 40, "798": 40, "802": [40, 47], "7683": 40, "805": [40, 47], "7996": 40, "806": [40, 47], "8230": 40, "2527": 40, "943": [40, 47], "2253": 40, "3074": 40, "955": [40, 47], "2917": 40, "3386": 40, "961": 40, "3621": 40, "962": [40, 41], "2429": 40, "964": 40, "2156": 40, "965": 40, "3718": 40, "966": [40, 47], "967": [40, 42, 47], "2976": 40, "968": 40, "2820": 40, "3289": 40, "970": 40, "3523": [40, 42], "859": 40, "static_map": 40, "553648128": 40, "142151": 40, "53912": 40, "26107": 40, "141023": 40, "141078": 40, "57008": 40, "10267": 40, "141925": 40, "309": [40, 47, 53], "140561": 40, "55499": 40, "362": [40, 42, 47, 55], "13614": 40, "14338": 40, "139972": 40, "54929": 40, "464": 40, "10246": 40, "141379": 40, "139553": 40, "56729": 40, "11698": 40, "141421": 40, "642": [40, 41], "139362": 40, "56153": 40, "696": 40, "11376": 40, "136499": 40, "138972": 40, "60721": 40, "811": 40, "11548": 40, "141355": 40, "138726": 40, "56329": 40, "10124": 40, "142614": 40, "139617": 40, "5483": 40, "14957": 40, "138442": 40, "138159": 40, "57499": 40, "ensembl": [41, 43, 55], "inferenceon": 41, "739": 41, "638": 41, "715": 41, "986": 41, "142": 41, "218": 41, "hps_demo": [41, 43], "reshape2": [41, 42, 51], "1100": [41, 51, 53], "ground_truth": 41, "2598678435": 41, "565": 41, "566": 41, "636": 41, "808": 41, "810": 41, "21845": 41, "0047": 41, "6921": 41, "0092": 41, "6824": 41, "207": [41, 47], "208": 41, "213": 41, "658": [41, 47], "444961": 41, "693355": 41, "508793": 41, "694358": 41, "422282": 41, "695494": 41, "764": 41, "175263": 41, "691037": 41, "174492": 41, "688767": 41, "503806": 41, "000913": 41, "093": [41, 42], "148": 41, "279": 41, "hps_demo_with_embed": 41, "hps_demo_dense_1000": 41, "sparse_model": [41, 55], "hps_demo0_sparse_1000": 41, "hps_demo1_sparse_1000": 41, "hps_demo_without_embed": 41, "parameterserverconfig": 41, "pd": [41, 45, 46, 49, 51], "onnxruntim": 41, "ort": 41, "key_offset": 41, "cumsum": 41, "ps_config": 41, "emb_table_nam": 41, "max_feature_num_per_sample_per_emb_t": 41, "inference_params_arrai": 41, "df": [41, 50, 53], "read_parquet": [41, 50, 51], "dense_input_column": 41, "cat_input1_column": 41, "cat_input2_column": 41, "loc": [41, 51], "to_numpi": 41, "cat_input1": 41, "cat_input2": 41, "embedding1": 41, "flatten": 41, "embedding2": 41, "get_output": 41, "input_fe": 41, "get_input": 41, "sess_ref": 41, "res_ref": 41, "pred_ref": 41, "diff_ref": 41, "mse_ref": 41, "18488": 41, "18470": 41, "4895492": 41, "509022": 41, "38192913": 41, "5264926": 41, "50650454": 41, "47927693": 41, "48954916": 41, "50902206": 41, "38192907": 41, "52649266": 41, "5065045": 41, "4792769": 41, "3887142e": 41, "566238532": 41, "3543": 41, "cleanunusedinitializersandnodearg": 41, "key_to_indice_hash_all_t": 41, "lookup_fromdlpack": [41, 55], "capsul": [41, 55], "to_dlpack": 41, "key_capsul": 41, "out_capsul": 41, "out_put": 41, "from_dlpack": 41, "runtimeerror": 41, "cuda_devic": 41, "is_avail": [41, 42], "10028": 41, "10004": 41, "0307": 41, "0264": 41, "0294": 41, "0151": 41, "0281": 41, "eager": 41, "out_tensor": 41, "out_dlcapsul": 41, "729218": 41, "182": [41, 42], "168630": 41, "1639": 41, "30048": 41, "170043": 41, "30184": 41, "171618": 41, "173095": 41, "174795": 41, "176299": 41, "177782": 41, "179411": 41, "20005": 41, "30047": 41, "20004": 41, "30001": 41, "20037": 41, "02182689": 41, "01806355": 41, "01985828": 41, "0136845": 41, "01738386": 41, "00323257": 41, "unix": 41, "primari": [41, 55], "secondari": 41, "multi_process_hp": 41, "multiprocess": [41, 46], "create_hp": 41, "num_max_process": 41, "subprocess": [41, 46], "getpid": 41, "await": 41, "sleep": 41, "eras": 41, "lost": 41, "delet": [41, 55], "revok": 41, "preserv": 41, "risidu": 41, "monitor": 41, "counter": 41, "destroi": 41, "far": 41, "1394": 41, "1397": 41, "270453215232": 41, "269706559488": 41, "17179868672": 41, "313": 41, "multiprocesshashmapbackend": [41, 55], "289": [41, 47, 53], "311": [41, 53], "281": 41, "282": 41, "260310085632": 41, "7783505728": 41, "463": [41, 47], "706": 41, "711": [41, 53], "842594773": 41, "3887142264200634e": 41, "497305659": 41, "101124718": 41, "176": [41, 42], "687": 41, "detach": [41, 47, 48], "progress": [41, 55], "mock": 41, "tar": [41, 47], "archiv": [41, 50], "rf": 41, "xf": 41, "sf": 41, "112": [41, 47], "443": 41, "sent": 41, "codeload": 41, "ref": 41, "gzip": [41, 42], "87m": 41, "50mb": 41, "3011655": 41, "tmr": 41, "mkreleasehdr": 41, "broken": [41, 55], "pipe": 41, "34mcc": 41, "33mmakefil": 41, "dep": 41, "sentinel": 41, "gcda": 41, "gcno": 41, "gcov": 41, "lcov": 41, "makefil": 41, "adlist": 41, "quicklist": 41, "ae": 41, "anet": 41, "sd": 41, "zmalloc": 41, "lzf_c": 41, "lzf_d": 41, "pqsort": 41, "zipmap": 41, "sha1": 41, "ziplist": 41, "replic": 41, "t_string": 41, "t_list": 41, "t_set": 41, "t_zset": 41, "t_hash": 41, "pubsub": 41, "intset": 41, "syncio": 41, "crc16": 41, "endianconv": 41, "slowlog": 41, "bio": 41, "rio": 41, "rand": [41, 49], "memtest": 41, "syscheck": 41, "crcspeed": 41, "crc64": 41, "bitop": 41, "notifi": 41, "setproctitl": 41, "hyperloglog": 41, "sparklin": 41, "geo": 41, "lazyfre": 41, "expir": 41, "geohash": 41, "geohash_help": 41, "childinfo": 41, "defrag": 41, "siphash": 41, "rax": 41, "t_stream": 41, "listpack": 41, "localtim": 41, "lolwut": 41, "lolwut5": 41, "lolwut6": 41, "acl": 41, "sha256": 41, "timeout": 41, "setcpuaffin": 41, "monoton": 41, "mt19937": 41, "resp_pars": 41, "call_repli": 41, "script_lua": 41, "function_lua": 41, "redisassert": 41, "cli_common": 41, "distclean": 41, "clean": [41, 42, 50, 55], "linenois": 41, "lua": 41, "jemalloc": 41, "hdr_histogram": 41, "leav": 41, "xo": 41, "commandfilt": 41, "testrdb": 41, "infotest": 41, "misc": 41, "hook": 41, "blockonkei": 41, "blockonbackground": 41, "scan": 41, "datatype2": 41, "auth": 41, "keyspace_ev": 41, "blockedcli": 41, "getkei": 41, "getchannel": 41, "test_lazyfre": 41, "defragtest": 41, "keyspec": 41, "zset": 41, "mallocs": 41, "aclcheck": 41, "subcommand": 41, "repli": 41, "cmdintrospect": 41, "eventloop": 41, "moduleconfig": 41, "moduleconfigstwo": 41, "usercal": 41, "pedant": 41, "dredis_stat": 41, "c11": [41, 53], "wall": [41, 50], "wno": 41, "o2": 41, "malloc": 41, "build_tl": 41, "use_systemd": 41, "cflag": 41, "ldflag": 41, "redis_cflag": 41, "redis_ldflag": 41, "prev_final_cflag": 41, "ggdb": 41, "duse_jemalloc": 41, "prev_final_ldflag": 41, "rdynam": 41, "1mmake": 41, "1mhiredi": 41, "c99": 41, "o3": 41, "fpic": 41, "wstrict": 41, "prototyp": 41, "wwrite": 41, "net": [41, 47, 48, 50], "sockcompat": 41, "rc": 41, "libhiredi": 41, "1mlinenois": 41, "1mlua": 41, "dlua_ansi": 41, "denable_cjson_glob": 41, "dlua_use_mkstemp": 41, "myldflag": 41, "lapi": 41, "lcode": 41, "ldebug": 41, "ldo": 41, "ldump": 41, "lfunc": 41, "lgc": 41, "llex": 41, "lmem": 41, "lobject": 41, "lopcod": 41, "lparser": 41, "lstate": 41, "lstring": 41, "ltabl": 41, "ltm": 41, "lundump": 41, "lvm": 41, "lzio": 41, "strbuf": 41, "fpconv": 41, "lauxlib": 41, "lbaselib": 41, "ldblib": 41, "liolib": 41, "lmathlib": 41, "loslib": 41, "ltablib": 41, "lstrlib": 41, "loadlib": 41, "linit": 41, "lua_cjson": 41, "lua_struct": 41, "lua_cmsgpack": 41, "lua_bit": 41, "liblua": 41, "dll": 41, "ranlib": 41, "lm": 41, "luac": 41, "1mhdr_histogram": 41, "dhdr_malloc_includ": 41, "hdr_redis_malloc": 41, "libhdrhistogram": 41, "1mjemalloc": 41, "g0": 41, "lg": 41, "quantum": 41, "je_": 41, "gnu99": 41, "g3": 41, "funrol": 41, "xsltproc": 41, "gcc": 41, "iso": 41, "c89": 41, "crai": 41, "gnu11": 41, "wextra": 41, "wshorten": 41, "wsign": 41, "wundef": 41, "preprocessor": 41, "libstdc": 41, "linkag": 41, "grep": 41, "egrep": 41, "ansi": 41, "stat": 41, "stdlib": 41, "inttyp": 41, "stdint": 41, "unistd": 41, "bigendian": 41, "void": 41, "intmax_t": 41, "pc": 41, "paus": 41, "nm": 41, "gawk": 41, "mawk": 41, "usabl": [41, 55], "presenc": 41, "malloc_usable_s": 41, "__attribute__": 41, "syntax": 41, "fvisibl": 41, "hidden": [41, 48], "werror": 41, "herror_on_warn": 41, "tls_model": 41, "alloc_s": 41, "gnu_printf": 41, "printf": 41, "bsd": 41, "ld": 41, "autoconf": 41, "memalign": 41, "valloc": 41, "backtrac": 41, "sbrk": 41, "utrac": 41, "__builtin_unreach": 41, "__builtin_ffsl": 41, "__builtin_popcountl": 41, "lg_page": 41, "pthread": 41, "pthread_creat": 41, "lpthread": 41, "dlfcn": 41, "dlsym": 41, "pthread_atfork": 41, "pthread_setname_np": 41, "clock_gettim": 41, "clock_monotonic_coars": 41, "clock_monoton": 41, "mach_absolute_tim": 41, "syscal": 41, "secure_getenv": 41, "sched_getcpu": 41, "sched_setaffin": 41, "issetugid": 41, "_malloc_thread_cleanup": 41, "_pthread_mutex_init_calloc_cb": 41, "__atom": 41, "__sync": 41, "darwin": 41, "osatom": 41, "madvis": 41, "madv_fre": 41, "madv_dontne": 41, "madv_do": 41, "nt": 41, "madv_": 41, "__builtin_clz": 41, "os_unfair_lock_": 41, "glibc": 41, "mutex": 41, "d_gnu_sourc": 41, "strerror_r": 41, "stdbool": 41, "conform": 41, "_bool": 41, "xsl": 41, "manpag": 41, "xml": 41, "jemalloc_macro": 41, "jemalloc_proto": 41, "jemalloc_typedef": 41, "jemalloc_preambl": 41, "jemalloc_test": 41, "stamp": 41, "jeprof": 41, "jemalloc_def": 41, "jemalloc_internal_def": 41, "jemalloc_test_def": 41, "public_symbol": 41, "private_symbol": 41, "awk": 41, "private_symbols_jet": 41, "public_namespac": 41, "public_unnamespac": 41, "jemalloc_protos_jet": 41, "jemalloc_renam": 41, "jemalloc_mangl": 41, "jemalloc_mangle_jet": 41, "revis": [41, 55], "configure_cflag": 41, "specified_cflag": 41, "extra_cflag": 41, "cppflag": 41, "d_reentrant": 41, "cxx": 41, "configure_cxxflag": 41, "specified_cxxflag": 41, "extra_cxxflag": 41, "extra_ldflag": 41, "dso_ldflag": 41, "wl": 41, "sonam": 41, "lstdc": 41, "rpath_extra": 41, "xslroot": 41, "bindir": 41, "datadir": 41, "includedir": 41, "libdir": 41, "mandir": 41, "man": 41, "srcroot": 41, "abs_srcroot": 41, "objroot": 41, "abs_objroot": 41, "jemalloc_prefix": 41, "jemalloc_private_namespac": 41, "install_suffix": 41, "malloc_conf": 41, "autogen": 41, "experimetal_smallocx": 41, "prof": 41, "libunwind": 41, "libgcc": 41, "xmalloc": 41, "lazy_lock": 41, "oblivi": 41, "libjemalloc": 41, "iinclud": 41, "djemalloc_no_private_namespac": 41, "sym": 41, "arena": 41, "background_thread": 41, "bitmap": 41, "ckh": 41, "ctl": 41, "div": 41, "extent": 41, "extent_dss": 41, "extent_mmap": 41, "malloc_io": 41, "mutex_pool": 41, "nstime": 41, "prng": 41, "rtree": 41, "safety_check": 41, "sc": 41, "sz": 41, "tcach": 41, "test_hook": 41, "ticker": 41, "tsd": 41, "wit": 41, "private_namespac": 41, "gen": 41, "cp": 41, "jemalloc_cpp": 41, "cru": 41, "33madlist": 41, "33mquicklist": 41, "33mae": 41, "33manet": 41, "33mdict": 41, "33mserver": 41, "33msd": 41, "33mzmalloc": 41, "33mlzf_c": 41, "33mlzf_d": 41, "33mpqsort": 41, "33mzipmap": 41, "33msha1": 41, "33mziplist": 41, "33mreleas": 41, "33mnetwork": 41, "33mutil": 41, "33mobject": 41, "33mdb": 41, "33mreplic": 41, "33mrdb": 41, "33mt_string": 41, "33mt_list": 41, "33mt_set": 41, "33mt_zset": 41, "33mt_hash": 41, "33mconfig": 41, "33maof": 41, "33mpubsub": 41, "33mmulti": 41, "33mdebug": 41, "33msort": 41, "33mintset": 41, "33msyncio": 41, "33mcluster": 41, "33mcrc16": 41, "33mendianconv": 41, "33mslowlog": 41, "33meval": 41, "33mbio": 41, "33mrio": 41, "33mrand": 41, "33mmemtest": 41, "33msyscheck": 41, "33mcrcspeed": 41, "33mcrc64": 41, "33mbitop": 41, "33msentinel": 41, "33mnotifi": 41, "33msetproctitl": 41, "33mblock": 41, "33mhyperloglog": 41, "33mlatenc": 41, "33msparklin": 41, "33mredi": 41, "33mgeo": 41, "33mlazyfre": 41, "33mmodul": 41, "33mevict": 41, "33mexpir": 41, "33mgeohash": 41, "33mgeohash_help": 41, "33mchildinfo": 41, "33mdefrag": 41, "33msiphash": 41, "33mrax": 41, "33mt_stream": 41, "33mlistpack": 41, "33mlocaltim": 41, "33mlolwut": 41, "33mlolwut5": 41, "33mlolwut6": 41, "33macl": 41, "33mtrack": 41, "33mconnect": 41, "33mtl": 41, "33msha256": 41, "33mtimeout": 41, "33msetcpuaffin": 41, "33mmonoton": 41, "33mmt19937": 41, "33mresp_pars": 41, "33mcall_repli": 41, "33mscript_lua": 41, "33mscript": 41, "33mfunction": 41, "33mfunction_lua": 41, "33mcommand": 41, "1mlink": 41, "1mredi": 41, "1minstal": 41, "33mredisassert": 41, "33mcli_common": 41, "hint": 41, "idea": 41, "conf": 41, "daemon": 41, "appendonli": 41, "7001": 41, "7002": 41, "shutdown": [41, 42, 55], "pkill": 41, "1m": 41, "0mmaster": 41, "5460": 41, "master": 41, "5461": 41, "10922": 41, "10923": 41, "16383": 41, "fa9bb82124685a6438a696cc1562693ccc815ff0": 41, "c6d7ad6353bf568d17a147e65b8198ded9d65717": 41, "5462": 41, "e26ae6cfbeea8a1e6367444445364d963ae17436": 41, "0mwait": 41, "0mm": 41, "coverag": 41, "num_node_connect": 41, "572": [41, 47], "redisclust": 41, "134": [41, 47, 53], "230052244": 41, "setupt": 41, "24mb": 41, "duse_openssl": 41, "use_ssl": 41, "dhiredis_test_ssl": 41, "libhiredis_ssl": 41, "encyrypt": 41, "test_cert": 41, "openssl": 41, "redis_serv": 41, "keyusag": 41, "digitalsignatur": 41, "keyencipher": 41, "hugectr_cli": 41, "nscerttyp": 41, "genrsa": 41, "public": 41, "rsa": 41, "pubout": 41, "dummi": 41, "req": 41, "x509": 41, "subj": 41, "cn": 41, "dai": [41, 42, 54], "cakei": 41, "caseri": 41, "ser": 41, "cacreateseri": 41, "extfil": 41, "subject": 41, "cert": 41, "cacert": 41, "a441806db5506b7600ee8ae794fa01dc31ac83c9": 41, "6fa93392a396aa3c321736234b7eafc86bb1f979": 41, "8e9cd68cc229fcb568a84d7358011201b4246046": 41, "644": [41, 47], "984": 41, "990": 41, "995": 41, "998": 41, "conclud": 41, "022623188": 41, "hugectr_e2": 42, "base_dir": 42, "data_dir": 42, "train_dir": 42, "val_dir": 42, "model_dir": 42, "decom": 42, "unzip": [42, 45], "filterwarn": 42, "simplefilt": 42, "dask_cudf": [42, 50], "dask_cuda": 42, "localcudaclust": 42, "dask": [42, 50], "device_mem_s": 42, "pynvml_mem_s": 42, "categorifi": [42, 50], "fillmiss": 42, "get_embedding_s": [42, 50, 51], "basicconfig": 42, "asctim": 42, "setlevel": 42, "notset": 42, "getlogg": 42, "asyncio": 42, "schema": [42, 50], "categorical_column": [42, 50], "continuous_column": 42, "label_column": [42, 50], "criteo_column": 42, "cross_column": 42, "c1_c2": 42, "c3_c4": 42, "num_integer_column": 42, "num_categorical_column": 42, "num_total_column": 42, "dashboard": 42, "dashboard_port": 42, "8787": 42, "tcp": 42, "visible_devic": 42, "delect": 42, "device_limit_frac": 42, "spill": 42, "device_pool_frac": 42, "part_mem_frac": 42, "device_s": 42, "device_limit": 42, "device_pool_s": 42, "part_siz": [42, 50], "fmem": 42, "1e9": 42, "bewar": 42, "n_worker": 42, "device_memory_limit": 42, "dashboard_address": 42, "rmm_pool_siz": 42, "061": 42, "preload": 42, "062": 42, "072": 42, "087": 42, "acc90f7f": 42, "fb72": 42, "11ed": 42, "808f": 42, "54ab3adac0a5": 42, "c0d46f34": 42, "503": 42, "789d4132": 42, "7d07": 42, "451f": 42, "ac": 42, "1867dfa9d7b3": 42, "comm": 42, "33423": 42, "40925": 42, "43851": 42, "nanni": 42, "40143": 42, "6pb36hck": 42, "40769": 42, "44353": 42, "43979": 42, "rv8itza6": 42, "45255": 42, "37165": 42, "46241": 42, "co2ru8ea": 42, "40555": 42, "37399": 42, "36117": 42, "04qjh_rt": 42, "39951": 42, "42631": 42, "46323": 42, "9zzs6cz6": 42, "40815": 42, "37909": 42, "40533": 42, "hq437puc": 42, "42963": 42, "41947": 42, "42201": 42, "kuwua5fi": 42, "39607": 42, "45371": 42, "39667": 42, "ouyeimq6": 42, "train_output": 42, "val_output": 42, "train_input": 42, "val_input": 42, "preprocess_dir_temp_train": 42, "preprocess_dir_temp_v": 42, "makedir": 42, "preprocess_dir_temp": 42, "cudf": [42, 50, 55], "one_path": 42, "rmtree": [42, 50], "train_valid_path": 42, "temp_output": 42, "ddf": 42, "read_csv": [42, 45, 46, 49], "sep": 42, "feature_pair": 42, "to_parquet": [42, 45, 49, 50], "train_path": [42, 50], "valid_path": [42, 50], "categorify_op": 42, "cat_featur": [42, 50], "cont_featur": 42, "min_valu": 42, "cross_cat_op": 42, "output_format": 42, "train_ds_iter": 42, "valid_ds_iter": 42, "per_partit": [42, 50], "dict_dtyp": [42, 50], "col": [42, 50], "transform": [42, 44, 47, 48, 50], "to_hugectr": 42, "output_path": [42, 50], "embeddings_dict_cat": 42, "embeddings_dict_cross": 42, "ndask": 42, "1234907": 42, "19683": 42, "13780": 42, "6867": 42, "18490": 42, "6264": 42, "1235": 42, "854680": 42, "114026": 42, "75736": 42, "2159": 42, "7533": 42, "1307783": 42, "404742": 42, "1105613": 42, "87714": 42, "9032": 42, "1577645": 42, "1093030": 42, "187256813049316": 42, "data_path": 42, "model_path": 42, "1581605": 42, "4000": [42, 51], "2720": 42, "1350": 42, "wide_redn": 42, "dropout2": 42, "add1": 42, "21000": 42, "mpiinitservic": 42, "4031005480": 42, "353": [42, 47], "355": 42, "475000": 42, "0018": 42, "7234": 42, "366": 42, "7175": 42, "7946054": 42, "6990506": 42, "0788": 42, "3132": 42, "392": 42, "7372800": 42, "396": [42, 47], "3516": 42, "5847": 42, "397": [42, 47], "2162": 42, "0056": 42, "3464": 42, "874": [42, 50], "429": [42, 53], "70458": 42, "124098": 42, "6176": 42, "130088": 42, "835": 42, "3000": [42, 51], "61959": 42, "101731": 42, "449": 42, "61009": 42, "110557": 42, "738497": 42, "47924": 42, "1046": 42, "10236": 42, "61852": 42, "102157": 42, "771": 42, "58452": 42, "123451": 42, "61023": 42, "122763": 42, "867": [42, 47], "698276": 42, "48087": 42, "487": 42, "0999177": 42, "103": [42, 47], "61106": 42, "0999892": 42, "722": 42, "11000": 42, "61545": 42, "0883301": 42, "348": [42, 53], "12000": [42, 55], "62134": 42, "0828304": 42, "688598": 42, "4733": 42, "13000": 42, "0717": 42, "108287": 42, "14000": 42, "62997": 42, "0745141": 42, "15000": 42, "60764": 42, "0720452": 42, "287": [42, 47], "16000": 42, "61101": 42, "0851126": 42, "758": 42, "685426": 42, "47088": 42, "17000": 42, "0865": 42, "0632745": 42, "18000": 42, "62825": 42, "0742994": 42, "626": 42, "19000": 42, "61035": 42, "0679226": 42, "230": 42, "59954": 42, "0779185": 42, "704": 42, "684045": 42, "4736": 42, "733": 42, "119": 42, "398": [42, 47], "611": 42, "903": [42, 47], "788": [42, 47], "5538": 42, "0770708": 42, "But": [43, 55], "development": 43, "repo": 43, "pybind11": 43, "hugectr_e2e_demo_with_nvtabular": 43, "continuous_train": 43, "multi_gpu_offline_infer": 43, "training_and_inference_with_remote_filesystem": 43, "modal": [43, 45, 46, 55], "movi": [43, 44, 45, 50, 52], "movielen": [43, 44, 50, 51, 52, 55], "25m": [43, 44, 46, 49, 52], "xiaolei": 43, "training_with_remote_filesystem": 43, "price": 44, "purchas": 44, "Such": 44, "rich": [44, 46], "poster": [44, 45, 52], "plot": [44, 46, 48], "synopsi": [44, 45, 52], "music": 44, "audio": 44, "lyric": 44, "itinerari": 44, "plan": [44, 55], "attract": 44, "photo": 44, "resnet": [44, 52], "bert": [44, 55], "pretrain": [44, 50], "enrich": [44, 45, 52], "etl": 44, "000": 45, "subsequ": 45, "sklearn": [45, 55], "model_select": 45, "train_test_split": 45, "download_fil": 45, "input_data_dir": [45, 50, 51], "ml": [45, 46, 49], "grouplen": 45, "movieid": [45, 46, 49, 50, 51], "genr": [45, 50], "toi": 45, "stori": 45, "1995": 45, "adventur": 45, "anim": 45, "children": [45, 47], "comedi": 45, "fantasi": 45, "jumanji": 45, "grumpier": 45, "old": [45, 55], "men": 45, "romanc": 45, "exhal": 45, "drama": 45, "father": 45, "bride": 45, "ii": 45, "movies_convert": [45, 50], "timestamp": [45, 55], "1147880044": 45, "1147868817": 45, "1147868828": 45, "665": 45, "1147878820": 45, "1147868510": 45, "simpl": [45, 49, 55], "test_siz": 45, "random_st": 45, "wish": [45, 46], "proce": [45, 46, 49], "sypnopsi": 46, "scrap": 46, "imdbpi": [46, 48], "ipython": [46, 47, 48], "do_shutdown": [46, 47, 48], "meta": 46, "ia": 46, "director": 46, "the_matrix": 46, "get_movi": 46, "0114709": [46, 47, 49], "get_movie_infoset": 46, "imdbid": [46, 49], "nuniqu": 46, "pickl": [46, 47, 48, 49], "cpu_count": 46, "basemanag": 46, "dictproxi": 46, "movies_id": 46, "movies_info": [46, 48], "movie_info": [46, 48], "risk": 46, "num_job": 46, "chunk_siz": 46, "proc": 46, "pkl": [46, 47, 48, 49], "highest_protocol": [46, 47, 48], "collect_large_post": 46, "filelist": [46, 47], "targetlist": 46, "largefilelist": 46, "largetargetlist": 46, "target_path": 46, "poster_smal": [46, 47], "jpg": [46, 47], "poster_larg": 46, "download_task": 46, "cmd": 46, "popen": 46, "wc": 46, "nvidia_resnet50": 47, "checkout": [47, 55], "5d6d417ff57e8824ef51573e00e5e21307b39697": 47, "classif": [47, 55], "convnet": 47, "pil": 47, "amp": 47, "autocast": 47, "image_classif": 47, "torchvis": [47, 48], "resnet50": 47, "resnext101_32x4d": 47, "se_resnext101_32x4d": 47, "efficientnet_b0": 47, "efficientnet_b4": 47, "efficientnet_widese_b0": 47, "efficientnet_widese_b4": 47, "efficientnet_quant_b0": 47, "efficientnet_quant_b4": 47, "available_model": 47, "load_jpeg_from_fil": 47, "image_s": 47, "img_transform": 47, "compos": 47, "resiz": 47, "centercrop": 47, "totensor": 47, "img": 47, "no_grad": [47, 48], "wherea": 47, "view": [47, 55], "229": [47, 53], "mono": 47, "channel": 47, "sub_": 47, "div_": 47, "check_quant_weight_correct": 47, "checkpoint_path": 47, "map_loc": 47, "startswith": 47, "quantizers_sd_kei": 47, "_amax": 47, "named_modul": 47, "quantiz": [47, 55], "sd_all_kei": 47, "imgnet_class": 47, "loc_synset_map": 47, "model_arg": 47, "pretrained_from_fil": 47, "nvidia_resnet50_200821": 47, "pth": 47, "resnet50_pyt_amp": 47, "hub": 47, "conv1": 47, "conv2d": 47, "kernel_s": 47, "bn1": 47, "batchnorm2d": 47, "momentum": [47, 55], "affin": 47, "track_running_stat": 47, "maxpool": 47, "maxpool2d": 47, "dilat": 47, "ceil_mod": 47, "conv2": 47, "bn2": 47, "conv3": 47, "bn3": 47, "downsampl": 47, "layer2": 47, "layer4": 47, "avgpool": 47, "adaptiveavgpool2d": 47, "output_s": 47, "61951": 47, "0055323": 47, "0274711": 47, "0055320": 47, "0054197": 47, "1791658": 47, "1288589": 47, "0365653": 47, "2324928": 47, "6000478": 47, "num_bathc": 47, "array_split": 47, "strip": 47, "feature_extractor": 47, "feature_dict": [47, 49], "unabl": 47, "0168199": 47, "0118926": 47, "0415856": 47, "0494260": 47, "0810772": 47, "02it": 47, "0049314": 47, "23it": 47, "0066831": 47, "29it": 47, "0888693": 47, "11it": 47, "0067431": 47, "21it": 47, "6522546": 47, "0057811": 47, "5176252": 47, "0112373": 47, "47it": 47, "4636254": 47, "41it": 47, "0365658": 47, "20it": 47, "2124046": 47, "0104469": 47, "14it": 47, "0102493": 47, "17it": 47, "0051792": 47, "65it": 47, "0110017": 47, "0139630": 47, "83it": 47, "0143348": 47, "92it": 47, "0037618": 47, "0040002": 47, "70it": 47, "0317950": 47, "52it": 47, "0850669": 47, "0325258": 47, "6569888": 47, "0037736": 47, "0109303": 47, "44it": 47, "0103882": 47, "59it": 47, "0267287": 47, "24it": 47, "0100033": 47, "43it": 47, "1601215": 47, "46it": [47, 51], "0092028": 47, "0075963": 47, "3267334": 47, "69it": 47, "0059398": 47, "78it": 47, "0122565": 47, "82it": 47, "0052572": 47, "102": [47, 53], "38it": 47, "6404896": 47, "28it": 47, "0027428": 47, "0033883": 47, "0113270": 47, "0022286": 47, "25it": 47, "0068953": 47, "13it": 47, "0042949": 47, "0130297": 47, "15it": 47, "0028207": 47, "0054244": 47, "10it": 47, "1275680": 47, "0036533": 47, "0037297": 47, "51it": 47, "0962736": 47, "0042548": 47, "34it": 47, "0038109": 47, "0104009": 47, "22it": 47, "0180316": 47, "26it": 47, "0071925": 47, "139": 47, "31it": 47, "0087001": 47, "0056910": 47, "0064563": 47, "1720040": 47, "149": 47, "0041112": 47, "16it": 47, "4412528": 47, "0051362": 47, "158": 47, "0029992": 47, "0384309": 47, "0028367": 47, "50it": 47, "0038336": 47, "32it": 47, "0058725": 47, "164": 47, "0113328": 47, "166": 47, "3878542": 47, "33it": 47, "0026465": 47, "0040588": 47, "0086984": 47, "178": 47, "40it": 47, "0309047": 47, "181": 47, "0031405": 47, "185": 47, "0097493": 47, "186": 47, "09it": 47, "0346336": 47, "0078841": 47, "0018795": 47, "9151704": 47, "1417097": 47, "0054223": 47, "0117477": 47, "199": 47, "64it": 47, "0000041": 47, "30it": 47, "0028907": 47, "0366179": 47, "0109761": 47, "217": 47, "39it": 47, "7167686": 47, "219": 47, "0048973": 47, "226": 47, "0100112": 47, "3606394": 47, "0021890": 47, "228": 47, "0033874": 47, "0035019": 47, "232": 47, "1228953": 47, "237": 47, "7688990": 47, "0052954": 47, "0092159": 47, "0094349": 47, "0065136": 47, "246": 47, "0027805": 47, "0034904": 47, "248": [47, 53], "18it": 47, "0037522": 47, "06it": 47, "0036301": 47, "254": [47, 53, 55], "0037324": 47, "35it": 47, "0053622": 47, "265": 47, "42it": 47, "7278178": 47, "266": 47, "37it": 47, "0418239": 47, "0040489": 47, "0069280": 47, "08it": 47, "0049143": 47, "0064840": 47, "285": [47, 50], "0070723": 47, "19it": 47, "0057997": 47, "0056072": 47, "7446332": 47, "0076618": 47, "04it": 47, "0290014": 47, "0347330": 47, "303": 47, "36it": 47, "0159620": 47, "304": 47, "0044667": 47, "07it": 47, "0040190": 47, "3088364": 47, "0230367": 47, "0037147": 47, "310": 47, "0033282": 47, "4028134": 47, "312": 47, "1352824": 47, "314": 47, "0079400": 47, "318": 47, "0449869": 47, "0047526": 47, "320": 47, "0095593": 47, "321": [47, 55], "2762334": 47, "322": 47, "0023293": 47, "0024593": 47, "327": 47, "1116182": 47, "328": [47, 53], "0063462": 47, "0119577": 47, "0106727": 47, "0053884": 47, "337": 47, "0037077": 47, "03it": 47, "0040064": 47, "0089108": 47, "0023129": 47, "347": 47, "0044827": 47, "12it": 47, "0067108": 47, "359": 47, "0432432": 47, "0202415": 47, "0074812": 47, "0059311": 47, "0065073": 47, "0052820": 47, "0120865": 47, "0064620": 47, "0068505": 47, "2934916": 47, "0040137": 47, "0071864": 47, "0072973": 47, "387": 47, "0449951": 47, "388": 47, "27it": 47, "0074605": 47, "0328955": 47, "0077294": 47, "393": [47, 53], "0987918": 47, "394": 47, "0067520": 47, "395": 47, "0220016": 47, "0067236": 47, "0085838": 47, "0047561": 47, "0066075": 47, "0123374": 47, "0026143": 47, "0064626": 47, "0822388": 47, "0101664": 47, "0403579": 47, "0070112": 47, "2323633": 47, "0203408": 47, "1167638": 47, "71it": 47, "0144178": 47, "48it": 47, "0295432": 47, "435": 47, "0123865": 47, "436": 47, "55it": 47, "0110530": 47, "0082817": 47, "45it": 47, "0067525": 47, "0046333": 47, "439": 47, "0248953": 47, "0000033": 47, "0069165": 47, "0000014": 47, "0000027": 47, "05it": 47, "0063531": 47, "0041431": 47, "0831387": 47, "3908598": 47, "0056341": 47, "3833520": 47, "472": 47, "0058660": 47, "0086847": 47, "0074455": 47, "477": 47, "0037990": 47, "481": 47, "1764600": 47, "0372764": 47, "0368576": 47, "0368574": 47, "0366178": 47, "484": 47, "0067118": 47, "488": 47, "0044954": 47, "496": 47, "0078950": 47, "498": 47, "0050957": 47, "0058374": 47, "499": 47, "0027963": 47, "507": 47, "0362590": 47, "508": 47, "0008309": 47, "509": 47, "0065240": 47, "0055022": 47, "0418753": 47, "0070768": 47, "1706680": 47, "518": 47, "3836530": 47, "0050545": 47, "61it": 47, "8752440": 47, "523": 47, "81it": [47, 48], "0019504": 47, "0060117": 47, "526": 47, "1172060": 47, "3280916": 47, "0039502": 47, "3800796": 47, "0074238": 47, "0062032": 47, "0053891": 47, "0184115": 47, "0060968": 47, "0075165": 47, "549": 47, "0076998": 47, "0060176": 47, "0092745": 47, "0079936": 47, "0060747": 47, "2523756": 47, "554": [47, 50], "0092217": 47, "0046906": 47, "0206226": 47, "0086484": 47, "0175471": 47, "0085913": 47, "0233687": 47, "0053214": 47, "0032794": 47, "0040765": 47, "0064541": 47, "0365109": 47, "569": 47, "0337721": 47, "0032234": 47, "0344604": 47, "574": 47, "0041349": 47, "53it": 47, "0180073": 47, "6926486": 47, "583": 47, "0079596": 47, "586": 47, "0140603": 47, "0069745": 47, "0066154": 47, "1745787": 47, "0045995": 47, "0038675": 47, "0068971": 47, "596": 47, "0050205": 47, "598": 47, "0085175": 47, "0424237": 47, "603": [47, 53], "0190524": 47, "3365778": 47, "8119752": 47, "0031742": 47, "610": 47, "0100465": 47, "614": 47, "0072097": 47, "0071771": 47, "0174997": 47, "0033676": 47, "623": [47, 53], "0443567": 47, "0047559": 47, "627": 47, "0260295": 47, "0200768": 47, "640": 47, "0245238": 47, "0075679": 47, "0042418": 47, "645": 47, "0036814": 47, "0079756": 47, "0983922": 47, "0058642": 47, "659": 47, "0116016": 47, "663": 47, "0092238": 47, "666": 47, "2226519": 47, "0414982": 47, "0419641": 47, "0040246": 47, "0217168": 47, "674": 47, "0038452": 47, "675": 47, "3155242": 47, "0038255": 47, "0043153": 47, "0072209": 47, "686": 47, "0074797": 47, "688": 47, "2720826": 47, "690": 47, "0068227": 47, "0372765": 47, "0083713": 47, "0252133": 47, "0329913": 47, "703": 47, "0036840": 47, "56it": 47, "0067956": 47, "2195566": 47, "708": 47, "0080549": 47, "714": [47, 53], "0073398": 47, "716": 47, "0038205": 47, "718": 47, "0117220": 47, "719": 47, "0046198": 47, "0060351": 47, "0081568": 47, "0046921": 47, "0034739": 47, "0023251": 47, "0491764": 47, "0090642": 47, "741": [47, 53], "0037928": 47, "743": 47, "0457430": 47, "0057283": 47, "0462519": 47, "0110546": 47, "0045197": 47, "0062523": 47, "750": 47, "0112454": 47, "0065243": 47, "0396171": 47, "0059710": 47, "0080928": 47, "0126004": 47, "1833116": 47, "770": 47, "0075766": 47, "0123860": 47, "0123970": 47, "0323120": 47, "0035301": 47, "1216520": 47, "0028331": 47, "1330015": 47, "0062443": 47, "0485241": 47, "0154467": 47, "776": 47, "5235348": 47, "0191074": 47, "0060168": 47, "779": 47, "0081738": 47, "0379473": 47, "0063381": 47, "4427076": 47, "0173714": 47, "3794028": 47, "0464106": 47, "0090570": 47, "0087829": 47, "0041866": 47, "0444682": 47, "0058110": 47, "0072392": 47, "0080546": 47, "0064482": 47, "809": 47, "0044599": 47, "0439771": 47, "58it": 47, "0021756": 47, "820": 47, "0039676": 47, "0160801": 47, "0032981": 47, "0049854": 47, "2605312": 47, "0367257": 47, "829": 47, "6817944": 47, "0082081": 47, "1146283": 47, "0796335": 47, "0183355": 47, "0218094": 47, "0290820": 47, "845": 47, "1059793": 47, "0025665": 47, "848": 47, "0259786": 47, "854": 47, "0044369": 47, "0031127": 47, "857": 47, "0283644": 47, "0316599": 47, "0118767": 47, "0059758": 47, "0122194": 47, "0070404": 47, "0028484": 47, "0166792": 47, "0369903": 47, "0073115": 47, "0284655": 47, "9236264": 47, "892": 47, "0137094": 47, "893": 47, "0064323": 47, "895": 47, "49it": 47, "0062741": 47, "68it": 47, "0084237": 47, "901": 47, "0084273": 47, "4193400": 47, "906": 47, "0124307": 47, "908": 47, "0157383": 47, "0412808": 47, "0161860": 47, "918": 47, "4613254": 47, "2788556": 47, "925": 47, "1437361": 47, "3037582": 47, "0048211": 47, "4516162": 47, "0033932": 47, "0042871": 47, "57it": 47, "0137799": 47, "62it": 47, "1714196": 47, "0025117": 47, "2357144": 47, "1525898": 47, "0098088": 47, "6537238": 47, "0303151": 47, "951": 47, "0315632": 47, "0316352": 47, "0166557": 47, "956": 47, "0066879": 47, "3736766": 47, "0140340": 47, "959": [47, 53], "1570970": 47, "0075364": 47, "0099901": 47, "447": 47, "movies_poster_featur": [47, 49], "61504": 47, "huggingfac": 48, "jupyterlab": 48, "nbclassic": 48, "admin": 48, "pip3": 48, "cu111": 48, "torchaudio": 48, "whl": 48, "torch_stabl": 48, "barttoken": 48, "bartmodel": 48, "facebook": 48, "decod": 48, "return_tensor": 48, "pt": 48, "truncat": 48, "max_length": 48, "output_hidden_st": 48, "last_hidden_st": 48, "62423": [48, 49, 51], "average_embed": 48, "movies_synopsis_embed": [48, 49], "proceed": 49, "poster_featur": 49, "61947": 49, "text_featur": 49, "61291": 49, "tmdbid": 49, "0113497": 49, "8844": 49, "0113228": 49, "15602": 49, "0114885": 49, "31357": 49, "0113041": 49, "11862": 49, "0105812": 49, "feature_arrai": 49, "iterrow": [49, 51], "2049": 49, "poster_feature_": 49, "text_feature_": 49, "3073": [49, 51], "feature_df": [49, 51], "datafram": [49, 50], "poster_feature_0": [49, 51], "poster_feature_1": [49, 51], "poster_feature_2": [49, 51], "poster_feature_3": [49, 51], "poster_feature_4": [49, 51], "poster_feature_5": [49, 51], "poster_feature_6": [49, 51], "poster_feature_7": [49, 51], "poster_feature_8": [49, 51], "text_feature_1014": [49, 51], "text_feature_1015": [49, 51], "text_feature_1016": [49, 51], "text_feature_1017": [49, 51], "text_feature_1018": [49, 51], "text_feature_1019": [49, 51], "text_feature_1020": [49, 51], "text_feature_1021": [49, 51], "text_feature_1022": [49, 51], "text_feature_1023": [49, 51], "088281": 49, "036760": 49, "006470": 49, "023553": 49, "000163": 49, "238797": 49, "291230": 49, "197272": 49, "024294": 49, "307049": 49, "789571": 49, "084938": 49, "187339": 49, "061683": 49, "183281": 49, "356245": 49, "289105": 49, "134672": 49, "691380": 49, "045417": 49, "051422": 49, "203168": 49, "617449": 49, "443821": 49, "501953": 49, "736949": 49, "180542": 49, "313696": 49, "274087": 49, "153105": 49, "218745": 49, "187553": 49, "904370": 49, "069441": 49, "026665": 49, "817211": 49, "125072": 49, "173140": 49, "209240": 49, "451933": 49, "491917": 49, "743956": 49, "069061": 49, "900011": 49, "583347": 49, "192817": 49, "224088": 49, "182279": 49, "014646": 49, "004135": 49, "197796": 49, "077938": 49, "215127": 49, "021160": 49, "023108": 49, "394012": 49, "679462": 49, "225475": 49, "196255": 49, "169627": 49, "008575": 49, "172138": 49, "114755": 49, "127861": 49, "003679": 49, "082123": 49, "447287": 49, "002375": 49, "135956": 49, "989514": 49, "808180": 49, "317510": 49, "176658": 49, "078992": 49, "726118": 49, "017430": 49, "249834": 49, "183357": 49, "071451": 49, "644567": 49, "090399": 49, "147284": 49, "pyarrow": 49, "pypi": [49, 55], "satisfi": 49, "33mwarn": 49, "upgrad": [49, 50, 55], "026260": [49, 51], "857608": [49, 51], "410247": [49, 51], "066654": [49, 51], "382803": [49, 51], "899998": [49, 51], "511562": [49, 51], "592291": [49, 51], "565434": [49, 51], "636716": [49, 51], "578369": [49, 51], "996169": [49, 51], "402107": [49, 51], "412318": [49, 51], "859952": [49, 51], "293852": [49, 51], "341114": [49, 51], "727113": [49, 51], "085829": [49, 51], "141265": [49, 51], "721758": [49, 51], "679958": [49, 51], "955634": [49, 51], "391091": [49, 51], "324611": [49, 51], "505211": [49, 51], "258331": [49, 51], "048264": [49, 51], "161505": [49, 51], "431864": [49, 51], "836532": [49, 51], "525013": [49, 51], "654566": [49, 51], "823841": [49, 51], "818313": [49, 51], "856280": [49, 51], "638048": [49, 51], "685537": [49, 51], "119418": [49, 51], "911146": [49, 51], "470762": [49, 51], "762258": [49, 51], "626335": [49, 51], "768947": [49, 51], "241833": [49, 51], "775992": [49, 51], "236340": [49, 51], "865548": [49, 51], "387806": [49, 51], "668321": [49, 51], "552122": [49, 51], "750238": [49, 51], "863707": [49, 51], "382173": [49, 51], "894487": [49, 51], "565142": [49, 51], "164083": [49, 51], "538184": [49, 51], "980678": [49, 51], "643513": [49, 51], "928519": [49, 51], "794906": [49, 51], "201022": [49, 51], "744666": [49, 51], "962188": [49, 51], "915320": [49, 51], "777534": [49, 51], "904200": [49, 51], "167337": [49, 51], "875194": [49, 51], "180481": [49, 51], "815904": [49, 51], "808288": [49, 51], "036711": [49, 51], "902779": [49, 51], "580946": [49, 51], "772951": [49, 51], "239788": [49, 51], "061874": [49, 51], "162997": [49, 51], "388310": [49, 51], "236311": [49, 51], "162757": [49, 51], "207134": [49, 51], "111078": [49, 51], "250022": [49, 51], "335043": [49, 51], "091674": [49, 51], "121507": [49, 51], "418124": [49, 51], "150020": [49, 51], "803506": [49, 51], "059504": [49, 51], "002342": [49, 51], "932321": [49, 51], "manipul": 50, "terabyt": [50, 55], "rapid": 50, "apt": 50, "graphviz": 50, "ubuntu": 50, "focal": 50, "inreleas": 50, "ppa": 50, "launchpad": 50, "deadsnak": 50, "backport": 50, "33m": 50, "newest": [50, 55], "3build2": 50, "libarchive13": 50, "librhash0": 50, "libuv1": 50, "autoremov": 50, "newli": [50, 55], "columngroup": 50, "column_nam": 50, "op1": 50, "op2": 50, "sound": 50, "joinextern": 50, "left": 50, "acycl": 50, "dag": 50, "visual": 50, "contigu": 50, "fulfil": 50, "v0": [50, 54], "movieid_dup": 50, "_duplic": 50, "lambdaop": 50, "int8": 50, "manifest": 50, "matter": 50, "solv": 50, "break": 50, "demand": [50, 55], "hood": 50, "decomposit": 50, "lazili": 50, "couldn": 50, "train_dataset": 50, "100mb": 50, "valid_dataset": 50, "640002432": 50, "troubleshoot": 50, "160000608": 50, "scikit": [50, 55], "981": 50, "0x7fbb086a3370": 50, "\u00b5": 50, "restor": [50, 55], "162542": [50, 51], "56586": [50, 51], "movieid_dupl": [50, 51], "part_0": 50, "26460": 50, "97438": 50, "1704": 50, "105574": 50, "3568": 50, "39464": 50, "127724": 50, "movie_map": 51, "movieid_s": 51, "56581": 51, "209155": 51, "56582": 51, "209157": 51, "56583": 51, "209159": 51, "56584": 51, "209169": 51, "56585": 51, "209171": 51, "set_index": 51, "num_token": 51, "embedding_matrix": 51, "3967": 51, "3072": 51, "17294852": 51, "15285189": 51, "26095702": 51, "75369112": 51, "29602144": 51, "78917433": 51, "13539355": 51, "84843078": 51, "70951219": 51, "10441725": 51, "72871966": 51, "11719463": 51, "18514273": 51, "72422918": 51, "04273015": 51, "1404219": 51, "54169348": 51, "96875489": 51, "08307642": 51, "3673532": 51, "15777258": 51, "01297393": 51, "36267638": 51, "14848055": 51, "82188376": 51, "56516905": 51, "70838085": 51, "45119769": 51, "9273439": 51, "42464321": 51, "henc": [51, 55], "shall": 51, "plu": 51, "pretrained_embedding_s": 51, "convert_pretrained_embeddings_to_sparse_model": 51, "pre_trained_sparse_embed": 51, "hugectr_pretrained_embed": 51, "afterward": [51, 55], "noqa": 51, "pretrained_embed": 51, "10001": 51, "476440390": 51, "275735": 51, "16384000": 51, "256000": 51, "1072": 51, "297110": 51, "581705": 51, "274680": 51, "574425": 51, "746443": 51, "054157": 51, "332273": 51, "564224": 51, "277900": 51, "550730": 51, "764630": 51, "054009": 51, "434429": 51, "536507": 51, "279014": 51, "525059": 51, "773702": 51, "054287": 51, "335757": 51, "532503": 51, "278661": 51, "526352": 51, "779897": 51, "167787": 51, "447136": 51, "547141": 51, "376035": 51, "548916": 51, "784775": 51, "054224": 51, "334735": 51, "540766": 51, "277728": 51, "515882": 51, "786808": 51, "054551": 51, "1300": 51, "336372": 51, "531510": 51, "1400": [51, 53], "277408": 51, "511901": 51, "791416": 51, "165986": 51, "1500": 51, "554217": 51, "522047": 51, "279548": 51, "540521": 51, "793460": 51, "054801": 51, "1700": 51, "336303": 51, "525447": 51, "1800": [51, 53], "278906": 51, "523558": 51, "793137": 51, "054431": 51, "1900": 51, "336023": 51, "511348": 51, "384979": 51, "515268": 51, "796599": 51, "172160": 51, "2100": 51, "453174": 51, "526615": 51, "2200": 51, "278781": 51, "536789": 51, "798459": 51, "054509": 51, "2300": 51, "335596": 51, "508902": 51, "2400": 51, "277901": 51, "520411": 51, "798726": 51, "054518": 51, "2500": 51, "444557": 51, "490832": 51, "2600": 51, "279310": 51, "507799": 51, "801325": 51, "164203": 51, "2700": 51, "443310": 51, "519460": 51, "277569": 51, "512426": 51, "800731": 51, "054590": 51, "2900": 51, "336213": 51, "512216": 51, "384833": 51, "522102": 51, "803801": 51, "054133": 51, "3100": 51, "334245": 51, "507463": 51, "279046": 51, "526148": 51, "802950": 51, "070003": 51, "3300": 51, "352114": 51, "504611": 51, "3400": 51, "277292": 51, "502907": 51, "804364": 51, "054315": 51, "3500": 51, "442956": 51, "512927": 51, "3600": 51, "277974": 51, "519042": 51, "806404": 51, "054291": 51, "3700": 51, "335365": 51, "499368": 51, "3800": 51, "277786": 51, "509683": 51, "805164": 51, "064908": 51, "3900": 51, "344106": 51, "508182": 51, "387872": 51, "493841": 51, "808367": 51, "054222": 51, "4100": 51, "335361": 51, "508106": 51, "278802": 51, "519000": 51, "808897": 51, "054320": 51, "4300": 51, "334094": 51, "502797": 51, "4400": 51, "388990": 51, "508890": 51, "809649": 51, "074584": 51, "355005": 51, "505778": 51, "4600": 51, "277275": 51, "532776": 51, "810962": 51, "054498": 51, "4700": 51, "335553": 51, "503001": 51, "4800": 51, "279237": 51, "495762": 51, "808618": 51, "4900": 51, "449926": 51, "503213": 51, "277141": 51, "481138": 51, "810767": 51, "064807": 51, "untrain": 51, "5100": 51, "630313": 51, "485568": 51, "5200": 51, "278359": 51, "518924": 51, "811217": 51, "054624": 51, "5300": 51, "336246": 51, "516505": 51, "5400": 51, "384571": 51, "512404": 51, "811464": 51, "054350": 51, "5500": 51, "334675": 51, "500305": 51, "5600": 51, "279563": 51, "484969": 51, "bart": 52, "din": [53, 55], "0bcb014209e219273cb6fd4152df7df713cbac61": 53, "25t09": 53, "53z": 53, "protoc": 53, "4b40fff8bb27201ba07b6fa5651217fb": 53, "jar": 53, "172": 53, "dlrm_parquet": 53, "supergroup": 53, "112247365": 53, "112243637": 53, "112251207": 53, "112241764": 53, "112247838": 53, "112244076": 53, "112253553": 53, "112249557": 53, "112239093": 53, "112249156": 53, "lastli": [53, 55], "label0": 53, "c5": 53, "c6": 53, "c7": 53, "c8": 53, "c9": 53, "c10": 53, "c12": 53, "c13": 53, "c14": 53, "c15": 53, "c16": 53, "c17": 53, "c18": 53, "c19": 53, "c20": 53, "c21": 53, "c22": 53, "c23": 53, "c24": 53, "c25": 53, "c26": 53, "c27": 53, "c28": 53, "c29": 53, "c30": 53, "c31": 53, "c32": 53, "c33": 53, "c34": 53, "c35": 53, "c36": 53, "c37": 53, "c38": 53, "c39": 53, "train_with_hdf": 53, "datasourcetype_t": 53, "405274": 53, "72550": 53, "55008": 53, "222734": 53, "316071": 53, "156265": 53, "220243": 53, "200179": 53, "234566": 53, "335625": 53, "278726": 53, "263070": 53, "312542": 53, "203773": 53, "145859": 53, "117421": 53, "78140": 53, "3648": 53, "156308": 53, "94562": 53, "357703": 53, "386976": 53, "238046": 53, "230917": 53, "156382": 53, "10720": 53, "502": 53, "3218787045": 53, "607": 53, "529": 53, "530": 53, "a10": 53, "21954560": 53, "010000": 53, "716815": 53, "69327": 53, "856": 53, "719486": 53, "693207": 53, "750294": 53, "693568": 53, "721128": 53, "693352": 53, "78435": 53, "499891": 53, "5486": 53, "2728": 53, "693178": 53, "720984": 53, "693292": 53, "756448": 53, "693053": 53, "725832": 53, "693433": 53, "382": 53, "77763": 53, "693193": 53, "500092": 53, "57548": 53, "575": 53, "0_sparse_2000": 53, "_dense_2000": 53, "_opt_dense_2000": 53, "430": 53, "drwxr": 53, "xr": 53, "9479684": 53, "functionalit": 53, "dcn_parquet": 53, "train_with_s3": 53, "east": [53, 55], "39884": 53, "39043": 53, "17289": 53, "7420": 53, "20263": 53, "7120": 53, "1543": 53, "slice12": 53, "amazonaw": [53, 55], "pipeline_test": 53, "dcn_model": 53, "569406237": 53, "822": 53, "710": 53, "713": 53, "397821": 53, "2457600": 53, "1453": 53, "25574": 53, "712926": 53, "16987": 53, "701584": 53, "22653": 53, "696012": 53, "16121": 53, "698167": 53, "42367": 53, "695641": 53, "500979": 53, "0735": 53, "6575": 53, "696028": 53, "03696": 53, "693602": 53, "089": 53, "73903": 53, "693618": 53, "10101": 53, "696232": 53, "59704": 53, "50103": 53, "5882": 53, "473": 53, "0_sparse_1000": 53, "_dense_1000": 53, "_opt_dense_1000": 53, "843": 53, "988": 53, "denable_gc": 53, "enable_gc": 53, "gcp": 53, "credenti": 53, "environment": 53, "google_application_credenti": 53, "gcs_kei": 53, "train_with_gc": 53, "1008636636": 53, "308": 53, "323": 53, "236": 53, "22452": 53, "786299": 53, "6347": 53, "738846": 53, "22938": 53, "711017": 53, "63355": 53, "708317": 53, "850": 53, "11226": 53, "697101": 53, "501301": 53, "0298": 53, "6054": 53, "698077": 53, "744573": 53, "697804": 53, "244": 53, "04207": 53, "695543": 53, "761465": 53, "695323": 53, "28151": 53, "695319": 53, "647": 53, "501347": 53, "3576": 53, "664": 53, "804": 53, "submiss": 54, "ve": [54, 55], "billion": 54, "curv": 54, "dual": 54, "dgx1": 54, "emb_dim": 54, "6x": 54, "criteolab": 54, "criteo_script": [54, 55], "criteo2hugectr": 54, "tfrecord": 54, "chart": 54, "seven": 54, "exhibit": 54, "incremental_dump": 55, "illeg": 55, "hierarh": 55, "inconsist": 55, "anymor": 55, "export_predict": 55, "legaci": 55, "corner": 55, "nan": 55, "occasion": 55, "thousand": 55, "happen": 55, "cub": 55, "workaround": 55, "rmm": 55, "mr": 55, "set_current_device_resourc": 55, "356": 55, "hctr_rmm_settabl": 55, "cautiou": 55, "1g": 55, "243": 55, "kafkaproduc": 55, "succe": 55, "unrespons": 55, "reachabl": 55, "joint": 55, "suit": 55, "omit": 55, "futr": 55, "hierarchicalkv": 55, "cmakelist": 55, "minor": 55, "duse_cudart_stat": 55, "torchscript": 55, "coupl": 55, "regress": 55, "unnessari": 55, "h800": 55, "fp8": 55, "dequant": 55, "x86": 55, "superchip": 55, "hand": 55, "_concat_": 55, "dense_embed": 55, "refin": 55, "devicesegmentedsort": 55, "devicesegmentedradixsort": 55, "led": 55, "datadistributor": 55, "fly": 55, "member": 55, "elimin": 55, "parquet_reader_opt": 55, "set_num_row": 55, "pr": 55, "core23": 55, "hctr_print": 55, "did": 55, "cmak": 55, "ing": 55, "clarifi": 55, "refactor": 55, "dynamic_vari": 55, "adamax": 55, "adadelta": 55, "fault": 55, "wrong": 55, "fusion": 55, "cudadevicesynchron": 55, "embeddingtablecollect": 55, "utest": 55, "unfus": 55, "analayz": 55, "stress": 55, "crosslay": 55, "backpropag": 55, "move": 55, "denselayerswitch": 55, "unfamiliar": 55, "nic": 55, "python_interfac": 55, "strengthen": 55, "datasest": 55, "is_exclusive_kei": 55, "nob": 55, "sparseparam": 55, "misus": 55, "clearer": 55, "violat": 55, "newer": 55, "wdl_predict": 55, "januari": 55, "calendar": 55, "v23": 55, "bst": 55, "conceptu": 55, "arxiv": 55, "ab": 55, "2008": 55, "13535": 55, "redisclusterbackend": 55, "prop": 55, "test_embedding_table_optim": 55, "embedding_collect": 55, "clariti": 55, "bind": 55, "recov": 55, "sometim": 55, "problem": 55, "failur": 55, "macro": 55, "reiniti": 55, "end_offset": 55, "deeprec": 55, "embeddingplann": 55, "embedding_collection_test": 55, "db_type": 55, "boundari": 55, "hierarchc": 55, "co": 55, "round": 55, "trip": 55, "mlplayer": 55, "dgx_a100_mlp": 55, "preprocess_censu": 55, "mmoe": 55, "replicacontext": 55, "4_nvt_process": 55, "dgx_a100_ib_nvlink": 55, "dlpack": 55, "odd": 55, "sector": 55, "unreport": 55, "leak": 55, "table_group_strategi": 55, "table_placement_strategi": 55, "mmoe_parquet": 55, "simplif": 55, "mybucket": 55, "graphic": 55, "epilogu": 55, "corrupt": 55, "improp": 55, "io_block_s": 55, "max_nr_request": 55, "stabliz": 55, "globalembeddingdata": 55, "localembeddingdata": 55, "mention": 55, "bullet": 55, "ratio": 55, "subset": 55, "durat": 55, "hadoopfilesystem": 55, "hadoop_filesystem": 55, "hpp": 55, "third_parti": 55, "finer": 55, "deperac": 55, "overlapped_pipelin": 55, "triton_tf_deploi": 55, "cucollect": 55, "embedding_storag": 55, "dynamic_embedding_storag": 55, "interoper": 55, "slurm": 55, "305": 55, "482141": 55, "440781": 55, "46146124601364136": 55, "databasebackend": 55, "budget": 55, "nanosecond": 55, "strict": 55, "caller": 55, "unprocess": 55, "callback": 55, "load_dump": 55, "sst": 55, "find_tabl": 55, "discov": 55, "routin": 55, "inlin": 55, "340": 55, "concept": 55, "matrixmultipli": 55, "goal": 55, "navig": 55, "multiplex": 55, "gbp": 55, "upsert": 55, "diminish": 55, "recal": 55, "4x": 55, "rather": 55, "safer": 55, "ndcg": 55, "smape": 55, "extractor": 55, "261": 55, "inspir": 55, "roc_auc_scor": 55, "unweight": 55, "release_not": 55, "reorgan": 55, "dlrm_kaggle_fp32": 55, "36672493": 55, "301": 55, "standalon": 55, "prodvid": 55, "decoupl": 55, "realli": 55, "embedding_workspace_calcul": 55, "qa": 55, "codebas": 55, "relev": 55, "model_analyz": 55, "won": 55, "bare": 55, "notic": 55, "lessen": 55, "dlrm_benchmark": 55, "uint32_t": 55, "int64_t": 55, "uint32": 55, "all2alldenseembed": 55, "embedding_initi": 55, "randomuniform": 55, "readabl": 55, "robust": 55, "resolut": 55, "recycl": 55, "unload": 55, "invok": 55, "conda": 55, "event": 55, "notebok": 55, "inaccur": 55, "parameter": 55, "meaning": 55, "jabber": 55, "uniformli": 55, "interleav": 55, "shouldn": 55, "graphschedul": 55, "grapschedul": 55, "cudagraph": 55, "gap": 55, "adjac": 55, "grain": 55, "frozen": 55, "unfrozen": 55, "worri": 55, "gpu_cach": 55, "oversubscript": 55, "hmem": 55, "mo": 55, "use_host_memory_p": 55, "ps_type": 55, "vice": 55, "versa": 55, "famili": 55, "poc": 55, "assist": 55, "explain": 55, "involv": 55, "netwoek": 55, "fusedrelubiasfullyconnectedlay": 55, "holist": 55, "use_overlapped_pipelin": 55, "use_hash_t": 55, "multiworkermirroredstrategi": 55, "concret": 55, "ncf": 55, "gmf": 55, "neumf": 55, "dien": 55, "paraquet": 55, "moment": 55, "stand": 55, "alon": 55, "dotproduct": 55, "__half2": 55, "vocabulary_s": 55, "streamlin": 55, "grasp": 55, "embedding_plugin": 55, "localizedslotsparseembeddinghashonehot": 55, "claus": 55, "max_eval_sampl": 55, "multiplylay": 55, "weightmultiplylay": 55, "perl": 55, "embeddinglay": 55, "helper": 55, "coars": 55, "tensorfloat": 55, "mantissa": 55, "expon": 55, "redesign": 55, "multinod": 55, "hugectr_user_guid": 55, "preprocess_nvt": 55, "hasn": 55, "dl": 56, "vast": 56, "broad": 56, "balanc": 56, "littl": 56}, "objects": {"hierarchical_parameter_server": [[8, 0, 1, "", "Init"], [9, 1, 1, "", "LookupLayer"], [9, 1, 1, "", "SparseLookupLayer"]], "hierarchical_parameter_server.LookupLayer": [[9, 2, 1, "", "call"]], "hierarchical_parameter_server.SparseLookupLayer": [[9, 2, 1, "", "call"]]}, "objtypes": {"0": "py:function", "1": "py:class", "2": "py:method"}, "objnames": {"0": ["py", "function", "Python function"], "1": ["py", "class", "Python class"], "2": ["py", "method", "Python method"]}, "titleterms": {"question": 0, "answer": 0, "1": [0, 30, 41, 43, 54, 55], "who": 0, "ar": 0, "target": 0, "user": 0, "hugectr": [0, 2, 3, 4, 25, 29, 31, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 51, 53, 54], "2": [0, 30, 41, 43, 54, 55], "which": 0, "model": [0, 4, 5, 6, 20, 21, 22, 26, 27, 30, 31, 32, 33, 36, 40, 42, 48, 51, 53], "can": 0, "support": [0, 36], "3": [0, 30, 41, 43, 55], "doe": 0, "tensorflow": [0, 10, 24, 30, 33, 54], "4": [0, 30, 41, 43, 55], "multipl": [0, 21, 33], "node": [0, 36], "ctr": 0, "train": [0, 4, 5, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33, 35, 36, 40, 41, 42, 44, 45, 51, 53], "5": [0, 30, 41, 55], "how": [0, 35], "deal": 0, "huge": 0, "embed": [0, 2, 5, 19, 22, 26, 27, 30, 40, 41, 48, 51], "tabl": [0, 19, 21, 23, 28, 30, 40], "cannot": 0, "store": [0, 49], "singl": 0, "gpu": [0, 33, 47], "memori": 0, "6": [0, 55], "7": [0, 55], "must": 0, "we": 0, "us": [0, 2, 10, 13, 17, 19, 24, 30, 31, 32, 33, 38, 40, 42], "dgx": [0, 54], "famili": 0, "a100": [0, 54], "run": [0, 30, 40, 43, 50], "8": [0, 55], "without": [0, 41], "infiniband": 0, "9": [0, 55], "i": 0, "ani": 0, "requir": 0, "cpu": 0, "configur": [0, 5, 20, 21, 22, 24, 26, 27, 30, 31, 32, 33], "execut": [0, 5], "10": 0, "what": [0, 55], "specif": [0, 25, 29, 34, 43], "format": [0, 4, 30], "file": [0, 4, 30, 31, 32, 33, 53], "input": [0, 2, 21, 30], "11": [0, 55], "python": [0, 4], "interfac": [0, 4], "12": [0, 55], "do": [0, 30, 31, 32, 33], "synchron": 0, "otherwis": 0, "asynchron": 0, "13": 0, "stream": 0, "14": 0, "slot": 0, "15": 0, "differ": 0, "between": 0, "localizedslotembed": 0, "distributedslotembed": 0, "16": 0, "For": 0, "multi": [0, 36, 41, 44, 49, 52], "dataread": [0, 4], "read": 0, "same": 0, "batch": 0, "data": [0, 4, 28, 30, 31, 38, 40, 41, 42, 44, 46, 49, 53], "each": 0, "step": [0, 30, 45], "17": 0, "As": 0, "parallel": [0, 36], "layer": [0, 2, 4, 9, 20, 21, 22, 24, 26, 27, 33, 51], "get": [0, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 45], "all": [0, 46, 47, 48], "lookup": [0, 5, 28, 41], "featur": [0, 36, 47, 48, 49, 51], "from": [0, 20, 21, 22, 23, 24, 26, 27, 28, 30, 34, 35, 38, 41, 46, 47], "18": 0, "set": [0, 35], "claus": 0, "two": 0, "need": 0, "19": 0, "save": [0, 20, 21, 26], "load": [0, 22, 51], "20": 0, "could": 0, "post": 0, "import": 0, "other": 0, "framework": 0, "infer": [0, 4, 5, 20, 21, 23, 24, 26, 30, 35, 41], "deploy": [0, 41], "21": 0, "overlap": 0, "22": 0, "": [0, 48, 54, 55], "valu": 0, "23": [0, 55], "benchmark": [0, 6, 10, 17, 30, 38], "my": 0, "network": 0, "24": [0, 55], "workspace_size_per_gpu_in_mb": 0, "slot_size_arrai": 0, "25": 0, "nvlink": 0, "26": 0, "onli": 0, "server": [0, 5, 7, 10, 13, 17, 18, 20, 25, 29, 30, 36, 41], "27": 0, "28": 0, "pretrain": [0, 22, 47, 48, 51], "29": 0, "construct": 0, "graph": [0, 20, 21, 24, 26, 30, 31, 32, 33], "branch": 0, "topologi": 0, "30": 0, "good": 0, "practic": 0, "vector": [0, 41], "size": 0, "31": 0, "resolv": 0, "bu": 0, "error": 0, "when": 0, "sampl": 0, "notebook": [0, 10, 13, 17, 25, 29, 34, 40, 43, 52], "32": 0, "log": 0, "pool": 0, "empti": 0, "impli": 0, "addit": 1, "resourc": [1, 6, 39], "class": [2, 4, 12], "method": [2, 4], "spars": [2, 21, 32, 33, 35, 36, 56], "type": 2, "detail": 2, "distributedslotsparseembeddinghash": 2, "localizedslotsparseembeddinghash": 2, "localizedslotsparseembeddingonehot": 2, "dens": 2, "usag": 2, "fullyconnect": 2, "mlp": 2, "multicross": 2, "fmorder2": 2, "weightmultipli": 2, "elementwisemultipli": 2, "batchnorm": 2, "layernorm": 2, "concat": 2, "reshap": 2, "select": 2, "slice": 2, "dropout": 2, "elu": 2, "relu": 2, "sigmoid": 2, "interact": 2, "add": [2, 4], "reducesum": 2, "gru": 2, "preludic": 2, "scale": 2, "fusedreshapeconcat": 2, "fusedreshapeconcatgener": 2, "softmax": 2, "sub": 2, "reducemean": 2, "matrixmutipli": 2, "multiheadattent": 2, "sequencemask": 2, "gather": 2, "binarycrossentropyloss": 2, "crossentropyloss": 2, "multicrossentropyloss": 2, "collect": [2, 40, 46], "about": [2, 4, 38, 40], "overview": [2, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 41, 42, 44, 53], "known": [2, 55], "limit": 2, "embeddingtableconfig": 2, "embeddingcollectionconfig": 2, "embedding_lookup": 2, "shard": 2, "api": [3, 4, 7, 11, 16, 40], "document": 3, "high": 4, "level": 4, "solver": 4, "createsolv": 4, "asyncparam": 4, "hybridembeddingparam": 4, "datareaderparam": 4, "dataset": [4, 38, 45], "list": [4, 25, 29, 34, 43], "raw": 4, "parquet": 4, "optparamspi": 4, "createoptim": 4, "compil": 4, "fit": 4, "summari": [4, 36], "graph_to_json": 4, "construct_from_json": 4, "load_dense_weight": 4, "load_dense_optimizer_st": 4, "load_sparse_weight": 4, "load_sparse_optimizer_st": 4, "freeze_dens": 4, "freeze_embed": 4, "unfreeze_dens": 4, "unfreeze_embed": 4, "reset_learning_rate_schedul": 4, "set_sourc": 4, "low": 4, "learningrateschedul": 4, "get_next": 4, "is_eof": 4, "get_learning_rate_schedul": 4, "get_data_reader_train": 4, "get_data_reader_ev": 4, "start_data_read": 4, "set_learning_r": 4, "get_current_loss": 4, "eval": 4, "get_eval_metr": 4, "save_params_to_fil": 4, "check_out_tensor": 4, "inferenceparam": 4, "inferencemodel": 4, "predict": 4, "evalu": [4, 54], "gener": [4, 28, 31, 38, 41], "datageneratorparam": 4, "datagener": 4, "sourc": [4, 5, 35], "datasourceparam": 4, "hierarch": [5, 7, 10, 13, 17, 18, 20, 25, 29, 36, 41], "paramet": [5, 7, 10, 13, 17, 18, 20, 25, 29, 36, 41], "databas": 5, "backend": [5, 24], "introduct": [5, 10, 13, 17, 38], "hp": [5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 26, 28, 30, 31, 32, 33, 34, 41], "background": 5, "architectur": 5, "iter": 5, "updat": 5, "optim": [5, 36], "cach": 5, "param": 5, "syntax": 5, "volatil": 5, "overflow": 5, "common": 5, "persist": 5, "dlrm": [6, 26, 27, 40, 53], "setup": [6, 40, 41, 42, 53], "result": [6, 19], "initi": 8, "sparselookuplay": [9, 21, 26], "lookuplay": [9, 12, 20, 21, 24], "plugin": [10, 11, 12, 13, 14, 15, 16, 17, 30, 31, 32, 33, 34], "benefit": 10, "workflow": [10, 17], "instal": [10, 13, 17, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 38], "comput": [10, 13, 17, 38], "capabl": [10, 13, 17, 38], "ngc": [10, 13, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 38, 43], "contain": [10, 13, 17, 30, 34, 35, 38], "exampl": [10, 13, 17, 42, 43, 52, 53], "torch": [11, 12, 13, 28], "creator": 15, "tensorrt": [16, 17, 30, 31, 32, 33, 34], "profil": 19, "build": [19, 27, 30, 31, 32, 33, 35, 38, 43], "creat": [19, 20, 21, 23, 24, 26, 30, 49], "synthet": [19, 38, 49], "measur": 19, "triton": [19, 24, 30, 31, 32, 33], "perf": 19, "analyz": 19, "v": 19, "demo": [20, 22, 23, 26, 27, 28, 31, 32, 33, 41], "nativ": [20, 21, 22, 24, 33], "tf": [20, 21, 22, 23, 24, 30, 33], "pre": [22, 50, 51], "via": 22, "fusion": [23, 28], "savedmodel": [23, 24], "make": 23, "deploi": [24, 31, 32, 33], "trt": 24, "quickstart": [25, 29, 34], "pull": [25, 29, 34, 43], "docker": [25, 29, 30, 43], "clone": [25, 29, 34, 43], "repositori": [25, 29, 30, 34, 43], "start": [25, 29, 34, 35, 43, 45], "jupyt": [25, 29, 34, 43], "system": [25, 29, 30, 34, 43, 44, 53], "sok": [26, 27, 35], "larg": 30, "integr": [30, 31, 32, 33], "engin": [30, 31, 32, 33], "step1": [30, 31, 32, 33], "prepar": [30, 31, 32, 33, 40, 42, 53], "147gb": 30, "scratch": [30, 38, 41], "step2": [30, 31, 32, 33], "json": [30, 31, 32, 33], "step3": [30, 31, 32, 33], "convert": [30, 31, 32, 33, 36, 41, 45], "onnx": [30, 31, 32, 33, 36, 41], "surgeri": [30, 31, 32, 33], "step4": 30, "launch": 30, "arm64": 30, "grace": 30, "hooper": 30, "nvidia": [30, 47], "merlin": [30, 35, 39], "imag": 30, "host": 30, "pytorch": 32, "contribut": 35, "new": [35, 55], "code": 35, "your": 35, "develop": 35, "up": 35, "environ": 35, "With": 35, "oper": [35, 36, 56], "kit": [35, 36, 56], "core": 36, "mix": 36, "precis": 36, "sgd": 36, "learn": 36, "rate": 36, "schedul": 36, "hdf": [36, 53], "talk": 37, "blog": 37, "tool": 38, "download": [38, 45, 47, 48], "preprocess": [38, 42, 50], "relat": 39, "thi": 40, "concept": 40, "refer": 40, "an": 40, "follow": 40, "command": 40, "termin": 40, "script": 40, "placement": 40, "strategi": 40, "round": 40, "robin": 40, "uniform": 40, "hybrid": 40, "dynam": 40, "hash": 40, "dlpack": 41, "process": [41, 50], "redi": 41, "cluster": 41, "tl": 41, "ssl": 41, "end": 42, "nvtabular": [42, 50], "wdl": 42, "custom": 43, "option": 43, "recommend": 44, "modal": [44, 49, 52], "movielen": [45, 46], "25m": 45, "split": 45, "valid": 45, "next": 45, "enrich": 46, "scrape": 46, "imdb": 46, "synopsi": [46, 48], "movi": [46, 47, 48, 49, 51], "poster": [46, 47], "extract": [47, 48], "resnet": 47, "50": 47, "cloud": [47, 53], "bart": 48, "text": 48, "summar": 48, "real": 49, "etl": 50, "defin": [50, 51], "our": 50, "pipelin": 50, "check": 50, "output": 50, "non": 51, "trainabl": 51, "remot": 53, "dcn": 53, "aw": 53, "s3": 53, "googl": 53, "storag": 53, "perform": 54, "mlperf": 54, "releas": 55, "note": 55, "version": 55, "06": 55, "08": 55, "04": 55, "02": 55, "0": 55, "issu": 55}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.viewcode": 1, "sphinx.ext.intersphinx": 1, "sphinx": 57}, "alltitles": {"Questions and Answers": [[0, "questions-and-answers"]], "1. Who are the target users of HugeCTR?": [[0, "who-are-the-target-users-of-hugectr"]], "2. Which models can be supported in HugeCTR?": [[0, "which-models-can-be-supported-in-hugectr"]], "3. Does HugeCTR support TensorFlow?": [[0, "does-hugectr-support-tensorflow"]], "4. Does HugeCTR support multiple nodes CTR training?": [[0, "does-hugectr-support-multiple-nodes-ctr-training"]], "5. How to deal with the huge embedding table that cannot be stored in a single GPU memory?": [[0, "how-to-deal-with-the-huge-embedding-table-that-cannot-be-stored-in-a-single-gpu-memory"]], "6. Which GPUs are supported in HugeCTR?": [[0, "which-gpus-are-supported-in-hugectr"]], "7. Must we use the DGX family such as DGX A100 to run HugeCTR?": [[0, "must-we-use-the-dgx-family-such-as-dgx-a100-to-run-hugectr"]], "8. Can HugeCTR run without InfiniBand?": [[0, "can-hugectr-run-without-infiniband"]], "9. Is there any requirement of CPU configuration for HugeCTR execution?": [[0, "is-there-any-requirement-of-cpu-configuration-for-hugectr-execution"]], "10. What is the specific format of files as input in HugeCTR?": [[0, "what-is-the-specific-format-of-files-as-input-in-hugectr"]], "11.\t Does HugeCTR support Python interface?": [[0, "does-hugectr-support-python-interface"]], "12. Does HugeCTR do synchronous training with multiple GPUs (and nodes)? Otherwise, does it do asynchronous training?": [[0, "does-hugectr-do-synchronous-training-with-multiple-gpus-and-nodes-otherwise-does-it-do-asynchronous-training"]], "13. Does HugeCTR support stream training?": [[0, "does-hugectr-support-stream-training"]], "14. What is a \u201cslot\u201d in HugeCTR?": [[0, "what-is-a-slot-in-hugectr"]], "15. What are the differences between LocalizedSlotEmbedding and DistributedSlotEmbedding?": [[0, "what-are-the-differences-between-localizedslotembedding-and-distributedslotembedding"]], "16. For multi-node\uff0cis DataReader required to read the same batch of data on each node for each step?": [[0, "for-multi-node-is-datareader-required-to-read-the-same-batch-of-data-on-each-node-for-each-step"]], "17. As model parallelism in embedding layers, how does it get all the embedding lookup features from multi-node / multi-gpu?": [[0, "as-model-parallelism-in-embedding-layers-how-does-it-get-all-the-embedding-lookup-features-from-multi-node-multi-gpu"]], "18. How to set data clauses, if there are two embeddings needed?": [[0, "how-to-set-data-clauses-if-there-are-two-embeddings-needed"]], "19. How to save and load models in HugeCTR?": [[0, "how-to-save-and-load-models-in-hugectr"]], "20. Could the post training model from HugeCTR be imported into other frameworks such as TensorFlow for inference deployment?": [[0, "could-the-post-training-model-from-hugectr-be-imported-into-other-frameworks-such-as-tensorflow-for-inference-deployment"]], "21. Does HugeCTR support overlap between different slots?": [[0, "does-hugectr-support-overlap-between-different-slots"]], "22. What if there\u2019s no value in a slot?": [[0, "what-if-there-s-no-value-in-a-slot"]], "23. How can I benchmark my network?": [[0, "how-can-i-benchmark-my-network"]], "24. How to set workspace_size_per_gpu_in_mb and slot_size_array?": [[0, "how-to-set-workspace-size-per-gpu-in-mb-and-slot-size-array"]], "25. Is nvlink required in HugeCTR?": [[0, "is-nvlink-required-in-hugectr"]], "26. Is DGX the only GPU server that is required in HugeCTR?": [[0, "is-dgx-the-only-gpu-server-that-is-required-in-hugectr"]], "27. Can HugeCTR run without InfiniBand?": [[0, "id1"]], "28. Does HugeCTR support loading pretrained embeddings in other formats?": [[0, "does-hugectr-support-loading-pretrained-embeddings-in-other-formats"]], "29. How to construct the model graph with branch topology in HugeCTR?": [[0, "how-to-construct-the-model-graph-with-branch-topology-in-hugectr"]], "30. What is the good practice of configuring the embedding vector size?": [[0, "what-is-the-good-practice-of-configuring-the-embedding-vector-size"]], "31. How to resolve the bus error when running HugeCTR samples and notebooks?": [[0, "how-to-resolve-the-bus-error-when-running-hugectr-samples-and-notebooks"]], "32. What does the log \u201cmemory pool is empty\u201d imply for HugeCTR inference?": [[0, "what-does-the-log-memory-pool-is-empty-imply-for-hugectr-inference"]], "Additional Resources": [[1, "additional-resources"]], "HugeCTR Layer Classes and Methods": [[2, "hugectr-layer-classes-and-methods"]], "Input Layer": [[2, "input-layer"]], "Sparse Embedding": [[2, "sparse-embedding"]], "Embedding Types Detail": [[2, "embedding-types-detail"]], "DistributedSlotSparseEmbeddingHash Layer": [[2, "distributedslotsparseembeddinghash-layer"]], "LocalizedSlotSparseEmbeddingHash Layer": [[2, "localizedslotsparseembeddinghash-layer"]], "LocalizedSlotSparseEmbeddingOneHot Layer": [[2, "localizedslotsparseembeddingonehot-layer"]], "Dense Layers": [[2, "dense-layers"]], "Dense Layers Usage": [[2, "dense-layers-usage"]], "FullyConnected Layer": [[2, "fullyconnected-layer"]], "MLP Layer": [[2, "mlp-layer"]], "MultiCross Layer": [[2, "multicross-layer"]], "FmOrder2 Layer": [[2, "fmorder2-layer"]], "WeightMultiply Layer": [[2, "weightmultiply-layer"]], "ElementwiseMultiply Layer": [[2, "elementwisemultiply-layer"]], "BatchNorm Layer": [[2, "batchnorm-layer"]], "LayerNorm Layer": [[2, "layernorm-layer"]], "Concat Layer": [[2, "concat-layer"]], "Reshape Layer": [[2, "reshape-layer"]], "Select Layer": [[2, "select-layer"]], "Slice Layer": [[2, "slice-layer"]], "Dropout Layer": [[2, "dropout-layer"]], "ELU Layer": [[2, "elu-layer"]], "ReLU Layer": [[2, "relu-layer"]], "Sigmoid Layer": [[2, "sigmoid-layer"]], "Interaction Layer": [[2, "interaction-layer"]], "Add Layer": [[2, "add-layer"]], "ReduceSum Layer": [[2, "reducesum-layer"]], "GRU Layer": [[2, "gru-layer"]], "PReLUDice Layer": [[2, "preludice-layer"]], "Scale Layer": [[2, "scale-layer"]], "FusedReshapeConcat Layer": [[2, "fusedreshapeconcat-layer"]], "FusedReshapeConcatGeneral Layer": [[2, "fusedreshapeconcatgeneral-layer"]], "Softmax Layer": [[2, "softmax-layer"]], "Sub Layer": [[2, "sub-layer"]], "ReduceMean Layer": [[2, "reducemean-layer"]], "MatrixMutiply Layer": [[2, "matrixmutiply-layer"]], "MultiHeadAttention Layer": [[2, "multiheadattention-layer"]], "SequenceMask Layer": [[2, "sequencemask-layer"]], "Gather Layer": [[2, "gather-layer"]], "BinaryCrossEntropyLoss": [[2, "binarycrossentropyloss"]], "CrossEntropyLoss": [[2, "crossentropyloss"]], "MultiCrossEntropyLoss": [[2, "multicrossentropyloss"]], "Embedding Collection": [[2, "embedding-collection"]], "About the HugeCTR embedding collection": [[2, "about-the-hugectr-embedding-collection"]], "Overview of using the HugeCTR embedding collection": [[2, "overview-of-using-the-hugectr-embedding-collection"]], "Known Limitations": [[2, "known-limitations"]], "EmbeddingTableConfig": [[2, "embeddingtableconfig"]], "EmbeddingCollectionConfig": [[2, "embeddingcollectionconfig"]], "embedding_lookup method": [[2, "embedding-lookup-method"]], "shard method": [[2, "shard-method"]], "HugeCTR API Documentation": [[3, "hugectr-api-documentation"]], "HugeCTR Python Interface": [[4, "hugectr-python-interface"]], "About the HugeCTR Python Interface": [[4, "about-the-hugectr-python-interface"]], "High-level Training API": [[4, "high-level-training-api"]], "Solver": [[4, "solver"]], "CreateSolver method": [[4, "createsolver-method"]], "AsyncParam": [[4, "asyncparam"]], "AsyncParam class": [[4, "asyncparam-class"]], "HybridEmbeddingParam": [[4, "hybridembeddingparam"]], "HybridEmbeddingParam class": [[4, "hybridembeddingparam-class"]], "DataReaderParams": [[4, "datareaderparams"]], "DataReaderParams class": [[4, "datareaderparams-class"]], "Dataset formats": [[4, "dataset-formats"]], "Data Files": [[4, "data-files"]], "File List": [[4, "file-list"]], "Raw": [[4, "raw"]], "Parquet": [[4, "parquet"]], "OptParamsPy": [[4, "optparamspy"]], "CreateOptimizer method": [[4, "createoptimizer-method"]], "Layers": [[4, "layers"]], "Model": [[4, "model"], [4, "id2"]], "Model class": [[4, "model-class"]], "add method": [[4, "add-method"]], "compile method": [[4, "compile-method"]], "fit method": [[4, "fit-method"]], "summary method": [[4, "summary-method"]], "graph_to_json method": [[4, "graph-to-json-method"]], "construct_from_json method": [[4, "construct-from-json-method"]], "load_dense_weights method": [[4, "load-dense-weights-method"]], "load_dense_optimizer_states method": [[4, "load-dense-optimizer-states-method"]], "load_sparse_weights method": [[4, "load-sparse-weights-method"]], "load_sparse_optimizer_states method": [[4, "load-sparse-optimizer-states-method"]], "freeze_dense method": [[4, "freeze-dense-method"]], "freeze_embedding method": [[4, "freeze-embedding-method"]], "unfreeze_dense method": [[4, "unfreeze-dense-method"]], "unfreeze_embedding method": [[4, "unfreeze-embedding-method"]], "reset_learning_rate_scheduler method": [[4, "reset-learning-rate-scheduler-method"]], "set_source method": [[4, "set-source-method"], [4, "id1"]], "Low-level Training API": [[4, "low-level-training-api"]], "LearningRateScheduler": [[4, "learningratescheduler"]], "get_next method": [[4, "get-next-method"]], "DataReader": [[4, "datareader"]], "is_eof method": [[4, "is-eof-method"]], "get_learning_rate_scheduler method": [[4, "get-learning-rate-scheduler-method"]], "get_data_reader_train method": [[4, "get-data-reader-train-method"]], "get_data_reader_eval method": [[4, "get-data-reader-eval-method"]], "start_data_reading method": [[4, "start-data-reading-method"]], "set_learning_rate method": [[4, "set-learning-rate-method"]], "train method": [[4, "train-method"]], "get_current_loss method": [[4, "get-current-loss-method"]], "eval method": [[4, "eval-method"]], "get_eval_metrics method": [[4, "get-eval-metrics-method"]], "save_params_to_files method": [[4, "save-params-to-files-method"]], "check_out_tensor method": [[4, "check-out-tensor-method"], [4, "id3"]], "Inference API": [[4, "inference-api"]], "InferenceParams": [[4, "inferenceparams"]], "InferenceParams class": [[4, "inferenceparams-class"]], "InferenceModel": [[4, "inferencemodel"]], "InferenceModel class": [[4, "inferencemodel-class"]], "predict method": [[4, "predict-method"]], "evaluate method": [[4, "evaluate-method"]], "Data Generator API": [[4, "data-generator-api"]], "DataGeneratorParams class": [[4, "datageneratorparams-class"]], "DataGenerator": [[4, "datagenerator"]], "DataGenerator class": [[4, "datagenerator-class"]], "generate method": [[4, "generate-method"]], "Data Source API": [[4, "data-source-api"]], "DataSourceParams class": [[4, "datasourceparams-class"]], "Hierarchical Parameter Server Database Backend": [[5, "hierarchical-parameter-server-database-backend"]], "Introduction to the HPS Database Backend": [[5, "introduction-to-the-hps-database-backend"]], "Background": [[5, "background"]], "Architecture": [[5, "architecture"]], "Training and Iterative Model Updates": [[5, "training-and-iterative-model-updates"]], "Execution": [[5, "execution"]], "Inference": [[5, "inference"]], "Training": [[5, "training"], [53, "training"], [53, "id3"]], "Lookup Optimization": [[5, "lookup-optimization"]], "Configuration": [[5, "configuration"]], "Inference Parameters and Embedding Cache Configuration": [[5, "inference-parameters-and-embedding-cache-configuration"]], "Inference Params Syntax": [[5, "inference-params-syntax"]], "Inference Parameters": [[5, "inference-parameters"]], "Parameter Server Configuration: Models": [[5, "parameter-server-configuration-models"]], "Volatile Database Configuration": [[5, "volatile-database-configuration"]], "Volatile Database Params Syntax": [[5, "volatile-database-params-syntax"]], "Parameter Server Configuration: Volatile Database": [[5, "parameter-server-configuration-volatile-database"]], "Volatile Database Parameters": [[5, "volatile-database-parameters"]], "Overflow Parameters": [[5, "overflow-parameters"]], "Common Volatile Database Parameters": [[5, "common-volatile-database-parameters"]], "Persistent Database Configuration": [[5, "persistent-database-configuration"]], "Persistent Database Params Syntax": [[5, "persistent-database-params-syntax"]], "Parameter Server Configuration: Persistent Database": [[5, "parameter-server-configuration-persistent-database"]], "Persistent Database Parameters": [[5, "persistent-database-parameters"]], "Update Source Configuration": [[5, "update-source-configuration"]], "Update Source Params Syntax": [[5, "update-source-params-syntax"]], "Parameter Server Configuration: Update Source": [[5, "parameter-server-configuration-update-source"]], "Update Source Parameters": [[5, "update-source-parameters"]], "Benchmark the DLRM Model with HPS": [[6, "benchmark-the-dlrm-model-with-hps"]], "Benchmark Setup": [[6, "benchmark-setup"]], "Results": [[6, "results"]], "Resources": [[6, "resources"]], "Hierarchical Parameter Server API": [[7, "hierarchical-parameter-server-api"]], "HPS Initialize": [[8, "hps-initialize"]], "HPS Layers": [[9, "hps-layers"]], "SparseLookupLayer": [[9, "sparselookuplayer"]], "LookupLayer": [[9, "lookuplayer"]], "Hierarchical Parameter Server Plugin for TensorFlow": [[10, "hierarchical-parameter-server-plugin-for-tensorflow"]], "Introduction to the HPS Plugin for TensorFlow": [[10, "introduction-to-the-hps-plugin-for-tensorflow"]], "Benefits of the Plugin for TensorFlow": [[10, "benefits-of-the-plugin-for-tensorflow"]], "Workflow": [[10, "workflow"], [17, "workflow"]], "Installation": [[10, "installation"], [13, "installation"], [17, "installation"], [20, "installation"], [21, "installation"], [22, "installation"], [23, "installation"], [24, "installation"], [26, "installation"], [27, "installation"], [28, "installation"], [30, "installation"], [31, "installation"], [32, "installation"], [33, "installation"]], "Compute Capability": [[10, "compute-capability"], [13, "compute-capability"], [17, "compute-capability"], [38, "compute-capability"]], "Installing HPS Using NGC Containers": [[10, "installing-hps-using-ngc-containers"], [13, "installing-hps-using-ngc-containers"], [17, "installing-hps-using-ngc-containers"]], "Example Notebooks": [[10, "example-notebooks"], [13, "example-notebooks"], [17, "example-notebooks"]], "Benchmark": [[10, "benchmark"], [17, "benchmark"]], "HPS Plugin for Torch API": [[11, "hps-plugin-for-torch-api"]], "HPS Plugin for Torch": [[12, "hps-plugin-for-torch"]], "LookupLayer class": [[12, "lookuplayer-class"]], "Hierarchical Parameter Server Plugin for Torch": [[13, "hierarchical-parameter-server-plugin-for-torch"]], "Introduction to the HPS Plugin for Torch": [[13, "introduction-to-the-hps-plugin-for-torch"]], "HPS Plugin": [[14, "hps-plugin"]], "HPS Plugin Creator": [[15, "hps-plugin-creator"]], "HPS Plugin for TensorRT API": [[16, "hps-plugin-for-tensorrt-api"]], "Hierarchical Parameter Server Plugin for TensorRT": [[17, "hierarchical-parameter-server-plugin-for-tensorrt"]], "Introduction to the HPS Plugin for TensorRT": [[17, "introduction-to-the-hps-plugin-for-tensorrt"]], "Hierarchical Parameter Server": [[18, "hierarchical-parameter-server"], [36, "hierarchical-parameter-server"]], "Profiling HPS": [[19, "profiling-hps"]], "HPS profiler": [[19, "hps-profiler"]], "Build and install the HPS Profiler": [[19, "build-and-install-the-hps-profiler"]], "Create a synthetic embedding table": [[19, "create-a-synthetic-embedding-table"]], "Use the HPS Profiler to get the measurement results": [[19, "use-the-hps-profiler-to-get-the-measurement-results"]], "Profile HPS with Triton Perf Analyzer:": [[19, "profile-hps-with-triton-perf-analyzer"]], "HPS Profiler vs. Triton Perf Analyzer:": [[19, "hps-profiler-vs-triton-perf-analyzer"]], "Hierarchical Parameter Server Demo": [[20, "hierarchical-parameter-server-demo"], [41, "hierarchical-parameter-server-demo"]], "Overview": [[20, "overview"], [21, "overview"], [22, "overview"], [23, "overview"], [24, "overview"], [26, "overview"], [27, "overview"], [28, "overview"], [30, "overview"], [31, "overview"], [32, "overview"], [33, "overview"], [41, "overview"], [42, "overview"], [44, "overview"], [53, "overview"]], "Get HPS from NGC": [[20, "get-hps-from-ngc"], [21, "get-hps-from-ngc"], [22, "get-hps-from-ngc"], [23, "get-hps-from-ngc"], [24, "get-hps-from-ngc"], [28, "get-hps-from-ngc"]], "Configurations": [[20, "configurations"], [21, "configurations"], [22, "configurations"], [24, "configurations"], [26, "configurations"], [27, "configurations"], [32, "configurations"], [33, "configurations"]], "Train with native TF layers": [[20, "train-with-native-tf-layers"], [21, "train-with-native-tf-layers"], [22, "train-with-native-tf-layers"], [24, "train-with-native-tf-layers"], [33, "train-with-native-tf-layers"]], "Create the inference graph with HPS LookupLayer": [[20, "create-the-inference-graph-with-hps-lookuplayer"], [24, "create-the-inference-graph-with-hps-lookuplayer"]], "Inference with saved model graph": [[20, "inference-with-saved-model-graph"], [21, "inference-with-saved-model-graph"], [26, "inference-with-saved-model-graph"]], "HPS for Multiple Tables and Sparse Inputs": [[21, "hps-for-multiple-tables-and-sparse-inputs"]], "Create the inference graph with HPS SparseLookupLayer and LookupLayer": [[21, "create-the-inference-graph-with-hps-sparselookuplayer-and-lookuplayer"]], "HPS Pretrained Model Training Demo": [[22, "hps-pretrained-model-training-demo"]], "Load the pre-trained embeddings via HPS": [[22, "load-the-pre-trained-embeddings-via-hps"]], "HPS Table Fusion Demo": [[23, "hps-table-fusion-demo"]], "Create TF SavedModel": [[23, "create-tf-savedmodel"]], "Make inference with HPS table fusion": [[23, "make-inference-with-hps-table-fusion"]], "Deploy SavedModel using HPS with Triton TensorFlow Backend": [[24, "deploy-savedmodel-using-hps-with-triton-tensorflow-backend"], [24, "id1"]], "Deploy TF-TRT SavedModel using HPS with Triton TensorFlow Backend": [[24, "deploy-tf-trt-savedmodel-using-hps-with-triton-tensorflow-backend"]], "Hierarchical Parameter Server Notebooks": [[25, "hierarchical-parameter-server-notebooks"], [29, "hierarchical-parameter-server-notebooks"]], "Quickstart": [[25, "quickstart"], [29, "quickstart"], [34, "quickstart"]], "Pull the NGC Docker": [[25, "pull-the-ngc-docker"], [29, "pull-the-ngc-docker"]], "Clone the HugeCTR Repository": [[25, "clone-the-hugectr-repository"], [29, "clone-the-hugectr-repository"], [34, "clone-the-hugectr-repository"]], "Start the Jupyter Notebook": [[25, "start-the-jupyter-notebook"], [29, "start-the-jupyter-notebook"], [34, "start-the-jupyter-notebook"]], "Notebook List": [[25, "notebook-list"], [29, "notebook-list"], [34, "notebook-list"], [43, "notebook-list"]], "System Specifications": [[25, "system-specifications"], [29, "system-specifications"], [34, "system-specifications"], [43, "system-specifications"]], "SOK to HPS DLRM Demo": [[26, "sok-to-hps-dlrm-demo"]], "Get SOK from NGC": [[26, "get-sok-from-ngc"], [27, "get-sok-from-ngc"]], "Train with SOK embedding layers": [[26, "train-with-sok-embedding-layers"]], "Create the inference graph with HPS SparseLookupLayer": [[26, "create-the-inference-graph-with-hps-sparselookuplayer"]], "SOK Train DLRM Demo": [[27, "sok-train-dlrm-demo"]], "Build model with SOK embedding layers": [[27, "build-model-with-sok-embedding-layers"]], "Train with SOK models": [[27, "train-with-sok-models"]], "HPS Torch Demo": [[28, "hps-torch-demo"]], "Data Generation": [[28, "data-generation"], [31, "data-generation"], [41, "data-generation"]], "Lookup with Table Fusion": [[28, "lookup-with-table-fusion"]], "HPS TensorRT Plugin Benchmark for TensorFlow Large Model": [[30, "hps-tensorrt-plugin-benchmark-for-tensorflow-large-model"]], "Use NGC": [[30, "use-ngc"], [31, "use-ngc"], [32, "use-ngc"], [33, "use-ngc"]], "1. Create the TF model": [[30, "create-the-tf-model"]], "2. Build the HPS-integrated TensorRT engine": [[30, "build-the-hps-integrated-tensorrt-engine"]], "Step1: Prepare the 147GB embedding table": [[30, "step1-prepare-the-147gb-embedding-table"]], "1.1 Train a 147GB model from scratch": [[30, "train-a-147gb-model-from-scratch"]], "1.2 Get the embedding model file in hps format": [[30, "get-the-embedding-model-file-in-hps-format"]], "Step2: Prepare JSON configuration file for HPS": [[30, "step2-prepare-json-configuration-file-for-hps"]], "Step3: Convert to ONNX and do ONNX graph surgery": [[30, "step3-convert-to-onnx-and-do-onnx-graph-surgery"]], "Step4: Build the TensorRT engine": [[30, "step4-build-the-tensorrt-engine"]], "3. Benchmark HPS-integrated TensorRT engine on Triton": [[30, "benchmark-hps-integrated-tensorrt-engine-on-triton"]], "Step1: Create the model repository": [[30, "step1-create-the-model-repository"]], "Step2: Prepare the benchmark input data": [[30, "step2-prepare-the-benchmark-input-data"]], "Step3: Launch the Triton inference server": [[30, "step3-launch-the-triton-inference-server"]], "Step4: Run the benchmark": [[30, "step4-run-the-benchmark"]], "4. Benchmark for ARM64 or Grace + Hooper systems": [[30, "benchmark-for-arm64-or-grace-hooper-systems"]], "Step 1: Build the NVIDIA Merlin docker images": [[30, "step-1-build-the-nvidia-merlin-docker-images"]], "Step 2: Prepare host system for running the docker container": [[30, "step-2-prepare-host-system-for-running-the-docker-container"]], "Step 3: Create the model": [[30, "step-3-create-the-model"]], "Step 4: Prepare data": [[30, "step-4-prepare-data"]], "Step 5: Run benchmark": [[30, "step-5-run-benchmark"]], "HPS TensorRT Plugin Demo for HugeCTR Trained Model": [[31, "hps-tensorrt-plugin-demo-for-hugectr-trained-model"]], "Train with HugeCTR": [[31, "train-with-hugectr"]], "Build the HPS-integrated TensorRT engine": [[31, "build-the-hps-integrated-tensorrt-engine"], [32, "build-the-hps-integrated-tensorrt-engine"], [33, "build-the-hps-integrated-tensorrt-engine"]], "Step1: Prepare JSON configuration file for HPS": [[31, "step1-prepare-json-configuration-file-for-hps"]], "Step2: Convert to ONNX and do ONNX graph surgery": [[31, "step2-convert-to-onnx-and-do-onnx-graph-surgery"], [32, "step2-convert-to-onnx-and-do-onnx-graph-surgery"], [33, "step2-convert-to-onnx-and-do-onnx-graph-surgery"]], "Step3: Build the TensorRT engine": [[31, "step3-build-the-tensorrt-engine"], [32, "step3-build-the-tensorrt-engine"], [33, "step3-build-the-tensorrt-engine"]], "Deploy HPS-integrated TensorRT engine on Triton": [[31, "deploy-hps-integrated-tensorrt-engine-on-triton"], [32, "deploy-hps-integrated-tensorrt-engine-on-triton"]], "HPS TensorRT Plugin Demo for PyTorch Trained Model": [[32, "hps-tensorrt-plugin-demo-for-pytorch-trained-model"]], "Train with PyTorch": [[32, "train-with-pytorch"]], "Step1: Prepare sparse model and JSON configuration file for HPS": [[32, "step1-prepare-sparse-model-and-json-configuration-file-for-hps"], [33, "step1-prepare-sparse-model-and-json-configuration-file-for-hps"]], "HPS TensorRT Plugin Demo for TensorFlow Trained Model": [[33, "hps-tensorrt-plugin-demo-for-tensorflow-trained-model"]], "Deploy HPS-integrated TensorRT engine with Triton on multiple GPUs": [[33, "deploy-hps-integrated-tensorrt-engine-with-triton-on-multiple-gpus"]], "HPS Plugin for TensorRT Notebooks": [[34, "hps-plugin-for-tensorrt-notebooks"]], "Pull the Container from NGC": [[34, "pull-the-container-from-ngc"]], "Contributing to HugeCTR": [[35, "contributing-to-hugectr"]], "Overview of Contributing to HugeCTR": [[35, "overview-of-contributing-to-hugectr"]], "Contribute New Code": [[35, "contribute-new-code"]], "How to Start your Development": [[35, "how-to-start-your-development"]], "Set Up the Development Environment With Merlin Containers": [[35, "set-up-the-development-environment-with-merlin-containers"]], "Build HugeCTR Training Container from Source": [[35, "build-hugectr-training-container-from-source"]], "Build HugeCTR Inference Container from Source": [[35, "build-hugectr-inference-container-from-source"]], "Build Sparse Operation Kit (SOK) from Source": [[35, "build-sparse-operation-kit-sok-from-source"]], "HugeCTR Core Features": [[36, "hugectr-core-features"]], "Summary of Core Features": [[36, "summary-of-core-features"]], "Model Parallel Training": [[36, "model-parallel-training"]], "Multi-Node Training": [[36, "multi-node-training"]], "Mixed Precision Training": [[36, "mixed-precision-training"]], "SGD Optimizer and Learning Rate Scheduling": [[36, "sgd-optimizer-and-learning-rate-scheduling"]], "HugeCTR to ONNX Converter": [[36, "hugectr-to-onnx-converter"]], "HDFS Support": [[36, "hdfs-support"]], "Sparse Operation Kit": [[36, "sparse-operation-kit"], [56, "sparse-operation-kit"]], "HugeCTR Talks and Blogs": [[37, "hugectr-talks-and-blogs"]], "Talks": [[37, "talks"]], "Blogs": [[37, "blogs"]], "Introduction to HugeCTR": [[38, "introduction-to-hugectr"]], "About HugeCTR": [[38, "about-hugectr"]], "Installing and Building HugeCTR": [[38, "installing-and-building-hugectr"]], "Installing HugeCTR Using NGC Containers": [[38, "installing-hugectr-using-ngc-containers"]], "Building HugeCTR from Scratch": [[38, "building-hugectr-from-scratch"]], "Tools": [[38, "tools"]], "Generating Synthetic Data and Benchmarks": [[38, "generating-synthetic-data-and-benchmarks"]], "Downloading and Preprocessing Datasets": [[38, "downloading-and-preprocessing-datasets"]], "Merlin HugeCTR": [[39, "merlin-hugectr"]], "Related Resources": [[39, "related-resources"]], "HugeCTR Embedding Collection": [[40, "hugectr-embedding-collection"]], "About this Notebook": [[40, "about-this-notebook"]], "Concepts and API Reference": [[40, "concepts-and-api-reference"]], "Setup": [[40, "setup"], [41, "setup"], [42, "setup"]], "Use an Embedding Collection with a DLRM Model": [[40, "use-an-embedding-collection-with-a-dlrm-model"]], "Data Preparation": [[40, "data-preparation"], [42, "data-preparation"], [53, "data-preparation"]], "Run the following commands on the terminal to prepare the data for this notebook": [[40, "run-the-following-commands-on-the-terminal-to-prepare-the-data-for-this-notebook"]], "Prepare the Training Script": [[40, "prepare-the-training-script"]], "Embedding Table Placement Strategy: Round Robin": [[40, "embedding-table-placement-strategy-round-robin"]], "Embedding Table Placement Strategy: Uniform": [[40, "embedding-table-placement-strategy-uniform"]], "Embedding Table Placement Strategy: Hybrid": [[40, "embedding-table-placement-strategy-hybrid"]], "Use Dynamic Hash Table with Round Robin Table Placement Strategy": [[40, "use-dynamic-hash-table-with-round-robin-table-placement-strategy"]], "Train from Scratch": [[41, "train-from-scratch"]], "Convert HugeCTR to ONNX": [[41, "convert-hugectr-to-onnx"]], "1. Inference with HPS & ONNX": [[41, "inference-with-hps-onnx"]], "2. Lookup the Embedding Vector from DLPack": [[41, "lookup-the-embedding-vector-from-dlpack"]], "3. Multi-process inference": [[41, "multi-process-inference"]], "4. Redis Cluster deployment (without TLS/SSL)": [[41, "redis-cluster-deployment-without-tls-ssl"]], "5. Redis Cluster deployment (with TLS/SSL)": [[41, "redis-cluster-deployment-with-tls-ssl"]], "HugeCTR End-end Example with NVTabular": [[42, "hugectr-end-end-example-with-nvtabular"]], "Data Preprocessing using NVTabular": [[42, "data-preprocessing-using-nvtabular"]], "Training a WDL model with HugeCTR": [[42, "training-a-wdl-model-with-hugectr"]], "HugeCTR Example Notebooks": [[43, "hugectr-example-notebooks"]], "1. Clone the HugeCTR Repository": [[43, "clone-the-hugectr-repository"]], "2. Pull the NGC Docker and run it": [[43, "pull-the-ngc-docker-and-run-it"]], "3. Customized Building (Optional)": [[43, "customized-building-optional"]], "4. Start the Jupyter Notebook": [[43, "start-the-jupyter-notebook"]], "Training Recommender Systems on Multi-modal Data": [[44, "training-recommender-systems-on-multi-modal-data"]], "MovieLens-25M: Download and Convert": [[45, "movielens-25m-download-and-convert"]], "Getting Started": [[45, "getting-started"]], "Convert the dataset": [[45, "convert-the-dataset"]], "Splitting into train and validation dataset": [[45, "splitting-into-train-and-validation-dataset"]], "Next steps": [[45, "next-steps"]], "MovieLens Data Enrichment": [[46, "movielens-data-enrichment"]], "Scraping data from IMDB": [[46, "scraping-data-from-imdb"]], "Collect synopsis for all movies": [[46, "collect-synopsis-for-all-movies"]], "Scraping movie posters": [[46, "scraping-movie-posters"]], "Movie Poster Feature Extraction with ResNet": [[47, "movie-poster-feature-extraction-with-resnet"]], "Download a pretrained ResNet-50 from NVIDIA GPU cloud": [[47, "download-a-pretrained-resnet-50-from-nvidia-gpu-cloud"]], "Extract features for all movies": [[47, "extract-features-for-all-movies"]], "Movie Synopsis Feature Extraction with Bart text summarization": [[48, "movie-synopsis-feature-extraction-with-bart-text-summarization"]], "Download pretrained BART model": [[48, "download-pretrained-bart-model"]], "Extracting embeddings for all movie\u2019s synopsis": [[48, "extracting-embeddings-for-all-movie-s-synopsis"]], "Creating Multi-Modal Movie Feature Store": [[49, "creating-multi-modal-movie-feature-store"]], "Real data": [[49, "real-data"]], "Synthetic data": [[49, "synthetic-data"]], "ETL with NVTabular": [[50, "etl-with-nvtabular"]], "Defining our Preprocessing Pipeline": [[50, "defining-our-preprocessing-pipeline"]], "Running the pipeline": [[50, "running-the-pipeline"]], "Checking the pre-processing outputs": [[50, "checking-the-pre-processing-outputs"]], "Training HugeCTR Model with Pre-trained Embeddings": [[51, "training-hugectr-model-with-pre-trained-embeddings"]], "Loading pretrained movie features into non-trainable embedding layer": [[51, "loading-pretrained-movie-features-into-non-trainable-embedding-layer"]], "Define and train model": [[51, "define-and-train-model"]], "Multi-modal Example Notebooks": [[52, "multi-modal-example-notebooks"]], "HugeCTR Training with Remote File System Example": [[53, "hugectr-training-with-remote-file-system-example"]], "Setup HugeCTR": [[53, "setup-hugectr"]], "Training with HDFS Example": [[53, "training-with-hdfs-example"]], "Training a DLRM model": [[53, "training-a-dlrm-model"]], "Training a DCN model with AWS S3": [[53, "training-a-dcn-model-with-aws-s3"]], "Data preparation": [[53, "id1"], [53, "id2"]], "Training a DCN model with Google Cloud Storage": [[53, "training-a-dcn-model-with-google-cloud-storage"]], "Performance": [[54, "performance"]], "MLPerf on DGX-2 and DGX A100": [[54, "mlperf-on-dgx-2-and-dgx-a100"]], "Evaluating HugeCTR\u2019s Performance on the DGX-1": [[54, "evaluating-hugectr-s-performance-on-the-dgx-1"]], "Evaluating HugeCTR\u2019s Performance on TensorFlow": [[54, "evaluating-hugectr-s-performance-on-tensorflow"]], "Release Notes": [[55, "release-notes"]], "What\u2019s New in Version 24.06": [[55, "what-s-new-in-version-24-06"]], "What\u2019s New in Version 23.12": [[55, "what-s-new-in-version-23-12"]], "What\u2019s New in Version 23.11": [[55, "what-s-new-in-version-23-11"]], "What\u2019s New in Version 23.08": [[55, "what-s-new-in-version-23-08"]], "What\u2019s New in Version 23.06": [[55, "what-s-new-in-version-23-06"]], "What\u2019s New in Version 23.04": [[55, "what-s-new-in-version-23-04"]], "What\u2019s New in Version 23.02": [[55, "what-s-new-in-version-23-02"]], "What\u2019s New in Version 4.3": [[55, "what-s-new-in-version-4-3"]], "What\u2019s New in Version 4.2": [[55, "what-s-new-in-version-4-2"]], "What\u2019s New in Version 4.1": [[55, "what-s-new-in-version-4-1"]], "What\u2019s New in Version 4.0": [[55, "what-s-new-in-version-4-0"]], "What\u2019s New in Version 3.9": [[55, "what-s-new-in-version-3-9"]], "What\u2019s New in Version 3.8": [[55, "what-s-new-in-version-3-8"]], "What\u2019s New in Version 3.7": [[55, "what-s-new-in-version-3-7"]], "What\u2019s New in Version 3.6": [[55, "what-s-new-in-version-3-6"]], "What\u2019s New in Version 3.5": [[55, "what-s-new-in-version-3-5"]], "What\u2019s New in Version 3.4.1": [[55, "what-s-new-in-version-3-4-1"]], "What\u2019s New in Version 3.4": [[55, "what-s-new-in-version-3-4"]], "What\u2019s New in Version 3.3.1": [[55, "what-s-new-in-version-3-3-1"]], "What\u2019s New in Version 3.3": [[55, "what-s-new-in-version-3-3"]], "What\u2019s New in Version 3.2.1": [[55, "what-s-new-in-version-3-2-1"]], "What\u2019s New in Version 3.2": [[55, "what-s-new-in-version-3-2"]], "What\u2019s New in Version 3.1": [[55, "what-s-new-in-version-3-1"]], "What\u2019s New in Version 3.0.1": [[55, "what-s-new-in-version-3-0-1"]], "What\u2019s New in Version 3.0": [[55, "whats-new-in-version-3-0"]], "What\u2019s New in Version 2.3": [[55, "what-s-new-in-version-2-3"]], "Known Issues": [[55, "known-issues"]]}, "indexentries": {"init() (in module hierarchical_parameter_server)": [[8, "hierarchical_parameter_server.Init"]], "lookuplayer (class in hierarchical_parameter_server)": [[9, "hierarchical_parameter_server.LookupLayer"]], "sparselookuplayer (class in hierarchical_parameter_server)": [[9, "hierarchical_parameter_server.SparseLookupLayer"]], "call() (hierarchical_parameter_server.lookuplayer method)": [[9, "hierarchical_parameter_server.LookupLayer.call"]], "call() (hierarchical_parameter_server.sparselookuplayer method)": [[9, "hierarchical_parameter_server.SparseLookupLayer.call"]]}}) \ No newline at end of file diff --git a/review/pr-458/sparse_operation_kit.html b/review/pr-458/sparse_operation_kit.html deleted file mode 100644 index a65991cfed..0000000000 --- a/review/pr-458/sparse_operation_kit.html +++ /dev/null @@ -1,146 +0,0 @@ - - - - - - - Sparse Operation Kit — Merlin HugeCTR documentation - - - - - - - - - - - - - - - - - - - - - - - - - -
- - -
- -
-
-
-
    -
  • - -
  • -
  • -
-
-
-
-
- -
-

Sparse Operation Kit

-

Sparse Operation Kit (SOK) is a Python package wrapped GPU accelerated operations dedicated for sparse training / inference cases. It is designed to be compatible with common deep learning (DL) frameworks like TensorFlow. -In sparse training / inference scenarios, for instance, CTR estimation, there are vast amounts of parameters which cannot fit into the memory of a single GPU. Many common DL frameworks only offer limited support for model parallelism (MP), because it can complicate using all available GPUs in a cluster to accelerate the whole training process. -SOK provides broad MP functionality to fully utilize all available GPUs, regardless of whether these GPUs are located in a single machine or multiple machines. Simultaneously, SOK takes advantage of existing data-parallel (DP) capabilities of DL frameworks to accelerate training while minimizing code changes. With SOK embedding layers, you can build a DNN model with mixed MP and DP. MP is used to shard large embedding parameter tables, such that they are distributed among the available GPUs to balance the workload, while DP is used for layers that only consume little GPU resources.

-

Please check this SOK Documentation for detail.

-_images/workflow_of_embeddinglayer.png -
- - -
-
- -
-
-
-
- - - - - - - \ No newline at end of file