Merge branch 'main'

ashahba · ashahba · commit f77deff881bd · 2023-11-20T15:26:04.000-08:00
diff --git a/docker/hf_k8s/README.md b/docker/hf_k8s/README.md
@@ -71,7 +71,8 @@ The [Docker](https://www.docker.com) container used in this example includes all
 distributed PyTorch training using a Hugging Face model and a fine tuning script. This directory includes the
 [`Dockerfile`](Dockerfile) that was used to build the container.
 
-The container has the following major packages included:
+An image has been published to DockerHub (`intel/ai-workflows:torch-2.0.1-huggingface-multinode-py3.9`) with
+the following major packages included:
 
 | Package Name | Version | Purpose |
 |--------------|---------|---------|
@@ -83,7 +84,10 @@ The container has the following major packages included:
 #### Container Build
 
 The container can be built either using the default package versions from the table above or by specifying your own
-package version using build arguments. Use one of these options to build the container:
+package version using build arguments. This section (and the ["Container Push" section](#container-push)) can be skipped if
+you are using the published `intel/ai-workflows:torch-2.0.1-huggingface-multinode-py3.9` container.
+
+Use one of these options to build the container:
 
 a. The container can be built with the default package versions using the following command:
    ```
@@ -115,7 +119,7 @@ b. The build arguments below that can be provided to install a different version
 
 #### Container Push
 
-After you've built the Docker container using the instructions above, the container needs to be pushed for the
+If you are building your own Docker container using the instructions above, the container needs to be pushed for the
 Kubernetes cluster to have access to the image. If you have a Docker container registry (such as
 [DockerHub](https://hub.docker.com)), you can push the container to that registry. Otherwise, we have alternative
 instructions for getting the container distributed to the cluster nodes by saving the image and copying it to the nodes.
@@ -189,9 +193,8 @@ fine tune the model.
 
 2. Edit your values file based on the parameters that you would like to use and your cluster. Key parameters to look
    at and edit are:
-   * `image.name` based on the image that was pushed to the container registry or copied to the Kubernetes cluster nodes
-   * `image.tag` based on the image tag that was pushed to the container registry or copied to the Kubernetes cluster
-     nodes
+   * `image.name` if have built your own container, otherwise the default `intel/ai-workflows` image will be used
+   * `image.tag` if have built your own container, otherwise the default `torch-2.0.1-huggingface-multinode-py3.9` tag will be used
    * `elasticPolicy.minReplicas` and `elasticPolicy.maxReplicas` based on the number of workers being used
    * `distributed.workers` should be set to the number of worker that will be used for the job
    * If you are using `chart/values.yaml` for your own workload, fill in either `train.datasetName` (the name of a
diff --git a/docker/hf_k8s/values.md b/docker/hf_k8s/values.md
@@ -26,8 +26,8 @@ A Kubernetes secret is to store your Hugging Face token.
 
 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
-| `image.name` | string | `intel/ai-workflow` <!-- TODO: Replace with the public image when it's available --> | Name of the image to use for the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
-| `image.tag` | string | `torch-2.0.1-huggingface-multinode-py3.9` <!-- TODO: Replace with the public tag when it's available --> | The image tag for the container that will be used to run the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
+| `image.name` | string | `intel/ai-workflow` | Name of the image to use for the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
+| `image.tag` | string | `torch-2.0.1-huggingface-multinode-py3.9` | The image tag for the container that will be used to run the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
 | `image.pullPolicy` | string | `IfNotPresent` | Determines when the kubelet will pull the image to the worker nodes. Choose from: `IfNotPresent`, `Always`, or `Never`. If updates to the image have been made, use `Always` to ensure the newest image is used. |
 
 ## Elastic policy