Skip to content

Commit f77deff

Browse files
committed
Merge branch 'main'
2 parents 1cdc89d + 9d5727f commit f77deff

File tree

2 files changed

+11
-8
lines changed

2 files changed

+11
-8
lines changed

docker/hf_k8s/README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ The [Docker](https://www.docker.com) container used in this example includes all
7171
distributed PyTorch training using a Hugging Face model and a fine tuning script. This directory includes the
7272
[`Dockerfile`](Dockerfile) that was used to build the container.
7373

74-
The container has the following major packages included:
74+
An image has been published to DockerHub (`intel/ai-workflows:torch-2.0.1-huggingface-multinode-py3.9`) with
75+
the following major packages included:
7576

7677
| Package Name | Version | Purpose |
7778
|--------------|---------|---------|
@@ -83,7 +84,10 @@ The container has the following major packages included:
8384
#### Container Build
8485

8586
The container can be built either using the default package versions from the table above or by specifying your own
86-
package version using build arguments. Use one of these options to build the container:
87+
package version using build arguments. This section (and the ["Container Push" section](#container-push)) can be skipped if
88+
you are using the published `intel/ai-workflows:torch-2.0.1-huggingface-multinode-py3.9` container.
89+
90+
Use one of these options to build the container:
8791

8892
a. The container can be built with the default package versions using the following command:
8993
```
@@ -115,7 +119,7 @@ b. The build arguments below that can be provided to install a different version
115119

116120
#### Container Push
117121

118-
After you've built the Docker container using the instructions above, the container needs to be pushed for the
122+
If you are building your own Docker container using the instructions above, the container needs to be pushed for the
119123
Kubernetes cluster to have access to the image. If you have a Docker container registry (such as
120124
[DockerHub](https://hub.docker.com)), you can push the container to that registry. Otherwise, we have alternative
121125
instructions for getting the container distributed to the cluster nodes by saving the image and copying it to the nodes.
@@ -189,9 +193,8 @@ fine tune the model.
189193

190194
2. Edit your values file based on the parameters that you would like to use and your cluster. Key parameters to look
191195
at and edit are:
192-
* `image.name` based on the image that was pushed to the container registry or copied to the Kubernetes cluster nodes
193-
* `image.tag` based on the image tag that was pushed to the container registry or copied to the Kubernetes cluster
194-
nodes
196+
* `image.name` if have built your own container, otherwise the default `intel/ai-workflows` image will be used
197+
* `image.tag` if have built your own container, otherwise the default `torch-2.0.1-huggingface-multinode-py3.9` tag will be used
195198
* `elasticPolicy.minReplicas` and `elasticPolicy.maxReplicas` based on the number of workers being used
196199
* `distributed.workers` should be set to the number of worker that will be used for the job
197200
* If you are using `chart/values.yaml` for your own workload, fill in either `train.datasetName` (the name of a

docker/hf_k8s/values.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ A Kubernetes secret is to store your Hugging Face token.
2626

2727
| Key | Type | Default | Description |
2828
|-----|------|---------|-------------|
29-
| `image.name` | string | `intel/ai-workflow` <!-- TODO: Replace with the public image when it's available --> | Name of the image to use for the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
30-
| `image.tag` | string | `torch-2.0.1-huggingface-multinode-py3.9` <!-- TODO: Replace with the public tag when it's available --> | The image tag for the container that will be used to run the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
29+
| `image.name` | string | `intel/ai-workflow` | Name of the image to use for the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
30+
| `image.tag` | string | `torch-2.0.1-huggingface-multinode-py3.9` | The image tag for the container that will be used to run the PyTorch job. The container should include the fine tuning script and all the dependencies required to run the job. |
3131
| `image.pullPolicy` | string | `IfNotPresent` | Determines when the kubelet will pull the image to the worker nodes. Choose from: `IfNotPresent`, `Always`, or `Never`. If updates to the image have been made, use `Always` to ensure the newest image is used. |
3232

3333
## Elastic policy

0 commit comments

Comments
 (0)