This guide introduces how to run PyTorch job on OpenPAI. The following contents show some basic PyTorch examples, other customized PyTorch code can be run similarly.
To run PyTorch examples in OpenPAI, you need to prepare a job configuration file and submit it through webportal.
OpenPAI packaged the docker env required by the job for user to use. User could refer to DOCKER.md to customize this example docker env. If user have built a customized image and pushed it to Docker Hub, replace our pre-built image openpai/pai.example.pytorch
with your own.
Here're some configuration file examples:
{
"jobName": "pytorch-mnist",
"image": "openpai/pai.example.pytorch",
"taskRoles": [
{
"name": "main",
"taskNumber": 1,
"cpuNumber": 4,
"memoryMB": 8192,
"gpuNumber": 1,
"command": "cd examples/mnist && python main.py"
}
]
}
{
"jobName": "pytorch-regression",
"image": "openpai/pai.example.pytorch",
"taskRoles": [
{
"name": "main",
"taskNumber": 1,
"cpuNumber": 4,
"memoryMB": 8192,
"gpuNumber": 0,
"command": "cd examples/regression && python main.py"
}
]
}
For more details on how to write a job configuration file, please refer to job tutorial.
Since PAI runs PyTorch jobs in Docker, the trainning speed on PAI should be similar to speed on host.