Skip to content

Latest commit

 

History

History
 
 

pytorch

PyTorch on OpenPAI

This guide introduces how to run PyTorch job on OpenPAI. The following contents show some basic PyTorch examples, other customized PyTorch code can be run similarly.

Contents

  1. PyTorch MNIST digit recognition
  2. PyTorch regression examples

PyTorch MNIST digit recognition examples

To run PyTorch examples in OpenPAI, you need to prepare a job configuration file and submit it through webportal.

OpenPAI packaged the docker env required by the job for user to use. User could refer to DOCKER.md to customize this example docker env. If user have built a customized image and pushed it to Docker Hub, replace our pre-built image openpai/pai.example.pytorch with your own.

Here're some configuration file examples:

{
  "jobName": "pytorch-mnist",
  "image": "openpai/pai.example.pytorch",
  "taskRoles": [
    {
      "name": "main",
      "taskNumber": 1,
      "cpuNumber": 4,
      "memoryMB": 8192,
      "gpuNumber": 1,
      "command": "cd examples/mnist && python main.py"
    }
  ]
}

PyTorch regression examples

{
  "jobName": "pytorch-regression",
  "image": "openpai/pai.example.pytorch",
  "taskRoles": [
    {
      "name": "main",
      "taskNumber": 1,
      "cpuNumber": 4,
      "memoryMB": 8192,
      "gpuNumber": 0,
      "command": "cd examples/regression && python main.py"
    }
  ]
}

For more details on how to write a job configuration file, please refer to job tutorial.

Note:

Since PAI runs PyTorch jobs in Docker, the trainning speed on PAI should be similar to speed on host.