- Quick start: how to write and submit a CIFAR-10 job
- List of off-the-shelf examples
- List of customized job template
- Contributing
In this section, we will use CIFAR-10 training job as an example to explain how to write and submit a job in OpenPAI.
CIFAR-10 is an established computer-vision dataset used for image classification.
- Full example for tensorflow cifar10 image classification training on OpenPAI:
{
// Name for the job, need to be unique
"jobName": "tensorflow-cifar10",
// URL pointing to the Docker image for all tasks in the job
"image": "openpai/pai.example.tensorflow",
// Data directory existing on HDFS
"dataDir": "/tmp/data",
// Output directory on HDFS,
"outputDir": "/tmp/output",
// List of taskRole, one task role at least
"taskRoles": [
{
// Name for the task role
"name": "cifar_train",
// Number of tasks for the task role, no less than 1
"taskNumber": 1,
// CPU number for one task in the task role, no less than 1
"cpuNumber": 8,
// Memory for one task in the task role, no less than 100
"memoryMB": 32768,
// GPU number for one task in the task role, no less than 0
"gpuNumber": 1,
// Executable command for tasks in the task role, can not be empty
"command": "git clone https://github.com/tensorflow/models && cd models/research/slim && python download_and_convert_data.py --dataset_name=cifar10 --dataset_dir=$PAI_DATA_DIR && python train_image_classifier.py --batch_size=64 --model_name=inception_v3 --dataset_name=cifar10 --dataset_split_name=train --dataset_dir=$PAI_DATA_DIR --train_dir=$PAI_OUTPUT_DIR"
}
]
}
-
Save content to a file. Name this file as cifar10.json
Users can refer to this tutorial submit a job in web portal for job submission from OpenPAI webportal.
Examples which can be run by submitting the json straightly without any modification.
- tensorflow.cifar10.json: Single GPU trainning on CIFAR-10 using TensorFlow.
- pytorch.mnist.json: Single GPU trainning on MNIST using PyTorch.
- pytorch.regression.json: Regression using PyTorch.
- mxnet.autoencoder.json: Autoencoder using MXNet.
- mxnet.image-classification.json: Image
- serving.tensorflow.json: TensorFlow model serving. classification on MNIST using MXNet.
These user could customize and run these jobs over OpenPAI.
-
CNTK:
If you want to contribute a job example that can be run on PAI, please open a new pull request.
-
Prepare a folder under pai/examples folder, for example create pai/examples/caffe2/
-
Prepare example files:
Under Caffe2 example dir, user should prepare these files for an example's contribution PR:
- README.md: Example's introductions
- Dockerfile: Example's dependencies
- Pai job json file: Example's OpenPAI job json template
- [Optional] Code file: Example's code file