Skip to content

Latest commit

 

History

History
76 lines (49 loc) · 2.6 KB

DOCKER.md

File metadata and controls

76 lines (49 loc) · 2.6 KB

PyTorch on PAI docker env

Contents

  1. Basic environment
  2. Advanced environment

Basic environment

First of all, PAI runs all jobs in Docker container.

Install Docker-CE if you haven't. Register an account at public Docker registry Docker Hub if you do not have a private Docker registry.

You can also jump to PyTorch examples using pre-built images on Docker Hub.

We need to build a PyTorch image with GPU support to run PyTorch workload on PAI, this can be done in two steps:

  1. Build a base Docker image for PAI. We prepared a base Dockerfile which can be built directly.

    $ cd ../Dockerfiles/cuda8.0-cudnn6
    $ sudo docker build -f Dockerfile.build.base \
    >                   -t pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04 .
    $ cd -
  2. Prepare PyTorch envoriment in a Dockerfile using the base image.

    Write a PyTorch Dockerfile and save it to Dockerfile.example.pytorch:

    FROM pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04
    
    # install git
    RUN apt-get -y update && apt-get -y install git
    
    # install PyTorch dependeces using pip
    RUN pip install torch torchvision
    
    # clone PyTorch examples
    RUN git clone https://github.com/pytorch/examples.git

    Build the Docker image from Dockerfile.example.pytorch:

    $ sudo docker build -f Dockerfile.example.pytorch -t pai.example.pytorch .

    Push the Docker image to a Docker registry:

    $ sudo docker tag pai.example.pytorch USER/pai.example.pytorch
    $ sudo docker push USER/pai.example.pytorch

    Note: Replace USER with the Docker Hub username you registered, you will be required to login before pushing Docker image.

Advanced environment

You can skip this section if you do not need to prepare other dependencies.

You can customize runtime PyTorch environment in Dockerfile.example.pytorch, for example, adding other dependeces in Dockerfile:

FROM pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04

# install other packages using apt-get
RUN apt-get -y update && apt-get -y install git PACKAGE

# install other packages using pip
RUN pip install torch torchvision PACKAGE

# clone PyTorch examples
RUN git clone https://github.com/pytorch/examples.git