Skip to content

Commit 93802fd

Browse files
authored
Adds container packages for PyTorch Mask R-CNN Training and Inference (#89)
* Add PyTorch SPR Mask R-CNN package specs, docs, and quickstart filesg * Update build arg * Doc updates for training * Print status * update error handling * try to run without pretrained model for training * Fix else * Doc update * Doc updates * Regenerated dockerfiles * Add new line at EOF
1 parent 9ece6ea commit 93802fd

31 files changed

+1199
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Copyright (c) 2020-2021 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ============================================================================
15+
#
16+
# THIS IS A GENERATED DOCKERFILE.
17+
#
18+
# This file was assembled from multiple pieces, whose use is documented
19+
# throughout. Please refer to the TensorFlow dockerfiles documentation
20+
# for more information.
21+
22+
ARG PYTORCH_IMAGE="model-zoo"
23+
ARG PYTORCH_TAG="pytorch-ipex-spr"
24+
25+
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch
26+
27+
# Build Torch Vision
28+
ARG TORCHVISION_VERSION=v0.8.0
29+
30+
RUN source ~/anaconda3/bin/activate pytorch && \
31+
git clone https://github.com/pytorch/vision && \
32+
cd vision && \
33+
git checkout ${TORCHVISION_VERSION} && \
34+
python setup.py install
35+
36+
RUN source ~/anaconda3/bin/activate pytorch && \
37+
pip install matplotlib Pillow pycocotools && \
38+
pip install yacs opencv-python cityscapesscripts transformers && \
39+
conda install -y libopenblas psutil && \
40+
cd /workspace/installs && \
41+
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \
42+
tar -xzf gperftools-2.7.90.tar.gz && \
43+
cd gperftools-2.7.90 && \
44+
./configure --prefix=$HOME/.local && \
45+
make && \
46+
make install && \
47+
rm -rf /workspace/installs/
48+
49+
ARG PACKAGE_DIR=model_packages
50+
51+
ARG PACKAGE_NAME="pytorch-spr-maskrcnn-inference"
52+
53+
ARG MODEL_WORKSPACE
54+
55+
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID
56+
# this allows the default user (root) to work in k8s single-node, multi-node
57+
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE}
58+
59+
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE}
60+
61+
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x
62+
63+
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME}
64+
65+
ARG MASKRCNN_DIR="/workspace/pytorch-spr-maskrcnn-inference/models/maskrcnn"
66+
67+
RUN source ~/anaconda3/bin/activate pytorch && \
68+
cd ${MASKRCNN_DIR} && \
69+
cd maskrcnn-benchmark && \
70+
python setup.py install && \
71+
pip install onnx
72+
73+
FROM intel-optimized-pytorch AS release
74+
COPY --from=intel-optimized-pytorch /root/anaconda3 /root/anaconda3
75+
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/
76+
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/
77+
78+
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX"
79+
80+
ENV PATH="~/anaconda3/bin:${PATH}"
81+
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD"
82+
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
83+
ENV BASH_ENV=/root/.bash_profile
84+
WORKDIR /workspace/
85+
RUN yum install -y numactl mesa-libGL && \
86+
yum clean all && \
87+
echo "source activate pytorch" >> /root/.bash_profile
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Copyright (c) 2020-2021 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ============================================================================
15+
#
16+
# THIS IS A GENERATED DOCKERFILE.
17+
#
18+
# This file was assembled from multiple pieces, whose use is documented
19+
# throughout. Please refer to the TensorFlow dockerfiles documentation
20+
# for more information.
21+
22+
ARG PYTORCH_IMAGE="model-zoo"
23+
ARG PYTORCH_TAG="pytorch-ipex-spr"
24+
25+
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch
26+
27+
# Build Torch Vision
28+
ARG TORCHVISION_VERSION=v0.8.0
29+
30+
RUN source ~/anaconda3/bin/activate pytorch && \
31+
git clone https://github.com/pytorch/vision && \
32+
cd vision && \
33+
git checkout ${TORCHVISION_VERSION} && \
34+
python setup.py install
35+
36+
RUN source ~/anaconda3/bin/activate pytorch && \
37+
pip install matplotlib Pillow pycocotools && \
38+
pip install yacs opencv-python cityscapesscripts transformers && \
39+
conda install -y libopenblas psutil && \
40+
cd /workspace/installs && \
41+
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \
42+
tar -xzf gperftools-2.7.90.tar.gz && \
43+
cd gperftools-2.7.90 && \
44+
./configure --prefix=$HOME/.local && \
45+
make && \
46+
make install && \
47+
rm -rf /workspace/installs/
48+
49+
ARG PACKAGE_DIR=model_packages
50+
51+
ARG PACKAGE_NAME="pytorch-spr-maskrcnn-training"
52+
53+
ARG MODEL_WORKSPACE
54+
55+
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID
56+
# this allows the default user (root) to work in k8s single-node, multi-node
57+
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE}
58+
59+
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE}
60+
61+
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x
62+
63+
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME}
64+
65+
ARG MASKRCNN_DIR="/workspace/pytorch-spr-maskrcnn-training/models/maskrcnn"
66+
67+
RUN source ~/anaconda3/bin/activate pytorch && \
68+
cd ${MASKRCNN_DIR} && \
69+
cd maskrcnn-benchmark && \
70+
python setup.py install && \
71+
pip install onnx
72+
73+
FROM intel-optimized-pytorch AS release
74+
COPY --from=intel-optimized-pytorch /root/anaconda3 /root/anaconda3
75+
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/
76+
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/
77+
78+
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX"
79+
80+
ENV PATH="~/anaconda3/bin:${PATH}"
81+
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD"
82+
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000"
83+
ENV BASH_ENV=/root/.bash_profile
84+
WORKDIR /workspace/
85+
RUN yum install -y numactl mesa-libGL && \
86+
yum clean all && \
87+
echo "source activate pytorch" >> /root/.bash_profile
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## Build the container
2+
3+
The <model name> <mode> package has scripts and a Dockerfile that are
4+
used to build a workload container that runs the model. This container
5+
uses the PyTorch/IPEX container as it's base, so ensure that you have built
6+
the `pytorch-ipex-spr.tar.gz` container prior to building this model container.
7+
8+
Use `docker images` to verify that you have the base container built. For example:
9+
```
10+
$ docker images | grep pytorch-ipex-spr
11+
model-zoo pytorch-ipex-spr fecc7096a11e 40 minutes ago 8.31GB
12+
```
13+
14+
To build the <model name> <mode> container, extract the package and
15+
run the `build.sh` script.
16+
```
17+
# Extract the package
18+
tar -xzf <package name>
19+
cd <package dir>
20+
21+
# Build the container
22+
./build
23+
```
24+
25+
After the build completes, you should have a container called
26+
`<docker image>` that will be used to run the model.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
## Datasets
2+
3+
The [COCO dataset](https://cocodataset.org) is used to run <model name> <mode>.
4+
Download and extract the 2017 validation images and annotations from the
5+
[COCO dataset website](https://cocodataset.org/#download) to a `coco` folder
6+
and unzip the files. After extracting the zip files, your dataset directory
7+
structure should look something like this:
8+
```
9+
coco
10+
├── annotations
11+
│ ├── captions_train2017.json
12+
│ ├── captions_val2017.json
13+
│ ├── instances_train2017.json
14+
│ ├── instances_val2017.json
15+
│ ├── person_keypoints_train2017.json
16+
│ └── person_keypoints_val2017.json
17+
└── val2017
18+
├── 000000000139.jpg
19+
├── 000000000285.jpg
20+
├── 000000000632.jpg
21+
└── ...
22+
```
23+
The parent of the `annotations` and `val2017` directory (in this example `coco`)
24+
is the directory that should be used when setting the `DATASET_DIR` environment
25+
variable for <model name> (for example: `export DATASET_DIR=/home/<user>/coco`).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
<!-- 10. Description -->
2+
## Description
3+
4+
This document has instructions for running <model name> <mode> using
5+
Intel-optimized PyTorch.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
## Run the model
2+
3+
Download the pretrained model and set the `PRETRAINED_MODEL` environment variable
4+
to point to the file.
5+
```
6+
curl -O https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_R_50_FPN_1x.pth
7+
export PRETRAINED_MODEL=$(pwd)/e2e_mask_rcnn_R_50_FPN_1x.pth
8+
```
9+
10+
After you've downloaded the pretrained model and followed the instructions to
11+
[build the container](#build-the-container) and [prepare the dataset](#datasets),
12+
use the `run.sh` script from the container package to run <model name> <mode>.
13+
Set environment variables to specify the dataset directory, precision to run, and
14+
an output directory. By default, the `run.sh` script will run the
15+
`inference_realtime.sh` quickstart script. To run a different script, specify
16+
the name of the script using the `SCRIPT` environment variable.
17+
```
18+
# Navigate to the container package directory
19+
cd <package dir>
20+
21+
# Set the required environment vars
22+
export PRECISION=<specify the precision to run>
23+
export PRETRAINED_MODEL=<path to the downloaded .pth file>
24+
export DATASET_DIR=<path to the dataset>
25+
export OUTPUT_DIR=<directory where log files will be written>
26+
27+
# Run the container with inference_realtime.sh quickstart script
28+
./run.sh
29+
30+
# Specify a different quickstart script to run, for example, accuracy.sh
31+
SCRIPT=accuracy.sh ./run.sh
32+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<!--- 80. License -->
2+
## License
3+
4+
Licenses can be found in the model package, in the `licenses` directory.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
<!--- 40. Quick Start Scripts -->
2+
## Quick Start Scripts
3+
4+
| Script name | Description |
5+
|-------------|-------------|
6+
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance for the specified precision (fp32, int8, or bf16). |
7+
| `inference_throughput.sh` | Runs multi instance batch inference using 24 cores per instance for the specified precision (fp32, int8, or bf16). |
8+
| `accuracy.sh` | Measures the inference accuracy for the specified precision (fp32, int8, or bf16). |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
<!--- 0. Title -->
2+
# PyTorch <model name> <mode>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Model Package
2+
3+
The model package includes the Dockerfile and scripts needed to build and
4+
run <model name> <mode> in a container.
5+
```
6+
<package dir>
7+
├── README.md
8+
├── build.sh
9+
├── licenses
10+
│   ├── LICENSE
11+
│   └── third_party
12+
├── model_packages
13+
│   └── <package name>
14+
├── <package dir>.Dockerfile
15+
└── run.sh
16+
```

0 commit comments

Comments
 (0)