Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin versions in AMD Docker #4

Merged
merged 4 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions closed/AMD/code/llama2-70b-99.9/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,17 @@ KV cache scales for the quantized model weights are used and were downloaded fro
To generate results for the full submission, running the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_scenarios.sh
bash ./run_scenarios.sh
```

To generate results for the Offline scenario only, run the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2/Offline`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_tests_Offline.sh
bash ./run_tests_Offline.sh
```

To generate results for the Server scenario only, run the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2/Server`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_tests_Server.sh
```
bash ./run_tests_Server.sh
```
10 changes: 2 additions & 8 deletions closed/AMD/code/llama2-70b-99.9/test_VllmFp8/run_scenarios.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,12 @@ export RESULTS_DIR=${LAB_CLOG}/${TS_START_BENCHMARKS}
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"

echo "Running Offline"
./run_tests_Offline.sh
bash run_tests_Offline.sh
echo "Done Offline"

echo "Running Server"
./run_tests_Server.sh
bash run_tests_Server.sh
echo "Done Server"

echo "Done Benchmarks"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"

echo "Packaging and checking submission results"
python ../submission/package_submission.py \
--base-package-dir ${PACKAGE_DRAFT_DIR} \
--system-name ${SYSTEM_NAME} \
--input-dir ${RESULTS_DIR}
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
for i in $(seq 1 ${NUM_ITERS})
do
echo "Running $SCENARIO - Performance run $i/$NUM_ITERS"
ITER=$i ./test_VllmFp8_Offline_perf.sh
ITER=$i bash test_VllmFp8_Offline_perf.sh
done
echo "Running $SCENARIO - Accuracy"
./test_VllmFp8_Offline_acc.sh
bash test_VllmFp8_Offline_acc.sh
echo "Running $SCENARIO - Audit"
./test_VllmFp8_Offline_audit.sh
bash test_VllmFp8_Offline_audit.sh
echo "Done"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
for i in $(seq 1 ${NUM_ITERS})
do
echo "Running $SCENARIO - Performance run $i/$NUM_ITERS"
ITER=$i ./test_VllmFp8_SyncServer_perf.sh
ITER=$i bash test_VllmFp8_SyncServer_perf.sh
done
echo "Running $SCENARIO - Accuracy"
./test_VllmFp8_SyncServer_acc.sh
bash test_VllmFp8_SyncServer_acc.sh
echo "Running $SCENARIO - Audit"
./test_VllmFp8_SyncServer_audit.sh
bash test_VllmFp8_SyncServer_audit.sh
echo "Done SyncServer"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
8 changes: 4 additions & 4 deletions closed/AMD/code/llama2-70b-99/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,17 @@ KV cache scales for the quantized model weights are used and were downloaded fro
To generate results for the full submission, running the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_scenarios.sh
bash ./run_scenarios.sh
```

To generate results for the Offline scenario only, run the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2/Offline`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_tests_Offline.sh
bash ./run_tests_Offline.sh
```

To generate results for the Server scenario only, run the command below in an inference container. Logs can be found in `/lab-hist/mlperf-results/$datetime1/$datetime2/Server`.
``` bash
cd /lab-mlperf-inference/code/llama2-70b-99.9/test_VllmFp8
./run_tests_Server.sh
```
bash ./run_tests_Server.sh
```
10 changes: 2 additions & 8 deletions closed/AMD/code/llama2-70b-99/test_VllmFp8/run_scenarios.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,12 @@ export RESULTS_DIR=${LAB_CLOG}/${TS_START_BENCHMARKS}
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"

echo "Running Offline"
./run_tests_Offline.sh
bash run_tests_Offline.sh
echo "Done Offline"

echo "Running Server"
./run_tests_Server.sh
bash run_tests_Server.sh
echo "Done Server"

echo "Done Benchmarks"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"

echo "Packaging and checking submission results"
python ../submission/package_submission.py \
--base-package-dir ${PACKAGE_DRAFT_DIR} \
--system-name ${SYSTEM_NAME} \
--input-dir ${RESULTS_DIR}
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
for i in $(seq 1 ${NUM_ITERS})
do
echo "Running $SCENARIO - Performance run $i/$NUM_ITERS"
ITER=$i ./test_VllmFp8_Offline_perf.sh
ITER=$i bash test_VllmFp8_Offline_perf.sh
done
echo "Running $SCENARIO - Accuracy"
./test_VllmFp8_Offline_acc.sh
bash test_VllmFp8_Offline_acc.sh
echo "Running $SCENARIO - Audit"
./test_VllmFp8_Offline_audit.sh
bash test_VllmFp8_Offline_audit.sh
echo "Done"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
for i in $(seq 1 ${NUM_ITERS})
do
echo "Running $SCENARIO - Performance run $i/$NUM_ITERS"
ITER=$i ./test_VllmFp8_SyncServer_perf.sh
ITER=$i bash test_VllmFp8_SyncServer_perf.sh
done
echo "Running $SCENARIO - Accuracy"
./test_VllmFp8_SyncServer_acc.sh
bash test_VllmFp8_SyncServer_acc.sh
echo "Running $SCENARIO - Audit"
./test_VllmFp8_SyncServer_audit.sh
bash test_VllmFp8_SyncServer_audit.sh
echo "Done SyncServer"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
echo "TS_START_BENCHMARKS=${TS_START_BENCHMARKS}"
12 changes: 6 additions & 6 deletions closed/AMD/docker/Dockerfile.llama2
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ RUN apt update \
&& rm -rf /var/lib/apt/lists/*

RUN pip install \
absl-py \
datasets \
evaluate \
nltk \
absl-py==2.1.0 \
datasets==2.20.0 \
evaluate==0.4.2 \
nltk==3.8.1 \
numpy==1.26.4 \
py-libnuma \
rouge_score
py-libnuma==1.2 \
rouge_score==0.1.2

WORKDIR /app
RUN git clone --recurse-submodules https://github.com/mlcommons/inference.git --branch v4.1 --depth 1 mlperf_inference \
Expand Down
7 changes: 3 additions & 4 deletions closed/AMD/docker/build_llama2.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ BASE_IMAGE=rocm/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging
VLLM_REV=799388d722e22ecb14d1011faaba54c4882cc8f5 # MLPerf-4.1
HIPBLASLT_BRANCH=8b71e7a8d26ba95774fdc372883ee0be57af3d28
FA_BRANCH=23a2b1c2f21de2289db83de7d42e125586368e66 # ck_tile - FA 2.5.9
TRITON_BRANCH=e4a0d93ff1a367c7d4eeebbcd7079ed267e6b06f
RELEASE_TAG=${RELEASE_TAG:-latest}

git clone https://github.com/ROCm/vllm
Expand All @@ -11,9 +12,7 @@ git checkout main
git pull
git checkout ${VLLM_REV}
git cherry-pick b9013696b23dde372cccecdbaf69f0c852008844 # optimizations for process output step, PR #104

docker build --build-arg BASE_IMAGE=${BASE_IMAGE} --build-arg HIPBLASLT_BRANCH=${HIPBLASLT_BRANCH} --build-arg FA_BRANCH=${FA_BRANCH} -f Dockerfile.rocm -t vllm_dev:${VLLM_REV} .

popd

docker build --build-arg BASE_IMAGE=vllm_dev:${VLLM_REV} -f Dockerfile.llama2 -t mlperf/llama_inference:${RELEASE_TAG} ..
docker build --build-arg BASE_IMAGE=${BASE_IMAGE} --build-arg HIPBLASLT_BRANCH=${HIPBLASLT_BRANCH} --build-arg FA_BRANCH=${FA_BRANCH} --build-arg TRITON_BRANCH=${TRITON_BRANCH} -f vllm/Dockerfile.rocm -t vllm_dev:${VLLM_REV} vllm \
&& docker build --build-arg BASE_IMAGE=vllm_dev:${VLLM_REV} -f Dockerfile.llama2 -t mlperf/llama_inference:${RELEASE_TAG} ..
Loading