Skip to content
Open
51 changes: 51 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
SRC_DIR := $(shell pwd)

IMAGE_NAME := phos-base-113
DOCKERFILE := $(SRC_DIR)/dockerfiles/build_113.Dockerfile

BUILD_ARGS ?= -i -3 -u -p=false
CLIENT_RUN_CMD ?= "python"

.PHONY: build clean exec

build-image:
docker build \
--build-arg proxy=http://ipads:[email protected]:11235 \
--progress=plain -f $(DOCKERFILE) -t $(IMAGE_NAME) .

build:
docker run --rm --gpus all \
-v $(SRC_DIR):/root \
--privileged --network=host --ipc=host \
$(IMAGE_NAME) \
bash -c "cd /root/scripts/build_scripts/ && bash build.sh $(BUILD_ARGS)"

server-run:
docker run --rm --gpus all -it \
-v $(SRC_DIR):/root \
--privileged --network=host --ipc=host \
$(IMAGE_NAME) \
bash -c "CUDA_VISIBLE_DEVICES=2 pos_cli --start --target daemon"

client-run:
docker run --rm --gpus all \
-v $(SRC_DIR):/root \
--privileged --network=host --ipc=host \
$(IMAGE_NAME) \
bash -c "cd /root && export LD_LIBRARY_PATH=/root/lib:$LD_LIBRARY_PATH && export LIBRARY_PATH=/root/lib:$LIBRARY_PATH && LD_PRELOAD=/root/lib/libxpuclient.so RUST_LOG=error $(CLIENT_RUN_CMD)"

clean:
docker run --rm --gpus all \
-v $(SRC_DIR):/root \
--privileged --network=host --ipc=host \
$(IMAGE_NAME) \
bash -c "cd /root/scripts/build_scripts/ && bash build.sh -c -3"

exec:
docker run --rm --gpus all -it \
-v $(SRC_DIR):/root \
--privileged --network=host --ipc=host \
$(IMAGE_NAME) \
bash


125 changes: 11 additions & 114 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,121 +63,10 @@

<br />

## I. Build and Install PhOS
## I. Quick build

### 💡 Option 1: Build and Install From Source

1. **[Clone Repository]**
First of all, clone this repository **recursively**:

```bash
git clone --recursive https://github.com/SJTU-IPADS/PhoenixOS.git
```

2. **[Start Container]**
PhOS can be built and installed on official vendor image.

> NOTE: PhOS require libc6 >= 2.29 for compiling CRIU from source.

For example, for running PhOS for CUDA 11.3,
one can build on official CUDA images
(e.g., [`nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04`](https://hub.docker.com/layers/nvidia/cuda/11.3.1-cudnn8-devel-ubuntu20.04/images/sha256-459c130c94363099b02706b9b25d9fe5822ea233203ce9fbf8dfd276a55e7e95)):


```bash
# enter repository
cd PhoenixOS/scripts/docker

# start and enter container with id 1
bash run_torch_cu113.sh -s 1

# enter / close container (no need to execute here, just listed)
bash run_torch_cu113.sh -e 1 # enter container
bash run_torch_cu113.sh -c 1 # close container
```

Note that it's important to execute docker container with root privilege, as CRIU needs the permission to C/R kernel-space memory pages.

3. **[Downloading Necesssary Assets]**
PhOS relies on some assets to build and test,
please download these assets by simply running following commands:

```bash
# inside container

# download assets
cd /root/scripts/build_scripts
bash download_assets.sh
```

4. **[Build]**
Building PhOS is simple!

PhOS provides a convinient build system, which covers compiling, linking and installing all PhOS components:

<table>
<tr>
<th width="25%">Component</th>
<th width="75%">Description</th>
</tr>
<tr>
<td><code>phos-autogen</code></td>
<td><b>Autogen Engine</b> for generating most of Parser and Worker code for specific hardware platform, based on lightwight notation.</td>
</tr>
<tr>
<td><code>phosd</code></td>
<td><b>PhOS Daemon</b>, which continuously run at the background, taking over the control of all GPU devices on the node.</td>
</tr>
<tr>
<td><code>libphos.so</code></td>
<td><b>PhOS Hijacker</b>, which hijacks all GPU API calls on the client-side and forward to PhOS Daemon.</td>
</tr>
<tr>
<td><code>libpccl.so</code></td>
<td><b>PhOS Checkpoint Communication Library</b> (PCCL), which provide highly-optimized device-to-device state migration. Note that this library is not included in current release.</td>
</tr>
<tr>
<td><code>unit-testing</code></td>
<td><b>Unit Tests</b> for PhOS, which is based on GoogleTest.</td>
</tr>
<tr>
<td><code>phos-cli</code></td>
<td><b>Command Line Interface</b> (CLI) for interacting with PhOS.</td>
</tr>
<tr>
<td><code>phos-remoting</code></td>
<td><b>Remoting Framework</b>, which provide highly-optimized GPU API remoting performance. See more details at <a href="https://github.com/SJTU-IPADS/PhoenixOS-Remoting">SJTU-IPADS/PhoenixOS-Remoting</a>.</td>
</tr>
</table>

To build and install all above components and other dependencies, simply run the build script in the container would works:

```bash
# inside container
cd /root/scripts/build_scripts

# clear old build cache
# -c: clear previous build
# -3: the clean process involves all third-parties
bash build.sh -c -3

# start building
# -3: the build process involves all third-parties
# -i: install after successful building
# -u: build PhOS with unit test enable
bash build.sh -i -3 -u
```

For customizing build options, please refers to and modify avaiable options under `scripts/build_scripts/build_config.yaml`.

If you encounter any build issues, you're able to see building logs under `build_log`. Please open a new issue if things are stuck :-|

### 💡 Option 2: Install From Pre-built Binaries

Will soon be updated, stay tuned :)


<br />
Currently, we don't have pre-built binaries.
Please check [build from Source](docs/docs/getting_started/build_from_source.md) for how to build and run from source!

## II. Usage

Expand All @@ -194,9 +83,17 @@ Once successfully installed PhOS, you can now try run your program with PhOS sup
1. Start the PhOS daemon (`phosd`), which takes over all GPU reousces on the node:

```bash
## If built in an interactive container (or host)
pos_cli --start --target daemon
```

or

```bash
## If built with our container
make server-run
```

2. To run your program with PhOS support, one need to put a `yaml` configure file under the directory which your program would regard as `$PWD`.
This file contains all necessary informations for PhOS to hijack your program. An example file looks like:

Expand Down
53 changes: 53 additions & 0 deletions dockerfiles/build_113.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
FROM phoenixos/pytorch:11.3-ubuntu20.04 as base

ARG DEBIAN_FRONTEND=noninteractive
ARG proxy

RUN apt update
RUN apt-get install -y libibverbs-dev libboost-all-dev net-tools \
git-lfs pkg-config python3-pip libelf-dev libssl-dev libgl1-mesa-dev \
libvdpau-dev iputils-ping wget gdb vim nsight-compute-2023.1.1 curl

RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:ubuntu-toolchain-r/test && \
apt-get update

RUN apt-get install -y g++-9
RUN apt-get install -y g++-13

RUN pip3 install meson -i https://mirrors.aliyun.com/pypi/simple/

RUN ln -s /opt/nvidia/nsight-compute/2023.1.1/target/linux-desktop-glibc_2_11_3-x64/ncu /usr/local/bin/ncu

RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

# Copy build scripts from the project root
COPY scripts/ /scripts
COPY third_party/go1.23.2.linux-amd64.tar.gz /third_party/go1.23.2.linux-amd64.tar.gz


ENV RUSTUP_UPDATE_ROOT=https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup
ENV RUSTUP_DIST_SERVER=https://mirrors.tuna.tsinghua.edu.cn/rustup
RUN mkdir -p /opt/rust

ENV CARGO_HOME=/opt/rust/.cargo
ENV RUSTUP_HOME=/opt/rust/.rustup

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --no-modify-path
ENV PATH="/opt/rust/.cargo/bin:${PATH}"
RUN . /opt/rust/.cargo/env

RUN rustup install nightly
RUN rustup default nightly


# Make scripts executable and run download_assets.sh
RUN chmod +x /scripts/build_scripts/*.sh
RUN cd /scripts/build_scripts && bash build.sh -p -b=false -3=true

ENV PATH="/root/bin:${PATH}"
ENV LD_LIBRARY_PATH="/root/lib:${LD_LIBRARY_PATH}"

WORKDIR /root

129 changes: 129 additions & 0 deletions docs/docs/getting_started/build_from_source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Quick start

This guide will help you build and run PhOS from source.
PhOS provides two options, and you can choose **either one** to build PhOS.

## Overview of the build

PhOS provides a convenient build system, which covers compiling, linking and installing all PhOS components:

<table>
<tr>
<th width="25%">Component</th>
<th width="75%">Description</th>
</tr>
<tr>
<td><code>phos-autogen</code></td>
<td><b>Autogen Engine</b> for generating most of Parser and Worker code for specific hardware platform, based on lightwight notation.</td>
</tr>
<tr>
<td><code>phosd</code></td>
<td><b>PhOS Daemon</b>, which continuously run at the background, taking over the control of all GPU devices on the node.</td>
</tr>
<tr>
<td><code>libphos.so</code></td>
<td><b>PhOS Hijacker</b>, which hijacks all GPU API calls on the client-side and forward to PhOS Daemon.</td>
</tr>
<tr>
<td><code>libpccl.so</code></td>
<td><b>PhOS Checkpoint Communication Library</b> (PCCL), which provide highly-optimized device-to-device state migration. Note that this library is not included in current release.</td>
</tr>
<tr>
<td><code>unit-testing</code></td>
<td><b>Unit Tests</b> for PhOS, which is based on GoogleTest.</td>
</tr>
<tr>
<td><code>phos-cli</code></td>
<td><b>Command Line Interface</b> (CLI) for interacting with PhOS.</td>
</tr>
<tr>
<td><code>phos-remoting</code></td>
<td><b>Remoting Framework</b>, which provide highly-optimized GPU API remoting performance. See more details at <a href="https://github.com/SJTU-IPADS/PhoenixOS-Remoting">SJTU-IPADS/PhoenixOS-Remoting</a>.</td>
</tr>
</table>


1. **[Clone Repository]**
First of all, clone this repository **recursively**:

```bash
git clone --recursive https://github.com/SJTU-IPADS/PhoenixOS.git
```

2. **[Downloading Necessary (third-party) Assets]**
PhOS relies on some assets to build and test,
please download these assets by simply running following commands:

```bash
# download assets
cd path/to/phos/scripts/build_scripts
bash download_assets.sh
```

3. **(Optional#1) [Build with our image]**
First, build our pre-released image (if not found phos-base-113 on the hub):
(This option only works for cuda 11.3 for now)

```bash
make build-image
```

Second, use the image to build PhOS all the time:

```bash
make build BUILD_ARGS="-i -3 -p=false"
```

Use the following to check possible built options:

```bash
make build BUILD_ARGS="-help"
```

3. **(Optional#2) [Start an interactive container]**
PhOS can be built and installed on official vendor image (or host)
if you don't want to use our pre-built image.

> NOTE: PhOS has some minimal requirements, e.g., it requires libc6 >= 2.29 for compiling CRIU from source. Thus, we strongly recommend you to use our base image as an interactive building environment.

For example, for running PhOS for CUDA 11.3,
one can build on official CUDA images
(e.g., [`nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04`](https://hub.docker.com/layers/nvidia/cuda/11.3.1-cudnn8-devel-ubuntu20.04/images/sha256-459c130c94363099b02706b9b25d9fe5822ea233203ce9fbf8dfd276a55e7e95)):


```bash
# enter repository
cd PhoenixOS/scripts/docker

# start and enter container with id 1
bash run_torch_cu113.sh -s 1

# enter / close container (no need to execute here, just listed)
bash run_torch_cu113.sh -e 1 # enter container
bash run_torch_cu113.sh -c 1 # close container
```

> Note that it's important to execute docker container with root privilege, as CRIU needs the permission to C/R kernel-space memory pages.

To build and install all above components and other dependencies, simply run the build script in the container would works:

```bash
# inside container
cd /root/scripts/build_scripts

# clear old build cache
# -c: clear previous build
# -3: the clean process involves all third-parties
bash build.sh -c -3

# start building
# -3: the build process involves all third-parties
# -i: install after successful building
# -u: build PhOS with unit test enable
bash build.sh -i -3 -u
```

4. **Build configuration and trouble shooting**
For customizing build options, please refers to and modify avaiable options under `scripts/build_scripts/build_config.yaml`.

If you encounter any build issues, you're able to see building logs under `build_log`. Please open a new issue if things are stuck :-| The logs typically are quite self-explained.
Loading