|
1 | 1 | # Plexus |
2 | 2 |
|
3 | | -Plexus is a 3D parallel framework designed for large-scale distributed GNN training. |
| 3 | +Plexus is a 3D parallel framework for large-scale distributed GNN training. |
| 4 | + |
| 5 | +## Dependencies |
| 6 | + |
| 7 | +To use Plexus, you'll need the following dependencies: |
| 8 | + |
| 9 | +* **Python 3.11.7:** It's recommended to use a virtual environment to manage your Python dependencies. You can create one using `venv`: |
| 10 | + |
| 11 | + ```bash |
| 12 | + python -m venv <your_env_name> |
| 13 | + source <your_env_name>/bin/activate |
| 14 | + ``` |
| 15 | + |
| 16 | +* **CUDA 12.4:** If you'll be running Plexus on GPUs, you'll need CUDA 12.4. On systems where modules are used to manage software, you can load it with the following command (this is present in the run script under the examples directory for Perlmutter): |
| 17 | + |
| 18 | + ```bash |
| 19 | + module load cudatoolkit/12.4 |
| 20 | + ``` |
| 21 | + |
| 22 | +* **NCCL:** The NVIDIA Collective Communications Library (NCCL) is required for multi-GPU communication. On systems where modules are used (like Perlmutter), you can load it with: |
| 23 | + |
| 24 | + ```bash |
| 25 | + module load nccl |
| 26 | + ``` |
| 27 | + |
| 28 | +* **Python Dependencies:** Once your virtual environment is set up, you can install the required Python packages using `pip` and the `requirements.txt` file provided in the repository: |
| 29 | + |
| 30 | + ```bash |
| 31 | + pip install -r requirements.txt |
| 32 | + ``` |
4 | 33 |
|
5 | 34 | ## Directory Structure |
6 | 35 |
|
7 | | -- **benchmarking**: Contains a serial implementation using PyTorch Geometric (PyG) for validation and testing. Additionally, it includes utilities for benchmarking Sparse Matrix-Matrix Multiplication (SpMM) operations, a key component in GNN computations. |
8 | | -- **examples**: Offers a practical demonstration of how to leverage Plexus to parallelize a GNN model. This directory includes example scripts for running the parallelized training, as well as utilities for parsing the resulting performance data. |
9 | | -- **performance**: Houses files dedicated to modeling the performance characteristics of parallel GNN training. This includes models for communication overhead, computation costs (specifically SpMM), and memory utilization. |
10 | | -- **plexus**: Contains the core logic of the Plexus framework. This includes the parallel implementation of a Graph Convolutional Network (GCN) layer, along with utility functions for dataset preprocessing, efficient data loading, and other essential components for distributed GNN training. |
| 36 | +* **benchmarking**: Contains a serial implementation using PyTorch Geometric (PyG) for validation and testing. Additionally, it includes utilities for benchmarking Sparse Matrix-Matrix Multiplication (SpMM) operations, a key component in GNN computations. |
| 37 | +* **examples**: Offers a practical demonstration of how to leverage Plexus to parallelize a GNN model. This directory includes example scripts for running the parallelized training, as well as utilities for parsing the resulting performance data. |
| 38 | +* **performance**: Houses files dedicated to modeling the performance characteristics of parallel GNN training. This includes models for communication overhead, computation costs (specifically SpMM), and memory utilization. |
| 39 | +* **plexus**: Contains the core logic of the Plexus framework. This includes the parallel implementation of a Graph Convolutional Network (GCN) layer, along with utility functions for dataset preprocessing, efficient data loading, and other essential components for distributed GNN training. |
| 40 | + |
0 commit comments