The Distributed State Vector Mini App is a quantum circuit simulation tool that leverages distributed computing resources to perform quantum state vector calculations using PennyLane's lightning.gpu backend with NVIDIA cuQuantum. It supports both single-node and multi-node GPU execution through MPI.
- Distributed quantum state vector simulation
- Support for both CPU (lightning.qubit) and GPU (lightning.gpu) backends
- MPI-enabled parallel execution
- Configurable circuit parameters (qubits, layers, runs)
- Optional Jacobian calculation
- Performance metrics collection
- QJIT (Quantum Just-In-Time) compilation support
The distributed state vector mini-app utilizes multiple processing elements, i. e., cores, nodes, and GPU, to benchmark the computational and memory needs of quantum simulations by partitioning and distributing the state vector, i. e., the state of a quantum system. In this motif, coupling is tight and occurs between classical tasks. Updates to the state vector are done by multiplying a unitary matrix. This computation is conducted concurrently. Depending on the type of operation, only local or non-local qubits, i. e., qubits placed on different processing elements, can be affected. Operations on local qubits can be performed without data exchange, while non-local or global qubits may require significant data movement. Thus, MPI is commonly used to facilitate the communication between tasks. Examples of distributed state vectors include QULAC (CPU/GPU) and cuQuantum’s cuStateVec (GPU). Further, different programming frameworks utilize cuQuantum to provide a distributed state vector simulation, e. g., Pennylane and Qiskit.
Distributed State Vector Mini-App implementation involves PennyLane’s lightning.gpu to assess the performance of a strongly entangling layered (SEL) circuit featuring two layers, which is frequently utilized for classification tasks. For gradient calculation, the motif use adjoint differentiation, a method designed for efficient gradient computation in quantum simulations, with lower memory and computational requirements than other methods like finite difference, which requires multiple circuit evaluations.
parameters = {
"num_runs": 3, # Number of simulation runs
"n_wires": 30, # Number of qubits
"n_layers": 2, # Number of circuit layers
"enable_jacobian": False, # Enable Jacobian calculation
"diff_method": "adjoint", # Differentiation method (adjoint, parameter-shift, None)
"enable_qjit": False, # Enable QJIT compilation
"pennylane_device_config": {
"name": "lightning.gpu", # PennyLane device backend
"mpi": "True", # Enable MPI distribution
"batch_obs": "False" # Enable batch observations
}
}from mini_apps.quantum_simulation.distributed_state_vector.mini_app import QuantumSimulation
# Configure cluster settings
cluster_config = {
"executor": "pilot",
"config": {
"number_of_nodes": 1,
"gpus_per_node": 4,
# ... other cluster configurations
}
}
# Initialize and run simulation
qs = QuantumSimulation(cluster_config, parameters)
qs.run()# Using MPI
mpirun -n <num_gpus> python motif.py --n-wires 30 --n-layers 2 --device lightning.gpu --mpi True
# Using SLURM
srun -N <num_nodes> -n <num_gpus> python motif.py --n-wires 30 --n-layers 2 --device lightning.gpu --mpi TrueThe app generates performance metrics in CSV format, stored in the results directory.
The quantum circuit implements a Strongly Entangling Layer pattern:
- Uses PennyLane's
StronglyEntanglingLayers - Measures PauliZ expectation values for each qubit
- Supports differentiation through various methods (adjoint, parameter-shift)
- NVIDIA GPUs with cuQuantum support
- MPI-enabled environment for distributed execution
- Sufficient GPU memory for larger qubit counts
- Source:
This repository provides instructions for setting up and running the pennylane-lightning[gpu] backend on NERSC's Perlmutter system. These steps are validated for v0.41.0-rc and are compatible with Cray MPICH, CUDA 12, and Lightning GPU.
- Python 3.10+ (Python 3.11 recommended)
- Pennylane Lightning (v0.41.0)
- Access to NERSC's Perlmutter system
- Cray MPICH and CUDA 12 toolchain
module load python/3.11
module load PrgEnv-gnu cray-mpich cudatoolkit craype-accel-nvidia80
cd $SCRATCH
python -m venv lgpu_env && source lgpu_env/bin/activate
git clone https://github.com/PennyLaneAI/pennylane-lightning.git
cd pennylane-lightning
git checkout latest_release ### Testing with (v0.41.0)
python -m pip install -r requirements-dev.txt && CC=$(which cc) CXX=$(which CC) python -m pip install . --verbose
PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" CC=$(which mpicc) CXX=$(which mpicxx) python -m pip install . --verbose
MPICC="cc -shared" pip install --force-reinstall --no-cache-dir --no-binary=mpi4py mpi4py
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
-
Allocate Interactive GPU Job (4 GPUs)
salloc -N 1 -c 32 --qos interactive --time 0:30:00 \ --constraint gpu --ntasks-per-node=4 \ --gpus-per-task=1 --gpu-bind=none --account=XYZ -
Run Your Script with MPI
srun -n 4 python myscript.py
- Qjit has jax as dependency...
- Test MPI Run of Motif
salloc --account xxx --nodes 2 --qos interactive --time 04:00:00 --constraint gpu --gpus 8
srun -N 2 -n 8 python motif.py --num-runs 1 --n-wires 31 --n-layers 2 --enable-jacobian False --diff-method adjoint --device lightning.gpu --mpi True
-
Other maybe useful commands to try:
- clean
make clean-
alternative compiler commands:
- Adjust CMAkeList.txt to use Cray compiler
set(CMAKE_C_COMPILER "/opt/cray/pe/craype/2.7.30/bin/cc") set(CMAKE_CXX_COMPILER "/opt/cray/pe/craype/2.7.30/bin/CC")- compile
PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv -
howto parse a compile
cmake -DCMAKE_C_COMPILER=/path/to/clang -DCMAKE_CXX_COMPILER=/path/to/clang++ <source-directory>