Skip to content

Commit

Permalink
Add native PauliRot implementation in LightningQubit (#834)
Browse files Browse the repository at this point in the history
### Before submitting

Please complete the following checklist when submitting a PR:

- [x] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [x] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [x] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`. 

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**
Pauli rotations come up in many places, and importantly in the time
evolution of qchem Hamiltonians. It is therefore worth considering ways
to accelerate their execution.

**Description of the Change:**
Implement `applyPauliRot` in the LM gate kernel. Invoke `applyPauliRot`
directly from the SV class and add bindings to the Python layer. Add
custom decomposition rule for `PauliRot` operations.

**Benefits:**
Faster Pauli rotations. I performed a benchmark on random
`PauliRotation`s (runtime > 1.0 sec and at least 5 of them) through the
Python layer. The data remains noisy with 5 samples because the
performance varies depending on the specific "XYZ" sequence (which
translates into more or less predictable memory access patterns).
Overall, we see an advantage for 3+ qubits and up.


![speedup_vs_ntargets](https://github.com/user-attachments/assets/35549fe3-5668-49f8-b94b-3a6567be91cf)

The speed-ups are also good when parallelizing with OpenMP threads (both
cases using 16 threads)


![speedup_vs_ntargets_omp16](https://github.com/user-attachments/assets/4b543bb1-0da4-47f7-9ecb-8932760f069c)


After rewriting the PauliRot gates as one-qubit gates of sort, writing
AVX kernels for PauliRot may be worth it.

For an entire workflow like time-evolving a QChem Hamiltonian, the gains
are slower coming because `PauliRot` is not necessarily the main
bottleneck at all scales, but it is once the molecules are big enough.
This is shown in the speed-up bar graph below.


![speedup_vs_mol](https://github.com/user-attachments/assets/db6ec7aa-de15-4f2a-947b-5564ea4824f2)

**Possible Drawbacks:**

**Related GitHub Issues:**
[sc-69801]
[sc-71642]

---------

Co-authored-by: ringo-but-quantum <[email protected]>
Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]>
  • Loading branch information
3 people authored Aug 19, 2024
1 parent 6b42030 commit 83ac9d8
Show file tree
Hide file tree
Showing 17 changed files with 560 additions and 253 deletions.
3 changes: 3 additions & 0 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@

### Improvements

* LightningQubit gains native support for the `PauliRot` gate.
[(#834)](https://github.com/PennyLaneAI/pennylane-lightning/pull/834)

* The `setBasisState` and `setStateVector` methods of `StateVectorLQubit` and `StateVectorKokkos` are overloaded to support PennyLane-like parameters.
[(#843)](https://github.com/PennyLaneAI/pennylane-lightning/pull/843)

Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/tests_lqcpu_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ jobs:
runs-on: ${{ needs.determine_runner.outputs.runner_group }}

steps:
- name: Make disk space
run: |
sudo apt-get autoremove -y && sudo apt-get autoclean -y && sudo apt-get clean -y
for DIR in /usr/share/dotnet /usr/local/share/powershell /usr/share/swift; do
sudo du -sh $DIR || echo $DIR not found
sudo rm -rf $DIR
done
- uses: actions/setup-python@v5
name: Install Python
with:
Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/core/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
Version number (major.minor.patch[-label])
"""

__version__ = "0.38.0-dev37"
__version__ = "0.38.0-dev38"
Original file line number Diff line number Diff line change
Expand Up @@ -327,9 +327,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
const std::vector<std::size_t> &wires,
bool inverse = false,
const std::vector<PrecisionT> &params = {}) {
auto *arr = this->getData();
auto *arr = BaseType::getData();
DynamicDispatcher<PrecisionT>::getInstance().applyOperation(
kernel, arr, this->getNumQubits(), opName, wires, inverse, params);
kernel, arr, BaseType::getNumQubits(), opName, wires, inverse,
params);
}

/**
Expand All @@ -344,12 +345,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
const std::vector<std::size_t> &wires,
bool inverse = false,
const std::vector<PrecisionT> &params = {}) {
auto *arr = this->getData();
auto *arr = BaseType::getData();
auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
const auto gate_op = dispatcher.strToGateOp(opName);
dispatcher.applyOperation(getKernelForGate(gate_op), arr,
this->getNumQubits(), gate_op, wires, inverse,
params);
BaseType::getNumQubits(), gate_op, wires,
inverse, params);
}

/**
Expand All @@ -371,12 +372,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
PL_ABORT_IF_NOT(controlled_wires.size() == controlled_values.size(),
"`controlled_wires` must have the same size as "
"`controlled_values`.");
auto *arr = this->getData();
auto *arr = BaseType::getData();
const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
const auto gate_op = dispatcher.strToControlledGateOp(opName);
const auto kernel = getKernelForControlledGate(gate_op);
dispatcher.applyControlledGate(
kernel, arr, this->getNumQubits(), opName, controlled_wires,
kernel, arr, BaseType::getNumQubits(), opName, controlled_wires,
controlled_values, wires, inverse, params);
}
/**
Expand Down Expand Up @@ -435,6 +436,25 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
}
}

/**
* @brief Apply a PauliRot gate to the state-vector.
*
* @param wires Wires to apply gate to.
* @param inverse Indicates whether to use inverse of gate.
* @param params Rotation angle.
* @param word A Pauli word (e.g. "XYYX").
*/
void applyPauliRot(const std::vector<std::size_t> &wires,
const bool inverse,
const std::vector<PrecisionT> &params,
const std::string &word) {
PL_ABORT_IF_NOT(wires.size() == word.size(),
"wires and word have incompatible dimensions.");
GateImplementationsLM::applyPauliRot<PrecisionT>(
BaseType::getData(), BaseType::getNumQubits(), wires, inverse,
params[0], word);
}

/**
* @brief Apply a single generator to the state-vector using a given
* kernel.
Expand All @@ -447,9 +467,9 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
[[nodiscard]] inline auto applyGenerator(
Pennylane::Gates::KernelType kernel, const std::string &opName,
const std::vector<std::size_t> &wires, bool adj = false) -> PrecisionT {
auto *arr = this->getData();
auto *arr = BaseType::getData();
return DynamicDispatcher<PrecisionT>::getInstance().applyGenerator(
kernel, arr, this->getNumQubits(), opName, wires, adj);
kernel, arr, BaseType::getNumQubits(), opName, wires, adj);
}

/**
Expand All @@ -462,12 +482,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
[[nodiscard]] auto applyGenerator(const std::string &opName,
const std::vector<std::size_t> &wires,
bool adj = false) -> PrecisionT {
auto *arr = this->getData();
auto *arr = BaseType::getData();
const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
const auto gen_op = dispatcher.strToGeneratorOp(opName);
return dispatcher.applyGenerator(getKernelForGenerator(gen_op), arr,
this->getNumQubits(), opName, wires,
adj);
BaseType::getNumQubits(), opName,
wires, adj);
}

/**
Expand All @@ -485,12 +505,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
const std::vector<bool> &controlled_values,
const std::vector<std::size_t> &wires, bool adj = false)
-> PrecisionT {
auto *arr = this->getData();
auto *arr = BaseType::getData();
const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
const auto generator_op = dispatcher.strToControlledGeneratorOp(opName);
const auto kernel = getKernelForControlledGenerator(generator_op);
return dispatcher.applyControlledGenerator(
kernel, arr, this->getNumQubits(), opName, controlled_wires,
kernel, arr, BaseType::getNumQubits(), opName, controlled_wires,
controlled_values, wires, adj);
}

Expand All @@ -510,7 +530,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
const std::vector<std::size_t> &wires,
bool inverse = false) {
const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
auto *arr = this->getData();
auto *arr = BaseType::getData();
PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");
PL_ABORT_IF_NOT(controlled_wires.size() == controlled_values.size(),
"`controlled_wires` must have the same size as "
Expand All @@ -528,7 +548,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
ControlledMatrixOperation::NCMultiQubitOp);
}
}();
dispatcher.applyControlledMatrix(kernel, arr, this->getNumQubits(),
dispatcher.applyControlledMatrix(kernel, arr, BaseType::getNumQubits(),
matrix, controlled_wires,
controlled_values, wires, inverse);
}
Expand Down Expand Up @@ -567,12 +587,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
const std::vector<std::size_t> &wires,
bool inverse = false) {
const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
auto *arr = this->getData();
auto *arr = BaseType::getData();

PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");

dispatcher.applyMatrix(kernel, arr, this->getNumQubits(), matrix, wires,
inverse);
dispatcher.applyMatrix(kernel, arr, BaseType::getNumQubits(), matrix,
wires, inverse);
}

/**
Expand Down Expand Up @@ -651,9 +671,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
* @param branch Branch 0 or 1.
*/
void collapse(const std::size_t wire, const bool branch) {
auto *arr = this->getData();
const std::size_t stride = pow(2, this->num_qubits_ - (1 + wire));
const std::size_t vec_size = pow(2, this->num_qubits_);
auto *arr = BaseType::getData();
const std::size_t stride =
pow(2, BaseType::getNumQubits() - (1 + wire));
const std::size_t vec_size = pow(2, BaseType::getNumQubits());
const auto section_size = vec_size / stride;
const auto half_section_size = section_size / 2;

Expand All @@ -677,14 +698,14 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
* @brief Normalize vector (to have norm 1).
*/
void normalize() {
auto *arr = this->getData();
PrecisionT norm = std::sqrt(squaredNorm(arr, this->getLength()));
auto *arr = BaseType::getData();
PrecisionT norm = std::sqrt(squaredNorm(arr, BaseType::getLength()));

PL_ABORT_IF(norm < std::numeric_limits<PrecisionT>::epsilon() * 1e2,
"vector has norm close to zero and can't be normalized");

std::complex<PrecisionT> inv_norm = 1. / norm;
for (size_t k = 0; k < this->getLength(); k++) {
ComplexT inv_norm = 1. / norm;
for (size_t k = 0; k < BaseType::getLength(); k++) {
arr[k] *= inv_norm;
}
}
Expand All @@ -695,10 +716,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
* @param index Index of the target element.
*/
void setBasisState(const std::size_t index) {
auto length = this->getLength();
auto length = BaseType::getLength();
PL_ABORT_IF(index > length - 1, "Invalid index");

auto *arr = this->getData();
auto *arr = BaseType::getData();
std::fill(arr, arr + length, 0.0);
arr[index] = {1.0, 0.0};
}
Expand All @@ -712,7 +733,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
void setBasisState(const std::vector<std::size_t> &state,
const std::vector<std::size_t> &wires) {
const auto n_wires = wires.size();
const auto num_qubits = this->getNumQubits();
const auto num_qubits = BaseType::getNumQubits();
std::size_t index{0U};
for (std::size_t k = 0; k < n_wires; k++) {
const auto bit = static_cast<std::size_t>(state[k]);
Expand All @@ -726,7 +747,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
*
*/
void resetStateVector() {
if (this->getLength() > 0) {
if (BaseType::getLength() > 0) {
setBasisState(0U);
}
}
Expand All @@ -743,8 +764,8 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
PL_ABORT_IF(num_indices != values.size(),
"Indices and values length must match");

auto *arr = this->getData();
const auto length = this->getLength();
auto *arr = BaseType::getData();
const auto length = BaseType::getLength();
std::fill(arr, arr + length, 0.0);
for (std::size_t i = 0; i < num_indices; i++) {
PL_ABORT_IF(i >= length, "Invalid index");
Expand Down Expand Up @@ -774,7 +795,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
void setStateVector(const ComplexT *state,
const std::vector<std::size_t> &wires) {
const std::size_t num_state = exp2(wires.size());
const auto total_wire_count = this->getNumQubits();
const auto total_wire_count = BaseType::getNumQubits();

std::vector<std::size_t> reversed_sorted_wires(wires);
std::sort(reversed_sorted_wires.begin(), reversed_sorted_wires.end());
Expand All @@ -790,15 +811,16 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {

const std::vector<bool> controlled_values(controlled_wires.size(),
false);
auto core_function =
[num_state, &state](std::complex<PrecisionT> *arr,
const std::vector<std::size_t> &indices,
const std::vector<std::complex<PrecisionT>> &) {
for (std::size_t i = 0; i < num_state; i++) {
arr[indices[i]] = state[i];
}
};
GateImplementationsLM::applyNCN(this->getData(), total_wire_count,
auto core_function = [num_state,
&state](ComplexT *arr,
const std::vector<std::size_t> &indices,
const std::size_t offset) {
for (std::size_t i = 0; i < num_state; i++) {
const std::size_t index = indices[i] + offset;
arr[index] = state[i];
}
};
GateImplementationsLM::applyNCN(BaseType::getData(), total_wire_count,
controlled_wires, controlled_values,
wires, core_function);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,14 @@ void registerBackendClassSpecificBindings(PyClass &pyclass) {

registerGatesForStateVector<StateVectorT>(pyclass);
registerControlledGate<StateVectorT>(pyclass);

pyclass.def(
"applyPauliRot",
[](StateVectorT &sv, const std::vector<std::size_t> &wires,
const bool inverse, const std::vector<ParamT> &params,
const std::string &word) {
sv.applyPauliRot(wires, inverse, params, word);
},
"Apply a Pauli rotation.");
pyclass
.def(py::init([](std::size_t num_qubits) {
return new StateVectorT(num_qubits);
Expand Down
Loading

0 comments on commit 83ac9d8

Please sign in to comment.