Add native PauliRot implementation in LightningQubit (#834)

### Before submitting Please complete the following checklist when submitting a PR: - [x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the [`tests`](../tests) directory! - [x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [x] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing the change, and including a link back to the PR. - [x] Ensure that code is properly formatted by running `make format`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** Pauli rotations come up in many places, and importantly in the time evolution of qchem Hamiltonians. It is therefore worth considering ways to accelerate their execution. **Description of the Change:** Implement `applyPauliRot` in the LM gate kernel. Invoke `applyPauliRot` directly from the SV class and add bindings to the Python layer. Add custom decomposition rule for `PauliRot` operations. **Benefits:** Faster Pauli rotations. I performed a benchmark on random `PauliRotation`s (runtime > 1.0 sec and at least 5 of them) through the Python layer. The data remains noisy with 5 samples because the performance varies depending on the specific "XYZ" sequence (which translates into more or less predictable memory access patterns). Overall, we see an advantage for 3+ qubits and up. ![speedup_vs_ntargets](https://github.com/user-attachments/assets/35549fe3-5668-49f8-b94b-3a6567be91cf) The speed-ups are also good when parallelizing with OpenMP threads (both cases using 16 threads) ![speedup_vs_ntargets_omp16](https://github.com/user-attachments/assets/4b543bb1-0da4-47f7-9ecb-8932760f069c) After rewriting the PauliRot gates as one-qubit gates of sort, writing AVX kernels for PauliRot may be worth it. For an entire workflow like time-evolving a QChem Hamiltonian, the gains are slower coming because `PauliRot` is not necessarily the main bottleneck at all scales, but it is once the molecules are big enough. This is shown in the speed-up bar graph below. ![speedup_vs_mol](https://github.com/user-attachments/assets/db6ec7aa-de15-4f2a-947b-5564ea4824f2) **Possible Drawbacks:** **Related GitHub Issues:** [sc-69801] [sc-71642] --------- Co-authored-by: ringo-but-quantum <[email protected]> Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]>
PennyLaneAI · Aug 19, 2024 · 83ac9d8 · 83ac9d8
1 parent 6b42030
commit 83ac9d8
Show file tree

Hide file tree

Showing 17 changed files with 560 additions and 253 deletions.
diff --git a/.github/CHANGELOG.md b/.github/CHANGELOG.md
@@ -27,6 +27,9 @@
 
 ### Improvements
 
+* LightningQubit gains native support for the `PauliRot` gate.
+  [(#834)](https://github.com/PennyLaneAI/pennylane-lightning/pull/834)
+
 * The `setBasisState` and `setStateVector` methods of `StateVectorLQubit` and `StateVectorKokkos` are overloaded to support PennyLane-like parameters.
   [(#843)](https://github.com/PennyLaneAI/pennylane-lightning/pull/843)
 

diff --git a/.github/workflows/tests_lqcpu_python.yml b/.github/workflows/tests_lqcpu_python.yml
@@ -45,6 +45,14 @@ jobs:
     runs-on: ${{ needs.determine_runner.outputs.runner_group }}
 
     steps:
+      - name: Make disk space
+        run: |
+          sudo apt-get autoremove -y && sudo apt-get autoclean -y && sudo apt-get clean -y
+          for DIR in /usr/share/dotnet /usr/local/share/powershell /usr/share/swift; do
+            sudo du -sh $DIR || echo $DIR not found
+            sudo rm -rf $DIR
+          done
+
       - uses: actions/setup-python@v5
         name: Install Python
         with:

diff --git a/pennylane_lightning/core/_version.py b/pennylane_lightning/core/_version.py
@@ -16,4 +16,4 @@
    Version number (major.minor.patch[-label])
 """
 
-__version__ = "0.38.0-dev37"
+__version__ = "0.38.0-dev38"
diff --git a/pennylane_lightning/core/src/simulators/lightning_qubit/StateVectorLQubit.hpp b/pennylane_lightning/core/src/simulators/lightning_qubit/StateVectorLQubit.hpp
@@ -327,9 +327,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                         const std::vector<std::size_t> &wires,
                         bool inverse = false,
                         const std::vector<PrecisionT> &params = {}) {
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         DynamicDispatcher<PrecisionT>::getInstance().applyOperation(
-            kernel, arr, this->getNumQubits(), opName, wires, inverse, params);
+            kernel, arr, BaseType::getNumQubits(), opName, wires, inverse,
+            params);
     }
 
     /**
@@ -344,12 +345,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                         const std::vector<std::size_t> &wires,
                         bool inverse = false,
                         const std::vector<PrecisionT> &params = {}) {
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
         const auto gate_op = dispatcher.strToGateOp(opName);
         dispatcher.applyOperation(getKernelForGate(gate_op), arr,
-                                  this->getNumQubits(), gate_op, wires, inverse,
-                                  params);
+                                  BaseType::getNumQubits(), gate_op, wires,
+                                  inverse, params);
     }
 
     /**
@@ -371,12 +372,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
         PL_ABORT_IF_NOT(controlled_wires.size() == controlled_values.size(),
                         "`controlled_wires` must have the same size as "
                         "`controlled_values`.");
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
         const auto gate_op = dispatcher.strToControlledGateOp(opName);
         const auto kernel = getKernelForControlledGate(gate_op);
         dispatcher.applyControlledGate(
-            kernel, arr, this->getNumQubits(), opName, controlled_wires,
+            kernel, arr, BaseType::getNumQubits(), opName, controlled_wires,
             controlled_values, wires, inverse, params);
     }
     /**
@@ -435,6 +436,25 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
         }
     }
 
+    /**
+     * @brief Apply a PauliRot gate to the state-vector.
+     *
+     * @param wires Wires to apply gate to.
+     * @param inverse Indicates whether to use inverse of gate.
+     * @param params Rotation angle.
+     * @param word A Pauli word (e.g. "XYYX").
+     */
+    void applyPauliRot(const std::vector<std::size_t> &wires,
+                       const bool inverse,
+                       const std::vector<PrecisionT> &params,
+                       const std::string &word) {
+        PL_ABORT_IF_NOT(wires.size() == word.size(),
+                        "wires and word have incompatible dimensions.");
+        GateImplementationsLM::applyPauliRot<PrecisionT>(
+            BaseType::getData(), BaseType::getNumQubits(), wires, inverse,
+            params[0], word);
+    }
+
     /**
      * @brief Apply a single generator to the state-vector using a given
      * kernel.
@@ -447,9 +467,9 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
     [[nodiscard]] inline auto applyGenerator(
         Pennylane::Gates::KernelType kernel, const std::string &opName,
         const std::vector<std::size_t> &wires, bool adj = false) -> PrecisionT {
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         return DynamicDispatcher<PrecisionT>::getInstance().applyGenerator(
-            kernel, arr, this->getNumQubits(), opName, wires, adj);
+            kernel, arr, BaseType::getNumQubits(), opName, wires, adj);
     }
 
     /**
@@ -462,12 +482,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
     [[nodiscard]] auto applyGenerator(const std::string &opName,
                                       const std::vector<std::size_t> &wires,
                                       bool adj = false) -> PrecisionT {
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
         const auto gen_op = dispatcher.strToGeneratorOp(opName);
         return dispatcher.applyGenerator(getKernelForGenerator(gen_op), arr,
-                                         this->getNumQubits(), opName, wires,
-                                         adj);
+                                         BaseType::getNumQubits(), opName,
+                                         wires, adj);
     }
 
     /**
@@ -485,12 +505,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                    const std::vector<bool> &controlled_values,
                    const std::vector<std::size_t> &wires, bool adj = false)
         -> PrecisionT {
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
         const auto generator_op = dispatcher.strToControlledGeneratorOp(opName);
         const auto kernel = getKernelForControlledGenerator(generator_op);
         return dispatcher.applyControlledGenerator(
-            kernel, arr, this->getNumQubits(), opName, controlled_wires,
+            kernel, arr, BaseType::getNumQubits(), opName, controlled_wires,
             controlled_values, wires, adj);
     }
 
@@ -510,7 +530,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                           const std::vector<std::size_t> &wires,
                           bool inverse = false) {
         const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");
         PL_ABORT_IF_NOT(controlled_wires.size() == controlled_values.size(),
                         "`controlled_wires` must have the same size as "
@@ -528,7 +548,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                     ControlledMatrixOperation::NCMultiQubitOp);
             }
         }();
-        dispatcher.applyControlledMatrix(kernel, arr, this->getNumQubits(),
+        dispatcher.applyControlledMatrix(kernel, arr, BaseType::getNumQubits(),
                                          matrix, controlled_wires,
                                          controlled_values, wires, inverse);
     }
@@ -567,12 +587,12 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
                             const std::vector<std::size_t> &wires,
                             bool inverse = false) {
         const auto &dispatcher = DynamicDispatcher<PrecisionT>::getInstance();
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
 
         PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");
 
-        dispatcher.applyMatrix(kernel, arr, this->getNumQubits(), matrix, wires,
-                               inverse);
+        dispatcher.applyMatrix(kernel, arr, BaseType::getNumQubits(), matrix,
+                               wires, inverse);
     }
 
     /**
@@ -651,9 +671,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
      * @param branch Branch 0 or 1.
      */
     void collapse(const std::size_t wire, const bool branch) {
-        auto *arr = this->getData();
-        const std::size_t stride = pow(2, this->num_qubits_ - (1 + wire));
-        const std::size_t vec_size = pow(2, this->num_qubits_);
+        auto *arr = BaseType::getData();
+        const std::size_t stride =
+            pow(2, BaseType::getNumQubits() - (1 + wire));
+        const std::size_t vec_size = pow(2, BaseType::getNumQubits());
         const auto section_size = vec_size / stride;
         const auto half_section_size = section_size / 2;
 
@@ -677,14 +698,14 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
      * @brief Normalize vector (to have norm 1).
      */
     void normalize() {
-        auto *arr = this->getData();
-        PrecisionT norm = std::sqrt(squaredNorm(arr, this->getLength()));
+        auto *arr = BaseType::getData();
+        PrecisionT norm = std::sqrt(squaredNorm(arr, BaseType::getLength()));
 
         PL_ABORT_IF(norm < std::numeric_limits<PrecisionT>::epsilon() * 1e2,
                     "vector has norm close to zero and can't be normalized");
 
-        std::complex<PrecisionT> inv_norm = 1. / norm;
-        for (size_t k = 0; k < this->getLength(); k++) {
+        ComplexT inv_norm = 1. / norm;
+        for (size_t k = 0; k < BaseType::getLength(); k++) {
             arr[k] *= inv_norm;
         }
     }
@@ -695,10 +716,10 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
      * @param index Index of the target element.
      */
     void setBasisState(const std::size_t index) {
-        auto length = this->getLength();
+        auto length = BaseType::getLength();
         PL_ABORT_IF(index > length - 1, "Invalid index");
 
-        auto *arr = this->getData();
+        auto *arr = BaseType::getData();
         std::fill(arr, arr + length, 0.0);
         arr[index] = {1.0, 0.0};
     }
@@ -712,7 +733,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
     void setBasisState(const std::vector<std::size_t> &state,
                        const std::vector<std::size_t> &wires) {
         const auto n_wires = wires.size();
-        const auto num_qubits = this->getNumQubits();
+        const auto num_qubits = BaseType::getNumQubits();
         std::size_t index{0U};
         for (std::size_t k = 0; k < n_wires; k++) {
             const auto bit = static_cast<std::size_t>(state[k]);
@@ -726,7 +747,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
      *
      */
     void resetStateVector() {
-        if (this->getLength() > 0) {
+        if (BaseType::getLength() > 0) {
             setBasisState(0U);
         }
     }
@@ -743,8 +764,8 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
         PL_ABORT_IF(num_indices != values.size(),
                     "Indices and values length must match");
 
-        auto *arr = this->getData();
-        const auto length = this->getLength();
+        auto *arr = BaseType::getData();
+        const auto length = BaseType::getLength();
         std::fill(arr, arr + length, 0.0);
         for (std::size_t i = 0; i < num_indices; i++) {
             PL_ABORT_IF(i >= length, "Invalid index");
@@ -774,7 +795,7 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
     void setStateVector(const ComplexT *state,
                         const std::vector<std::size_t> &wires) {
         const std::size_t num_state = exp2(wires.size());
-        const auto total_wire_count = this->getNumQubits();
+        const auto total_wire_count = BaseType::getNumQubits();
 
         std::vector<std::size_t> reversed_sorted_wires(wires);
         std::sort(reversed_sorted_wires.begin(), reversed_sorted_wires.end());
@@ -790,15 +811,16 @@ class StateVectorLQubit : public StateVectorBase<PrecisionT, Derived> {
 
         const std::vector<bool> controlled_values(controlled_wires.size(),
                                                   false);
-        auto core_function =
-            [num_state, &state](std::complex<PrecisionT> *arr,
-                                const std::vector<std::size_t> &indices,
-                                const std::vector<std::complex<PrecisionT>> &) {
-                for (std::size_t i = 0; i < num_state; i++) {
-                    arr[indices[i]] = state[i];
-                }
-            };
-        GateImplementationsLM::applyNCN(this->getData(), total_wire_count,
+        auto core_function = [num_state,
+                              &state](ComplexT *arr,
+                                      const std::vector<std::size_t> &indices,
+                                      const std::size_t offset) {
+            for (std::size_t i = 0; i < num_state; i++) {
+                const std::size_t index = indices[i] + offset;
+                arr[index] = state[i];
+            }
+        };
+        GateImplementationsLM::applyNCN(BaseType::getData(), total_wire_count,
                                         controlled_wires, controlled_values,
                                         wires, core_function);
     }

diff --git a/pennylane_lightning/core/src/simulators/lightning_qubit/bindings/LQubitBindings.hpp b/pennylane_lightning/core/src/simulators/lightning_qubit/bindings/LQubitBindings.hpp
@@ -171,7 +171,14 @@ void registerBackendClassSpecificBindings(PyClass &pyclass) {
 
     registerGatesForStateVector<StateVectorT>(pyclass);
     registerControlledGate<StateVectorT>(pyclass);
-
+    pyclass.def(
+        "applyPauliRot",
+        [](StateVectorT &sv, const std::vector<std::size_t> &wires,
+           const bool inverse, const std::vector<ParamT> &params,
+           const std::string &word) {
+            sv.applyPauliRot(wires, inverse, params, word);
+        },
+        "Apply a Pauli rotation.");
     pyclass
         .def(py::init([](std::size_t num_qubits) {
             return new StateVectorT(num_qubits);