Skip to content

Commit

Permalink
Fix applyMatrix in lightning.gpu (#932)
Browse files Browse the repository at this point in the history
### Before submitting

Please complete the following checklist when submitting a PR:

- [ ] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [ ] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [ ] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`. 

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**

[SC-74899]

`lightning.gpu` stores gate data to a `unordered_map` object. The gate
data can be accessed with a key (`opsName, Param`) . This design can
avoid the excessive data copy between the device and host. However, the
way a matrix gate is stored causes the bug since `lightning.gpu` will
only add first matrix data to the cache since the keys of all matrix
will be the same `({},{})` pair. Following applyMatrix calls will just
result in getting data from the cache and the matrix used for operation
will be always the first matrix. This bug will cause errors at least
when applying TensorProd with multiple Hermitian obs.

To fix it, we will not store the matrix data to the gate cache and just
copy it from host to device on-the-fly.

**Description of the Change:**

**Benefits:**

**Possible Drawbacks:**

**Related GitHub Issues:**

---------

Co-authored-by: ringo-but-quantum <[email protected]>
  • Loading branch information
multiphaseCFD and ringo-but-quantum authored Oct 3, 2024
1 parent 133ab15 commit fdf09bc
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 3 deletions.
3 changes: 3 additions & 0 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,9 @@

### Bug fixes

* Bug fix for `applyMatrix` in `lightning.tensor`. Matrix operator data is not stored in the `cuGateCache` object to support `TensorProd` obs with multiple `Hermitian` obs.
[(#932)](https://github.com/PennyLaneAI/pennylane-lightning/pull/932)

* Bug fix for `_pauli_word` of `QuantumScriptSerializer`. `_pauli_word` can process `PauliWord` object: `I`.
[(#919)](https://github.com/PennyLaneAI/pennylane-lightning/pull/919)

Expand Down
2 changes: 1 addition & 1 deletion pennylane_lightning/core/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
Version number (major.minor.patch[-label])
"""

__version__ = "0.39.0-dev36"
__version__ = "0.39.0-dev37"
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,19 @@ class StateVectorCudaMPI final
cuGates::getRot<CFP_t>(params[0], params[1], params[2]);
applyDeviceMatrixGate(rot_matrix.data(), ctrls, tgts, false);
}
} else if (opName == "Matrix") {
DataBuffer<CFP_t, int> d_matrix{
gate_matrix.size(), BaseType::getDataBuffer().getDevTag(),
true};
d_matrix.CopyHostDataToGpu(gate_matrix.data(), d_matrix.getLength(),
false);
// ensure wire indexing correctly preserved for tensor-observables
const std::vector<std::size_t> ctrls_local{ctrls.rbegin(),
ctrls.rend()};
const std::vector<std::size_t> tgts_local{tgts.rbegin(),
tgts.rend()};
applyDeviceMatrixGate(d_matrix.getData(), ctrls_local, tgts_local,
adjoint);
} else if (par_gates_.find(opName) != par_gates_.end()) {
par_gates_.at(opName)(wires, adjoint, params);
} else { // No offloadable function call; defer to matrix passing
Expand Down Expand Up @@ -519,7 +532,7 @@ class StateVectorCudaMPI final
const std::vector<std::size_t> &wires,
bool adjoint = false) {
PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");
const std::string opName = {};
const std::string opName = "Matrix";
std::size_t n = std::size_t{1} << wires.size();
const std::vector<std::complex<PrecisionT>> matrix(gate_matrix,
gate_matrix + n * n);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,19 @@ class StateVectorCudaManaged
cuGates::getRot<CFP_t>(params[0], params[1], params[2]);
applyDeviceMatrixGate(rot_matrix.data(), ctrls, tgts, false);
}
} else if (opName == "Matrix") {
DataBuffer<CFP_t, int> d_matrix{
gate_matrix.size(), BaseType::getDataBuffer().getDevTag(),
true};
d_matrix.CopyHostDataToGpu(gate_matrix.data(), d_matrix.getLength(),
false);
// ensure wire indexing correctly preserved for tensor-observables
const std::vector<std::size_t> ctrls_local{ctrls.rbegin(),
ctrls.rend()};
const std::vector<std::size_t> tgts_local{tgts.rbegin(),
tgts.rend()};
applyDeviceMatrixGate(d_matrix.getData(), ctrls_local, tgts_local,
adjoint);
} else if (par_gates_.find(opName) != par_gates_.end()) {
par_gates_.at(opName)(wires, adjoint, params);
} else { // No offloadable function call; defer to matrix passing
Expand Down Expand Up @@ -439,7 +452,7 @@ class StateVectorCudaManaged
const std::vector<std::size_t> &wires,
bool adjoint = false) {
PL_ABORT_IF(wires.empty(), "Number of wires must be larger than 0");
const std::string opName = {};
const std::string opName = "Matrix";
std::size_t n = std::size_t{1} << wires.size();
const std::vector<std::complex<PrecisionT>> matrix(gate_matrix,
gate_matrix + n * n);
Expand Down

0 comments on commit fdf09bc

Please sign in to comment.