-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qjit function execution occurs asynchronously #885
Comments
The attached pdf file below explains the different test cases(scenarios) and includes some analysis and comments.
|
Hi @mwasfy, thank you for the submission! I would be curious to ask you a few follow-up questions about this solution.
This is rather minor, but I'm also curious why the test function was copy pasted a few times, rather than re-using one definition. Was there a specific reason for this? |
Hi @dime10, thank you for your insightful comments. It cleared many of my concerns.
Copying and pasting functions: Mainly I didn’t want to run in the |
Hi @mwasfy, thanks for your reply!
I believe this is an issue with the PennyLane library not being thread-safe, since it uses a global context to capture quantum instructions in a QNode. Regarding point 1:
This is actually a good point, numpy does appear to do that for many of its functions. The functions we are interested in are typically quantum functions, which will execute a quantum circuit on a device using the PennyLane library. Do you think this reasoning applies there as well? Btw, I noticed that the parallel functions don't have the |
Hi @dime10, thanks for getting back to me.
Actually I think that is how it was implemented. Please take a look at the following code snippet. Wouldn’t that be the case you are describing as a work around. (Unless I am not understating it well). BTW, there was no use of decorators here for qjit or async.
We’ll be running on a separate device its own machine code, so yes I think the same reasoning would apply. Actually executing on a separate “quantum” device underscore the importance of such an approach for asynchronous tasks even more.
I think this is more a question of multi-threading vs multi-processing. Intuitively, I would say this is supposed to be a compute intensive function so the answer would be multi-processing. However, if the quantum device acts like an attached co-processor where we send inputs and wait for outputs, it could be considered a case of IO bound computation from the main thread’s perspective where multi-threading would be better. Having a specific answer for that question requires some more internal knowledge of how both Catalyst and PennyLane work and how the quantum device is connected to the host and how they interact with each other.
As a matter of fact, I did test all the parallel function with @qnode, there was no difference in terms of performance. So I neglected them when writing up the final test scenarios I submitted. |
Both the
Since quantum functions always have classical inputs (e.g. rotation angles) and outputs (e.g. expectation values), a dependency can also be created by using the output of one quantum function as the input of another: @qml.qnode(qml.device("lightning.qubit", wires=3))
def circuit(phi):
qml.RX(phi, wires=0)
qml.CNOT(wires=[0,2])
return qml.expval(qml.PauliZ(2))
input = 0.7
output = circuit(input)
_ = circuit(output) |
Thanks for the clarification, I guess from what you are describing Catalyst uses lazy compilation. Does it support eager as well?
One more comment about the code I just shared now, some times it runs perfectly and some times it gives the error: |
Context
Thread-Level Speculation is a technique that has been used in various research to speed up general purpose programs by speculatively executing code downstream of a function call. The idea here is to do this in a similar manner to JAX, see Asynchronous Dispatch in the JAX docs.
JAX does not wait for the operation to complete before returning control to the Python program. Instead, JAX returns a
DeviceArray
value, which is a future, i.e., a value that will be produced in the future on an accelerator device but isn’t necessarily available immediately. Only when the value of theDeviceArray
is queried is a blocking call generated.Consider the following code snippet. Here,
x
, a device array returned as the result of evaluatingf
is a futureDeviceArray
, and blocking only occurs when a user requests the value ofx
in Python.Questions:
The assumption here is that this will lead to speedups in the following situation (this assumption needs to be validated, but should be apparent in an interpreted language):
That is, since
x
is evaluated asynchronously, Python is not blocked awaiting the result off
and can simply invokeg
directly.Requirements:
qjit
'ted function is executed in parallel with the compiled function.qjit
'ted function.Installation Help
Refer to the Catalyst installation guide for how to install a source build of the project.
The text was updated successfully, but these errors were encountered: