-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305
Conversation
…mpiledDAGRef Signed-off-by: jeffreyjeffreywang <[email protected]>
…tures Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
… CompiledDAGFuture Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
Hi Ray developers, I would love to have some help validating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fist pass. Looks pretty nice. Thanks for the contribution!
Signed-off-by: jeffreyjeffreywang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally the approach makes sense! But I am seeing 2 problems.
- It is backward incompatible changes and will break vllm. How complicated is it to support
MultiOutputNode(_multiple_return_refs=True)
? I am not suggesting to do it, but I'd like to measure the complexity - I think although we don't call ray.get, ray.get is called and deserialization still happens when the dag ref is deallocated because of this code
if not self._ray_get_called:
python/ray/dag/compiled_dag_node.py
Outdated
@@ -1564,42 +1570,48 @@ def _execute_until( | |||
|
|||
TODO(rui): catch the case that user holds onto the CompiledDAGRefs | |||
""" | |||
from ray.dag import DAGContext | |||
if self._max_execution_index < execution_index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this if now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ImportError will occur with the original code when destructing unused CompiledDAGRef (ones that we have not called get()). Here is a minimal repro for the original behavior:
foo = Foo.remote()
bar = Bar.remote()
with InputNode() as inp:
dag = MultiOutputNode([foo.increment.bind(inp), bar.decrement.bind(inp)])
dag = dag.experimental_compile()
ref1 = dag.execute(1)
ref2 = dag.execute(1)
assert ref2.get() == [2, -2]
dag.teardown()
# When exiting the program, ref1.__del__ is invoked.
# Since it has not been called with get(), ref1.get() will
# be invoked subsequently. In _execute_until, even though
# the DAG won't be executed again, we still attempt to
# import a library, namely ray.dag.DAGContext. Since
# Python is shutting down, an ImportError occurs
# (ImportError: sys.meta_path is None, Python is likely
# shutting down).
The same problem persists with the current behavior if the if clause is not introduced. Another the benefit of introducing the if clause is to only import the library and calculate timeout
when necessary. timeout
is only used when self._max_execution_index < execution_index
is True
. We shouldn't import the library when self._max_execution_index == execution_index
. I should keep the following while loop out of the if clause though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I speak, I found out that if the last CompiledDAGRef
is unused, the ImportError
persists even when the if clause is introduced. Is there a guidance for Ray users to always invoke get()
on all CompiledDAGRef
? If not, this may be a bug that needs to be addressed in a separate PR. WDYT?
Here is a minimal repro:
foo = Foo.remote()
bar = Bar.remote()
with InputNode() as inp:
dag = MultiOutputNode([foo.increment.bind(inp), bar.decrement.bind(inp)])
dag = dag.experimental_compile()
ref1 = dag.execute(1)
ref2 = dag.execute(1)
assert ref1.get() == [1, -1]
dag.teardown()
# Upon destruction, the DAG will be executed until the latest index. Again,
# we attempt to import DAGContext during program exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think it's worth keeping the if clause and have another bug tracking the dangling CompiledDAGRef
issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this issue I mentioned is correlated with your second point. Is there any reason why we want to avoid execution result leak (which is why we currently execute the DAG until the latest index and get()
all results)?
python/ray/dag/compiled_dag_node.py
Outdated
if self._max_execution_index + 1 == execution_index: | ||
# Directly fetch and return without buffering | ||
while self._max_execution_index < execution_index: | ||
if len(self._result_buffer) >= self._max_buffered_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong now. We should do len(self._result_buffer) * num_output_channels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a unit test? (or modify existing test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay, it matches the previous semantics. Maybe the naming is not good - it should be the number of DAG executions buffered, not the number of individual results buffered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. len(self._result_buffer)
indicates the number of DAG executions while len(self._result_buffer) * num_output_channels
represents the total number of outputs. I will adjust the naming of _max_buffered_results
to _max_buffered_executions
(any other suggestions?) and add a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we need to change max_buffered_results
to max_buffered_executions
if we were to change _max_buffered_results
to _max_buffered_executions
. Since there are quite a few references to max_buffered_results
in the repo (including developer APIs and python/ray/dag/context.py
which doesn't have much relevance to the changes in this PR), I'd suggest to not modify the naming now. We can track it as a bug and address it in a follow-up.
As for the unit test, I'll add one tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done adding a unit test for cases when there are multiple outputs. Now, if not all results from an execution index are fetched with get()
, that execution index will still count towards the number of buffered results. Let me know if this sounds okay to you!
Hey @rkooo567, thank you so much for the review. Let me experiment a few things you suggested and get back to you tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution!
python/ray/dag/compiled_dag_node.py
Outdated
if self._max_execution_index + 1 == execution_index: | ||
# Directly fetch and return without buffering | ||
while self._max_execution_index < execution_index: | ||
if len(self._result_buffer) >= self._max_buffered_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay, it matches the previous semantics. Maybe the naming is not good - it should be the number of DAG executions buffered, not the number of individual results buffered.
Signed-off-by: jeffreyjeffreywang <[email protected]>
Thanks for all of the feedback. I'll keep addressing comments tomorrow and over the weekend. Will ping you as soon as the revision is ready for review. |
@rkooo567, I think this should be quite straightforward to support. Essentially, we can ask clients to specify |
…etchers for synchronous case Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
… is only a single output channel Signed-off-by: jeffreyjeffreywang <[email protected]>
Please hold off reviewing as I'm now working on making it backward compatible. Thanks! |
Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
Signed-off-by: jeffreyjeffreywang <[email protected]>
The change is now backward compatible. Working on unit tests and addressing the remaining comments. |
Signed-off-by: jeffreyjeffreywang <[email protected]>
It looks good to me! Will merge once tests pass. |
also to be clear @stephanie-wang. @jeffreyjeffreywang it will be a breaking change right? |
Correct, this will be breaking. |
makes sense! Also @jeffreyjeffreywang do we have issues for all follow up problems?
|
Yup, the second one is addressed in this PR. #47662 and #47614 track the remaining follow-up problems. |
actuallay is this a known issue? can you take a look? |
@rkooo567 Yeah, I also realized this may be a problem over the weekend. I think we need to support calling |
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
Currently, if
MultiOutputNode
is used to wrap a DAG's output, you get back a singleCompiledDAGRef
orCompiledDAGFuture
, depending on whetherexecute
orexecute_async
is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time.This PR separates the output of
execute
andexecute_async
to a list ofCompiledDAGRef
orCompiledDAGFuture
when the output is wrapped byMultiOutputNode
. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers.Related issue number
Resolves #46908
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.