[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305

jeffreyjeffreywang · 2024-08-23T22:08:59Z

Why are these changes needed?

Currently, if MultiOutputNode is used to wrap a DAG's output, you get back a single CompiledDAGRef or CompiledDAGFuture, depending on whether execute or execute_async is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time.

This PR separates the output of execute and execute_async to a list of CompiledDAGRef or CompiledDAGFuture when the output is wrapped by MultiOutputNode. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers.

Related issue number

Resolves #46908

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…mpiledDAGRef Signed-off-by: jeffreyjeffreywang <[email protected]>

…tures Signed-off-by: jeffreyjeffreywang <[email protected]>

Signed-off-by: jeffreyjeffreywang <[email protected]>

… CompiledDAGFuture Signed-off-by: jeffreyjeffreywang <[email protected]>

Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang · 2024-08-23T22:13:24Z

Hi Ray developers, I would love to have some help validating test_execution_schedule_gpu.py tests as they require a GPU device which I don't have one.

ruisearch42

Fist pass. Looks pretty nice. Thanks for the contribution!

python/ray/experimental/compiled_dag_ref.py

python/ray/dag/compiled_dag_node.py

Signed-off-by: jeffreyjeffreywang <[email protected]>

rkooo567

I think generally the approach makes sense! But I am seeing 2 problems.

It is backward incompatible changes and will break vllm. How complicated is it to support MultiOutputNode(_multiple_return_refs=True)? I am not suggesting to do it, but I'd like to measure the complexity
I think although we don't call ray.get, ray.get is called and deserialization still happens when the dag ref is deallocated because of this code

ray/python/ray/experimental/compiled_dag_ref.py

Line 80 in 4aea49f

if not self._ray_get_called:

. I think we should improve this part of code to not doing deserialization. I think this part is a little bit tricky to handle (so we can probably handle as a follow up)

python/ray/dag/compiled_dag_node.py

rkooo567 · 2024-08-28T05:36:14Z

python/ray/dag/compiled_dag_node.py

@@ -1564,42 +1570,48 @@ def _execute_until(

        TODO(rui): catch the case that user holds onto the CompiledDAGRefs
        """
-        from ray.dag import DAGContext
+        if self._max_execution_index < execution_index:


why do we need this if now?

An ImportError will occur with the original code when destructing unused CompiledDAGRef (ones that we have not called get()). Here is a minimal repro for the original behavior:

foo = Foo.remote() bar = Bar.remote() with InputNode() as inp: dag = MultiOutputNode([foo.increment.bind(inp), bar.decrement.bind(inp)]) dag = dag.experimental_compile() ref1 = dag.execute(1) ref2 = dag.execute(1) assert ref2.get() == [2, -2] dag.teardown() # When exiting the program, ref1.__del__ is invoked. # Since it has not been called with get(), ref1.get() will # be invoked subsequently. In _execute_until, even though # the DAG won't be executed again, we still attempt to # import a library, namely ray.dag.DAGContext. Since # Python is shutting down, an ImportError occurs # (ImportError: sys.meta_path is None, Python is likely # shutting down).

The same problem persists with the current behavior if the if clause is not introduced. Another the benefit of introducing the if clause is to only import the library and calculate timeout when necessary. timeout is only used when self._max_execution_index < execution_index is True. We shouldn't import the library when self._max_execution_index == execution_index. I should keep the following while loop out of the if clause though.

As I speak, I found out that if the last CompiledDAGRef is unused, the ImportError persists even when the if clause is introduced. Is there a guidance for Ray users to always invoke get() on all CompiledDAGRef? If not, this may be a bug that needs to be addressed in a separate PR. WDYT?

Here is a minimal repro:

foo = Foo.remote() bar = Bar.remote() with InputNode() as inp: dag = MultiOutputNode([foo.increment.bind(inp), bar.decrement.bind(inp)]) dag = dag.experimental_compile() ref1 = dag.execute(1) ref2 = dag.execute(1) assert ref1.get() == [1, -1] dag.teardown() # Upon destruction, the DAG will be executed until the latest index. Again, # we attempt to import DAGContext during program exit.

For now, I think it's worth keeping the if clause and have another bug tracking the dangling CompiledDAGRef issue.

I think this issue I mentioned is correlated with your second point. Is there any reason why we want to avoid execution result leak (which is why we currently execute the DAG until the latest index and get() all results)?

rkooo567 · 2024-08-28T05:38:08Z

python/ray/dag/compiled_dag_node.py

-            if self._max_execution_index + 1 == execution_index:
-                # Directly fetch and return without buffering
+            while self._max_execution_index < execution_index:
+                if len(self._result_buffer) >= self._max_buffered_results:


I think this is wrong now. We should do len(self._result_buffer) * num_output_channels

can you also add a unit test? (or modify existing test)

I think this is okay, it matches the previous semantics. Maybe the naming is not good - it should be the number of DAG executions buffered, not the number of individual results buffered.

Agree. len(self._result_buffer) indicates the number of DAG executions while len(self._result_buffer) * num_output_channels represents the total number of outputs. I will adjust the naming of _max_buffered_results to _max_buffered_executions (any other suggestions?) and add a unit test.

Actually, I think we need to change max_buffered_results to max_buffered_executions if we were to change _max_buffered_results to _max_buffered_executions. Since there are quite a few references to max_buffered_results in the repo (including developer APIs and python/ray/dag/context.py which doesn't have much relevance to the changes in this PR), I'd suggest to not modify the naming now. We can track it as a bug and address it in a follow-up.

As for the unit test, I'll add one tomorrow.

Done adding a unit test for cases when there are multiple outputs. Now, if not all results from an execution index are fetched with get(), that execution index will still count towards the number of buffered results. Let me know if this sounds okay to you!

python/ray/dag/compiled_dag_node.py

jeffreyjeffreywang · 2024-08-29T07:05:23Z

Hey @rkooo567, thank you so much for the review. Let me experiment a few things you suggested and get back to you tomorrow.

stephanie-wang

Thanks for this contribution!

python/ray/dag/compiled_dag_node.py

stephanie-wang · 2024-08-29T21:50:15Z

python/ray/dag/compiled_dag_node.py

-            if self._max_execution_index + 1 == execution_index:
-                # Directly fetch and return without buffering
+            while self._max_execution_index < execution_index:
+                if len(self._result_buffer) >= self._max_buffered_results:


I think this is okay, it matches the previous semantics. Maybe the naming is not good - it should be the number of DAG executions buffered, not the number of individual results buffered.

python/ray/dag/compiled_dag_node.py

python/ray/dag/tests/experimental/test_accelerated_dag.py

python/ray/dag/compiled_dag_node.py

Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang · 2024-08-30T07:19:57Z

Thanks for all of the feedback. I'll keep addressing comments tomorrow and over the weekend. Will ping you as soon as the revision is ready for review.

jeffreyjeffreywang · 2024-08-30T07:28:25Z

It is backward incompatible changes and will break vllm. How complicated is it to support MultiOutputNode(_multiple_return_refs=True)? I am not suggesting to do it, but I'd like to measure the complexity.

@rkooo567, I think this should be quite straightforward to support. Essentially, we can ask clients to specify multiple_return_refs=True when they want a list of refs rather than a single ref wrapping all outputs. execute and execute_async can check whether MultiOutputNode is used to wrap the output and whether multiple_return_refs is set to True. If both conditions are satisfied, we return a list of refs. Otherwise, fall back to original behavior. Could you please help me understand VLLM's use case for me to examine whether this workaround is sufficient? Also, please let me know if there's any corner case I'm missing.

…etchers for synchronous case Signed-off-by: jeffreyjeffreywang <[email protected]>

Signed-off-by: jeffreyjeffreywang <[email protected]>

… is only a single output channel Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang · 2024-09-02T06:39:20Z

Please hold off reviewing as I'm now working on making it backward compatible. Thanks!

Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang · 2024-09-03T00:03:42Z

The change is now backward compatible. Working on unit tests and addressing the remaining comments.

Signed-off-by: jeffreyjeffreywang <[email protected]>

stephanie-wang · 2024-09-11T21:24:42Z

Thank you so much @stephanie-wang for all of the reviews. I'm looking forward to contributing more. 😄

Would you mind taking a final look at my latest update? I resolved some merge conflicts with SangBin's recent changes.

It looks good to me! Will merge once tests pass.

rkooo567 · 2024-09-11T23:29:12Z

also to be clear @stephanie-wang. @jeffreyjeffreywang it will be a breaking change right?

jeffreyjeffreywang · 2024-09-12T01:10:40Z

also to be clear @stephanie-wang. @jeffreyjeffreywang it will be a breaking change right?

Correct, this will be breaking. execute and execute_async on adag with MultiOutputNode will always return a list of refs/futures.

rkooo567 · 2024-09-13T15:54:53Z

makes sense! Also @jeffreyjeffreywang do we have issues for all follow up problems?

- Deserialization still happens in the destructor of CompiledDAGRef/CompiledDAGFuture
-  ray.get doesn't support fetching results from a list of CompiledDAGRef
-  There isn't sufficient ADAG test coverage for async cases (quite comprehensive for sync cases though)

jeffreyjeffreywang · 2024-09-13T22:47:54Z

makes sense! Also @jeffreyjeffreywang do we have issues for all follow up problems?

Yup, the second one is addressed in this PR. #47662 and #47614 track the remaining follow-up problems.

rkooo567 · 2024-09-16T15:40:54Z

@jeffreyjeffreywang #47684

actuallay is this a known issue? can you take a look?

jeffreyjeffreywang · 2024-09-16T17:51:13Z

@rkooo567 Yeah, I also realized this may be a problem over the weekend. I think we need to support calling ray.wait on CompiledDAGFuture and a list of CompiledDAGFuture as well.

…tiple refs or futures to allow clients to retrieve them one at a time (ray-project#46908) (ray-project#47305) ## Why are these changes needed? Currently, if `MultiOutputNode` is used to wrap a DAG's output, you get back a single `CompiledDAGRef` or `CompiledDAGFuture`, depending on whether `execute` or `execute_async` is invoked, that points to a list of all of the outputs. To retrieve one of the outputs, you have to get and deserialize all of them at the same time. This PR separates the output of `execute` and `execute_async` to a list of `CompiledDAGRef` or `CompiledDAGFuture` when the output is wrapped by `MultiOutputNode`. This is particularly useful for vLLM tensor parallelism. Since all shards return the same results, we only need to fetch result from one of the workers. Closes ray-project#46908. --------- Signed-off-by: jeffreyjeffreywang <[email protected]> Signed-off-by: Jeffrey Wang <[email protected]> Co-authored-by: jeffreyjeffreywang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>

jeffreyjeffreywang added 7 commits August 19, 2024 00:20

Separate the output of ADAG synchronous execute() API to a list of Co…

99d24d5

…mpiledDAGRef Signed-off-by: jeffreyjeffreywang <[email protected]>

Separate ADAG execute_async() API output into a list of CompiledDAGFu…

3e277bd

…tures Signed-off-by: jeffreyjeffreywang <[email protected]>

Fixing tests in progress

855e445

Signed-off-by: jeffreyjeffreywang <[email protected]>

Fix ADAG tests (in progress); execute_async can still return a single…

8cb739d

… CompiledDAGFuture Signed-off-by: jeffreyjeffreywang <[email protected]>

Reformat

05fa1b6

Signed-off-by: jeffreyjeffreywang <[email protected]>

Fix ADAG tests

30a398b

Signed-off-by: jeffreyjeffreywang <[email protected]>

Add async multi-output test case

1111c3a

Signed-off-by: jeffreyjeffreywang <[email protected]>

anyscalesam added compiled-graph triage Needs triage (eg: priority, bug/not-bug, and owning component) core Issues that should be addressed in Ray Core labels Aug 26, 2024

ruisearch42 reviewed Aug 26, 2024

View reviewed changes

python/ray/experimental/compiled_dag_ref.py Outdated Show resolved Hide resolved

python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved

python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved

stephanie-wang self-assigned this Aug 27, 2024

CR feedback

0853a20

Signed-off-by: jeffreyjeffreywang <[email protected]>

rkooo567 reviewed Aug 28, 2024

View reviewed changes

stephanie-wang reviewed Aug 29, 2024

View reviewed changes

stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 29, 2024

CR feedback (WIP)

66654a7

Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang added 3 commits September 2, 2024 01:07

Indexing consistent with dag_output_channels; stop using dag_output_f…

f55b6c5

…etchers for synchronous case Signed-off-by: jeffreyjeffreywang <[email protected]>

Get rid of dag_output_fetchers entirely

cfc8db1

Signed-off-by: jeffreyjeffreywang <[email protected]>

Return a list of results if MultiOutputNode is used even though there…

65edc3f

… is only a single output channel Signed-off-by: jeffreyjeffreywang <[email protected]>

jeffreyjeffreywang added 3 commits September 2, 2024 22:08

Backward compatibility (WIP)

d4a841b

Signed-off-by: jeffreyjeffreywang <[email protected]>

Merge remote-tracking branch 'upstream/master' into adag-multi-output

821eaca

Signed-off-by: jeffreyjeffreywang <[email protected]>

Modify deserialization logic; working on tests

dca6153

Signed-off-by: jeffreyjeffreywang <[email protected]>

Adjust tests

59c08f6

Signed-off-by: jeffreyjeffreywang <[email protected]>

stephanie-wang enabled auto-merge (squash) September 11, 2024 21:25

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 11, 2024

stephanie-wang merged commit a6f923b into ray-project:master Sep 11, 2024
7 checks passed

jeffreyjeffreywang mentioned this pull request Sep 12, 2024

[adag] Avoid deserialization during CompiledDAGRef's deallocation #47614

Open

stephanie-wang mentioned this pull request Sep 22, 2024

[experimental][adag] ray.get doesn't support list of aDAG refs #46808

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305

[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305

jeffreyjeffreywang commented Aug 23, 2024 •

edited

Loading

jeffreyjeffreywang commented Aug 23, 2024

ruisearch42 left a comment

rkooo567 left a comment

rkooo567 Aug 28, 2024

jeffreyjeffreywang Aug 30, 2024

jeffreyjeffreywang Aug 30, 2024

jeffreyjeffreywang Aug 30, 2024

jeffreyjeffreywang Aug 30, 2024

rkooo567 Aug 28, 2024

rkooo567 Aug 28, 2024

stephanie-wang Aug 29, 2024

jeffreyjeffreywang Aug 30, 2024

jeffreyjeffreywang Sep 3, 2024

jeffreyjeffreywang Sep 4, 2024

jeffreyjeffreywang commented Aug 29, 2024

stephanie-wang left a comment

stephanie-wang Aug 29, 2024

jeffreyjeffreywang commented Aug 30, 2024

jeffreyjeffreywang commented Aug 30, 2024 •

edited

Loading

jeffreyjeffreywang commented Sep 2, 2024

jeffreyjeffreywang commented Sep 3, 2024

stephanie-wang commented Sep 11, 2024

rkooo567 commented Sep 11, 2024 •

edited

Loading

jeffreyjeffreywang commented Sep 12, 2024

rkooo567 commented Sep 13, 2024

jeffreyjeffreywang commented Sep 13, 2024

rkooo567 commented Sep 16, 2024

jeffreyjeffreywang commented Sep 16, 2024

[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305

[core][adag] Separate the outputs of execute and execute_async to multiple refs or futures to allow clients to retrieve them one at a time (#46908) #47305

Conversation

jeffreyjeffreywang commented Aug 23, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

jeffreyjeffreywang commented Aug 23, 2024

ruisearch42 left a comment

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffreyjeffreywang commented Aug 29, 2024

stephanie-wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffreyjeffreywang commented Aug 30, 2024

jeffreyjeffreywang commented Aug 30, 2024 • edited Loading

jeffreyjeffreywang commented Sep 2, 2024

jeffreyjeffreywang commented Sep 3, 2024

stephanie-wang commented Sep 11, 2024

rkooo567 commented Sep 11, 2024 • edited Loading

jeffreyjeffreywang commented Sep 12, 2024

rkooo567 commented Sep 13, 2024

jeffreyjeffreywang commented Sep 13, 2024

rkooo567 commented Sep 16, 2024

jeffreyjeffreywang commented Sep 16, 2024

jeffreyjeffreywang commented Aug 23, 2024 •

edited

Loading

jeffreyjeffreywang commented Aug 30, 2024 •

edited

Loading

rkooo567 commented Sep 11, 2024 •

edited

Loading