Skip to content

Commit

Permalink
merged with dev
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <[email protected]>
  • Loading branch information
touma-I committed Dec 10, 2024
2 parents 0c38a4c + c9b49db commit fd0b261
Show file tree
Hide file tree
Showing 20 changed files with 1,970 additions and 7 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,7 @@ dmypy.json

# Ignore artifacts directory created by the `make save-images` target
artifacts/

*.lock

*.db
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,8 @@ conda install gxx_linux-64
Next, install the data prep toolkit library. This library installs both the python and ray versions of the transforms. For better management of dependencies, it is recommended to install the same tagged version of both the library and the transform.

```bash
pip3 install 'data-prep-toolkit[ray]==0.2.2.dev1'
pip3 install 'data-prep-toolkit-transforms[ray,all]==0.2.2.dev1'
pip3 install 'data-prep-toolkit[ray]==0.2.3.dev0'
pip3 install 'data-prep-toolkit-transforms[ray,all]==0.2.3.dev1'
pip3 install jupyterlab ipykernel ipywidgets

## install custom kernel
Expand Down Expand Up @@ -144,7 +144,7 @@ The matrix below shows the the combination of modules and supported runtimes. Al
| [Web to Parquet](transforms/universal/web2parquet/README.md) | :white_check_mark: | | | |
| **Universal (Code & Language)** | | | | |
| [Exact dedup filter](transforms/universal/ededup/ray/README.md) | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
| [Fuzzy dedup filter](transforms/universal/fdedup/ray/README.md) | | :white_check_mark: | | :white_check_mark: |
| [Fuzzy dedup filter](transforms/universal/fdedup/ray/README.md) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| [Unique ID annotation](transforms/universal/doc_id/ray/README.md) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| [Filter on annotations](transforms/universal/filter/python/README.md) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| [Profiler](transforms/universal/profiler/ray/README.md) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,19 @@
from typing import Any

import ray
from ray.experimental.state.api import list_actors
from data_processing.utils import GB, UnrecoverableException
from ray.actor import ActorHandle
from ray.exceptions import RayError
from ray.experimental.state.api import list_actors
from ray.util.actor_pool import ActorPool


# This value matches the constant `RAY_MAX_LIMIT_FROM_API_SERVER` defined in the ray source code here:
# https://github.com/ray-project/ray/blob/569f7df9067c5654fb57ba7bc4792b3ba5aaa846/python/ray/util/state/common.py#L50-L53

RAY_MAX_ACTOR_LIMIT = 10000


class RayUtils:
"""
Class implementing support methods for Ray execution
Expand Down Expand Up @@ -109,11 +115,13 @@ def operator() -> ActorHandle:
time.sleep(creation_delay)
return clazz.options(**actor_options).remote(params)

cls_name = clazz.__class__.__name__.replace('ActorClass(', '').replace(')','')
cls_name = clazz.__class__.__name__.replace("ActorClass(", "").replace(")", "")
actors = [operator() for _ in range(n_actors)]
for i in range(120):
time.sleep(1)
alive = list_actors(filters=[("class_name", "=", cls_name), ("state", "=", "ALIVE")])
alive = list_actors(
filters=[("class_name", "=", cls_name), ("state", "=", "ALIVE")], limit=RAY_MAX_ACTOR_LIMIT
)
if len(actors) == len(alive):
return actors
# failed - raise an exception
Expand Down
Loading

0 comments on commit fd0b261

Please sign in to comment.