-
Notifications
You must be signed in to change notification settings - Fork 512
Add Modal orchestrator with step operator and orchestrator flavors #3733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
- Adds new Modal orchestrator flavor for serverless pipeline execution - Implements optimized execution modes: pipeline (default) and per_step - Supports GPU/CPU resource configuration with intelligent defaults - Features persistent apps with warm containers for fast execution - Includes comprehensive documentation and examples - Simplifies execution model by removing redundant single_function mode 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Documentation Link Check Results✅ Absolute links check passed |
✅ Branch tenant has been deployed! Access it at: https://staging.cloud.zenml.io/workspaces/feature-modal-orchestrator/projects |
elif ( | ||
log_age < 300 | ||
): # Only show logs from last 5 minutes | ||
# This log is recent enough to likely be ours | ||
logger.info(f"{log_msg}") | ||
# Else: skip old logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit seems maybe like it could use more attention. Also are you sure that we wouldn't have time zone mismatches here etc?
else: | ||
# Fallback to first step's resource settings if no pipeline-level resources | ||
if deployment.step_configurations: | ||
first_step = list(deployment.step_configurations.values())[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the first step is unrepresentative, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check / just make this really explicit in the docs that this is how we assume this
# Register the orchestrator with explicit credentials | ||
zenml orchestrator register <ORCHESTRATOR_NAME> \ | ||
--flavor=modal \ | ||
--token=<MODAL_TOKEN> \ | ||
--workspace=<MODAL_WORKSPACE> \ | ||
--synchronous=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use --token-id
and --token-secret
separately as per the code?
### Authentication with different environments | ||
|
||
For production deployments, you can specify different Modal environments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could have a little info box in this section (or maybe even above, linking down here) to say that you might want to have two different stacks, each associated with a different modal environment, one for prod and the other for development etc etc.
log_stream_active.set() | ||
start_time = time.time() | ||
|
||
def stream_logs() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function in function smells a bit wrong and also wondering if we should instead use their Python SDK to stream the logs? https://github.com/modal-labs/modal-client/blob/4177d0b994ac69e01ada7d7a96655c9dcaae570e/modal/cli/utils.py#L24
Possibly something for down the line, though the func-in-func seems off.
if TYPE_CHECKING: | ||
from zenml.integrations.modal.flavors.modal_orchestrator_flavor import ( | ||
ModalOrchestratorConfig, | ||
ModalOrchestratorSettings, | ||
) | ||
|
||
from zenml.integrations.modal.flavors.modal_orchestrator_flavor import ( | ||
ModalExecutionMode, | ||
) | ||
|
||
if TYPE_CHECKING: | ||
from zenml.models import PipelineDeploymentResponse, PipelineRunResponse | ||
from zenml.models.v2.core.pipeline_deployment import PipelineDeploymentBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combine the 'if TYPE_CHECKING' parts?
|
||
The Modal orchestrator supports two execution modes: | ||
|
||
1. **`pipeline` (default)**: Runs the entire pipeline in a single Modal function for maximum speed and cost efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why this pipeline
option is max speed. Isn't it running everything sequentially in the same container? Wouldn't running things in parallel in separate Modal function calls run faster?
src/zenml/integrations/modal/flavors/modal_orchestrator_flavor.py
Outdated
Show resolved
Hide resolved
Using the ZenML `modal` integration, you can orchestrate and scale your ML pipelines on [Modal's](https://modal.com/) serverless cloud platform with minimal setup and maximum efficiency. | ||
|
||
The Modal orchestrator is designed for speed and cost-effectiveness, running entire pipelines in single serverless functions to minimize cold starts and optimize resource utilization. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some representative screenshot of the Modal UI in here to make the docs a bit friendlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its fine without
- Extract nested log streaming function into ModalLogStreamer class for better code organization - Remove unreliable timezone-based log filtering that could miss logs due to clock skew - Implement smarter resource fallback: use highest requirements across all steps instead of potentially unrepresentative first step - Add logging for resource selection decisions to improve debugging - Fix function-in-function code smell identified in PR review 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Combine duplicate TYPE_CHECKING blocks into single import section - Improve import organization and reduce redundancy - Maintain all existing functionality while improving code structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Import MODAL_ORCHESTRATOR_FLAVOR constant from central location to avoid duplication - Update requirements to modal>=1 after testing compatibility with both orchestrator and step operator - Remove unnecessary utils import that was only for mypy discovery - Maintain consistent import patterns across Modal integration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Based on PR review feedback: - Fix token authentication examples to use --token-id and --token-secret - Add "When NOT to use it" section with clear tradeoffs and alternatives - Add info boxes for environment separation best practices and cost implications - Document Modal vs Step Operator differences with usage recommendations - Add GPU base image requirements and CUDA compatibility warnings - Clarify execution modes: "pipeline" mode reduces overhead vs enables parallelism - Document resource fallback behavior and warming window defaults - Add container warming cost implications with specific guidance - Remove tracking pixel per review request - Improve overall documentation clarity and completeness 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming tests pass.
src/zenml/integrations/modal/flavors/modal_orchestrator_flavor.py
Outdated
Show resolved
Hide resolved
token_id: Optional[str] = SecretField(default=None) | ||
token_secret: Optional[str] = SecretField(default=None) | ||
workspace: Optional[str] = None | ||
environment: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already exists on superclass
Returns: | ||
True since the orchestrator waits for completion. | ||
""" | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. All other orchestrators have the synchronous attribute in the config and return that value here I think?
token_id: Optional[str] = SecretField(default=None) | ||
token_secret: Optional[str] = SecretField(default=None) | ||
workspace: Optional[str] = None | ||
environment: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined on super class
try: | ||
import modal | ||
except ImportError: | ||
modal = None # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? And then it crashes with a worse error below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
logger.info( | ||
f"Running step '{step_name}' remotely (pipeline run: {pipeline_run_id})" | ||
) | ||
sys.stdout.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Exception: If step execution fails. | ||
""" | ||
# Get pipeline run ID for debugging/logging | ||
pipeline_run_id = os.environ.get("ZENML_PIPELINE_RUN_ID", "unknown") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this, and who sets it?
) | ||
|
||
# Create the configuration and run the step | ||
config = StepEntrypointConfiguration(arguments=args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not meant to be used in that way, no idea if this will lead to issues
""" | ||
# Use the standard containerized orchestrator build logic | ||
# This ensures ZenML builds the image with all pipeline code | ||
return super().get_docker_builds(deployment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
if modal is None: | ||
raise RuntimeError( | ||
"Required dependencies not installed. Please install with: pip install modal" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if modal is None: | |
raise RuntimeError( | |
"Required dependencies not installed. Please install with: pip install modal" | |
) | |
if modal is None: | |
raise ImportError( | |
"Modal is not installed. Please install with: pip install 'modal>=1.0'" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is intentional
Co-authored-by: Michael Schuster <[email protected]>
…ent` to `modal_environment` in settings and configs.- Update references and usage correspondingly.- Refactor logging in `ModalLogStreamer` class.<commit message>
|
||
# Pass pipeline run ID for proper isolation (following other orchestrators' pattern) | ||
if placeholder_run: | ||
environment["ZENML_PIPELINE_RUN_ID"] = str(placeholder_run.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi can i do this? Does this work?
Describe changes
Add Modal Orchestrator Integration
This PR adds a new Modal orchestrator to the ZenML integrations, enabling users
to run complete ML pipelines on Modal's serverless cloud infrastructure with
optimized performance and cost efficiency.
What does this PR do?
pipelines on Modal's cloud platform
maximum speed
for faster execution
intelligent defaults
Implementation Details
File Structure
src/zenml/integrations/modal/
├── orchestrators/ # New directory
│ ├── init.py
│ └── modal_orchestrator.py # Main orchestrator implementation
├── flavors/
│ ├── modal_orchestrator_flavor.py # New flavor definition
│ └── init.py # Updated exports
└── init.py # Updated with orchestrator
registration
Key Features
performance
CPU, 64GB RAM)
starts
compatibility
Usage Example
Why this approach?
inter-step overhead
integration
Breaking Changes
None - this is a new integration that doesn't affect existing functionality.
Dependencies
Note: This orchestrator follows the same patterns as other ZenML orchestrators
(GCP Vertex, Kubernetes) and integrates seamlessly with the existing ZenML
stack architecture.
Note: I also updated the step operator logic to unify it
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes