diff --git a/docs/api/cli.md b/docs/api/cli.md index f20bacb..acf23d9 100644 --- a/docs/api/cli.md +++ b/docs/api/cli.md @@ -18,12 +18,15 @@ $ hog [OPTIONS] COMMAND [ARGS]... * `run`: Run a Python script on a Globus Compute... * `init`: Create a new groundhog script with PEP 723... * `add`: Add dependencies or update Python version... -* `remove`: Remove dependencies or endpoint... +* `remove`: Remove dependencies from a script's PEP... ## `hog run` Run a Python script on a Globus Compute endpoint. +Use -- to pass arguments to parameterized harnesses: + hog run script.py harness -- arg1 --option=value + **Usage**: ```console @@ -59,6 +62,7 @@ $ hog init [OPTIONS] FILENAME * `-p, --python TEXT`: Python version specifier (e.g., --python '>=3.11' or -p 3.11) * `-e, --endpoint TEXT`: Template config for endpoint with known fields, e.g. --endpoint my-endpoint-uuid. Can also be one of the following pre-configured names: anvil, anvil.gpu, tutorial (e.g. --endpoint anvil.gpu). Can specify multiple. +* `--log-level TEXT`: Set logging level (DEBUG, INFO, WARNING, ERROR) * `--help`: Show this message and exit. ## `hog add` @@ -86,7 +90,7 @@ $ hog add [OPTIONS] SCRIPT [PACKAGES]... ## `hog remove` -Remove dependencies or endpoint configurations from a script's PEP 723 metadata. +Remove dependencies from a script's PEP 723 metadata. **Usage**: @@ -102,4 +106,5 @@ $ hog remove [OPTIONS] SCRIPT [PACKAGES]... **Options**: * `-e, --endpoint TEXT`: Remove endpoint or variant configuration (e.g., anvil, anvil.gpu, my_endpoint). Known endpoints: anvil, anvil.gpu, tutorial. Can specify multiple. Note: Removing a base endpoint (e.g., anvil) removes all its variants. Removing a specific variant (e.g., anvil.gpu) leaves the base and other variants intact. +* `--log-level TEXT`: Set logging level (DEBUG, INFO, WARNING, ERROR) * `--help`: Show this message and exit. diff --git a/docs/concepts/functions-and-harnesses.md b/docs/concepts/functions-and-harnesses.md new file mode 100644 index 0000000..002c01f --- /dev/null +++ b/docs/concepts/functions-and-harnesses.md @@ -0,0 +1,116 @@ +# Functions and Harnesses + +Groundhog scripts use two decorator types: `@hog.function()` for remote-executable code, and `@hog.harness()` for local orchestration. + +**TL;DR:** Functions are the core abstraction for running remote or isolated code. Harnesses are a convenience for orchestrating functions from the CLI. + +## Functions + +A **function** is a unit of work that runs remotely on an HPC cluster. Decorate any Python function with `@hog.function()` to enable remote execution: + +```python +@hog.function(endpoint="anvil") +def train_model(dataset: str, epochs: int) -> dict: + """This code runs on the remote HPC cluster.""" + import torch + # ... training logic ... + return {"accuracy": 0.95} +``` + +Functions provide four execution modes: + +| Method | Where it runs | Behavior | +|--------|---------------|----------| +| `func(args)` | Local process | Direct call, no serialization | +| `func.remote(args)` | HPC cluster | Blocks until complete, returns result | +| `func.submit(args)` | HPC cluster | Returns immediately with `GroundhogFuture` | +| `func.local(args)` | Local subprocess | Isolated environment, useful for testing | + +The `.remote()`, `.submit()`, and `.local()` methods serialize your arguments, send your entire script to the target environment, and execute in an isolated Python environment managed by uv. + +## Harnesses + +A **harness** is an entry point that orchestrates function calls. Harnesses run locally on your machine and coordinate remote execution: + +```python +@hog.harness() +def main(): + """This code runs locally, orchestrating remote work.""" + result = train_model.remote("imagenet", epochs=100) + print(f"Training complete: {result}") +``` + +Run a harness with the `hog run` command: + +```bash +hog run script.py # Runs the 'main' harness +hog run script.py my_harness # Runs a specific harness +``` + +### Parameterized harnesses + +Harnesses can accept parameters that map to CLI arguments. This makes harnesses reusable without editing code: + +```python +@hog.harness() +def train(dataset: str, epochs: int = 10, debug: bool = False): + """Configurable training harness.""" + if debug: + print(f"Training on {dataset} for {epochs} epochs") + result = train_model.remote(dataset, epochs) + return result +``` + +Pass arguments after a `--` separator: + +```bash +# Positional argument + options +hog run script.py train -- imagenet --epochs=50 + +# With debug flag +hog run script.py train -- imagenet --epochs=50 --debug + +# Get help for harness parameters +hog run script.py train -- --help +``` + +The `--` separator distinguishes harness arguments from `hog run` flags. Everything before `--` belongs to `hog run`; everything after goes to the harness. + +### Supported parameter types + +Harness parameters use [Typer](https://typer.tiangolo.com/) for CLI parsing. Supported types include: + +- Basic types: `str`, `int`, `float`, `bool` +- Path types: `Path`, `pathlib.Path` +- Optional types: `Optional[str]` becomes an optional CLI argument +- Enums and `Literal` types for constrained choices + +Parameters without defaults become required positional arguments. Parameters with defaults become optional flags. + +```python +@hog.harness() +def process( + input_file: Path, # Required positional: INPUTFILE + output_dir: Path = Path("."), # Optional flag: --output-dir + verbose: bool = False, # Boolean flag: --verbose / --no-verbose +): + ... +``` + +```bash +hog run script.py process -- data.csv --output-dir=results --verbose +``` + +### Default harness with arguments + +To pass arguments to the default `main` harness, use `--` without specifying a harness name: + +```bash +hog run script.py -- --epochs=20 # Runs main with epochs=20 +``` + +## Next steps + +- **[Parallel Execution](../examples/parallel-execution.md)** - Use `.submit()` to run functions concurrently +- **[Parameterized Harness Example](../examples/parameterized-harness.md)** - Complete example with CLI arguments +- **[Remote Execution Flow](remote-execution.md)** - Understand what happens when you call `.remote()` diff --git a/docs/examples/configuration.md b/docs/examples/configuration.md index 9cf8fdc..fb21d8e 100644 --- a/docs/examples/configuration.md +++ b/docs/examples/configuration.md @@ -167,6 +167,8 @@ echo 'starting job' pip show -qq uv || pip install uv || true # (1)! ``` +1. Groundhog adds this automatically, just in case `uv` isn't present in the remote environment to bootstrap the groundhog runner. If you or your endpoint administrator could ensure `uv` is available for groundhog, your sailing will be smoother ⛵️🦫. + And the final `endpoint_setup` contains: ```bash @@ -175,7 +177,6 @@ module load cuda export CUDA_VISIBLE_DEVICES=0 ``` -1. Groundhog adds this automatically, just in case `uv` isn't present in the remote environment to bootstrap the groundhog runner. If you or your endpoint administrator could ensure `uv` is available for groundhog, your sailing will be smoother ⛵️🦫. This allows you to build up initialization commands from multiple sources. diff --git a/docs/examples/index.md b/docs/examples/index.md index d060397..0a20d2d 100644 --- a/docs/examples/index.md +++ b/docs/examples/index.md @@ -16,6 +16,7 @@ These examples cover the basics of using Groundhog: Examples showing how to handle typical workflows: - **[Parallel Execution](parallel-execution.md)** - Using `.submit()` for concurrent remote execution +- **[Parameterized Harnesses](parameterized-harness.md)** - Harnesses that accept CLI arguments for runtime configuration - **[Endpoint Configuration](configuration.md)** - How the configuration system merges settings from multiple sources (PEP 723, decorators, call-time overrides) - **[PyTorch from Custom Sources](pytorch_custom_index.md)** - Configuring uv to install packages from cluster-specific indexes, local paths, or internal mirrors - **[Importing Groundhog Functions](imported_function.md)** - Calling Groundhog functions from regular Python scripts, REPLs, and notebooks (includes import safety and `mark_import_safe()`) diff --git a/docs/examples/parameterized-harness.md b/docs/examples/parameterized-harness.md new file mode 100644 index 0000000..1436f7b --- /dev/null +++ b/docs/examples/parameterized-harness.md @@ -0,0 +1,57 @@ +# Parameterized Harnesses + +This example shows how to create harnesses that accept CLI arguments. + +## The script + +```python +--8<-- "examples/parameterized_harness.py" +``` + +## Running the example + +Run with default parameters: + +```bash +hog run parameterized_harness.py +``` + +Pass arguments after the `--` separator: + +```bash +# Required positional + optional flag +hog run parameterized_harness.py -- my_dataset --epochs=20 + +# With debug mode +hog run parameterized_harness.py -- my_dataset --epochs=5 --debug +``` + +View available parameters: + +```bash +hog run parameterized_harness.py -- --help +``` + +## How it works + +The `main` harness accepts three parameters: + +```python +@hog.harness() +def main(dataset: str = "default_dataset", epochs: int = 10, debug: bool = False): + ... +``` + +Typer maps these to CLI arguments: + +- `dataset` has a default, so it's an optional positional argument +- `epochs` becomes `--epochs` +- `debug` becomes `--debug` / `--no-debug` + +The `--` separator tells `hog run` where its own flags end and harness arguments begin. + +## See also + +- **[Functions and Harnesses](../concepts/functions-and-harnesses.md)** - Conceptual overview +- **[Hello World](hello-world.md)** - Simplest example with zero-argument harness +- **[Typer documentation](https://typer.tiangolo.com/)** - CLI parsing library used for harness parameters diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md index 2f0d32c..1bfa1d2 100644 --- a/docs/getting-started/quickstart.md +++ b/docs/getting-started/quickstart.md @@ -60,7 +60,7 @@ The comment block at the top uses [PEP 723](https://peps.python.org/pep-0723/) i ### Functions and harnesses - **`@hog.function()`**: Decorates a Python function to make it executable remotely -- **`@hog.harness()`**: Decorates a zero-argument orchestrator function that calls other functions +- **`@hog.harness()`**: Decorates an orchestrator function that coordinates remote calls. Harnesses can accept parameters passed as CLI arguments (see [Functions and Harnesses](../concepts/functions-and-harnesses.md)) - **`.remote()`**: Executes the function remotely and blocks until complete (alternatively, use **`.submit()`** for async execution) ## Add dependencies diff --git a/docs/index.md b/docs/index.md index cd41577..5f33e1d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -222,9 +222,9 @@ hog run analysis.py --- - Understand how Groundhog handles PEP 723, serialization, and remote execution + Understand functions, harnesses, PEP 723, and remote execution - [Concepts →](concepts/pep723.md) + [Concepts →](concepts/functions-and-harnesses.md)