Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions docs/api/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,15 @@ $ hog [OPTIONS] COMMAND [ARGS]...
* `run`: Run a Python script on a Globus Compute...
* `init`: Create a new groundhog script with PEP 723...
* `add`: Add dependencies or update Python version...
* `remove`: Remove dependencies or endpoint...
* `remove`: Remove dependencies from a script's PEP...

## `hog run`

Run a Python script on a Globus Compute endpoint.

Use -- to pass arguments to parameterized harnesses:
hog run script.py harness -- arg1 --option=value

**Usage**:

```console
Expand Down Expand Up @@ -59,6 +62,7 @@ $ hog init [OPTIONS] FILENAME

* `-p, --python TEXT`: Python version specifier (e.g., --python '>=3.11' or -p 3.11)
* `-e, --endpoint TEXT`: Template config for endpoint with known fields, e.g. --endpoint my-endpoint-uuid. Can also be one of the following pre-configured names: anvil, anvil.gpu, tutorial (e.g. --endpoint anvil.gpu). Can specify multiple.
* `--log-level TEXT`: Set logging level (DEBUG, INFO, WARNING, ERROR)
* `--help`: Show this message and exit.

## `hog add`
Expand Down Expand Up @@ -86,7 +90,7 @@ $ hog add [OPTIONS] SCRIPT [PACKAGES]...

## `hog remove`

Remove dependencies or endpoint configurations from a script's PEP 723 metadata.
Remove dependencies from a script's PEP 723 metadata.

**Usage**:

Expand All @@ -102,4 +106,5 @@ $ hog remove [OPTIONS] SCRIPT [PACKAGES]...
**Options**:

* `-e, --endpoint TEXT`: Remove endpoint or variant configuration (e.g., anvil, anvil.gpu, my_endpoint). Known endpoints: anvil, anvil.gpu, tutorial. Can specify multiple. Note: Removing a base endpoint (e.g., anvil) removes all its variants. Removing a specific variant (e.g., anvil.gpu) leaves the base and other variants intact.
* `--log-level TEXT`: Set logging level (DEBUG, INFO, WARNING, ERROR)
* `--help`: Show this message and exit.
116 changes: 116 additions & 0 deletions docs/concepts/functions-and-harnesses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Functions and Harnesses

Groundhog scripts use two decorator types: `@hog.function()` for remote-executable code, and `@hog.harness()` for local orchestration.

**TL;DR:** Functions are the core abstraction for running remote or isolated code. Harnesses are a convenience for orchestrating functions from the CLI.

## Functions

A **function** is a unit of work that runs remotely on an HPC cluster. Decorate any Python function with `@hog.function()` to enable remote execution:

```python
@hog.function(endpoint="anvil")
def train_model(dataset: str, epochs: int) -> dict:
"""This code runs on the remote HPC cluster."""
import torch
# ... training logic ...
return {"accuracy": 0.95}
```

Functions provide four execution modes:

| Method | Where it runs | Behavior |
|--------|---------------|----------|
| `func(args)` | Local process | Direct call, no serialization |
| `func.remote(args)` | HPC cluster | Blocks until complete, returns result |
| `func.submit(args)` | HPC cluster | Returns immediately with `GroundhogFuture` |
| `func.local(args)` | Local subprocess | Isolated environment, useful for testing |

The `.remote()`, `.submit()`, and `.local()` methods serialize your arguments, send your entire script to the target environment, and execute in an isolated Python environment managed by uv.

## Harnesses

A **harness** is an entry point that orchestrates function calls. Harnesses run locally on your machine and coordinate remote execution:

```python
@hog.harness()
def main():
"""This code runs locally, orchestrating remote work."""
result = train_model.remote("imagenet", epochs=100)
print(f"Training complete: {result}")
```

Run a harness with the `hog run` command:

```bash
hog run script.py # Runs the 'main' harness
hog run script.py my_harness # Runs a specific harness
```

### Parameterized harnesses

Harnesses can accept parameters that map to CLI arguments. This makes harnesses reusable without editing code:

```python
@hog.harness()
def train(dataset: str, epochs: int = 10, debug: bool = False):
"""Configurable training harness."""
if debug:
print(f"Training on {dataset} for {epochs} epochs")
result = train_model.remote(dataset, epochs)
return result
```

Pass arguments after a `--` separator:

```bash
# Positional argument + options
hog run script.py train -- imagenet --epochs=50

# With debug flag
hog run script.py train -- imagenet --epochs=50 --debug

# Get help for harness parameters
hog run script.py train -- --help
```

The `--` separator distinguishes harness arguments from `hog run` flags. Everything before `--` belongs to `hog run`; everything after goes to the harness.

### Supported parameter types

Harness parameters use [Typer](https://typer.tiangolo.com/) for CLI parsing. Supported types include:

- Basic types: `str`, `int`, `float`, `bool`
- Path types: `Path`, `pathlib.Path`
- Optional types: `Optional[str]` becomes an optional CLI argument
- Enums and `Literal` types for constrained choices

Parameters without defaults become required positional arguments. Parameters with defaults become optional flags.

```python
@hog.harness()
def process(
input_file: Path, # Required positional: INPUTFILE
output_dir: Path = Path("."), # Optional flag: --output-dir
verbose: bool = False, # Boolean flag: --verbose / --no-verbose
):
...
```

```bash
hog run script.py process -- data.csv --output-dir=results --verbose
```

### Default harness with arguments

To pass arguments to the default `main` harness, use `--` without specifying a harness name:

```bash
hog run script.py -- --epochs=20 # Runs main with epochs=20
```

## Next steps

- **[Parallel Execution](../examples/parallel-execution.md)** - Use `.submit()` to run functions concurrently
- **[Parameterized Harness Example](../examples/parameterized-harness.md)** - Complete example with CLI arguments
- **[Remote Execution Flow](remote-execution.md)** - Understand what happens when you call `.remote()`
3 changes: 2 additions & 1 deletion docs/examples/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,8 @@ echo 'starting job'
pip show -qq uv || pip install uv || true # (1)!
```

1. Groundhog adds this automatically, just in case `uv` isn't present in the remote environment to bootstrap the groundhog runner. If you or your endpoint administrator could ensure `uv` is available for groundhog, your sailing will be smoother ⛵️🦫.

And the final `endpoint_setup` contains:

```bash
Expand All @@ -175,7 +177,6 @@ module load cuda
export CUDA_VISIBLE_DEVICES=0
```

1. Groundhog adds this automatically, just in case `uv` isn't present in the remote environment to bootstrap the groundhog runner. If you or your endpoint administrator could ensure `uv` is available for groundhog, your sailing will be smoother ⛵️🦫.

This allows you to build up initialization commands from multiple sources.

Expand Down
1 change: 1 addition & 0 deletions docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ These examples cover the basics of using Groundhog:
Examples showing how to handle typical workflows:

- **[Parallel Execution](parallel-execution.md)** - Using `.submit()` for concurrent remote execution
- **[Parameterized Harnesses](parameterized-harness.md)** - Harnesses that accept CLI arguments for runtime configuration
- **[Endpoint Configuration](configuration.md)** - How the configuration system merges settings from multiple sources (PEP 723, decorators, call-time overrides)
- **[PyTorch from Custom Sources](pytorch_custom_index.md)** - Configuring uv to install packages from cluster-specific indexes, local paths, or internal mirrors
- **[Importing Groundhog Functions](imported_function.md)** - Calling Groundhog functions from regular Python scripts, REPLs, and notebooks (includes import safety and `mark_import_safe()`)
Expand Down
57 changes: 57 additions & 0 deletions docs/examples/parameterized-harness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Parameterized Harnesses

This example shows how to create harnesses that accept CLI arguments.

## The script

```python
--8<-- "examples/parameterized_harness.py"
```

## Running the example

Run with default parameters:

```bash
hog run parameterized_harness.py
```

Pass arguments after the `--` separator:

```bash
# Required positional + optional flag
hog run parameterized_harness.py -- my_dataset --epochs=20

# With debug mode
hog run parameterized_harness.py -- my_dataset --epochs=5 --debug
```

View available parameters:

```bash
hog run parameterized_harness.py -- --help
```

## How it works

The `main` harness accepts three parameters:

```python
@hog.harness()
def main(dataset: str = "default_dataset", epochs: int = 10, debug: bool = False):
...
```

Typer maps these to CLI arguments:

- `dataset` has a default, so it's an optional positional argument
- `epochs` becomes `--epochs`
- `debug` becomes `--debug` / `--no-debug`

The `--` separator tells `hog run` where its own flags end and harness arguments begin.

## See also

- **[Functions and Harnesses](../concepts/functions-and-harnesses.md)** - Conceptual overview
- **[Hello World](hello-world.md)** - Simplest example with zero-argument harness
- **[Typer documentation](https://typer.tiangolo.com/)** - CLI parsing library used for harness parameters
2 changes: 1 addition & 1 deletion docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ The comment block at the top uses [PEP 723](https://peps.python.org/pep-0723/) i
### Functions and harnesses

- **`@hog.function()`**: Decorates a Python function to make it executable remotely
- **`@hog.harness()`**: Decorates a zero-argument orchestrator function that calls other functions
- **`@hog.harness()`**: Decorates an orchestrator function that coordinates remote calls. Harnesses can accept parameters passed as CLI arguments (see [Functions and Harnesses](../concepts/functions-and-harnesses.md))
- **`.remote()`**: Executes the function remotely and blocks until complete (alternatively, use **`.submit()`** for async execution)

## Add dependencies
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,9 +222,9 @@ hog run analysis.py

---

Understand how Groundhog handles PEP 723, serialization, and remote execution
Understand functions, harnesses, PEP 723, and remote execution

[Concepts →](concepts/pep723.md)
[Concepts →](concepts/functions-and-harnesses.md)

<!--
- **API Reference**
Expand Down
67 changes: 67 additions & 0 deletions examples/parameterized_harness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# /// script
# requires-python = ">=3.12,<3.13"
# dependencies = []
#
# [tool.uv]
# exclude-newer = "2026-01-08T00:00:00Z"
#
# [tool.hog.anvil]
# endpoint = "5aafb4c1-27b2-40d8-a038-a0277611868f"
# account = "cis250461"
# walltime = "00:30:00"
#
# ///

"""Example demonstrating parameterized harnesses.

Harnesses can accept parameters that map to CLI arguments, making them
reusable without editing code.

Usage:
# Run with defaults
hog run parameterized_harness.py

# Pass arguments after --
hog run parameterized_harness.py -- my_dataset --epochs=20

# Enable debug mode
hog run parameterized_harness.py -- my_dataset --epochs=5 --debug

# Get help for harness parameters
hog run parameterized_harness.py -- --help
"""

import groundhog_hpc as hog


@hog.function(endpoint="anvil")
def train_model(dataset: str, epochs: int) -> dict:
"""Simulate model training on the remote endpoint."""
# In a real script, this would do actual training
return {
"dataset": dataset,
"epochs": epochs,
"accuracy": 0.85 + (epochs * 0.001),
}


@hog.harness()
def main(dataset: str = "default_dataset", epochs: int = 10, debug: bool = False):
"""Training harness with configurable parameters.

Args:
dataset: Name of the dataset to train on
epochs: Number of training epochs
debug: Enable debug output
"""
if debug:
print(f"Debug: Training on '{dataset}' for {epochs} epochs")

result = train_model.remote(dataset, epochs)

print("Training complete!")
print(f" Dataset: {result['dataset']}")
print(f" Epochs: {result['epochs']}")
print(f" Accuracy: {result['accuracy']:.3f}")

return result
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,12 @@ nav:
- Running Locally: examples/local.md
- Organizing with Classes: examples/methods.md
- Parallel Execution: examples/parallel-execution.md
- Parameterized Harnesses: examples/parameterized-harness.md
- Endpoint Configuration: examples/configuration.md
- Customizing Package Sources: examples/pytorch_custom_index.md
- Importing Groundhog Functions: examples/imported_function.md
- Concepts:
- Functions and Harnesses: concepts/functions-and-harnesses.md
- PEP 723 Metadata: concepts/pep723.md
- Remote Execution Flow: concepts/remote-execution.md
- Serialization: concepts/serialization.md
Expand Down
6 changes: 5 additions & 1 deletion src/groundhog_hpc/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@

app = typer.Typer(pretty_exceptions_show_locals=False)

app.command(no_args_is_help=True)(run)
# Enable extra args for run command to capture harness arguments after --
app.command(
no_args_is_help=True,
context_settings={"allow_extra_args": True, "allow_interspersed_args": False},
)(run)
app.command(no_args_is_help=True)(init)
app.command(no_args_is_help=True)(add)
app.command(no_args_is_help=True)(remove)
Expand Down
Loading
Loading