Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions .exp/design-workflow-1-grok-1-inference-and-sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The \"Grok-1 Inference and Sampling\" workflow provides the machinery to load th

Key inputs: Checkpoint in `./checkpoints/ckpt-0/`, `tokenizer.model`, GPU cluster, prompts as `Request` objects (prompt str, temperature float, nucleus_p float, rng_seed int, max_len int).
Outputs: Generated text strings.
Entry points: `run.py` for test run, or `InferenceRunner().run()` generator for streaming requests.
Entry points: `./install.sh` for automated full setup including dependency installation, model weight download via HF or torrent, and execution of test run via `run.py` [PR #389](https://github.com/xai-org/grok-1/pull/389); `run.py` directly for inference and sampling assuming environment, checkpoints, and tokenizer are prepared; `InferenceRunner().run()` generator for streaming batched requests in custom code.
Relevant files: `run.py`, `runners.py`, `model.py`, `checkpoint.py`, `tokenizer.model`.

The workflow orchestrates model loading, compilation of sharded compute functions, prompt processing (prefill KV cache while sampling first token), and iterative single-token generation using cached attention keys/values, until max length or EOS.
Expand Down Expand Up @@ -46,13 +46,17 @@ The workflow orchestrates model loading, compilation of sharded compute function
```mermaid
sequenceDiagram
participant User
participant InstallSh as install.sh
participant RunPy as run.py
participant IR as InferenceRunner
participant MR as ModelRunner
participant Model as model.py
participant Checkpoint as checkpoint.py
participant JAX as JAX Runtime
User->>RunPy: Execute main()
User->>InstallSh: Execute ./install.sh
InstallSh->>InstallSh: Install system deps, Python pkgs from requirements.txt
InstallSh->>InstallSh: Download checkpoints & tokenizer via HF/torrent
InstallSh->>RunPy: python3 run.py
RunPy->>IR: Create with config, MR, paths, meshes
IR->>MR: initialize(dummy_data, meshes)
MR->>Model: model.initialize(), fprop_dtype=bf16
Expand All @@ -64,6 +68,7 @@ sequenceDiagram
IR->>IR: Precompile with dummy prompts for pad_sizes
RunPy->>IR: gen = run() // generator setup with initial memory, settings, etc.
```
Note: Updated to include automated setup via install.sh (PR #389).

## Inference and Sampling Sequence

Expand Down Expand Up @@ -127,6 +132,6 @@ sequenceDiagram
- **Error/Edge Cases**: Assumes sufficient memory/GPUs; handles long contexts by left-truncation/padding. No built-in EOS handling (relies on max_len or app logic). Quantized weights require custom unpickling.
- **Performance Notes**: MoE router/experts use JAX vmap/shard_map (serial per-token, inefficient for prod). Focus on correctness/single-host validation.
- **Extensibility**: Modular Haiku design allows custom configs/modules. Generator interface suits serving multiple prompts concurrently.
- **Dependencies & Setup**: `requirements.txt` (jax[cuda12_pip], haiku, etc.). Download ckpt via torrent/HF, place in checkpoints/.
- **Dependencies & Setup**: `./install.sh` [PR #389](https://github.com/xai-org/grok-1/pull/389) automates the process: detects OS and installs system tools (git, python3, pip, transmission-cli using apt/dnf/brew or checks on Windows); installs Python dependencies via `pip install -r requirements.txt` and `huggingface_hub`; downloads `tokenizer.model` and `checkpoints/ckpt-0/*` using `huggingface-cli` from [xai-org/grok-1](https://huggingface.co/xai-org/grok-1), and full weights via torrent magnet link with `transmission-cli`; finally runs `python run.py` to test the workflow. Manual setup: install packages from `requirements.txt` (jax[cuda12_pip], haiku, sentencepiece, numpy, etc.); download checkpoint from [Hugging Face](https://huggingface.co/xai-org/grok-1) or torrent magnet `magnet:?xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce` and place in `./checkpoints/`.

This document captures the high-level design, derived from code analysis.
43 changes: 43 additions & 0 deletions pr-analysis-389.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# PR #389: Workflow Design Impact Analysis

## Affected Workflows
- **Grok-1 Inference and Sampling**: The PR creates `install.sh` which directly addresses and automates the dependencies and setup instructions documented in `.exp/design-workflow-1-grok-1-inference-and-sampling.md`, including installation of packages from `requirements.txt` and downloading checkpoints to `./checkpoints/`. Moreover, the script culminates in executing `run.py`, the primary entry point for this workflow (see [workflows.json](.exp/workflows.json)). This introduces a new automated prerequisite phase to the workflow design without modifying core code.

No other workflows (Model Loading and Initialization, Model Forward Pass and Logits Computation) are impacted, as their designs focus on runtime operations assuming prepared environment and files, with no references to installation processes.

## Grok-1 Inference and Sampling Analysis

### Summary of design changes
The PR introduces `install.sh`, a comprehensive Bash script that automates user setup for running the workflow. It detects the operating system, installs necessary system packages and Python dependencies, clones the repository if needed, downloads model weights via Hugging Face CLI and torrent, and launches `run.py` for inference testing. This affects the documented design by:

- Adding a new pre-initialization phase for environment and asset preparation, previously manual.
- Extending the entry point to include `install.sh` as a user-friendly wrapper around `run.py`.
- Enhancing setup documentation with automation details, links to resources, and manual alternatives.

The implementation uses conditional OS logic, external tools (apt, brew, dnf, transmission-cli, huggingface-cli), and error-checked commands for reliability. Benefits include cross-platform accessibility, reduced manual effort, and alternative download paths for robustness. Implications involve potential need for elevated privileges, network dependencies, and minor script quirks (e.g., redundant cloning when run from repo).

The design document has been updated accordingly: entry points expanded, setup section detailed with PR reference, and initialization sequence diagram revised to prepend setup steps.

```mermaid
graph LR
subgraph before ["Before PR"]
U1[User] --> R[Execute run.py]
R --> I[Initialization & Inference Setup]
end
subgraph after ["After PR"]
U2[User] --> IS[Execute install.sh]
IS --> SD[System deps install OS-specific]
IS --> PP[Python packages from requirements.txt]
IS --> DW[Download checkpoints & tokenizer]
DW --> R2[Execute run.py]
R2 --> I2[Initialization & Inference Setup]
end
before -.->|New setup phase added| after
style IS fill:#90EE90,stroke:#333,stroke-width:4px
style SD fill:#90EE90,stroke:#333,stroke-width:4px
style PP fill:#90EE90,stroke:#333,stroke-width:4px
style DW fill:#90EE90,stroke:#333,stroke-width:4px
style R2 fill:#FFFF00,stroke:#333,stroke-width:4px
```

**Legend**: Green rectangles indicate additions from the PR (new setup components); yellow indicates changes (altered execution path to run.py).