Skip to content

Add arithmetic_chain community environment#420

Open
nevasini1 wants to merge 3 commits intoNousResearch:mainfrom
nevasini1:feature/arithmetic-chain-env
Open

Add arithmetic_chain community environment#420
nevasini1 wants to merge 3 commits intoNousResearch:mainfrom
nevasini1:feature/arithmetic-chain-env

Conversation

@nevasini1
Copy link

@nevasini1 nevasini1 commented Mar 21, 2026

PR Type

  • RL Environment PR — Environment Snapshot & Zero-Training sections below
  • Non-Environment PR

Description

Adds arithmetic_chain: procedural multi-step integer word problems; answers scored with math_verify + \boxed{}, same pattern as GSM8K-style envs. Uses ManagedServer for training logprobs. No Hugging Face dataset.


Environment Snapshot

Field Your Entry
Environment Name arithmetic_chain
Short Description Procedural add/sub/mul chains; model must output final integer in \boxed{}.
Category Verifiable-Reasoning
Dataset Needed? No
External Deps Same as core Atropos stack (math_verify, latex2sympy2_extended, etc.) — already in atroposlib.
Environmental Variables None required; use OPENAI_API_KEY only if you point --openai.base_url at a provider that needs it.
Compute Footprint Estimate Lightweight verification (integer equality via parser); rollout cost is dominated by inference API.

Zero-Training Test Results

Full process run: Not executed in the contributor CI sandbox (no inference key / local vLLM here). Please run locally when reviewing:

python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
  --slurm false \
  --env.total_steps 5 \
  --env.data_path_to_save_groups arithmetic_chain_zero_train.jsonl

(Override --openai.* to match your server; see process --help.)

Offline scoring check (same parse / verify path as score() in arithmetic_chain_server.py):

Example Assistant tail vs gold \boxed{14} verify
Good Step: 3+4=7, 7*2=14. \boxed{14} correct True
Bad \boxed{13} wrong False

W&B link: Not used for this check; enable use_wandb in config/CLI if you want a run URL for the template.


CI / checks (clarification)

  • pre-commit.ci - pr: Should show passing on the latest commit on this branch (including any [pre-commit.ci] auto fixes from pre-commit.com hooks commit). That is the authoritative style/lint check for this repo — not tied to the wording of any intermediate commit title.
  • Import order: An intermediate commit titled “Apply isort import order (ruff)…” may not match the final file: pre-commit.ci can push a small follow-up commit to match this repository’s isort/ruff profile exactly. If you see multiple commits touching imports, trust the latest tree + green pre-commit.ci.
  • Run Tests (GitHub Actions pytest): On PRs from forks, runs sometimes stay in action_required until a maintainer approves workflow execution for the contributor. That state is workflow approval, not “the isort commit failed.” Once approved, any real pytest failures would appear in the job logs.

Contributor local run (optional reference): pytest was run in a disposable venv: 137 passed, 50 skipped, 1 failed in an unrelated test (test_tokenize_for_trainer_mask_len_last_turn_only / HF model load). Your CI matrix may differ.


Test plan (for reviewers)

  • python environments/community/arithmetic_chain/arithmetic_chain_server.py process --help
  • Short process with your inference endpoint (JSONL + HTML output)
  • Optional: serve + run-api + trainer smoke test

Happy to adjust naming, defaults, or reward shaping to match maintainer preferences.

@nevasini1 nevasini1 force-pushed the feature/arithmetic-chain-env branch from a5e6f65 to d4cff17 Compare March 21, 2026 21:42
Procedural multi-step integer chains with boxed answers; uses ManagedServer
and math_verify for scoring. No external dataset required.

Made-with: Cursor
@nevasini1 nevasini1 force-pushed the feature/arithmetic-chain-env branch from d4cff17 to e6bc008 Compare March 21, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant