Add arithmetic_chain community environment#420
Open
nevasini1 wants to merge 3 commits intoNousResearch:mainfrom
Open
Add arithmetic_chain community environment#420nevasini1 wants to merge 3 commits intoNousResearch:mainfrom
nevasini1 wants to merge 3 commits intoNousResearch:mainfrom
Conversation
a5e6f65 to
d4cff17
Compare
Procedural multi-step integer chains with boxed answers; uses ManagedServer and math_verify for scoring. No external dataset required. Made-with: Cursor
d4cff17 to
e6bc008
Compare
Made-with: Cursor
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Type
Description
Adds
arithmetic_chain: procedural multi-step integer word problems; answers scored withmath_verify+\boxed{}, same pattern as GSM8K-style envs. UsesManagedServerfor training logprobs. No Hugging Face dataset.Environment Snapshot
arithmetic_chain\boxed{}.math_verify,latex2sympy2_extended, etc.) — already inatroposlib.OPENAI_API_KEYonly if you point--openai.base_urlat a provider that needs it.Zero-Training Test Results
Full
processrun: Not executed in the contributor CI sandbox (no inference key / local vLLM here). Please run locally when reviewing:python environments/community/arithmetic_chain/arithmetic_chain_server.py process \ --slurm false \ --env.total_steps 5 \ --env.data_path_to_save_groups arithmetic_chain_zero_train.jsonl(Override
--openai.*to match your server; seeprocess --help.)Offline scoring check (same
parse/verifypath asscore()inarithmetic_chain_server.py):\boxed{14}verifyStep: 3+4=7, 7*2=14. \boxed{14}\boxed{13}W&B link: Not used for this check; enable
use_wandbin config/CLI if you want a run URL for the template.CI / checks (clarification)
pre-commit.ci - pr: Should show passing on the latest commit on this branch (including any[pre-commit.ci] auto fixes from pre-commit.com hookscommit). That is the authoritative style/lint check for this repo — not tied to the wording of any intermediate commit title.Run Tests(GitHub Actionspytest): On PRs from forks, runs sometimes stay inaction_requireduntil a maintainer approves workflow execution for the contributor. That state is workflow approval, not “the isort commit failed.” Once approved, any real pytest failures would appear in the job logs.Contributor local run (optional reference):
pytestwas run in a disposable venv: 137 passed, 50 skipped, 1 failed in an unrelated test (test_tokenize_for_trainer_mask_len_last_turn_only/ HF model load). Your CI matrix may differ.Test plan (for reviewers)
python environments/community/arithmetic_chain/arithmetic_chain_server.py process --helpprocesswith your inference endpoint (JSONL + HTML output)serve+run-api+ trainer smoke testHappy to adjust naming, defaults, or reward shaping to match maintainer preferences.