Skip to content

Commit cfe150d

Browse files
tooling updates, ty type checking (#49)
* tooling updates, ty type checking * only install dev dep group * ignore animation script
1 parent 59a2726 commit cfe150d

82 files changed

Lines changed: 385 additions & 683 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/dependency-update.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ The process to go through is:
44
2. For each dependency, go to its pypi release history site. For example, for the openai package that is: https://pypi.org/project/openai/#history Get the latest release version.
55
3. Now bump the dependency in pyproject.toml. For example, if the current version in the pyproject.toml is `>=1.05,<2.0`, but on Pypi the latest version is 1.11, change the dependency to `>=1.11,<2.0`
66
4. If you notice a major version upgrade (ex v2 to v3), let the user know of each of those cases, but do not make the change yourself.
7-
5. Run `uv sync --all-extras --all-groups` to update the lock file.
7+
5. Make sure all the checks still pass by running `uv run ruff format && uv run ruff check --fix && uv run ty check` from the root.
8+
6. Run `uv sync --all-extras --all-groups` to update the lock file.

.claude/settings.json

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,40 @@
22
"permissions": {
33
"allow": [
44
"mcp__ide__getDiagnostics",
5-
"Bash(make:*)",
6-
"Bash(tree:*)",
7-
"Bash(mkdir:*)",
8-
"Bash(cp:*)",
95
"WebSearch",
10-
"WebFetch"
6+
"WebFetch",
7+
"Bash(cp:*)",
8+
"Bash(find:*)",
9+
"Bash(mkdir:*)",
10+
"Bash(ls:*)",
11+
"Bash(xargs:*)",
12+
"Bash(tree:*)",
13+
"Bash(tee:*)",
14+
"Bash(grep:*)",
15+
"Bash(git clone:*)",
16+
"Bash(uv build:*)",
17+
"Bash(uv sync:*)",
18+
"Bash(uv run ruff format:*)",
19+
"Bash(uv run ruff check:*)",
20+
"Bash(uv run ty:*)",
21+
"Bash(gh release list:*)",
22+
"Bash(gh release view:*)"
1123
],
1224
"deny": [],
1325
"ask": []
1426
},
1527
"hooks": {
1628
"Stop": [
1729
{
18-
"matcher": "Edit|MultiEdit|Write",
30+
"matcher": "Edit|Write",
1931
"hooks": [
2032
{
2133
"type": "command",
22-
"command": "make check"
34+
"command": "uv run ruff check --fix && uv run ruff format && uv run ty check",
35+
"timeout": 10
2336
}
2437
]
2538
}
2639
]
2740
}
28-
}
41+
}

.github/workflows/ci.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
quality:
11+
name: Code Quality
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
- name: Checkout code
16+
uses: actions/checkout@v6
17+
18+
- name: Install uv
19+
uses: astral-sh/setup-uv@v7
20+
with:
21+
enable-cache: true
22+
23+
- name: Set up Python
24+
run: uv python install
25+
26+
- name: Install dependencies
27+
run: uv sync --frozen --group dev
28+
29+
- name: Check formatting
30+
run: uv run ruff format --check
31+
32+
- name: Lint
33+
run: uv run ruff check
34+
35+
- name: Type check
36+
run: uv run ty check

.github/workflows/python-tests.yaml

Lines changed: 0 additions & 41 deletions
This file was deleted.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ media/
55
.comparison_results/
66
data/tasks/arc-agi-2-*/
77
data/tasks/frontier-science-*/
8+
ai_working/
89

910
# Benchmarking generated reports
1011
benchmark_report.html

.pre-commit-config.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
repos:
2+
# uv lock file management
3+
- repo: https://github.com/astral-sh/uv-pre-commit
4+
rev: 0.10.0
5+
hooks:
6+
- id: uv-lock
7+
8+
# Ruff linting and formatting
9+
- repo: https://github.com/astral-sh/ruff-pre-commit
10+
rev: v0.15.0
11+
hooks:
12+
# Run the linter with --fix (must be before formatter)
13+
- id: ruff-check
14+
args: [--fix]
15+
# Run the formatter
16+
- id: ruff-format
17+
18+
# Type checking with ty
19+
- repo: local
20+
hooks:
21+
- id: ty-check
22+
name: ty type check
23+
entry: uv run ty check --output-format concise --no-progress
24+
language: system
25+
types: [python]
26+
pass_filenames: false

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

.vscode/settings.json

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,13 @@
77
},
88
"editor.defaultFormatter": "charliermarsh.ruff"
99
},
10+
"notebook.formatOnSave.enabled": true,
11+
"notebook.codeActionsOnSave": {
12+
"notebook.source.fixAll": "explicit",
13+
"notebook.source.organizeImports": "explicit"
14+
},
15+
"python.languageServer": "None",
1016
"python.testing.unittestEnabled": false,
1117
"python.testing.pytestEnabled": true,
1218
"markdown.extension.orderedList.marker": "one"
13-
}
19+
}

AGENTS.md

Lines changed: 42 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,101 +1,52 @@
1-
# Project README
2-
@README.md
3-
4-
5-
# Project Dependencies
6-
The dependencies are defined in the pyproject.toml file:
7-
@pyproject.toml
8-
9-
You must only use these dependencies in the code you write. If you need to add a new dependency, confirm with the user first before adding it.
10-
11-
## Documentation
12-
- Some documentation for common dependencies and tooling, such as `uv` is available in the `ai_context/` folder
13-
14-
15-
# Core Design Principles
16-
### Ruthless Simplicity
17-
- KISS principle taken to heart: Keep everything as simple as possible.
18-
- Minimize abstractions: Every layer of abstraction must justify its existence
19-
- Start minimal, grow as needed: Begin with the simplest implementation that meets current needs
20-
- Avoid future-proofing: Don't build for hypothetical future requirements
21-
- Question everything: Regularly challenge complexity in the codebase
22-
23-
### Library Usage Philosophy
24-
- Use libraries as intended: Minimal wrappers around external libraries
25-
- Direct integration: Avoid unnecessary adapter layers
26-
- Selective dependency: Add dependencies only when they provide substantial value
27-
- Understand what you import: No black-box dependencies
28-
29-
### Testing Strategy
30-
- Emphasis on integration and end-to-end tests
31-
- Manual testability as a design goal
32-
- Focus on critical path testing initially
33-
- Add unit tests for complex logic and edge cases
34-
35-
## Areas to Embrace Complexity
36-
Some areas justify additional complexity:
37-
1. Security: Never compromise on security fundamentals
38-
2. Data integrity: Ensure data consistency and reliability
39-
3. Core user experience: Make the primary user flows smooth and reliable
40-
4. Error visibility: Make problems obvious and diagnosable
41-
42-
## Areas to Aggressively Simplify
43-
Push for extreme simplicity in these areas:
44-
1. Internal abstractions: Minimize layers between components
45-
2. Generic "future-proof" code: Resist solving non-existent problems
46-
3. Edge case handling: Handle the common cases well first
47-
4. Framework usage: Use only what you need from frameworks
48-
5. State management: Keep state simple and explicit
49-
50-
## Remember
51-
- It's easier to add complexity later than to remove it
52-
- Code you don't write has no bugs
53-
- Favor clarity over cleverness
54-
- The best code is often the simplest
55-
56-
57-
# Development Guidelines
58-
## Other Guidelines
59-
- Do not use emojis unless asked.
60-
- Do not include excessive print and logging statements.
1+
# General Instructions
2+
- This is a production-grade Python package using `uv` as the package and project manager. You must *always* follow best open-source Python practices.
3+
- Shortcuts are not appropriate. When in doubt, you must work with the user for guidance.
4+
- Any documentation you write, including in the README.md, should be clear, concise, and accurate like the official documentation of other production-grade Python packages.
5+
- Make sure any comments in code are necessary. A necessary comment captures intent that cannot be encoded in names, types, or structure. Comments should be reserved for the "why", only used to record rationale, trade-offs, links to specs/papers, or non-obvious domain insights. They should add signal that code cannot.
6+
- The current code in the package should be treated as an example of high quality code. Make sure to follow its style and tackle issues in similar ways where appropriate.
7+
- Do not run tests automatically unless asked since they take a while.
8+
- Don't generate characters that a user could not type on a standard keyboard like fancy arrows.
9+
- Anything is possible. Do not blame external factors after something doesn't work on the first try. Instead, investigate and test assumptions through debugging through first principles.
10+
- When writing documentation
11+
- Keep it very concise
12+
- No emojis or em dashes.
13+
- Documentation should be written exactly like it is for production-grade, polished, open-source Python packages.
6114
- You should only use the dependencies in the provided dependency files. If you need to add a new one, ask first.
62-
- Do not automatically run scripts, tests, or move/rename/delete files. Ask the user to do these tasks.
63-
- Read in the entirety of files to get the full context
64-
- Include `# Copyright (c) Microsoft. All rights reserved` at the top of each Python file (don't worry about __init__.py files)
65-
- When writing documentation, you write as if you were a professional and experienced developer making their code available publicly on GitHub.
66-
- Never add back in code or comments that the user has removed or changed.
67-
- llms.txt is auto generated by `.github/workflows/generate-llms-txt.yaml`. Do not edit it directly.
15+
- Never add back in code or comments that someone else has removed or changed since you last viewed it.
6816

69-
## Python Development Rules
70-
- This project uses Python >=3.11, uv as the package and project manager, and Ruff as a linter and code formatter.
17+
# Python Development Instructions
18+
- `ty` by Astral is used for type checking. Always add appropriate type hints such that the code would pass ty's type check.
7119
- Follow the Google Python Style Guide.
72-
- Instead of importing `Optional` from typing, using the `| `syntax.
73-
- Always add appropriate type hints such that the code would pass pyright's type check.
74-
- For type hints, use `list`, not `List`. For example, if the variable is `[{"name": "Jane", "age": 32}, {"name": "Amy", "age": 28}]` the type hint should be `list[dict[str. str | int]]`
75-
- Always prefer pathlib for dealing with files. Use `Path.open` instead of `open`.
20+
- After each code change, checks are automatically run. Fix any issues that arise.
21+
- **IMPORTANT**: The checks will remove any unused imports after you make an edit to a file. So if you need to use a new import, be sure to use it FIRST (or do your edits at the same time) or else it will be automatically removed. DO NOT use local imports to get around this.
22+
- At this stage of the project, NEVER add imports to __init__.py files. Leave them empty unless absolutely necessary.
23+
- Always prefer pathlib for dealing with files. Use `Path.open` instead of `open`.
7624
- When using pathlib, **always** Use `.parents[i]` syntax to go up directories instead of using `.parent` multiple times.
77-
- When writing multi-line strings, use `"""` instead of using string concatenation. Use `\` to break up long lines in appropriate places.
78-
- When creating template strings, prefer to use python-liquid. This is the basic sample code:
79-
```python
80-
from liquid import render
81-
82-
print(render("Hello, {{ you }}!", you="World"))
83-
# Hello, World!
84-
```
8525
- When writing tests, use pytest and pytest-asyncio.
8626
- The pyproject.toml is already configured so you do not need to add the `@pytest.mark.asyncio` decorator.
87-
- Prefer to use loguru instead of logging
88-
- Follow Ruff best practices such as:
89-
- Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
90-
- Do not use relative imports.
91-
- Use dotenv to load environment variables for local development. Assume we have a `.env` file
92-
- Since this is structured as Python package, you should not put `#!/usr/bin/env python` at the top of scripts as that is redundant.
27+
- Prefer using loguru for logging instead of the built-in logging module. Do not add logging unless requested.
28+
- NEVER use `# type: ignore`. It is better to leave the issue and have the user work with you to fix it.
29+
- Don't put types in quotes unless it is absolutely necessary to avoid circular imports and forward references.
30+
- When writing multi-line strings, use `"""` instead of using string concatenation. Use `\` to break up long lines in appropriate places.
31+
32+
- When constructing long strings like prompts for LLMs, use `python-liquid`'s `render` function:
33+
```python
34+
from liquid import render
35+
36+
print(render("Hello, {{ you }}!", you="World"))
37+
# Hello, World!
38+
```
39+
- Include `# Copyright (c) Microsoft. All rights reserved` at the top of each Python file (don't worry about __init__.py files)
9340
- For Windows compatibility, use encoding='utf-8' when handling files, unless required otherwise.
9441
- Leave __init__.py files empty unless there is a specific reason to add code there during this early stage of the project.
42+
- When running scripts with `uv run` make sure you are in the top level of this Python project. All you need to do is `uv run <script relative path>`. You also do not need to do `uv run python`, the `python` is unnecessary.
9543

96-
# Your workflow
97-
- After making changes, `make check` will be called. If there are any linter or type errors, fix them.
98-
- `make check` will remove any unused imports. So if you need to use a new import, be sure to use it as you import it or else it will automatically removed.
44+
# Key Files
9945

100-
## Running Code
101-
- When running scripts with `uv run` make sure you are in the top level of this Python project. All you need to do is `uv run <script relative path>`. You also do not need to do `uv run python`, the `python` is unnecessary.
46+
@README.md
47+
48+
@pyproject.toml
49+
50+
@docs/BENCHMARKING.md
51+
52+
@src/eval_recipes/benchmarking/schemas.py

Makefile

Lines changed: 0 additions & 17 deletions
This file was deleted.

0 commit comments

Comments
 (0)