-
Couldn't load subscription status.
- Fork 17
Add support for evals via Inspect AI #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Carson Sievert <[email protected]>
Updates Chat.to_solver to initialize the InspectAI state with existing chat history and system prompt if present. Adds a test to verify that chat history and system prompt are preserved when creating a solver.
…into add-chat-solver
Expanded Chat.to_solver to translate rich content (text, images) and handle tool calls for compatibility with InspectAI's message format. Added helper functions for content translation and tool call extraction. Updated tests to cover scenarios with mixed content and tool calls in chat history.
Simplifies content translation logic in Chat class, streamlining handling of different content types and tool calls.
Moved InspectAI translation helpers from _chat.py to new _inspect.py module.
Co-authored-by: Carson Sievert <[email protected]>
Reorganized and cleaned up imports in chatlas/_inspect.py and tests/test_inspect.py.
Added integration tests to verify ChatOpenAI evaluation with inspect_ai, including geography, simple QA, and tool usage scenarios. Also updated example usage in Chat docstring for clarity.
d3288af to
8afe02a
Compare
5aedaf7 to
31d8e46
Compare
31d8e46 to
74c6323
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds integration between chatlas and InspectAI evaluation framework, enabling users to evaluate chat models using InspectAI's tooling. The integration provides methods to convert chat instances into InspectAI solvers and export chat histories as evaluation datasets.
Key changes:
- New
Chat.to_solver()method to convert chat instances into InspectAI solvers for evaluations - New
Chat.export_eval()method to export chat histories as JSONL evaluation datasets - Content translation layer between chatlas and InspectAI formats
- Comprehensive test suite covering integration, content translation, and edge cases
- Documentation guide explaining how to use evaluations with chatlas
Reviewed Changes
Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| chatlas/_inspect.py | New module implementing bidirectional translation between chatlas and InspectAI data structures |
| chatlas/_chat.py | Adds to_solver() and export_eval() methods to Chat class, plus import of time module |
| chatlas/_turn.py | Adds to_inspect_messages() helper method to Turn class |
| tests/test_inspect.py | Comprehensive test suite covering InspectAI integration scenarios |
| pyproject.toml | Adds optional 'eval' extra dependency for inspect-ai |
| docs/misc/evals.qmd | New documentation page explaining evaluation workflows |
| docs/_sidebar.yml | Updates sidebar with new provider references and reorganized sections |
| docs/_quarto.yml | Adds evals documentation to site navigation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
This PR adds 2
Chatmethods:.export_eval()and.to_solver(), which makes it easy to export a chat session to an Inspect AI eval and use aChatas a solver.To learn more about how this works, visit the new articles on evals -- https://posit-dev.github.io/chatlas/misc/evals.html
Closes #178