Skip to content

Conversation

@karangattu
Copy link
Contributor

@karangattu karangattu commented Sep 25, 2025

This PR adds 2 Chat methods: .export_eval() and .to_solver(), which makes it easy to export a chat session to an Inspect AI eval and use a Chat as a solver.

To learn more about how this works, visit the new articles on evals -- https://posit-dev.github.io/chatlas/misc/evals.html

Closes #178

karangattu and others added 6 commits September 29, 2025 21:17
Co-authored-by: Carson Sievert <[email protected]>
Updates Chat.to_solver to initialize the InspectAI state with existing chat history and system prompt if present. Adds a test to verify that chat history and system prompt are preserved when creating a solver.
Expanded Chat.to_solver to translate rich content (text, images) and handle tool calls for compatibility with InspectAI's message format. Added helper functions for content translation and tool call extraction. Updated tests to cover scenarios with mixed content and tool calls in chat history.
Simplifies content translation logic in Chat class, streamlining handling of different content types and tool calls.
Moved InspectAI translation helpers from _chat.py to new _inspect.py module.
Added integration tests to verify ChatOpenAI evaluation with inspect_ai, including geography, simple QA, and tool usage scenarios. Also updated example usage in Chat docstring for clarity.
@cpsievert cpsievert force-pushed the add-chat-solver branch 2 times, most recently from 5aedaf7 to 31d8e46 Compare October 28, 2025 15:25
@cpsievert cpsievert marked this pull request as ready for review October 28, 2025 21:40
@cpsievert cpsievert requested a review from Copilot October 28, 2025 21:40
@cpsievert cpsievert changed the title Add ability chat instance to use it as chat.to_solver() Add integration with Inspect AI Oct 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds integration between chatlas and InspectAI evaluation framework, enabling users to evaluate chat models using InspectAI's tooling. The integration provides methods to convert chat instances into InspectAI solvers and export chat histories as evaluation datasets.

Key changes:

  • New Chat.to_solver() method to convert chat instances into InspectAI solvers for evaluations
  • New Chat.export_eval() method to export chat histories as JSONL evaluation datasets
  • Content translation layer between chatlas and InspectAI formats
  • Comprehensive test suite covering integration, content translation, and edge cases
  • Documentation guide explaining how to use evaluations with chatlas

Reviewed Changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
chatlas/_inspect.py New module implementing bidirectional translation between chatlas and InspectAI data structures
chatlas/_chat.py Adds to_solver() and export_eval() methods to Chat class, plus import of time module
chatlas/_turn.py Adds to_inspect_messages() helper method to Turn class
tests/test_inspect.py Comprehensive test suite covering InspectAI integration scenarios
pyproject.toml Adds optional 'eval' extra dependency for inspect-ai
docs/misc/evals.qmd New documentation page explaining evaluation workflows
docs/_sidebar.yml Updates sidebar with new provider references and reorganized sections
docs/_quarto.yml Adds evals documentation to site navigation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cpsievert cpsievert changed the title Add integration with Inspect AI Add support for evals via Inspect AI Oct 28, 2025
@cpsievert cpsievert merged commit bc9764d into main Oct 28, 2025
2 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Integrate inspect-ai to evaluate chatlas responses

3 participants