feat: Auto-rater allows results of previous checkers as context #57

aaronshim · 2025-09-25T21:29:15Z

Summary

This pull request introduces a new mechanism to allow individual rating functions to share their results with subsequent ratings. The primary motivation is to enable more context-aware evaluations, specifically allowing the LLM-based code quality auto-rater to be influenced by the results of earlier ratings, such as the safety-web security scan.

Key Changes

Introduced ratingsContext:
- A new ratingsContext object is created in the main rateGeneratedCode orchestration function (runner/ratings/rate-code.ts).
- As each rating function completes, its result is stored in the ratingsContext object, keyed by the rating's unique ID.
- This context object is now passed down to all rating execution functions (runPerBuildRating, runPerFileRating, runLlmBasedRating).
Created a Centralized RatingsContext Type:
- To improve code clarity and maintainability, a new type alias RatingsContext was created in runner/ratings/rating-types.ts.
- All relevant function signatures and type definitions have been updated to use this cleaner, centralized type instead of a verbose inline Record<> type.
Plumbed Context to Auto-Raters:
- The ratingsContext is now passed through codeQualityRating (runner/ratings/built-in-ratings/code-quality-rating.ts) to the underlying autoRateCode function (runner/ratings/autoraters/code-rater.ts).
- The context is also passed through autoRateFiles (runner/ratings/autoraters/rate-files.ts) to ensure it's available when the auto-rater is called from other entry points.
Implemented Context-Aware Prompting:
- As an example implementation of passing along a particular result into the context for a prompt generation -- in autoRateCode, the code now specifically checks the ratingsContext for the result of the safety-web rating.
- If the safety-web rating has executed, its entire result object is serialized to a JSON string.
- This JSON string is passed to the prompt renderer under the SAFETY_WEB_RESULTS_JSON variable, making the detailed security scan results available for interpolation in custom code rating prompts.

How to Use

A custom code rating prompt can now be configured in an environment's config.js and can access the security results like this:

You are a code quality evaluator. Please assess the following code.

A security scan was previously run on this code. Here are the results:
{{ SAFETY_WEB_RESULTS_JSON }}

Please take these security findings into account during your evaluation.

runner/ratings/autoraters/code-rater.ts

…ext.

devversion reviewed Sep 26, 2025

View reviewed changes

runner/ratings/autoraters/code-rater.ts Outdated Show resolved Hide resolved

runner/ratings/autoraters/code-rater.ts Outdated Show resolved Hide resolved

aaronshim added 5 commits September 26, 2025 23:24

Auto-rater allows some previous rater results to be passed in as cont…

9e2a766

…ext.

Fix formatting

575af8a

Rename ratingsContext to ratingsResult

0e752f9

Formatting

0f8f6f9

Comment explaining default rating order.

97ccb79

aaronshim force-pushed the safety-web-grader branch from b50a26e to 97ccb79 Compare September 26, 2025 23:30

aaronshim requested a review from devversion September 26, 2025 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Auto-rater allows results of previous checkers as context #57

feat: Auto-rater allows results of previous checkers as context #57

Uh oh!

aaronshim commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: Auto-rater allows results of previous checkers as context #57

Are you sure you want to change the base?

feat: Auto-rater allows results of previous checkers as context #57

Uh oh!

Conversation

aaronshim commented Sep 25, 2025

Summary

Key Changes

How to Use

Uh oh!

Uh oh!

Uh oh!

Uh oh!