Skip to content

Conversation

aaronshim
Copy link
Collaborator

Summary

This pull request introduces a new mechanism to allow individual rating functions to share their results with subsequent ratings. The primary motivation is to enable more context-aware evaluations, specifically allowing the LLM-based code quality auto-rater to be influenced by the results of earlier ratings, such as the safety-web security scan.

Key Changes

  1. Introduced ratingsContext:

    • A new ratingsContext object is created in the main rateGeneratedCode orchestration function (runner/ratings/rate-code.ts).
    • As each rating function completes, its result is stored in the ratingsContext object, keyed by the rating's unique ID.
    • This context object is now passed down to all rating execution functions (runPerBuildRating, runPerFileRating, runLlmBasedRating).
  2. Created a Centralized RatingsContext Type:

    • To improve code clarity and maintainability, a new type alias RatingsContext was created in runner/ratings/rating-types.ts.
    • All relevant function signatures and type definitions have been updated to use this cleaner, centralized type instead of a verbose inline Record<> type.
  3. Plumbed Context to Auto-Raters:

    • The ratingsContext is now passed through codeQualityRating (runner/ratings/built-in-ratings/code-quality-rating.ts) to the underlying autoRateCode function (runner/ratings/autoraters/code-rater.ts).
    • The context is also passed through autoRateFiles (runner/ratings/autoraters/rate-files.ts) to ensure it's available when the auto-rater is called from other entry points.
  4. Implemented Context-Aware Prompting:

    • As an example implementation of passing along a particular result into the context for a prompt generation -- in autoRateCode, the code now specifically checks the ratingsContext for the result of the safety-web rating.
    • If the safety-web rating has executed, its entire result object is serialized to a JSON string.
    • This JSON string is passed to the prompt renderer under the SAFETY_WEB_RESULTS_JSON variable, making the detailed security scan results available for interpolation in custom code rating prompts.

How to Use

A custom code rating prompt can now be configured in an environment's config.js and can access the security results like this:

You are a code quality evaluator. Please assess the following code.

A security scan was previously run on this code. Here are the results:
{{ SAFETY_WEB_RESULTS_JSON }}

Please take these security findings into account during your evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants