Skip to content

Prompt update evaluation notebook #66

@esmahoney

Description

@esmahoney

This task is to create a notebook and experiment with proposed prompt updates. It takes into consideration the findings from the extension logs and follows the process documented in the scoring notebook

Goal is to have a sense on if the proposed changes improve the translation.
Scope:
baseline vs control prompt and versioning (easy enough but good convention to establish)
scope of change proposed. Includes all items targeted ie. number/dates preservation, forbidden behaviors, etc
scoring lock (we have this but need to lock down to measure effectively)
create standard test set. We have a lot of this but it is important to make sure that we are consistently evaluating against the same set

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions