-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This task is to create a notebook and experiment with proposed prompt updates. It takes into consideration the findings from the extension logs and follows the process documented in the scoring notebook
Goal is to have a sense on if the proposed changes improve the translation.
Scope:
baseline vs control prompt and versioning (easy enough but good convention to establish)
scope of change proposed. Includes all items targeted ie. number/dates preservation, forbidden behaviors, etc
scoring lock (we have this but need to lock down to measure effectively)
create standard test set. We have a lot of this but it is important to make sure that we are consistently evaluating against the same set