Skip to content

Commit b500eac

Browse files
committed
[Maxim] Human eval docs
1 parent a538622 commit b500eac

File tree

5 files changed

+3
-0
lines changed

5 files changed

+3
-0
lines changed
88.3 KB
Loading
-180 KB
Loading
73.2 KB
Loading
-35.7 KB
Loading

offline-evals/via-ui/prompts/human-annotation.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ The human evaluation set requires the following choices
4848
3. For email based evaluation requests to SMEs or external annotators, we make it easy to send only required entries using a sampling rate. Sampling rate can be defined in 2 ways:
4949
- Percentage of total entries - This is relevant for large datasets where in it's not possible to manually rate all entries
5050
- Custom logic - This helps send entries of a particular type to raters. Eg. Ratings which have a low score on the Bias metric (auto eval). By defining these rules, you can make sure to use your SME's time on the most relevant cases.
51+
4. Use the dropdown to select any columns from the dataset. The selected fields will be displayed on the Human Evaluation dashboard and included in the evaluation emails sent to raters.
5152

5253
![Human annotation set up](/images/docs/evaluate/how-to/evaluate-prompts/human-annotation-pipeline/set-up-human-annotation.png)
5354

@@ -82,6 +83,8 @@ Human raters can go through the query, retrieved context, output and expected ou
8283

8384
Once all entries are completed by a rater, the summary scores and pass/fail results for the human ratings are shown along side all other auto evaluation results in the test run report. The human annotation section will show a `Completed` status next to this rater's email. To view the detailed ratings by a particular individual, click the `View details` button and go through the table provided.
8485

86+
Analyze ratings and comments from human annotators. View rating overviews, read detailed feedback, and review rewritten outputs. Filter by specific annotators or apply custom filters to focus your analysis.
87+
8588
![Human review details](/images/docs/evaluate/how-to/evaluate-prompts/human-annotation-pipeline/review-details.png)
8689

8790
If there are particular cases where you would like to use the human corrected output to build ground truth in your datasets, you can use the [data curation flows.](/library/datasets/curate-datasets)

0 commit comments

Comments
 (0)