FEAT: Anthropic Model-Written Evals Dataset #1170

0xm00n · 2025-11-05T18:01:33Z

Description

This PR adds support for the Anthropic model-written-evals dataset to PyRIT. The model-written-evals dataset contains 154 evaluation datasets designed to test LLM behaviors across 4 main categories: persona traits, sycophancy, advanced AI risks, and gender bias. The evaluations use language models to automatically generate test cases across multiple behavioral dimensions.

Dataset: https://github.com/anthropics/evals

Associated Paper: https://arxiv.org/abs/2212.09251

Work Completed

Implemented the fetch_anthropic_evals_dataset() function in anthropic_evals_dataset.py
Added unit tests in test_anthropic_evals_dataset.py (12 test cases)
Added integration test in test_fetch_datasets.py
Updated API documentation in api.rst
Registered function in pyrit/datasets/__init__.py

Related Issue

Contributes to issue #450

[pull] main from Azure:main

- Implement fetch_sorry_bench_dataset() for sorry-bench/sorry-bench-202503 on Hugging Face - Support filtering by 44 categories and 21 prompt style linguistic mutations - Default to base prompts only - Add unit tests (10 tests) - Add integration test - Update API documentation

…dation - Change prompt_styles (list) to prompt_style (string) for simpler API - Add input validation for categories and prompt_style - Add VALID_CATEGORIES and VALID_PROMPT_STYLES for input validation - Fix exception handling to preserve stack traces - Update tests to match new API (12 tests, all passing)

Implements support for Anthropic's model-written-evals dataset with 154 evaluation datasets across 4 categories. Changes: - Add fetch_anthropic_evals_dataset() function with category filtering - Support for persona, sycophancy, advanced-ai-risk, and winogenerated categories - Fetches directly from GitHub repository using API for file discovery - Add unit tests (12 tests) - Add integration test - Update API docs

…aset

pyrit/datasets/anthropic_evals_dataset.py

pyrit/datasets/sorry_bench_dataset.py

hannahwestra25 · 2025-11-07T16:49:41Z

Thanks for adding these! made a few stylistic suggestions!

Co-authored-by: hannahwestra25 <[email protected]>

romanlutz

Nice work! Looking forward to having this dataset fetcher in PyRIT 🙂

pyrit/datasets/anthropic_evals_dataset.py

0xm00n and others added 5 commits November 2, 2025 20:10

Merge pull request #34 from Azure/main

2a44aa3

[pull] main from Azure:main

Merge branch 'Azure:main' into feat/anthropic-model-written-evals-dat…

cee6223

…aset

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Show resolved Hide resolved

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed Nov 7, 2025

View reviewed changes

pyrit/datasets/sorry_bench_dataset.py Show resolved Hide resolved

0xm00n and others added 5 commits November 7, 2025 16:01

Update pyrit/datasets/anthropic_evals_dataset.py

3c3d27b

Co-authored-by: hannahwestra25 <[email protected]>

Update pyrit/datasets/sorry_bench_dataset.py

9f6bd77

Co-authored-by: hannahwestra25 <[email protected]>

get rid of redundant error block

246723e

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

ebc3345

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7ef1b1d

romanlutz reviewed Nov 12, 2025

View reviewed changes

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

pyrit/datasets/anthropic_evals_dataset.py Outdated Show resolved Hide resolved

0xm00n and others added 3 commits November 12, 2025 10:52

precommit fixes

569033f

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

c411244

Merge branch 'main' into feat/anthropic-model-written-evals-dataset

7f85e30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEAT: Anthropic Model-Written Evals Dataset #1170

FEAT: Anthropic Model-Written Evals Dataset #1170

Uh oh!

0xm00n commented Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannahwestra25 commented Nov 7, 2025

Uh oh!

romanlutz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FEAT: Anthropic Model-Written Evals Dataset #1170

Are you sure you want to change the base?

FEAT: Anthropic Model-Written Evals Dataset #1170

Uh oh!

Conversation

0xm00n commented Nov 5, 2025

Description

Work Completed

Related Issue

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hannahwestra25 commented Nov 7, 2025

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants