Releases · IBM/eval-assist

Support for in context learning examples in the frontend
Downloading the test case as a notebook now generates evalassist code, not unitxt
Improved fix instance feature: new model with text difference visuallization

Full Changelog: v1.0.0...v1.0.2

Assets 2

29 Oct 21:39

martinscooper

v1.0.1

6f7ee7b

v1.0.1

What's Changed

Add system prompt to in-house judges by @martinscooper in #133
Update documentation by @mclanza in #132
fix: incorrect model used in borderline generation by @martinscooper in #134

More changes:

Correct self_consistency attribute type
Add idx to parser failures logging
JSON parser: sanitize output only if parsing fails
Add json object as the response format for litellm
Convert persona prompt into message format
Add more comments to sanitize_and_parse_json
use logger instead of root_pkg_logger
Format frontend code
Add more logs to the parser
Improve json sanitizer
First benchmark updates after replacing langchain
Update tests after sanitizer changes

New Contributors

@mclanza made their first contribution in #132

Full Changelog: v0.3.2...v1.0.1

Contributors

martinscooper and mclanza

Assets 2

27 Oct 16:37

martinscooper

v1.0.0

6f7ee7b

v1.0.0

!!Breaking changes were introduced in this version.

Changes:

Criteria's prediction_field was renamed to to_evaluate_field .
DirectInstance and PairwiseInstance models were removed and unified under the Instance model.
Instance now just holds a fields attribute. context, response and responses fields were removed.
The logic behind how the text to be evaluated and how the context is evaluated was re-designed. Now, the criteria defines the role of each of the instance fields.
Lanchain usage was heavily reduced and replaced by custom logic and the dependency will be removed soon.
EvalAssist in-house judges prompts were changes: the prompts are now using system prompts and the message format.
Lanchain's output fixer were replaced with custom logic.
Synthetic instance generation was improved.
More tests were added.

Assets 2

15 Oct 02:46

martinscooper

v0.3.2

870f7a0

v0.3.2

What's Changed

Remove async parser by @martinscooper in #131

Full Changelog: v0.3.1...v0.3.2

Contributors

martinscooper

Assets 2

14 Oct 17:39

martinscooper

v0.3.1

aee8d30

v0.3.1

What's Changed

Several improvements by @martinscooper in #130

Full Changelog: v0.3.0...v0.3.1

Contributors

martinscooper

Assets 2

09 Oct 18:11

martinscooper

v0.3.0

1ea2f4b

v0.3.0

What's Changed

Pairwise with tie by @martinscooper in #129

Important:

In-house DirectJudge's prompt was changes, so you may see slighly different (and better) results.
Some types changes in order to accomodate Pairwise tie as a possible option. Moreover, the pairwise comparison result type was updated to accomodate both global results (selected option and explanation) and detailed results (all vs all strategy)

Full Changelog: v0.2.4...v0.3.0

Contributors

martinscooper

Assets 2

30 Sep 23:08

martinscooper

v0.2.4

e60ee5b

v0.2.4

What's Changed

Async fix by @martinscooper in #128

Full Changelog: v0.2.3...v0.2.4

Contributors

martinscooper

Assets 2

Releases: IBM/eval-assist

v1.0.5

What's Changed

Contributors

Uh oh!

v1.0.4

What's Changed

Uh oh!

v1.0.3

What's Changed

Contributors

Uh oh!

v1.0.2

What's Changed

Uh oh!

v1.0.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.0

Uh oh!

v0.3.2

What's Changed

Contributors

Uh oh!

v0.3.1

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

Important:

Contributors

Uh oh!

v0.2.4

What's Changed

Contributors

Uh oh!