Releases: IBM/eval-assist
Releases · IBM/eval-assist
v1.0.5
v1.0.4
What's Changed
- Remove lanchain dependency
- Fix issule: fastapi module not found when importing judges
Full Changelog: v1.0.3...v1.0.4
v1.0.3
What's Changed
- Add EMNLP paper to the webpage by @martinscooper in #138
- Add configuration panel and unify downlaod actions by @martinscooper in #139
- Improve configuration and sample code generation by @martinscooper in #140
- Fixes by @martinscooper in #141
Full Changelog: v1.0.2...v1.0.3
v1.0.2
What's Changed
- Support for in context learning examples in the frontend
- Downloading the test case as a notebook now generates evalassist code, not unitxt
- Improved fix instance feature: new model with text difference visuallization
Full Changelog: v1.0.0...v1.0.2
v1.0.1
What's Changed
- Add system prompt to in-house judges by @martinscooper in #133
- Update documentation by @mclanza in #132
- fix: incorrect model used in borderline generation by @martinscooper in #134
More changes:
- Correct self_consistency attribute type
- Add idx to parser failures logging
- JSON parser: sanitize output only if parsing fails
- Add json object as the response format for litellm
- Convert persona prompt into message format
- Add more comments to sanitize_and_parse_json
- use logger instead of root_pkg_logger
- Format frontend code
- Add more logs to the parser
- Improve json sanitizer
- First benchmark updates after replacing langchain
- Update tests after sanitizer changes
New Contributors
Full Changelog: v0.3.2...v1.0.1
v1.0.0
!!Breaking changes were introduced in this version.
Changes:
- Criteria's
prediction_fieldwas renamed toto_evaluate_field. DirectInstanceandPairwiseInstancemodels were removed and unified under theInstancemodel.Instancenow just holds afieldsattribute.context,responseandresponsesfields were removed.- The logic behind how the text to be evaluated and how the context is evaluated was re-designed. Now, the criteria defines the role of each of the instance fields.
- Lanchain usage was heavily reduced and replaced by custom logic and the dependency will be removed soon.
- EvalAssist in-house judges prompts were changes: the prompts are now using system prompts and the message format.
- Lanchain's output fixer were replaced with custom logic.
- Synthetic instance generation was improved.
- More tests were added.
v0.3.2
v0.3.1
v0.3.0
What's Changed
- Pairwise with tie by @martinscooper in #129
Important:
- In-house DirectJudge's prompt was changes, so you may see slighly different (and better) results.
- Some types changes in order to accomodate Pairwise tie as a possible option. Moreover, the pairwise comparison result type was updated to accomodate both global results (selected option and explanation) and detailed results (all vs all strategy)
Full Changelog: v0.2.4...v0.3.0