Skip to content

Conversation

YoYoJa
Copy link
Contributor

@YoYoJa YoYoJa commented Oct 17, 2025

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Neehar Duvvuri and others added 2 commits October 17, 2025 16:43
@YoYoJa YoYoJa requested a review from a team as a code owner October 17, 2025 20:55
@Copilot Copilot AI review requested due to automatic review settings October 17, 2025 20:55
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Oct 17, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the Azure AI Evaluation SDK with OpenTelemetry integration for logging evaluation results to Application Insights, and updates the prompty evaluation system to return structured data including token usage statistics.

  • Adds support for logging evaluation results to Application Insights using OpenTelemetry
  • Updates prompty evaluation to return dictionaries with token usage and metadata instead of raw responses
  • Includes comprehensive test coverage for the new evaluation result conversion functionality

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
setup.py Adds opentelemetry extra dependency for Application Insights integration
_model_configurations.py Defines new TypedDict structures for evaluation results and App Insights configuration
_legacy/prompty/_utils.py Updates format_llm_response to return structured dictionary with token usage
_legacy/prompty/_prompty.py Changes AsyncPrompty return type to dict for structured response
_evaluators/_common/_base_prompty_eval.py Updates evaluation logic to handle new structured prompty response format
_evaluate/_evaluate.py Adds comprehensive evaluation result conversion and App Insights logging functionality
_common/rai_service.py Enhances content harm response parsing with token usage statistics
_aoai/*.py Updates AOAI grader classes with proper type attributes
tests/unittests/test_evaluate.py Adds comprehensive unit tests for evaluation result conversion
tests/e2etests/test_prompty_async.py Updates tests to handle new structured prompty response format
tests/e2etests/test_mass_evaluate.py Adjusts test assertions for increased evaluation metrics

Comment on lines +568 to +576
input_str = f"{json.dumps(inputs)}" if inputs else ""
if inputs and len(inputs) > 0:
sample_input_json = []
msg = ChatCompletionUserMessageParam(
role="user",
content=input_str,
)
sample_input_json.append(msg)
sample_input = json.dumps(sample_input_json)
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable sample_input is referenced on line 592 but is only assigned within the if block on lines 569-576. If inputs is empty or None, sample_input will be undefined when accessed, causing a NameError.

Copilot uses AI. Check for mistakes.

elif client_type == "pf_client":
batch_run_client = ProxyClient(user_agent=UserAgentSingleton().value)
# Ensure the absolute path is passed to pf.run, as relative path doesn't work with
# Ensure the absolute path is Re to pf.run, as relative path doesn't work with
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected incomplete comment from 'Re to pf.run' to 'passed to pf.run'.

Suggested change
# Ensure the absolute path is Re to pf.run, as relative path doesn't work with
# Ensure the absolute path is passed to pf.run, as relative path doesn't work with

Copilot uses AI. Check for mistakes.

result = await prompty(firstName="Bob", question="What is the capital of France?")
assert isinstance(result, ChatCompletion)
response: str = result.choices[0].message.content or ""
result = await prompty(question="What is the capital of France?", firstName="Barbra", lastName="Streisand")
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call uses firstName and lastName parameters but line 204 checks for 'Bob' in the response, which doesn't match the provided firstName 'Barbra'. This will cause the assertion to fail.

Copilot uses AI. Check for mistakes.

Comment on lines +1671 to +1678
results: EvaluationResult,
logger: logging.Logger,
eval_id: Optional[str] = None,
eval_run_id: Optional[str] = None,
evaluators: Dict[str, Union[Callable, AzureOpenAIGrader]] = None,
eval_run_summary: Optional[Dict[str, Any]] = None,
eval_meta_data: Optional[Dict[str, Any]] = None,
) -> None:
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is extremely long (500+ lines) and handles multiple complex responsibilities including result parsing, data transformation, and summary calculation. Consider breaking it into smaller, focused functions for better maintainability and testability.

Copilot uses AI. Check for mistakes.

YoYoJa and others added 24 commits October 17, 2025 14:01
* add eval result converter

* Add result converter

* update converter params to optional

* add eval meta data

* fix type

* remove useless file

* get eval meta data as input

* fix build errors

* remove useless import

* resolve comments

* update

* update comments

* fix checker failure

* add error msg and error code

* Surface evaluator error msg

* update UT

* fix usage

* make eval_meta_data optional

* remove useless lines

* update param name to add underscore

* parse updated annotation results

* update trace_id

* expose sample data for sdk evaluators

* update

* update

* fix UT

* fix tests

* fix test
* add eval result converter

* Add result converter

* update converter params to optional

* add eval meta data

* fix type

* remove useless file

* get eval meta data as input

* fix build errors

* remove useless import

* resolve comments

* update

* update comments

* fix checker failure

* add error msg and error code

* Surface evaluator error msg

* update UT

* fix usage

* make eval_meta_data optional

* remove useless lines

* update param name to add underscore

* parse updated annotation results

* update trace_id

* expose sample data for sdk evaluators

* update

* Fix column mapping bug for AOAI evaluators with custom data mapping (#43429)

* fix nesting bug for custom data mapping

* address comments

* remove extra code and fix test case

* run formatter

* use dumps

* Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091)

* Modify logic in PR (#43060) to include default message for messages with empty body and export logs

* Update CHANGELOG

* Update logic as per updated spec

* Addressed comments

* Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470)

Co-authored-by: Daniel Jurek <[email protected]>

* update

* fix UT

* fix tests

* Added Tests and Samples for Paginated Queries (#43472)

* added tests and samples for paginated queries

* Apply suggestions from code review

Co-authored-by: Copilot <[email protected]>

* added single partition pagination sample

---------

Co-authored-by: Andrew Mathew <[email protected]>
Co-authored-by: Copilot <[email protected]>

* [Test Proxy] Support AARCH64 platform (#43428)

* Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415)

this doc is outdated

* fix test

* [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230)

* code and test

* update pyproject.toml

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>

* [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476)

* code and test

* update changelog

* update changelog

* Update CHANGELOG.md

---------

Co-authored-by: azure-sdk <PythonSdkPipelines>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>

* Extend basic test for "project_client.agents" to do more operations (#43516)

* Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457)

* Updated validate pkg template to use packageInfo

* Fixed typo

* Fixed the right variable to use

* output debug log

* Fixed errors in expression evaluation

* removed debug code

* Fixed an issue in pipeline

* Updated condition for variable setting step

* Join paths of the script path

* Use join-path

* return from the function rather than exit

---------

Co-authored-by: ray chen <[email protected]>

* Reorder error and warning log line processing (#43456)

Co-authored-by: Wes Haggard <[email protected]>

* [App Configuration] - Release 1.7.2 (#43520)

* release 1.7.2

* update change log

* Modify CODEOWNERS for Azure SDK ownership changes (#43524)

Updated CODEOWNERS to reflect new ownership for Azure SDK components.

* Migrate Confidential Ledger library from swagger to typespec codegen (#42664)

* regen

* add default cert endpoint with tsp

* remove refs to old namespace

* update async operation patch

* fix operations patch

* fix header impl

* more header fixes

* revert receipt directory removal

* cspell

* regen certificates under correct namespace

* regen ledger client

* update namespace name

* revert certificate change

* update shared files after regen

* updates

* delete extra files

* cspell

* match return type to current behavior

* cspell

* mypy

* pylint

* update docs

* regen

* regen

* fix patch

* Revert "mypy"

This reverts commit 6351ead.

* add info in tsp_location.yaml

* regen

* update patch files

* update patch files

* fix patch

* update patch files

* regen

* update tsp-location.yaml

* generate certificate client

* update patch files

* fixes

* regen clients

* update pyproject.toml deps

* update assets

* regen

* revert test change

* nit

* fix test input

* regen with new model

* update tests

* update tests

* apiview props

* regen

* update tests

* update assets

* apiview props

* temp relative package updates

* fix name

* fix ledger ci (#43181)

* remove swagger

* remove extra configs

* wip revert package dep temporarily

* update readme

* fix config files

* Revert "wip revert package dep temporarily"

This reverts commit db553c4.

* move tests

* add identity samples

---------

Co-authored-by: catalinaperalta <[email protected]>

* rm certificate files

* update changelog

* misc fixes

* update shared reqs

* test

* pylint

---------

Co-authored-by: catalinaperalta <[email protected]>

* update scripts (#43527)

Co-authored-by: helen229 <[email protected]>

* [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448)

* Configurations:  'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release.

* update changelog

---------

Co-authored-by: ChenxiJiang333 <[email protected]>

* App Configuration Provider - Key Vault Refresh (#41882)

* Sync refresh changes

* Key Vault Refresh

* adding tests and fixing sync refresh

* Updating Async

* Fixed Async Tests

* Updated tests and change log

* Apply suggestions from code review

Co-authored-by: Copilot <[email protected]>

* Fixing merge issue

* Updating comments

* Updating secret refresh

* Update _azureappconfigurationproviderasync.py

* Fixing Optional Endpoint

* fix mypy issue

* fixing async test

* mixing merge

* fixing test after merge

* Update testcase.py

* Secret Provider Base

* removing unused imports

* updating exception

* updating resolve key vault references

* Review comments

* fixing tests

* tox updates

* Updating Tests

* Updating Async to be the same as sync

* Fixing formatting

* fixing tox and unneeded ""

* fixing tox items

* fix cspell + tests recording

* Update test_async_secret_provider.py

* Post Merge updates

* Move cache to shared code

* removed unneeded disabled

* Update Secret Provider

* Updating usage

* Update assets.json

* Updated to make secret refresh update dictionary

* removing _secret_version_cache

* Update assets.json

* Update _secret_provider_base.py

---------

Co-authored-by: Copilot <[email protected]>

* Increment package version after release of azure-appconfiguration (#43531)

* Patch `azure-template` back to `green`  (#43533)

* Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source`

* added brackets for sql query keyword value (#43525)

Co-authored-by: Andrew Mathew <[email protected]>

* update changelog (#43532)

Co-authored-by: catalinaperalta <[email protected]>

* App Config Provider - Provider Refactor (#43196)

* Code Cleanup

* Move validation to shared file

* Updating Header Check

* Update test_azureappconfigurationproviderbase.py

* moved async tests to aio folder

* post merge updates

---------

Co-authored-by: Ethan Winters <[email protected]>
Co-authored-by: rads-1996 <[email protected]>
Co-authored-by: Azure SDK Bot <[email protected]>
Co-authored-by: Daniel Jurek <[email protected]>
Co-authored-by: Andrew Mathew <[email protected]>
Co-authored-by: Andrew Mathew <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: McCoy Patiño <[email protected]>
Co-authored-by: Yuchao Yan <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: ChenxiJiang333 <[email protected]>
Co-authored-by: Darren Cohen <[email protected]>
Co-authored-by: ray chen <[email protected]>
Co-authored-by: Wes Haggard <[email protected]>
Co-authored-by: Zhiyuan Liang <[email protected]>
Co-authored-by: Matthew Metcalf <[email protected]>
Co-authored-by: catalinaperalta <[email protected]>
Co-authored-by: catalinaperalta <[email protected]>
Co-authored-by: helen229 <[email protected]>
Co-authored-by: Scott Beddall <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants