-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[Just for running test]Jessli/convert #43482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments
…into needuv/structured-results-otel-logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the Azure AI Evaluation SDK with OpenTelemetry integration for logging evaluation results to Application Insights, and updates the prompty evaluation system to return structured data including token usage statistics.
- Adds support for logging evaluation results to Application Insights using OpenTelemetry
- Updates prompty evaluation to return dictionaries with token usage and metadata instead of raw responses
- Includes comprehensive test coverage for the new evaluation result conversion functionality
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
setup.py |
Adds opentelemetry extra dependency for Application Insights integration |
_model_configurations.py |
Defines new TypedDict structures for evaluation results and App Insights configuration |
_legacy/prompty/_utils.py |
Updates format_llm_response to return structured dictionary with token usage |
_legacy/prompty/_prompty.py |
Changes AsyncPrompty return type to dict for structured response |
_evaluators/_common/_base_prompty_eval.py |
Updates evaluation logic to handle new structured prompty response format |
_evaluate/_evaluate.py |
Adds comprehensive evaluation result conversion and App Insights logging functionality |
_common/rai_service.py |
Enhances content harm response parsing with token usage statistics |
_aoai/*.py |
Updates AOAI grader classes with proper type attributes |
tests/unittests/test_evaluate.py |
Adds comprehensive unit tests for evaluation result conversion |
tests/e2etests/test_prompty_async.py |
Updates tests to handle new structured prompty response format |
tests/e2etests/test_mass_evaluate.py |
Adjusts test assertions for increased evaluation metrics |
input_str = f"{json.dumps(inputs)}" if inputs else "" | ||
if inputs and len(inputs) > 0: | ||
sample_input_json = [] | ||
msg = ChatCompletionUserMessageParam( | ||
role="user", | ||
content=input_str, | ||
) | ||
sample_input_json.append(msg) | ||
sample_input = json.dumps(sample_input_json) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable sample_input
is referenced on line 592 but is only assigned within the if block on lines 569-576. If inputs
is empty or None, sample_input
will be undefined when accessed, causing a NameError.
Copilot uses AI. Check for mistakes.
elif client_type == "pf_client": | ||
batch_run_client = ProxyClient(user_agent=UserAgentSingleton().value) | ||
# Ensure the absolute path is passed to pf.run, as relative path doesn't work with | ||
# Ensure the absolute path is Re to pf.run, as relative path doesn't work with |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected incomplete comment from 'Re to pf.run' to 'passed to pf.run'.
# Ensure the absolute path is Re to pf.run, as relative path doesn't work with | |
# Ensure the absolute path is passed to pf.run, as relative path doesn't work with |
Copilot uses AI. Check for mistakes.
result = await prompty(firstName="Bob", question="What is the capital of France?") | ||
assert isinstance(result, ChatCompletion) | ||
response: str = result.choices[0].message.content or "" | ||
result = await prompty(question="What is the capital of France?", firstName="Barbra", lastName="Streisand") |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call uses firstName
and lastName
parameters but line 204 checks for 'Bob' in the response, which doesn't match the provided firstName 'Barbra'. This will cause the assertion to fail.
Copilot uses AI. Check for mistakes.
results: EvaluationResult, | ||
logger: logging.Logger, | ||
eval_id: Optional[str] = None, | ||
eval_run_id: Optional[str] = None, | ||
evaluators: Dict[str, Union[Callable, AzureOpenAIGrader]] = None, | ||
eval_run_summary: Optional[Dict[str, Any]] = None, | ||
eval_meta_data: Optional[Dict[str, Any]] = None, | ||
) -> None: |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is extremely long (500+ lines) and handles multiple complex responsibilities including result parsing, data transformation, and summary calculation. Consider breaking it into smaller, focused functions for better maintainability and testability.
Copilot uses AI. Check for mistakes.
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * update * fix UT * fix tests * fix test
…sion/ from app insights config
* add eval result converter * Add result converter * update converter params to optional * add eval meta data * fix type * remove useless file * get eval meta data as input * fix build errors * remove useless import * resolve comments * update * update comments * fix checker failure * add error msg and error code * Surface evaluator error msg * update UT * fix usage * make eval_meta_data optional * remove useless lines * update param name to add underscore * parse updated annotation results * update trace_id * expose sample data for sdk evaluators * update * Fix column mapping bug for AOAI evaluators with custom data mapping (#43429) * fix nesting bug for custom data mapping * address comments * remove extra code and fix test case * run formatter * use dumps * Modify logic for message body on Microsoft.ApplicationInsights.MessageData to include default message for messages with empty body and export logs (#43091) * Modify logic in PR (#43060) to include default message for messages with empty body and export logs * Update CHANGELOG * Update logic as per updated spec * Addressed comments * Set-VcpkgWriteModeCache -- add token timeout param for cmake generate's that exceed 1 hour (this can happen in C++ API View) (#43470) Co-authored-by: Daniel Jurek <[email protected]> * update * fix UT * fix tests * Added Tests and Samples for Paginated Queries (#43472) * added tests and samples for paginated queries * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> * added single partition pagination sample --------- Co-authored-by: Andrew Mathew <[email protected]> Co-authored-by: Copilot <[email protected]> * [Test Proxy] Support AARCH64 platform (#43428) * Delete doc/dev/how_to_request_a_feature_in_sdk.md (#43415) this doc is outdated * fix test * [AutoRelease] t2-iothub-2025-10-03-03336(can only be merged by SDK owner) (#43230) * code and test * update pyproject.toml --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <[email protected]> * [AutoRelease] t2-redisenterprise-2025-10-17-18412(can only be merged by SDK owner) (#43476) * code and test * update changelog * update changelog * Update CHANGELOG.md --------- Co-authored-by: azure-sdk <PythonSdkPipelines> Co-authored-by: ChenxiJiang333 <[email protected]> Co-authored-by: ChenxiJiang333 <[email protected]> * Extend basic test for "project_client.agents" to do more operations (#43516) * Sync eng/common directory with azure-sdk-tools for PR 12478 (#43457) * Updated validate pkg template to use packageInfo * Fixed typo * Fixed the right variable to use * output debug log * Fixed errors in expression evaluation * removed debug code * Fixed an issue in pipeline * Updated condition for variable setting step * Join paths of the script path * Use join-path * return from the function rather than exit --------- Co-authored-by: ray chen <[email protected]> * Reorder error and warning log line processing (#43456) Co-authored-by: Wes Haggard <[email protected]> * [App Configuration] - Release 1.7.2 (#43520) * release 1.7.2 * update change log * Modify CODEOWNERS for Azure SDK ownership changes (#43524) Updated CODEOWNERS to reflect new ownership for Azure SDK components. * Migrate Confidential Ledger library from swagger to typespec codegen (#42664) * regen * add default cert endpoint with tsp * remove refs to old namespace * update async operation patch * fix operations patch * fix header impl * more header fixes * revert receipt directory removal * cspell * regen certificates under correct namespace * regen ledger client * update namespace name * revert certificate change * update shared files after regen * updates * delete extra files * cspell * match return type to current behavior * cspell * mypy * pylint * update docs * regen * regen * fix patch * Revert "mypy" This reverts commit 6351ead. * add info in tsp_location.yaml * regen * update patch files * update patch files * fix patch * update patch files * regen * update tsp-location.yaml * generate certificate client * update patch files * fixes * regen clients * update pyproject.toml deps * update assets * regen * revert test change * nit * fix test input * regen with new model * update tests * update tests * apiview props * regen * update tests * update assets * apiview props * temp relative package updates * fix name * fix ledger ci (#43181) * remove swagger * remove extra configs * wip revert package dep temporarily * update readme * fix config files * Revert "wip revert package dep temporarily" This reverts commit db553c4. * move tests * add identity samples --------- Co-authored-by: catalinaperalta <[email protected]> * rm certificate files * update changelog * misc fixes * update shared reqs * test * pylint --------- Co-authored-by: catalinaperalta <[email protected]> * update scripts (#43527) Co-authored-by: helen229 <[email protected]> * [AutoPR azure-mgmt-mongocluster]-generated-from-SDK Generation - Python-5459673 (#43448) * Configurations: 'specification/mongocluster/resource-manager/Microsoft.DocumentDB/MongoCluster/tspconfig.yaml', API Version: 2025-09-01, SDK Release Type: stable, and CommitSHA: 'c5601446fc65494f18157aecbcc79cebcfbab1fb' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5459673 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * update changelog --------- Co-authored-by: ChenxiJiang333 <[email protected]> * App Configuration Provider - Key Vault Refresh (#41882) * Sync refresh changes * Key Vault Refresh * adding tests and fixing sync refresh * Updating Async * Fixed Async Tests * Updated tests and change log * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> * Fixing merge issue * Updating comments * Updating secret refresh * Update _azureappconfigurationproviderasync.py * Fixing Optional Endpoint * fix mypy issue * fixing async test * mixing merge * fixing test after merge * Update testcase.py * Secret Provider Base * removing unused imports * updating exception * updating resolve key vault references * Review comments * fixing tests * tox updates * Updating Tests * Updating Async to be the same as sync * Fixing formatting * fixing tox and unneeded "" * fixing tox items * fix cspell + tests recording * Update test_async_secret_provider.py * Post Merge updates * Move cache to shared code * removed unneeded disabled * Update Secret Provider * Updating usage * Update assets.json * Updated to make secret refresh update dictionary * removing _secret_version_cache * Update assets.json * Update _secret_provider_base.py --------- Co-authored-by: Copilot <[email protected]> * Increment package version after release of azure-appconfiguration (#43531) * Patch `azure-template` back to `green` (#43533) * Update sdk/template/azure-template/pyproject.toml to use `repository` instead of `source` * added brackets for sql query keyword value (#43525) Co-authored-by: Andrew Mathew <[email protected]> * update changelog (#43532) Co-authored-by: catalinaperalta <[email protected]> * App Config Provider - Provider Refactor (#43196) * Code Cleanup * Move validation to shared file * Updating Header Check * Update test_azureappconfigurationproviderbase.py * moved async tests to aio folder * post merge updates --------- Co-authored-by: Ethan Winters <[email protected]> Co-authored-by: rads-1996 <[email protected]> Co-authored-by: Azure SDK Bot <[email protected]> Co-authored-by: Daniel Jurek <[email protected]> Co-authored-by: Andrew Mathew <[email protected]> Co-authored-by: Andrew Mathew <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: McCoy Patiño <[email protected]> Co-authored-by: Yuchao Yan <[email protected]> Co-authored-by: ChenxiJiang333 <[email protected]> Co-authored-by: ChenxiJiang333 <[email protected]> Co-authored-by: Darren Cohen <[email protected]> Co-authored-by: ray chen <[email protected]> Co-authored-by: Wes Haggard <[email protected]> Co-authored-by: Zhiyuan Liang <[email protected]> Co-authored-by: Matthew Metcalf <[email protected]> Co-authored-by: catalinaperalta <[email protected]> Co-authored-by: catalinaperalta <[email protected]> Co-authored-by: helen229 <[email protected]> Co-authored-by: Scott Beddall <[email protected]>
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines