Skip to content

[THIS-1049] Neat alpha autofix infrastructure#1569

Open
Magssch wants to merge 33 commits intomainfrom
feat/autofix-part1
Open

[THIS-1049] Neat alpha autofix infrastructure#1569
Magssch wants to merge 33 commits intomainfrom
feat/autofix-part1

Conversation

@Magssch
Copy link
Contributor

@Magssch Magssch commented Feb 4, 2026

Description

Add core infrastructure for automatic fixing data models. Introduces FixAction (immutable data model for field-level changes), FixApplicator (groups fixes by resource, checks for conflicts, and applies them efficiently via in-place mutation of a deep copy), and transform_physical on NeatStore to integrate fix application into the provenance pipeline. The orchestrator and session layer are wired up to support the read-validate-fix flow behind an alpha feature flag.

The code path for fixe functionality is disabled until exposed in a later PR.

Bump

  • Patch
  • Skip

@gemini-code-assist
Copy link
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
7016 6487 92% 90% 🟢

New Files

File Coverage Status
cognite/neat/_data_model/_fix.py 96% 🟢
TOTAL 96% 🟢

Modified Files

File Coverage Status
cognite/neat/_data_model/_shared.py 92% 🟢
cognite/neat/_data_model/_snapshot.py 100% 🟢
cognite/neat/_data_model/models/dms/init.py 100% 🟢
cognite/neat/_data_model/models/dms/_http.py 100% 🟢
cognite/neat/_data_model/rules/_base.py 100% 🟢
cognite/neat/_data_model/rules/dms/_orchestrator.py 97% 🟢
cognite/neat/_session/_physical.py 71% 🟢
cognite/neat/_store/_provenance.py 100% 🟢
cognite/neat/_store/_store.py 92% 🟢
TOTAL 95% 🟢

updated for commit: 07baacb by action🐍

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 85.84071% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.73%. Comparing base (0f7f8fa) to head (07baacb).

Files with missing lines Patch % Lines
cognite/neat/_session/_physical.py 50.00% 7 Missing ⚠️
cognite/neat/_data_model/_fix.py 93.05% 5 Missing ⚠️
cognite/neat/_data_model/_shared.py 57.14% 3 Missing ⚠️
...ognite/neat/_data_model/rules/dms/_orchestrator.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1569      +/-   ##
==========================================
- Coverage   91.76%   91.73%   -0.03%     
==========================================
  Files         121      122       +1     
  Lines        7065     7161      +96     
==========================================
+ Hits         6483     6569      +86     
- Misses        582      592      +10     
Files with missing lines Coverage Δ
cognite/neat/_data_model/_snapshot.py 98.61% <100.00%> (+0.08%) ⬆️
cognite/neat/_data_model/models/dms/__init__.py 100.00% <100.00%> (ø)
cognite/neat/_data_model/models/dms/_http.py 100.00% <100.00%> (ø)
cognite/neat/_data_model/rules/_base.py 93.75% <100.00%> (ø)
cognite/neat/_store/_provenance.py 100.00% <100.00%> (ø)
cognite/neat/_store/_store.py 89.43% <100.00%> (+0.46%) ⬆️
...ognite/neat/_data_model/rules/dms/_orchestrator.py 96.87% <66.66%> (-3.13%) ⬇️
cognite/neat/_data_model/_shared.py 81.25% <57.14%> (-6.75%) ⬇️
cognite/neat/_data_model/_fix.py 93.05% <93.05%> (ø)
cognite/neat/_session/_physical.py 71.31% <50.00%> (+0.12%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Magssch Magssch force-pushed the feat/autofix-part1 branch 4 times, most recently from fb4de40 to 4a7321c Compare February 5, 2026 07:31
Add ability for validators to provide automatic fixes for issues they
detect. This implements the core infrastructure and fixes for:

Infrastructure:
- FixAction class extending ResourceChange for atomic schema fixes
- Helper functions for generating auto IDs (constraints, indexes)
- Orchestrator support for apply_fixes parameter
- Tracking of applied fixes in provenance

Fixable validators:
- MissingRequiresConstraint: Adds requires constraints
- SuboptimalRequiresConstraint: Removes suboptimal constraints
- RequiresConstraintCycle: Removes constraints to break cycles
- MissingReverseDirectRelationTargetIndex: Adds indexes

All fixes use __auto suffix for easy identification and are only
applied when explicitly enabled via the orchestrator.
@Magssch Magssch force-pushed the feat/autofix-part1 branch from 4a7321c to 945ab97 Compare February 5, 2026 12:26
@Magssch Magssch changed the title [THIS-1049] Neat alpha autofix capability (part 1) [THIS-1049] Neat alpha autofix infrastructure Feb 5, 2026
@Magssch
Copy link
Contributor Author

Magssch commented Feb 5, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a FixAction class for applying atomic fixes to schema issues, along with helper functions for generating constraint and index identifiers, ensuring they adhere to CDF's length limits by truncating and hashing if necessary. It also includes validators for identifying and resolving performance-related issues such as missing requires constraints and unindexed reverse direct relations, as well as breaking constraint cycles. The review comments suggest an iterative approach to applying fixes to avoid conflicts and ensure a more robust autofix mechanism, as applying one fix can change the data model in a way that invalidates other pending fixes.

@Magssch
Copy link
Contributor Author

Magssch commented Feb 5, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new autofix infrastructure for DMS validators, allowing FixAction objects to automatically resolve validation issues. The changes include defining FixAction and helper functions for auto-generated IDs, integrating fix application into the DmsDataModelRulesOrchestrator, and implementing fix methods for several performance-related validators. The new fix methods correctly generate FixAction objects to add/remove constraints and indexes. Comprehensive unit and end-to-end tests have been added to verify the functionality of the fix actions and their ability to resolve validation issues.

@Magssch Magssch marked this pull request as ready for review February 5, 2026 16:00
@Magssch Magssch requested a review from a team as a code owner February 5, 2026 16:00
Copy link
Collaborator

@doctrino doctrino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the FixAction structure. I suggest utilizing the pydantic model_copy method for efficient update. See comment

Copy link
Collaborator

@nikokaoja nikokaoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in the first pass are related to:

  • scope of PR
  • readability of code

Magssch and others added 13 commits February 10, 2026 11:21
…xed snapshot

- Merge _fix_actions.py + _fix_helpers.py into _fix.py
- Move as_resource_update onto FixAction as a method (uses model_copy instead of in-place mutation)
- Group fixes by resource ID in orchestrator with conflict detection
- Return fixed_snapshot from orchestrator.run() for provenance
- Revert validator changes to separate follow-up PR

Co-authored-by: Cursor <cursoragent@cursor.com>
@Magssch
Copy link
Contributor Author

Magssch commented Feb 10, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust infrastructure for automatically fixing data model validation issues. The core components like FixAction and FixApplicator are well-designed and the integration into the existing provenance and session management through transform_physical is clean. The changes are extensive but logical, and the addition of tests for the new components is great.

I've found a critical bug in the conflict detection logic within FixApplicator and a high-severity issue related to incorrect provenance tracking for applied fixes. Please see the detailed comments for suggestions on how to resolve these.

for action in self._fix_actions:
fix_by_resource_id[action.resource_id].append(action)

resources_list_lookup: dict[type, dict[SchemaResourceId, DataModelResource]] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely refactor RequestSchema to use dicts for more efficient lookup instead of having to do the lookup table here. Tracking this here: https://cognitedata.atlassian.net/browse/THIS-1068

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can avoid maintaining SchemaSnapshot class, as that one in an exactly the shape your FixApplicator needs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wandering if we attach a resource look up on RequestSchema could be an alternative @doctrino

Copy link
Collaborator

@nikokaoja nikokaoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, some optional refactoring which can be done in another PR (not critical)

for action in self._fix_actions:
fix_by_resource_id[action.resource_id].append(action)

resources_list_lookup: dict[type, dict[SchemaResourceId, DataModelResource]] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can avoid maintaining SchemaSnapshot class, as that one in an exactly the shape your FixApplicator needs

for action in self._fix_actions:
fix_by_resource_id[action.resource_id].append(action)

resources_list_lookup: dict[type, dict[SchemaResourceId, DataModelResource]] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wandering if we attach a resource look up on RequestSchema could be an alternative @doctrino

Comment on lines +76 to +84
if resource_lookup is None:
raise RuntimeError(
f"{type(self).__name__}: Unsupported resource type {type(resource_id)}. This is a bug in NEAT."
)
resource = resource_lookup.get(resource_id)
if resource is None:
raise RuntimeError(
f"{type(self).__name__}: Resource {resource_id} not found in schema. This is a bug in NEAT."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch !

Comment on lines +117 to +119
def _check_no_field_path_conflicts(self, changes: list[FieldChange]) -> None:
"""Raise if any changes touch a field_path already modified by a previous change."""
seen_paths: set[str] = set()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

Comment on lines +129 to +157
def make_auto_id(base_id: str) -> str:
"""Generate an auto-generated identifier with truncation if needed.

CDF has a 43-character limit on constraint/index identifiers. This function
ensures the ID stays within that limit while maintaining uniqueness.

Args:
base_id: The primary identifier to use (e.g., external_id or property_id).

Returns:
For short base_ids (≤37 chars): "{base_id}__auto"
For long base_ids (>37 chars): "{truncated_id}_{hash}__auto"
"""
if len(base_id) <= MAX_BASE_LENGTH_NO_HASH:
return f"{base_id}{AUTO_SUFFIX}"

hash_suffix = hashlib.sha256(base_id.encode()).hexdigest()[:HASH_LENGTH]
truncated_id = base_id[:MAX_BASE_LENGTH_WITH_HASH]
return f"{truncated_id}_{hash_suffix}{AUTO_SUFFIX}"


def make_auto_constraint_id(dst: ContainerReference) -> str:
"""Generate a constraint identifier for auto-generated requires constraints."""
return make_auto_id(dst.external_id)


def make_auto_index_id(property_id: str) -> str:
"""Generate an index identifier for auto-generated indexes."""
return make_auto_id(property_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to move this into utils under a dedicated module identifiers or similar

Comment on lines +76 to +84
def transform_physical(self, activity: Callable, on_success: OnSuccess | None = None) -> Change:
"""Transform the current physical data model and record in provenance."""
change, transformed_model = self._do_activity(activity, on_success)

if transformed_model:
self.physical_data_model.append(transformed_model)

self.provenance.append(change)
return change
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible transform_physical should return None and updates to the provenance (change) occur here, as in the example of read_physical.

This is optional refactoring, not nesseraly needed to be done in this PR

Comment on lines +115 to +118
applicator = FixApplicator(self._store.physical_data_model[-1], on_success.pending_fixes)
post_fix_on_success = self._create_on_success()
change = self._store.transform_physical(applicator.apply_fixes, post_fix_on_success)
change.applied_fixes = on_success.pending_fixes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional refactoring (in another PR):

_create_on_success can be extended allowing creation of data model transformer on_success object, then _read_validate_fix becomes simpler.

applicator = FixApplicator(self._store.physical_data_model[-1], on_success.pending_fixes)
post_fix_on_success = self._create_on_success()
change = self._store.transform_physical(applicator.apply_fixes, post_fix_on_success)
change.applied_fixes = on_success.pending_fixes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional refactoring (in another PR):

_create_on_success can be extended allowing creation of data model transformer on_success object, then _read_validate_fix becomes

target_entity: str | None = field(default="FailedEntity")
issues: IssueList | None = field(default=None)
errors: IssueList | None = field(default=None)
applied_fixes: list[FixAction] | None = field(default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nitpic , keep it simple

Suggested change
applied_fixes: list[FixAction] | None = field(default=None)
fixes: list[FixAction] | None = field(default=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants