Skip to content

fix(diff): check modified count in identical-files guard#792

Open
gghez wants to merge 1 commit intortk-ai:developfrom
gghez:fix/diff-false-identical
Open

fix(diff): check modified count in identical-files guard#792
gghez wants to merge 1 commit intortk-ai:developfrom
gghez:fix/diff-false-identical

Conversation

@gghez
Copy link
Copy Markdown

@gghez gghez commented Mar 24, 2026

Summary

rtk diff incorrectly reports "Files are identical" when lines differ by only a few characters. Two root causes:

  1. The identical-files guard at line 24 only checked added == 0 && removed == 0, ignoring the modified counter. Lines classified as modifications (similarity > 0.5) incremented modified but not added/removed, so the guard passed and the diff silently swallowed actual differences.

  2. The similarity() function used Jaccard similarity on unique character sets, which over-estimated similarity for lines sharing common characters regardless of position. Replaced it with a positional prefix+suffix ratio that better reflects actual line differences.

Changes

  • Added && diff.modified == 0 to the identical-files guard in run()
  • Replaced similarity() implementation: Jaccard set-based -> positional prefix+suffix metric
  • Added 4 regression tests covering all reproduction cases from the issue
  • Updated existing test_similarity_partial_overlap for the new metric

Test plan

Fixes #781

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

@pszymkowiak pszymkowiak added bug Something isn't working effort-small Quelques heures, 1 fichier labels Mar 24, 2026
@pszymkowiak
Copy link
Copy Markdown
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

🐛 Type bug-fix
🟡 Risk medium

Summary

Fixes a bug where rtk diff falsely reported files as identical when lines differed by only a few characters. The identical-files guard now checks the modified counter in addition to added and removed, and the similarity function was replaced with a positional prefix+suffix metric to avoid over-estimating similarity.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Linked issues: #781


Analyzed automatically by wshm · This is an automated analysis, not a human review.

@aeppling
Copy link
Copy Markdown
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

The "Files are identical" check only tested added and removed counters,
ignoring modified lines. When two lines shared enough characters to be
classified as a modification (similarity > 0.5), they incremented
modified but not added/removed, so the guard passed and the diff
reported no changes.

Also replace the Jaccard set-based similarity function with a
positional prefix+suffix metric that better reflects actual line
differences — the old approach used unique character sets, which
over-estimated similarity for lines sharing common characters in
different positions.

Fixes rtk-ai#781

Signed-off-by: gghez <gghez@users.noreply.github.com>
@gghez gghez force-pushed the fix/diff-false-identical branch from a82ce04 to 6ea71e5 Compare March 27, 2026 00:21
@gghez
Copy link
Copy Markdown
Author

gghez commented Mar 27, 2026

Done, rebased on develop. Had a small conflict in the test section since both branches added tests at the end of diff_cmd.rs — merged both sets, all green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working effort-small Quelques heures, 1 fichier

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants