Skip to content

Conversation

@IanMatthewHuff
Copy link
Member

@IanMatthewHuff IanMatthewHuff commented Nov 20, 2025

The repoInfoTelemetry service was being used by the science team for some model development. This telemetry uses the gitDiffService to provide diffs and there were some git diff patches that were not able to be applied coming specifically from new untracked files.

These new untracked files are actually handled a bit differently as the main git services doesn't pick up these changes. Instead a synthetic patch is created via _getUntrackedChangePatch. This function had some issue with generating an actual synthetic patch that could be applied without fixup.

Fixes:

  • For a totally empty file, correctly omit range / lines
  • For the range header correctly use lines and not byte length
  • Add 'new file mode' header
  • Correctly add 'no newline' footer when needed
  • Add unit tests for this function

Note: This fix is specifically for the usage of this code path for repoInfoTelemetry, but this existing function was used already by gitCommitMessageService and scmChanges tool. In both those changes the git diffs look to be just handled as a string and passed to an LLM, so improving the accuracy here should not cause any issues. But that is why I avoided added in extra (possibly risky) work for checking file permissions or calculating the SHA1 hash (which are not required to apply the patch) into this work.

Manually tested with pulling the telemetry locally and creating and applying a diff patch for:

  1. New untracked file with content and no end new line
  2. New untracked file with content and an ending new line
  3. New fully empty untracked file
  4. New file with just a new line

Patch applied and created the file as it existed before in all four cases.


beforeEach(() => {
// Create mock workspace.fs.readFile if it doesn't exist
if (!vscode.workspace?.fs?.readFile) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The product code here uses workspace.fs directly and not the file service, so I added this mock here so that my changes for this test were just in this test file as opposed to trying to mess with the top level vscode test shims.


// Add original/modified file paths
// 100644 is standard file mode for new git files. Saves us from trying to check file permissions and handling
// UNIX vs Windows permission differences. Skipping calculating the SHA1 hashes as well since they are not strictly necessary
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a comment here on this. Did a bit of research into what the diff patch actually contains and what was required to actually apply a patch.

Using 100644 for the file mode simplifies not having to do extra file permission checks for a synthetic patch and handles the basic git new file scenario. Also the SHA1 hash line is skipped as git patch doesn't require it for applying so it's just more calculations that we don't need.

}

// Range header and content
patch.push(`@@ -0,0 +1,${lines.length} @@`);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug with the old code was here, where it was using the buffer.length and not the lines length for the range header. This would lead to patches that looked incorrect in size.


return patch.join('\n');
// The patch itself should always end with a newline per git patch standards
return patch.join('\n') + '\n';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch itself should always end with a newline.


// Git standard to add this comment if the file does not end with a newline
if (!content.endsWith('\n')) {
patch.push('\\ No newline at end of file');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was also missing in the old code. Prevents you from getting a warning when applying a patch that doesn't end with a new line.

readFileSpy.mockRestore();
});

describe('_getUntrackedChangePatch', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitDiffService didn't have unit tests previously. But while looking into this some of the cases (newline at end, newline not at end, empty file) were a bit quirky, so added some quick tests in for those specific cases.

@IanMatthewHuff IanMatthewHuff marked this pull request as ready for review November 20, 2025 19:47
Copilot AI review requested due to automatic review settings November 20, 2025 19:47
Copilot finished reviewing on behalf of IanMatthewHuff November 20, 2025 19:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the _getUntrackedChangePatch method in GitDiffService to generate proper git diff patches for untracked files. The existing implementation had issues with range headers using byte length instead of line count, missing standard git patch headers, and incorrect handling of files without trailing newlines. These issues prevented patches from being applied correctly, affecting the repoInfoTelemetry service used by the science team for model development.

Key Changes:

  • Fixed range header calculation to use line count instead of byte length
  • Added "new file mode 100644" header and proper handling of files without trailing newlines
  • Added comprehensive unit tests covering empty files, files with/without trailing newlines, and single blank line files

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/extension/prompt/vscode-node/gitDiffService.ts Fixed _getUntrackedChangePatch to generate standards-compliant git patches with correct headers, line counts, and newline handling
src/extension/prompt/vscode-node/test/gitDiffService.spec.ts Added comprehensive unit tests for _getUntrackedChangePatch covering various edge cases including empty files, files with/without trailing newlines

Comment on lines 25 to 31
(vscode as any).workspace = {
...vscode.workspace,
fs: {
...vscode.workspace?.fs,
readFile: vi.fn()
}
};
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'as any' to override the vscode object is an anti-pattern. According to the project's testing guidelines (CodingGuidelineID: 1000002), you should prefer explicit Mock classes over mutating real instances. Consider creating a MockFileSystemService instead.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I did notice this was a bit odd. My goal here was that I didn't want to mess with the existing gitDiffService since other code paths are using this existing code. And gitDiffService was using workspace.fs directly and not the file system service for file operation. So this test bit here I did so that I could unit test this function without having to change the product code (workspace.fs -> WorkspaceFileService) or make bigger changes to the vscode test shim. If we don't like this for a test file, down to change it to one of the other options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did remove the 'as any' for a more strict type here, but I think the core issue might still apply?

@IanMatthewHuff
Copy link
Member Author

@Yoyokrazy @sbatten has context on this change. But you can ping me as well if you need any info.

@IanMatthewHuff
Copy link
Member Author

IanMatthewHuff commented Nov 26, 2025

@vritant24 Tagging for context while I'm oof

@sbatten sbatten requested a review from lszomoru December 1, 2025 18:28
@sbatten
Copy link
Member

sbatten commented Dec 1, 2025

@lszomoru I'd really appreciate your review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants