Skip to content

fix: respect .gitignore when scanning codebases#8

Merged
EduardPetraeus merged 2 commits intomainfrom
fix/respect-gitignore
Mar 7, 2026
Merged

fix: respect .gitignore when scanning codebases#8
EduardPetraeus merged 2 commits intomainfrom
fix/respect-gitignore

Conversation

@EduardPetraeus
Copy link
Collaborator

Summary

  • Use git ls-files --cached --others --exclude-standard -z as the source of truth for which files belong to a codebase
  • Falls back to hardcoded exclude dirs (_FALLBACK_EXCLUDE_DIRS) for non-git repos
  • Directory tree and architecture pattern detection now use the same git-aware filtering
  • Fixes false positives caused by scanning venvs, node_modules, or other gitignored dirs with non-standard names

Before (scanning .venv-ai/ on HealthReporting)

  • Entry points: dotenv/cli.py, dotenv/main.py (third-party garbage)
  • Architecture: layered, MVC, monorepo (false positives)
  • Docstring coverage: 32% (diluted by venv code)

After (git-aware)

  • Entry points: api/server.py, mcp/server.py (correct)
  • Architecture: layered (correct)
  • Docstring coverage: 50% (real metric)

Changes

  • analyzers/base.py — new _get_git_files(), _is_path_gitignored(), git-aware _collect_files()
  • analyzers/code_structure.py_build_directory_tree() uses _is_path_gitignored()
  • analyzers/pattern_detector.py_detect_architecture_patterns() uses _collect_files() instead of raw rglob
  • 4 new tests (100 total, all green)

Test plan

  • 100 tests passing (96 existing + 4 new)
  • Verified on HealthReporting repo (229-test Python project with .venv-ai/)
  • Pre-commit hooks passing (gitleaks, ruff, formatting)
  • Claude code-reviewer: WARN → all findings fixed
  • Claude security-reviewer: WARN → all findings fixed

Generated with Claude Code

EduardPetraeus and others added 2 commits March 7, 2026 09:23
Use `git ls-files --cached --others --exclude-standard` to get the set
of repo-relevant files instead of scanning the entire filesystem.
Falls back to hardcoded exclude dirs for non-git repos.

Fixes false positives caused by scanning venv/node_modules with
non-standard names (e.g. .venv-ai) that weren't in the hardcoded list:
- Entry points no longer point to third-party packages in venvs
- Monorepo detection no longer triggered by venv pyproject.toml files
- Naming conventions and code quality metrics reflect actual project code
- Directory tree excludes gitignored directories

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nsistency

- Use `git ls-files -z` with NUL-delimited output for robust parsing
- Fixes CRLF line-ending bug that would silently drop files on Windows
- Remove hardcoded `.startswith(".")` guard from directory tree level-1,
  letting `_is_path_gitignored()` decide consistently at both levels
- Move subprocess import to module level in test file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@EduardPetraeus EduardPetraeus merged commit 531f14a into main Mar 7, 2026
1 of 2 checks passed
@EduardPetraeus EduardPetraeus deleted the fix/respect-gitignore branch March 7, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant