Skip to content

refactor: consistent cloning & pattern-handling #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

filipchristiansen
Copy link
Contributor

@filipchristiansen filipchristiansen commented Jul 6, 2025

✨ Refactor: consistent cloning & pattern-handling

Closes #344
Prepares for #342 (digest prefix/suffix) and #343 (caching)

Why

  • Cloning behaved differently depending on whether they supplied a branch, tag or commit.
    Caching and reproducibility suffered.
  • Pattern handling lived in query_parser and duplicated ignore/include logic.
    Hard to unit-test and reuse.
  • Windows CI occasionally failed when temp clones contained read-only
    packed-object files.
    Shallow-clone cleanup broke with WinError 5.

What’s new

  • Unified commit resolution
    utils.git_utils.resolve_commit() guarantees we always fetch the exact SHA
    (HEAD / branch / tag) before checkout.
    Deterministic → enables caching (feat: Implement caching on a per-commit basis #343).
  • New pattern helpers
    utils.pattern_utils.process_patterns() centralises include/exclude parsing
    (moved out of query_parser). Adds thorough tests.
  • Windows-safe repo cleanup
    _handle_remove_readonly on-error callback makes read-only Git objects
    writable and retries shutil.rmtree(), preventing WinError 5 in CI.
  • Public API cleanup (BREAKING)
    • parse_query() removed in favour of
      • parse_remote_repo() (URLs/slugs)
      • parse_local_dir_path() (local paths)
    • clone_repo, ingest_query, parse_query are no longer re-exported from
      gitingest.__init__.
  • Clone rewrite
    • Single entry point that always does: shallow clone → sparse-checkout
      (optional) → fetch commit → checkout → submodule update (optional).
    • _checkout_partial_clone renamed & moved → git_utils.checkout_partial_clone.
  • Server updates
    • query_processor uses new pattern utilities and passes a typed enum.
    • IngestRequest.validate_input_text() now removes any .git suffix.
  • Tests added/updated

    clone, pattern, parser, summary

File changes

gitingest

File Changes
__init__.py Stop exposing internal helpers (clone_repo, ingest_query, parse_query)
clone.py Rewrite clone_repo to call resolve_commit, sparse checkout, and uniform fetch/checkout steps; move helper & rename
entrypoint.py Add _handle_remove_readonly callback for Windows temp-dir cleanup; ingest_async now uses parse_remote_repo / parse_local_dir_path
output_formatter.py Add missing Tag prefix in _create_summary_prefix
query_parser.py Remove parse_query; move pattern helpers to pattern_utils.py
utils/git_utils.py New resolve_commit helper
utils/os_utils.py Rename ensure_directoryensure_directory_exists_or_create
utils/pattern_utils.py Consolidated helpers + process_patterns
utils/query_parser_utils.py Move _is_valid_pattern to pattern_utils.py

server

File Changes
models.py Strip .git suffix in validate_input_text
query_processor.py Replace parse_query with parse_remote_repo, integrate PatternType, use process_patterns
routers_utils.py Coerce pattern_type into PatternType enum

tests

File Changes
conftest.py Adjust fixtures
query_parser/test_git_host_agnostic.py Switch to parse_remote_repo
query_parser/test_query_parser.py Switch to parse_remote_repo; add parse_local_dir_path coverage
test_clone.py Update expectations and fixtures
test_pattern_utils.py Add tests for _parse_patterns and process_patterns
test_summary.py Verify that gitingest.ingest() emits correct summaries

@filipchristiansen filipchristiansen changed the title resolve commit feat: resolve commit Jul 9, 2025
@filipchristiansen filipchristiansen force-pushed the resolve-commit branch 3 times, most recently from 7ab4af2 to 831a36d Compare July 9, 2025 20:32
@filipchristiansen filipchristiansen requested a review from Copilot July 9, 2025 20:33
Copilot

This comment was marked as resolved.

@filipchristiansen filipchristiansen force-pushed the resolve-commit branch 6 times, most recently from cf1aa6f to d1224a6 Compare July 9, 2025 23:45
@filipchristiansen filipchristiansen marked this pull request as ready for review July 9, 2025 23:47
Copilot

This comment was marked as resolved.

@filipchristiansen filipchristiansen changed the title feat: resolve commit refactor: consistent cloning & pattern-handling Jul 10, 2025
Add `_handle_remove_readonly` on-error callback for `shutil.rmtree`
that removes the read-only attribute and retries, preventing WinError 5
during temp-repo cleanup in tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: avoid using branch references
1 participant