Skip to content

[Bug] Gemini provider aborts extraction on transient 504 DeadlineExceeded; only 429 is retried #284

Description

@AdvancedUno

Description

When using the Gemini provider (LLM_PROVIDER=gemini), resume extraction intermittently aborts with:

504 Deadline expired before operation could complete.

The failure happens on individual section calls (e.g. projects, awards). Because _extract_all_sections_separately aborts the entire extraction if any single section fails, one transient 504 kills the whole run and no resume is produced:

❌ Error calling LLM for projects section: 504 Deadline expired before operation could complete.
⚠️ Failed to extract projects section. Aborting extraction to prevent partial/invalid resume data.

Root cause

In GeminiProvider.chat() (models.py):

  1. gemini_model.generate_content(...) is called without a request_options timeout, so the SDK's default per-request gRPC deadline (~60s) applies. Larger sections / slower models (e.g. gemini-2.5-flash, gemini-3.5-flash) routinely exceed this and raise google.api_core.exceptions.DeadlineExceeded (504).
  2. The retry loop only catches ResourceExhausted (429). Transient server-side errors — DeadlineExceeded (504), ServiceUnavailable (503), InternalServerError (500) — are not retried and propagate up, aborting extraction.

This is distinct from #186, which concerns the 429 free-tier RPM limit (already handled by the existing backoff). The 504 timeout path has no retry and no extended deadline.

Steps to reproduce

  1. Set LLM_PROVIDER=gemini, DEFAULT_MODEL=gemini-2.5-flash (or a 3.x flash model), and a valid GEMINI_API_KEY.
  2. Run python score.py <resume>.pdf on a resume with several populated sections.
  3. Intermittently a section call fails with 504 Deadline expired before operation could complete. and the run aborts before producing an evaluation.

Environment

  • OS: Windows 11
  • Python: 3.11.7
  • google-generativeai: 0.4.0 (as pinned in requirements.txt)
  • Models observed: gemini-2.5-flash, gemini-3.5-flash

Expected behavior

Transient 504 / 503 / 500 errors should be retried with backoff (as 429 already is), and the per-request timeout should be large enough to accommodate normal section calls, so a single transient hiccup doesn't abort the whole extraction.

Proposed fix

  • Pass request_options={"timeout": ...} to generate_content to extend the per-request deadline.
  • Extend the existing exponential-backoff loop to also retry DeadlineExceeded, ServiceUnavailable, and InternalServerError.

I have a fix ready and will open a PR referencing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions