Skip to content

Ollama models fail to reliably extract resume sections (skills, projects, etc.) that work correctly with Gemini #278

Description

@Srijansarkar17

Problem

When using local Ollama models (e.g., gemma3:4b, qwen3:4b, mistral:7b), the resume section extractor in pdf.py
frequently fails to parse structured sections such as skills, projects, and awards. The _extract_all_sections_separately()
method returns None for one or more sections, causing the entire extraction to abort due to the fail-fast logic at line 297–299. The same resume parses perfectly when a Gemini model is used.

Proposed Solution

The issue is likely caused by smaller Ollama models not consistently adhering to the strict JSON-only output instructions defined in skills.jinja and system_message.jinja. Despite stripping <think> tokens in llm_utils.py, these models may still generate additional text, malformed JSON, or residual reasoning tokens, causing JSON parsing to fail.

Possible improvements include:

  • Enforce structured output more strictly

    • The format=model.model_json_schema() argument is already passed in _call_llm_for_section(), but its effectiveness varies across Ollama models. Investigate ways to enforce structured output more reliably.
  • Add per-section retry logic

    • Instead of aborting on the first parsing failure, retry the LLM call for the affected section 2–3 times before marking it as failed.
  • Make fail-fast behavior configurable

    • Allow partial resume extraction instead of returning None for the entire resume when a single section fails. This would improve robustness and enable graceful degradation.
  • Use model-specific prompt variants

    • Smaller Ollama models may perform better with shorter, more constrained prompts. Providing model-specific prompt templates could improve JSON compliance.

I am willing to submit a PR implementing per-section retry logic and/or improved prompt engineering for Ollama models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions