feat: add job description scoring mode with weighted evaluation and semantic matching#283
Open
Kingsam147 wants to merge 4 commits into
Open
Conversation
…emantic matching Adds a second evaluation mode alongside the original HackerRank scoring. When selected, the pipeline reads a job description from job_description.txt and scores the resume against it using a 7-category weighted model: - Skills Match (30%): LLM extracts required/preferred skills from the JD and checks the resume for each, weighting required skills at 80% - Experience Match (20%): LLM judges relevance of work history and projects - Keyword & Semantic Match (15%): Sentence Transformers (all-MiniLM-L6-v2) cosine similarity between JD and resume embeddings - Job Title Alignment (10%): LLM compares previous titles to the target role - Education & Certifications (10%): LLM checks degree and cert requirements - Resume Quality (10%): LLM grades action verbs and quantified achievements - Missing Critical Requirements (5%): penalises absent must-have qualifications At startup the user is prompted to choose between the two modes. Choosing mode 2 with an empty job_description.txt exits with a clear error message. Results are written to job_evaluations.csv in development mode. New files: - job_description.txt: empty placeholder for the job description input - prompts/templates/job_description_extraction.jinja - prompts/templates/job_evaluation_criteria.jinja - prompts/templates/job_evaluation_system_message.jinja Modified files: - models.py: JobDescriptionData, JobCategoryScore, JobScores, LLMJobEvaluationResponse, JobEvaluationData Pydantic models - evaluator.py: JobDescriptionEvaluator class - score.py: mode selector, fixed resume.pdf path, routing, output formatter - transform.py: transform_job_evaluation_response(), removed stale pdb import - prompts/template_manager.py: registers three new templates - requirements.txt: adds sentence-transformers - .gitignore: adds resume.pdf, job_evaluations.csv, package-lock.json
Keeps the empty placeholder in the repo for new users to clone but prevents personal job description content from being pushed.
….txt Both files are tracked as empty placeholders so new users know where to put their inputs. Use 'git update-index --skip-worktree' on each file to prevent personal content from being staged or pushed.
Compares modification timestamps so a replaced resume.pdf triggers a full re-extraction on the next run instead of serving stale cache.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The existing pipeline evaluates resumes against a fixed rubric hardcoded for a single role (Software Intern at HackerRank). This PR adds a second evaluation mode that accepts any job description and scores the resume against it using a 7-category weighted model, making the tool useful for any role.
At startup the user is prompted to choose between the two modes. The original HackerRank scoring is untouched.
How it works
Input: paste any job description into
job_description.txtin the project root, placeresume.pdfin the same directory, then runpython score.pyand select mode 2.Both files ship as empty placeholders in the repo so new users know exactly where to put their inputs. To prevent personal content from ever being staged or pushed, run once after cloning:
Pipeline (mode 2):
all-MiniLM-L6-v2) computes cosine similarity between the job description and resume embeddings for semantic/keyword matchingScoring weights:
Results are written to
job_evaluations.csvin development mode (separate from the existingresume_evaluations.csv).Changes
New files:
job_description.txt— empty placeholder for the job description inputresume.pdf— empty placeholder for the candidate resumeprompts/templates/job_description_extraction.jinjaprompts/templates/job_evaluation_criteria.jinjaprompts/templates/job_evaluation_system_message.jinjaModified files:
models.py—JobDescriptionData,JobCategoryScore,JobScores,LLMJobEvaluationResponse,JobEvaluationDataPydantic modelsevaluator.py—JobDescriptionEvaluatorclassscore.py— mode selector, fixedresume.pdfpath, routing, new output formatter,job_evaluations.csvwriting; resume cache now invalidated automatically whenresume.pdfis replaced (mtime comparison)transform.py—transform_job_evaluation_response(), removed stalepdbimportprompts/template_manager.py— registers three new templatesrequirements.txt— addssentence-transformers.gitignore— addsjob_evaluations.csv,package-lock.json; removesresume.pdfnow that it is a tracked placeholderTest plan
gemma3:4b)gemini-2.5-flash)job_description.txtin mode 2 exits with a clear error messageresume.pdfexits with a clear error messagejob_evaluations.csvis created and appended to correctly in development mode--skip-worktree)resume.pdfwith a different file triggers automatic cache invalidation on the next run