An NLP-powered resume screening system that matches candidates to job descriptions using semantic embeddings. The pipeline loads resume and job datasets (CSV), preprocesses text, builds sentence embeddings with Sentence Transformers, and compares embedding models to select the best one. A Flask web app lets users upload a resume (PDF, DOCX, or TXT) and displays the top matching jobs with similarity scores. The project also includes simple skill and experience extraction from resumes.
Project Overview
Dataset / Input Data
Resume datasetResume Dataset/Resume/Resume.csv (e.g. Resume or Resume_str, Category)
Job datasetJob Dataset/job_descriptions.csv (Job Description, Job Title, Role/Category)
ConfigPaths and column names adjustable in config.py
Model / Approach
Embeddings
Sentence Transformers to encode resume and job description text into dense vectors. Multiple embedding models are compared in the notebook to pick the best performer.
Matching
Cosine similarity between resume embedding and precomputed job embeddings. Resumes are ranked by score; top matches are shown in the Flask UI with scores.
Extraction
Simple skill and experience extraction from resume text to support matching and display.
Tools / Frameworks
Tech Stack
Python
Flask
Sentence Transformers
Pandas
scikit-learn
Semantic search / Cosine similarity
PDF/DOCX/TXT upload
Results / Output
Pipeline Outputs
Notebookmodels/best_model_config.json, job_embeddings.npz
Flask appUpload resume → top job matches with scores
Highlights
- Load resume + job description datasets (CSV) and preprocess text
- Build sentence embeddings and compare embedding models
- Rank resumes/jobs by cosine similarity
- Skill and experience extraction from resumes
- Flask UI: upload PDF/DOCX/TXT and see top job matches with scores