An extractive question-answering system that takes a context paragraph and a question, then returns a span of text from the context as the answer. Transformer models (e.g. DistilBERT, BERT) are fine-tuned on the Stanford Question Answering Dataset (SQuAD). The project includes a full training pipeline with configurable model name, data path, epochs, and batch size, plus a Flask web app where users can enter context and questions and receive extracted answers. Evaluation uses Exact Match and F1; metrics are saved to outputs/eval_metrics.txt. The app uses outputs/final if present, otherwise the base model.
Project Overview
Dataset / Input Data
Model / Approach
Task
Extractive QA: model predicts start and end token positions of the answer span within the given context.
Training
Fine-tuning on SQuAD via src.train. Supports full dataset or smaller subsets (e.g. --max_train_samples 2000, --max_eval_samples 500). Configurable epochs, batch size, output directory. Checkpoints and final model saved under outputs.
Models
DistilBERT (default), BERT, or other Hugging Face QA models. Compare by running training with different --model_name and --output_dir and inspecting eval_metrics.txt.
Evaluation
Exact Match and F1 score; metrics written to outputs/eval_metrics.txt.
Tools / Frameworks
Tech Stack & Project Structure
Structure: app.py, question_answering.ipynb, data/ (SQuAD JSON), src/ (data_loader, preprocess, train, inference), templates/, static/css/, outputs/ (checkpoints, final, eval_metrics.txt).
Results / Output
Outputs
Highlights
- Extractive QA with Transformers fine-tuned on SQuAD
- Flask web interface for context + question → answer
- CLI training: quick run (small subset) or full training with configurable args
- Support for different base models (e.g. bert-base-uncased, distilbert-base-uncased)
- Evaluation: Exact Match and F1 saved to file