A complete NLP pipeline for binary sentiment classification of Amazon product reviews. The project includes exploratory data analysis, text preprocessing with NLTK, multiple classifier comparisons (Logistic Regression and Naive Bayes with TF-IDF and CountVectorizer), and a production Flask web app. Users can paste or type a review and get an instant Positive or Negative prediction with a clean, dark-themed UI.
Project Overview
Dataset / Input Data
Dataset Name / SourceAmazon Fine Food Reviews (Kaggle)
FormatCSV (Reviews.csv)
Size / InfoProduct review text; not included in repo (download from Kaggle)
Model / Approach
Preprocessing
Lowercasing, remove non-letters, tokenization, English stopwords removal (NLTK), extra spaces cleaned.
Vectorization
TfidfVectorizer with max_features=5000, ngram_range=(1, 2) for unigrams and bigrams.
Deployed Model
Logistic Regression with TF-IDF. Naive Bayes (TF-IDF and CountVectorizer) used in the notebook for comparison.
Tools / Frameworks
Tech Stack
Python 3.10+
Flask
NLTK
scikit-learn
Pandas / NumPy
Matplotlib / Seaborn
HTML / CSS / JS
Results / Output
Evaluation (Notebook)
Accuracy comparisonTable across models
Classification reportPrecision, Recall, F1
Confusion matricesVisualized
Top wordsPositive / negative frequency
Highlights
- Reusable preprocessing in
src/preprocessing.py - Best model saved as
models/best_model.joblib(vectorizer + classifier) - Flask UI: textarea + Analyze button, result card (Positive green / Negative red)
- Clear project structure: notebook for training, app.py for inference