Sentiment Analysis on Product Reviews - Case Study

Project Overview

A complete NLP pipeline for binary sentiment classification of Amazon product reviews. The project includes exploratory data analysis, text preprocessing with NLTK, multiple classifier comparisons (Logistic Regression and Naive Bayes with TF-IDF and CountVectorizer), and a production Flask web app. Users can paste or type a review and get an instant Positive or Negative prediction with a clean, dark-themed UI.

Dataset / Input Data

Dataset Name / SourceAmazon Fine Food Reviews (Kaggle)

FormatCSV (Reviews.csv)

Size / InfoProduct review text; not included in repo (download from Kaggle)

Model / Approach

Preprocessing

Lowercasing, remove non-letters, tokenization, English stopwords removal (NLTK), extra spaces cleaned.

Vectorization

TfidfVectorizer with max_features=5000, ngram_range=(1, 2) for unigrams and bigrams.

Deployed Model

Logistic Regression with TF-IDF. Naive Bayes (TF-IDF and CountVectorizer) used in the notebook for comparison.

Tools / Frameworks

scikit-learn NLTK Pandas joblib

Tech Stack

Python 3.10+

Flask

NLTK

scikit-learn

Pandas / NumPy

Matplotlib / Seaborn

HTML / CSS / JS

Results / Output

Evaluation (Notebook)

Accuracy comparisonTable across models

Classification reportPrecision, Recall, F1

Confusion matricesVisualized

Top wordsPositive / negative frequency

Highlights

Reusable preprocessing in src/preprocessing.py
Best model saved as models/best_model.joblib (vectorizer + classifier)
Flask UI: textarea + Analyze button, result card (Positive green / Negative red)
Clear project structure: notebook for training, app.py for inference

Demo Video

Watch project demo on Google Drive Flask app running — sentiment analysis in action

Watch Demo Video View Source Code