Loading AI Systems...
Back to Portfolio

Sentiment Analysis on Product Reviews

Text classification project to predict Positive or Negative sentiment from Amazon product reviews. TF-IDF + Logistic Regression, NLTK preprocessing, and a Flask web app with dark modern UI.

Project Overview

A complete NLP pipeline for binary sentiment classification of Amazon product reviews. The project includes exploratory data analysis, text preprocessing with NLTK, multiple classifier comparisons (Logistic Regression and Naive Bayes with TF-IDF and CountVectorizer), and a production Flask web app. Users can paste or type a review and get an instant Positive or Negative prediction with a clean, dark-themed UI.

Dataset / Input Data

Dataset Name / SourceAmazon Fine Food Reviews (Kaggle)
FormatCSV (Reviews.csv)
Size / InfoProduct review text; not included in repo (download from Kaggle)

Model / Approach

Preprocessing

Lowercasing, remove non-letters, tokenization, English stopwords removal (NLTK), extra spaces cleaned.

Vectorization

TfidfVectorizer with max_features=5000, ngram_range=(1, 2) for unigrams and bigrams.

Deployed Model

Logistic Regression with TF-IDF. Naive Bayes (TF-IDF and CountVectorizer) used in the notebook for comparison.

Tools / Frameworks

scikit-learn NLTK Pandas joblib

Tech Stack

Python 3.10+
Flask
NLTK
scikit-learn
Pandas / NumPy
Matplotlib / Seaborn
HTML / CSS / JS

Results / Output

Evaluation (Notebook)

Accuracy comparisonTable across models
Classification reportPrecision, Recall, F1
Confusion matricesVisualized
Top wordsPositive / negative frequency

Highlights

  • Reusable preprocessing in src/preprocessing.py
  • Best model saved as models/best_model.joblib (vectorizer + classifier)
  • Flask UI: textarea + Analyze button, result card (Positive green / Negative red)
  • Clear project structure: notebook for training, app.py for inference

Demo Video