AI Atlas
Beginner· ~2 min read#sentiment-analysis#nlp#text-classification

Sentiment Analysis

Inferring tone from text

Automatically determining the emotional tone (positive, negative, neutral) of text — the foundation of customer review, social media, and feedback analysis.

SENTIMENT ANALYSIS"Loved the product, fast shipping too."Positive86%Neutral10%Negative4%Outputs the emotional tone of text as a probability vector.
Definition

Sentiment analysis quantifies the writer's attitude. The simplest form is three-way classification (positive / negative / neutral); richer setups score intensity (-1 to +1), multiple dimensions (joy, anger, surprise), or aspect-based sentiment (positive about delivery, negative about price within the same review).

Approaches evolved:

- Lexicon-based (VADER, AFINN): each word has a preset sentiment score; sentence score = sum/average. Fast, explainable; weak on negation, irony, context. - Classical ML (Naive Bayes, logistic regression + TF-IDF): a classifier trained on labeled reviews. Solid baseline. - Pre-transformer deep learning (LSTM, CNN): captures context; needs data. - Transformer-based (BERT, RoBERTa, multilingual): today's default. Take a pre-trained model and fine-tune on 1K–10K labels for production-grade accuracy.

Open models exist for many languages. Label quality is what drives real performance. Morphology, irony, slang, regional use are perennial challenges in sentiment.

Analogy

Like a restaurant owner walking the floor after dinner service. From customers' faces, leftovers, snippets servers caught, and tip size, they form an overall sense of the night. Nobody explicitly says "it was good/bad" — they read cues. Sentiment analysis quantifies those cues in text.

Real-world example

A shop processes 50K reviews a day; manual reading is impossible. Three models:

| Model | Accuracy | 1K reviews | Notes | |-------|----------|------------|-------| | VADER (lexicon) | 72% | <1 s | Misses irony | | Logistic + TF-IDF | 81% | 2 s | Decent baseline | | Fine-tuned BERT | 91% | 30 s | Catches context, negation |

Decision: BERT for the daily dashboard, VADER for live tooltip in customer service — speed traded for quality. Reviews where star count contradicts text (3 stars but mostly positive language) go to a manual queue; data quality matters more than tool choice.

Code examples
Hugging Face · ready-made modelPython
from transformers import pipeline

clf = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
)

reviews = [
    "Loved the product, fast shipping too.",
    "Awful, would never recommend.",
    "It's okay, not as good as I hoped.",
]

for r in reviews:
    out = clf(r)[0]
    print(f"{out['label']:8s} ({out['score']:.2f})  →  {r}")
Train logistic regression on your own dataPython
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report

pipe = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        min_df=2,
        max_features=50000,
    )),
    ("clf", LogisticRegression(max_iter=1000, C=1.0)),
])

pipe.fit(texts_train, labels_train)
preds = pipe.predict(texts_test)
print(classification_report(labels_test, preds))

feats = pipe.named_steps["tfidf"].get_feature_names_out()
coefs = pipe.named_steps["clf"].coef_[0]
top_pos = coefs.argsort()[-15:][::-1]
print("Most positive words:", [feats[i] for i in top_pos])
When to use
  • Scaling reviews / social / feedback analysis
  • Brand monitoring, early crisis detection
  • Content moderation (toxic vs neutral)
  • Measuring emotional response in A/B tests
When not to use
  • Few one-off texts — read them yourself
  • Sarcasm / irony heavy — even modern models struggle
  • Highly domain-specific (legal, medical) — generic models miss nuance
Common pitfalls

Losing negation

'not bad' is positive; 'not good' is negative. Lexicon methods often miss this; transformers handle it better but verify on your data.

Poor label quality

If sarcasm was labeled 'positive', the model learns that mistake. Track inter-annotator agreement; refine guidelines if it's low.

Single-number summaries

An overall 78% positive can hide a feature with 30% positive / 50% negative. Aspect-based sentiment surfaces what product needs.