Naive Bayes Example

2 minute read

Naive Bayes

⭐️Simple, fast, and highly effective probabilistic machine learning classifier based on Bayes’ theorem.

\[\log(P(y| W)) \propto \sum_{i=1}^d \log(P(w_i|y)) + \log(P(y))\]

\[P(w_{i}|y)=\frac{count(w_{i},y)+\alpha }{count(y)+\alpha \cdot |V|}\]

Email Classification Problem

Let’s solve an email classification problem, below we have list of emails (tokenized) and labelled as Spam/Not Spam for training.

Training Phase

🏛️Class Priors:

📕 Vocabulary = { Free, Money, Inside, Scan, Win, Cash, Click, Link, Catch, Up Today, Noon, Project, Meeting }

🧮 Class count:

✅ Laplace smoothing: \(\alpha = 1\)

Likelihood of 'free'

\[P(w_{i}|y)=\frac{count(w_{i},y)+\alpha }{count(y)+\alpha \cdot |V|}\]

Inference Time

👉Say a new email 📧 arrives - “Free money today”; lets classify it as Spam/Not Spam.

Spam:

Not Spam:

Final Score 🏏

\[\log(P(y| W)) \propto \sum_{i=1}^d \log(P(w_i|y)) + \log(P(y))\]

Score(Spam) = log(P(Spam)) + log(P(‘free’|S)) + log(P(‘money’|S)) + log(P(‘today’|S))
= log(0.5) + log(0.13) + log(0.087) + log(0.043) = -0.301 -0.886 -1.06 -1.366 = -3.614
Score(Not Spam) = log(P(Not Spam)) + log(P(‘free’|NS)) + log(P(‘money’|NS)) + log(P(‘today’|NS))
= log(0.5) + log(0.047) + log(0.047) + log(0.095) = -0.301 -1.328 -1.328 -1.022 = -3.979

👉Since, Score(Spam) (-3.614 )> Score(Not Spam) (-3.979) , the model chooses ‘Spam’ as the label for the email.

End of Section