Confidence Interval

Confidence Interval

In this section, we will understand about Confidence Interval.


📘

Confidence Interval:
It is a range of values that is likely to contain the true population mean, based on a sample.
Instead of giving a point estimate, it gives a range of values with confidence level.

For normal distribution, confidence interval :

\[ CI = \bar{X} \pm Z\frac{\sigma}{\sqrt{n}} \]

\(\bar{X}\): Sample mean
\(Z\): Z-score corresponding to confidence level
\(n\): Sample size
\( \sigma \): Population Standard Deviation

Applications:

  • A/B testing, i.e., compare 2 or more versions of a product.
  • ML model performance evaluation, i.e, instead of giving a single performance score of say 85%,
    it is better to provide a 95% confidence interval, such as, [82.5%, 87.8%].

For example:
Let’s suppose we want to measure the average weight of a certain species of dog.
We want to estimate the true population mean \(\mu\) using confidence interval.
Note: True average weight = 30 kg, but this is NOT known to us.

Sample NumberSample Mean95% Confidence IntervalDid it capture \(\mu\) ?
129.8 kg(28.5, 31.1)Yes
230.4 kg(29.1, 31.7)Yes
331.5 kg(30.2, 32.8)No
428.1 kg(26.7, 29.3)No
----
----
----
10029.9 kg(28.6, 31.2)Yes
  • We generated 100 confidence intervals(CI) each based on different samples.
  • 95% CI guarantees that, in long run, 95 out of 100 CIs will include the true average weight, i.e, \(\mu=30kg\), and may be will miss 5 out of 100 times.

💡

Suggest which company is offering a better salary?
Below is the details of the salaries based on a survey of 50 employees.

CompanyAverage Salary(INR)Standard Deviation
A36 lpa7 lpa
B40 lpa14 lpa

For comparison, let’s calculate the 95% confidence interval for the average salaries of both companies A and B.
We know that:
\( CI = \bar{X} \pm Z\frac{\sigma}{\sqrt{n}} \)
Margin of Error(MoE) \( = Z\frac{\sigma}{\sqrt{n}} \)
Z-Score for 95% CI = 1.96

\(MoE_A = 1.96*\frac{7}{\sqrt{50}} \approx 1.94 \)
=> 95% CI for A = \(36 \pm 1.94 \) = [34.06, 37.94]

\(MoE_B = 1.96*\frac{14}{\sqrt{50}} \approx 3.88\)
=> 95% CI for B = \(40 \pm 3.88 \) = [36.12, 43.88]

We can see that initially company B’s salary looked obviously better,
but after calculating the 95% CI, we can see that there is a significant overlap in the salaries of two companies,
i.e [36.12, 37.94].




End of Section