Confidence Interval
3 minute read
In this section, we will understand about Confidence Interval.
Confidence Interval:
It is a range of values that is likely to contain the true population mean, based on a sample.
Instead of giving a point estimate, it gives a range of values with confidence level.
For normal distribution, confidence interval :
\(\bar{X}\): Sample mean
\(Z\): Z-score corresponding to confidence level
\(n\): Sample size
\( \sigma \): Population Standard Deviation
Applications:
- A/B testing, i.e., compare 2 or more versions of a product.
- ML model performance evaluation, i.e, instead of giving a single performance score of say 85%,
it is better to provide a 95% confidence interval, such as, [82.5%, 87.8%].
95% confidence interval does NOT mean there is a 95% chance that the true mean lies in the specific calculated interval.
- It just means that if we repeat the sampling process many times, then 95% of of those calculated intervals will capture or contain the true population mean \(\mu\).
- Also, we cannot say there is 95% probability that the true mean is within that specific range because true population mean is a fixed constant, NOT a random variable.
For example:
Let’s suppose we want to measure the average weight of a certain species of dog.
We want to estimate the true population
mean \(\mu\) using confidence interval.
Note: True average weight = 30 kg, but this is NOT known to us.
| Sample Number | Sample Mean | 95% Confidence Interval | Did it capture \(\mu\) ? |
|---|---|---|---|
| 1 | 29.8 kg | (28.5, 31.1) | Yes |
| 2 | 30.4 kg | (29.1, 31.7) | Yes |
| 3 | 31.5 kg | (30.2, 32.8) | No |
| 4 | 28.1 kg | (26.7, 29.3) | No |
| - | - | - | - |
| - | - | - | - |
| - | - | - | - |
| 100 | 29.9 kg | (28.6, 31.2) | Yes |
- We generated 100 confidence intervals(CI) each based on different samples.
- 95% CI guarantees that, in long run, 95 out of 100 CIs will include the true average weight, i.e, \(\mu=30kg\),
and may be will miss 5 out of 100 times.
Suggest which company is offering a better salary?
Below is the details of the salaries based on a survey of 50 employees.
| Company | Average Salary(INR) | Standard Deviation |
|---|---|---|
| A | 36 lpa | 7 lpa |
| B | 40 lpa | 14 lpa |
For comparison, let’s calculate the 95% confidence interval for the average salaries of both companies A and B.
We know that:
\( CI = \bar{X} \pm Z\frac{\sigma}{\sqrt{n}} \)
Margin of Error(MoE) \( = Z\frac{\sigma}{\sqrt{n}} \)
Z-Score for 95% CI = 1.96
\(MoE_A = 1.96*\frac{7}{\sqrt{50}} \approx 1.94 \)
=> 95% CI for A = \(36 \pm 1.94 \) = [34.06, 37.94]
\(MoE_B = 1.96*\frac{14}{\sqrt{50}} \approx 3.88\)
=> 95% CI for B = \(40 \pm 3.88 \) = [36.12, 43.88]
We can see that initially company B’s salary looked obviously better,
but after calculating the 95% CI, we can see that there is a significant overlap in the salaries of two companies,
i.e [36.12, 37.94].
End of Section