Z-Test

7 minute read

Statistics for AI & ML | Full Course Videos

Z-Test

It is a statistical test used to determine whether there is a significant difference between mean of 2 groups or sample and population mean.

It is a parametric test, since it assumes data to be normally distributed.
Appropriate when:
- sample size \(n \ge 30\).
- population standard deviation \(\sigma \) is known.
It is based on Gaussian/normal distribution.
It compares the difference between means relative to standard error, i.e, standard deviation of sampling distribution of sample mean.

There are 2 types of Z-Test:

1-Sample Z-Test: Used to compare the mean of a sample mean with a population mean.
2-Sample Z-Test: Used to compare the sample means of 2 independent samples.

Z-Score

It is a standardized score that measures how many standard deviations a particular data point is away from the population mean \(\mu\).

Transform a normal distribution \(\mathcal{N}(\mu, \sigma^2)\) to a standard normal distribution \(Z \sim \mathcal{N}(0, 1)\).
Standardized score helps us compare values from different normal distributions.

Z-score is calculated as:

\[Z = \frac{x - \mu}{\sigma}\]

x: data point
\(\mu\): population mean
\(\sigma\): population standard deviation

e.g:

Z-score of 1.5 => data point is 1.5 standard deviations above the mean.
Z-score of -2.0 => data point is 2.0 standard deviations below the mean.

Z-score helps to define probability areas:

68% of the data points fall within \(\pm 1 \sigma\).
95% of the data points fall within \(\pm 2 \sigma\).
99.7% of the data points fall within \(\pm 3 \sigma\).

Note:

Z-Test applies the concept of Z-score to sample mean rather than a single data point.

Z-Score | Gaussian Distribution | Hypothesis Testing | Explained with Example

1-Sample Z-Test

It is used to test whether the sample mean \(\bar{x}\) is significantly different from a known population mean \(\mu\).

Test Statistic:

\[ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

\(\bar{x}\): sample mean
\(\mu\): hypothesized population mean
\(\sigma\): population standard deviation
\(n\): sample size
\(\sigma / \sqrt{n}\): standard error of mean

Read more about Standard Error

Note: Test statistic Z follows a standard normal distribution \(Z \sim \mathcal{N}(0, 1)\).

2-Sample Z-Test

It is used to test whether the sample means \(\bar{x_1}\) and \(\bar{x_2}\) of 2 independent samples are significantly different from each other.
Test Statistic:

\[ Z = \frac{\bar{x_1} - \bar{x_2}}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]

\(\bar{x_1}\): sample mean of first sample
\(\bar{x_2}\): sample mean of second sample
\(\sigma_1\): population standard deviation of first sample
\(\sigma_2\): population standard deviation of second sample
\(n_1\): sample size of first sample
\(n_2\): sample size of second sample

Note: Test statistic Z follows a standard normal distribution \(Z \sim \mathcal{N}(0, 1)\).

Average time to run a ML model is 120 seconds with a known standard deviation of 15 seconds.
After applying a new optimisation, and n=100 runs, yields a sample mean of 117 seconds.
Does the optimisation significantly reduce the runtime of the model?
Consider the significance level of 5%.

Null Hypothesis: \(\mu = 120\) seconds, i.e, no change.
Alternative Hypothesis: \(\mu < 120\) seconds => left tailed test.

We will use 1-sample Z-Test to test the hypothesis.
Test Statistic:

\[ t_{obs} = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \\[10pt] = \frac{117 - 120}{\frac{15}{\sqrt{100}}} \\[10pt] = \frac{-3}{\frac{15}{10}} = \frac{-30}{15}\\[10pt] => t_{obs} = -2 \]

Since, significance level \(\alpha\) = 5% =0.05.
Critical value \(Z_{0.05}\) = -1.645
Important: Find the value of \(Z_{\alpha}\) in Z-Score Table

images/maths/statistics/one_sample_z_test.png

Our \(t_{obs}\) is much more extreme than the the critical value \(Z_{0.05}\), => p-value < 5%.
Hence, we reject the null hypothesis.
Therefore, there is a statistically significant evidence that the new optimisation reduces the runtime of the model.

Z Test | Gaussian Distribution | Standard Error | Hypothesis Testing | Explained with Example

Z Test of Proportion

It is a statistical hypothesis test used to determine if there is a significant difference between the proportion of a characteristic in two independent samples or to compare a sample proportion to a known population value

It is used when dealing with categorical data, such as, success/failure, male/female, yes/no etc.

It is of 2 types:

1-Sample Z-Test of Proportion: Used to test whether the observed proportion in a sample differs from hypothesized proportion.
2-Sample Z-Test of Proportion: Used to compare whether the 2 independent samples differ in their proportions.

The categorical data,i.e success/failure, is discrete that can be modeled as Bernoulli distribution.
Let’s understand how this Bernoulli random variable can be approximated as a Gaussian distribution for a very large sample size, using Central Limit Theorem.
Read more about Central Limit Theorem

Note:We will not prove the complete thing, but we will understand the concept in enough depth for clarity.

Sampling Distribution of a Proportion

\(Y \sim Bernoulli(p)\)
\(X \sim Binomial(n,p)\)
E[X] = mean = np
Var[X] = variance = np(1-p)
X = total number of successes
p = true probability of success
n = number of trials
Proportion of Success in sample = Sample Proportion = \(\hat{p} = \frac{X}{n}\)
e.g.: If n=100 people were surveyed, and 40 said yes, then \(\hat{p} = \frac{40}{100} = 0.4\)

\[ E[\hat{p}] = \frac{1}{n} E[X] = \frac{np}{n} = p \\[10pt] Var[\hat{p}] = Var[\frac{X}{n}] = \frac{Var[X]}{n^2} = \frac{np(1-p)}{n^2} =\frac{p(1-p)}{n} \\[10pt] \]

By Central Limit Theorem, we can state that for very large ’n’ Binomial distribution’s mean and variance can be used as an approximation for Gaussian/Normal distribution:

\[ X \sim Binomial(n,p) \xrightarrow{n \rightarrow \infty} X \approx N(np, np(1-p)) \\[10pt] \]

Since, \(\hat{p} = \frac{X}{n}\)
We can say that:

\[ \hat{p} \xrightarrow{n \rightarrow \infty} \approx N[p, \frac{p(1-p)}{n}] \\[10pt] \]

Mean = \(\mu_{\hat{p}} = p\) = True proportion of success in the entire population
Standard Error = \(SE_{\hat{p}} = \sqrt{Var[\frac{X}{n}]} = \sqrt{\frac{p(1-p)}{n}}\) = Standard Deviation of the sample proportion

Note: Large Sample Condition - Approximation is only valid when the expected number of successes and failures are both > 10 (sometimes 5).
\(np \ge 10 ~and~ n(1-p) \ge 10\)

1-Sample Z-Test of Proportion

It is used to test whether the observed proportion in a sample differs from hypothesized proportion.
\(\hat{p} = \frac{X}{n}\): Proportion of success observed in a sample
\(p_0\): Specific propotion value under the null hypothesis
\(SE_0\): Standard error of sample proportion under the null hypothesis
Z-Statistic: Measures how many standard errors is the observed sample proportion \(\hat{p}\) away from \(p_0\)
Test Statistic:

\[ Z = \frac{\hat{p} - p_0}{SE_0} = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

2-Sample Z-Test of Proportion

It is used to compare whether the 2 independent samples differ in their proportions.

Standard test used in A/B testing.

\[ \bar{p} = \frac{total ~ successes}{total ~ sample ~ size} = \frac{x_1+x_2}{n_1+n_2} = \frac{n_1\hat{p_1}+n_2\hat{p_2}}{n_1+n_2} \\[10pt] Standard ~ Error_{\hat{p_1}-\hat{p_2}} = \sqrt{\bar{p}(1-\bar{p})(\frac{1}{n_1} +\frac{1}{n_2})} \\[10pt] Z = \frac{\hat{p_1}-\hat{p_2}}{SE_{\hat{p_1}-\hat{p_2}}} \\[10pt] => Z = \frac{\hat{p_1}-\hat{p_2}}{\sqrt{\bar{p}(1-\bar{p})(\frac{1}{n_1} +\frac{1}{n_2})}} \]

A company wants to compare its 2 different website designs A & B.
Below is the table that shows the data:

Design	# of visitors(n)	# of signups(x)	conversion rate(\(\hat{p} = \frac{x}{n}\))
A	1000	80	0.08
B	1200	114	0.095

Is the design B better, i.e, design B increases conversion rate or proportion of visitors who sign up?
Consider the significance level of 5%.

Null Hypothesis: \(\hat{p_A} = \hat{p_B}\), i.e, no difference in conversion rates of 2 designs A & B.
Alternative Hypothesis: \(\hat{p_B} > \hat{p_A}\) i.e conversion rate of B > A => right tailed test.

Check large sample condition for both samples A & B.
\(n\hat{p_A} = 80 > 10 ~and~ n(1-\hat{p_A}) = 920 > 10\)
Similarly, we can show for B too.

Pooled proportion:

\[ \bar{p} = \frac{x_A+x_B}{n_A+n_B} = \frac{80+114}{1000+1200} = \frac{194}{2200} \\[10pt] => \bar{p}\approx 0.0882 \]

Standard Error(Pooled):

\[ SE=\sqrt{\bar{p}(1-\bar{p})(\frac{1}{n_1} +\frac{1}{n_2})} \\[10pt] = \sqrt{0.0882(1-0.0882)(\frac{1}{1000} +\frac{1}{1200})} \\[10pt] => SE \approx 0.0123 \]

Test Statistic(Z):

\[ t_{obs} = \frac{\hat{p_B}-\hat{p_A}}{SE_{\hat{p_A}-\hat{p_B}}} \\[10pt] = \frac{0.095-0.0882}{0.0123} \\[10pt] => t_{obs} \approx 1.22 \]

Significance level \(\alpha\) = 5% =0.05.
Critical value \(Z_{0.05}\) = 1.645

images/maths/statistics/two_sample_z_test_proportion.png

Since, \(t_{obs} < Z_{0.05}\) => p-value > 5%.
Hence, we fail to reject the null hypothesis.
Therefore, the observed conversion rate of design B is due to random chance; thus, B is not a better design.

Z test of Proportion | Gaussian Distribution | Hypothesis Testing | A/B Testing | Example

Previous: T-Test Next: Chi-Square Test

End of Section