Probability Density Function

Probability Density Function of a Continuous Random Variable

In this section, we will understand Probability Density Function of a Continuous Random Variable.


📘

Probability Density Function(PDF):
This is a function used for continuous random variables to describe the likelihood of the variable taking on a value within a specific range or interval.
Since, at any given point the probability of a continuous random variable is zero, we find the probability within a given range.
Note: Called ‘density’ because probability is spread continuously over a range of values rather than being concentrated at a single point as in PMF.
e.g: Uniform, Gaussian, Exponential, etc.

Note: PDF is a continuous function \(f(x)\).
It is also the derivative of Cumulative Distribution Function (CDF) \(F_X(x)\)


\(PDF = f(x) = F'(X) = \frac{dF_X(x)}{dx} \)

For example:
Consider a line segment/interval from \(\Omega = [0,2] \)
Random variable \(X(\omega) = \omega\)
i.e \(X(1) = 1 ~and~ X(1.1) = 1.1 \)

$$ F_X(x) = P(X \leq x) = \begin{cases} \frac{x}{2} & \text{if } x \in [0,2] \\ 1 & \text{if } x > 2 \\ 0 & \text{if } x < 0 \end{cases} $$


$$ \begin{aligned} PDF = f_X(x) = \frac{dF_X(x)}{dx} \\ \end{aligned} $$

$$ \text{PDF } = f_X(x) = \begin{cases} \dfrac{1}{2}, & x \in [0,2] \\ 0, & \text{otherwise.} \end{cases} $$


Note: If we know the PDF of a continuous random variable, then we can find the probability of any given region/interval by calculating the area under the curve.


📘

Uniform Distribution:
All the outcomes within the given range are equally likely to occur.
Also known as ‘fair’ distribution.
Note: This is a natural starting point to understand randomness in general.

\[ X \sim U(a,b) \]

$$ \begin{aligned} PDF = f(x) = \begin{cases} \frac{1}{b-a} & \text{if } x \in [a,b] \\ 0 & \text{~otherwise } \end{cases} \end{aligned} $$


Mean = Median = \( \frac{a+b}{2} ~if~ x \in [a,b] \)
Variance = \( \frac{(b-a)^2}{12} \)

Standard uniform distribution: \( X \sim U(0,1) \)

$$ \begin{aligned} PDF = f(x) = \begin{cases} 1 & \text{if } x \in [0,1] \\ 0 & \text{~otherwise } \end{cases} \end{aligned} $$

PDF in terms of mean(\(\mu\)) and standard deviation(\(\sigma\)) -

$$ \begin{aligned} PDF = f(x) = \begin{cases} \frac{1}{2\sigma\sqrt{3}} & \text{if } \mu -\sigma\sqrt{3} \le x \le \mu + \sigma\sqrt{3}\\ \\ 0 & \text{~otherwise } \end{cases} \end{aligned} $$
For example:

  • Random number generator that generates a random number between 0 and 1.


📘

Gaussian(Normal) Distribution:
It is a continuous probability distribution characterized by a symmetrical, bell-shaped curve with most data clustered around the central average, with frequency of values decreasing as they move away from the center.

  • Most outcomes are average; extremely low or extremely high values are rare.
  • Characterised by mean and standard deviation/variance.
  • Peak at mean = median, symmetric around mean.
    Note: Most important and widely used distribution.

    \[ X \sim N(\mu, \sigma^2) \] $$ \begin{aligned} PDF = f(x) = \dfrac{1}{\sqrt{2\pi}\sigma}e^{-\dfrac{(x-\mu)^2}{2\sigma^2}} \\ \end{aligned} $$
    Mean = \(\mu\)
    Variance = \(\sigma^2\)

Standard normal distribution:

\[ Z \sim N(0,1) ~i.e~ \mu = 0, \sigma^2 = 1 \]


Any normal distribution can be standardized using Z-score transformation:

$$ \begin{aligned} Z = \dfrac{X-\mu}{\sigma} \end{aligned} $$

For example:

  • Human height, IQ scores, blood-pressure etc.
  • Measurement of errors in scientific experiments.


📘

Exponential Distribution:
It is used to model the amount of time until a specific event occurs.
Given that:

  1. Events occur with a known constant average rate.
  2. Occurrence of an event is independent of the time since the last event.

    Parameters:
    Rate parameter: \(\lambda\): Average number of events per unit time
    Scale parameter: \(\mu ~or~ \beta \): Mean time between events
    Mean = \(\frac{1}{\lambda}\)
    Variance = \(\frac{1}{\lambda^2}\)
    \[ \lambda = \dfrac{1}{\beta} = \dfrac{1}{\mu}\]
$$ \begin{aligned} PDF = f(x) = \lambda e^{-\lambda x} ~~~ \forall ~~~ x \ge 0 ~~~\&~~~ \lambda > 0 \\ CDF = F(x) = 1 - e^{-\lambda x} ~~~ \forall ~~~ x \ge 0 ~~~\&~~~ \lambda > 0 \end{aligned} $$


💡 At a bank, a teller spends 4 minutes, on an average, with every customer. What is the probability that a randomly selected customer will be served in less than 3 minutes?

Mean time to serve 1 customer = \(\mu\) = 4 minutes
So, \(\lambda\) = average number of customers served per unit time = \(1/\mu\) = 1/4 = 0.25 minutes
Probability to serve a customer in less than 3 minutes can be found using CDF -

\[ F(x) = P(X \le x) = 1 - e^{-\lambda x}\]


$$ \begin{aligned} P(X \leq 3) &= 1 - e^{0.25*3} \\ &= 1 - e^{-0.75} \\ &= 1 - 0.47 \\ &\approx 0.53 \\ &\approx 53\% \end{aligned} $$

So, probability of a customer being served in less than 3 minutes is 53%(approx).


💡 At a bank, a teller spends 4 minutes, on an average, with every customer. What is the probability that a randomly selected customer will be served in greater than 2 minutes?

\[ CDF = F(x) = P(X \le x) = 1 - e^{-\lambda x} \\ \text{Total probability} = P(X \le x) + P(X > x) = 1\\ => 1 - e^{-\lambda x} + P(X > x) = 1 \\ => P(X > x) = e^{-\lambda x} \]


In this case x = 2 minutes, and \(\lambda\) = 0.25 so,

\[ P(X > 2) = e^{-\lambda x} \\ = e^{-0.25*2} \\ = e^{-0.5} \\ = 0.6065 \\ \approx 60.65\% \]


So, probability of a customer being served in greater than 2 minutes is 60%(approx).


💡 Suppose, we know that an electronic item has lasted for time \(x>t\) days, then what is the probability that it will last for an additional time of ’s’ days ?

Task: We want to find the probability of the electronic item lasting for \(x > t+s \) days,
given that it has already lasted for \(x>t\) days.
This is a ‘conditional probability’.
Since,

\[ P(A \mid B) = \dfrac{P(A \cap B)}{P(B)} \]


We want to find: \( P(X > t+s \mid X > t) = ~? \)

\[ P(X > t+s \mid X > t) = \dfrac{P(X > t+s ~and~ X > t)}{P(X > t)} \\ \text{Since, t+s > t, we can only consider t+s} \\ = \dfrac{P(X > t+s)}{P(X > t)} \\ = \dfrac{e^{-\lambda(t+s)}}{e^{-\lambda(t)}} \\ = e^{-\lambda(t) -\lambda(s) + \lambda(t)} \\ = e^{-\lambda(s)} \\ => \text{Independent of time 't'} \]

Hence, probability that the electronic item will survive for an additional time ’s’ days
is independent of the time ’t’ days it has already survived.


Lets see the proof for the above statement.

Poisson Case:
The probability of observing exactly ‘k’ events in a time interval of length ’t’
with an average rate of \( \lambda \) events per unit time is given by -

\[ PMF = P(X=k) = \dfrac{(\lambda t)^k e^{-\lambda t}}{k!} \]


The event that waiting time until next event > t, is same as observing ‘0’ events in that interval.
=> We can use the PMF of Poisson distribution with k=0.

\[ PMF = P(X=k=0) = \dfrac{(\lambda t)^0 e^{-\lambda t}}{0!} = e^{-\lambda t} \]


Exponential Case:
Now, lets consider exponential distribution that models the waiting time ‘T’ until next event.
The event that waiting time ‘T’ > t, is same as observing ‘0’ events in that interval.

\[ CDF = P(X>t) = e^{-\lambda t} \]

Observation:
Exponential distribution is a direct consequence of Poisson distribution probability of ‘0’ events.


💡 Consider a machine that fails, on an average, every 20 hours. What is the probability of having NO failures in the next 10 hours?

Using Poisson:
Average failure rate, \(\lambda\) = 1/20 = 0.05
Time interval, t = 10 hours
Number of events, k = 0
Average number of events in interval, (\(\lambda t\)) = (1/20) * 10 = 0.5
Probability of having NO failures in the next 10 hours = ?

\[ P(X=0) = \dfrac{(\lambda t)^0 e^{-\lambda t}}{0!} = e^{-\lambda t} \\ = e^{-0.5} \approx 0.6065 \\ \approx 60.65\%\]


Using Exponential:
What is the probability that wait time until next failure > 10 hours?
Waiting time, T > t = 10 hours.
Average number of failures per hour, \(\lambda\) = 1/20 per hour

\[ CDF = P(X>t) = e^{-\lambda t} \\ P(T>10) = e^{-(1/20) * 10} \\ = e^{-0.5} \approx 0.6065 \\ \approx 60.65\%\]


Therefore, we have seen that this problem can be solved using both Poisson and Exponential distribution.




End of Section