Probability distributions in R for BSTA 511/611

Author

Meike Niederhausen

Published

October 19, 2024

1 Binomial distribution

1.1 Probability calculations: \(P(X=x)\)

1.1.1 Using formula

Calculate \(P(X=4)\) for a Bin(n=10, p=0.25) distribution.

In general, for a binomial random variable, \[P(X=x) = \binom{n}{x}p^x q^{n-x}\] Thus \(P(X=4)\) for a Bin(n=10, p=0.25) random variable, \[P(X=4) = \binom{10}{4}0.25^4 0.75^{10-4} = \frac{10!}{4!(10-4)!}0.25^4 0.75^{10-4}\] Calculate “directly” in R:

choose(10,4) * 0.25^4 * 0.75^6
[1] 0.145998
# using factorials instead of the choose function:
factorial(10)/(factorial(4)*factorial(6)) * 0.25^4 * 0.75^6
[1] 0.145998

1.1.2 Using dbinom()

Calculate \(P(X=4)\) for a Bin(n=10, p=0.25) distribution.

# P(X = 4) for Bin(n=10, p=0.25) random variable
dbinom(x = 4, size = 10, prob = 0.25) 
[1] 0.145998

Calculate \(P(X=x)\) for all possible values \(x\) for a Bin(n=10, p=0.25) distribution.

# Entire probability distribution:
# P(X = x) for all x=0,1,2,...,10 for Bin(n=10, p=0.25) random variable
dbinom(x = 0:10, size = 10, prob = 0.25) 
 [1] 5.631351e-02 1.877117e-01 2.815676e-01 2.502823e-01 1.459980e-01
 [6] 5.839920e-02 1.622200e-02 3.089905e-03 3.862381e-04 2.861023e-05
[11] 9.536743e-07

In the next section we visualize these probability distribution values.

1.2 Histogram of probability distribution

  • First, create a data frame of the distribution’s possible x values and their respective probabilities:
# Bin(n=10, p=0.25) random variable

binom_probs <- data.frame(x_values = 0:10, 
                          px = dbinom(x = 0:10, size = 10, prob = 0.25)) 
binom_probs
   x_values           px
1         0 5.631351e-02
2         1 1.877117e-01
3         2 2.815676e-01
4         3 2.502823e-01
5         4 1.459980e-01
6         5 5.839920e-02
7         6 1.622200e-02
8         7 3.089905e-03
9         8 3.862381e-04
10        9 2.861023e-05
11       10 9.536743e-07

1.2.1 Histogram

The histogram below is actually created as a bar plot in ggplot:

ggplot(binom_probs, 
       aes(x = x_values, y = px))  + 
  geom_col(fill ="cornflowerblue") +
  labs(x = "x, number of successes",
       y = "P(X=x)") +
  scale_x_continuous(breaks=0:10) +
  labs(title = "Bin(n = 10, p = 0.25) distribution")

1.3 Probability calculations: \(P(X \leq x)\)

\[P(X\leq k) = \sum_{x=0}^{k}\binom{n}{x}p^xq^{n-x}\]

1.3.1 Using formula

Calculate \(P(X \leq 3)\) for a Bin(n=10, p=0.25) distribution:

\[P(X\leq 3) = \sum_{x=0}^{3}\binom{10}{x}0.2.5^x 0.75^{10-x}\] Calculate “directly” in R:

# vector of x values whose probabilities need to be added
x <- 0:3  
# vector of respective binomial prob's of x values
(binom_prob_0_3 <- choose(10,x) * 0.25^x * 0.75^(10-x))
[1] 0.05631351 0.18771172 0.28156757 0.25028229
# add up the probabilities
sum(binom_prob_0_3)
[1] 0.7758751

1.3.2 Using pbinom()

  • \(P(X\leq k)\) = pbinom(q = k, size = n, prob = p, lower.tail = TRUE)

Calculate \(P(X \leq 3)\) for a Bin(n=10, p=0.25) distribution.

# P(X <= 3) for Bin(n=10, p=0.25) random variable
pbinom(q = 3, size = 10, prob = 0.25, lower.tail = TRUE)
[1] 0.7758751
# Note: setting TRUE for the lower.tail option is the default value
# This means that if we do not specify this option, 
# it will give the same result as above:
pbinom(q = 3, size = 10, prob = 0.25) 
[1] 0.7758751

1.4 Probability calculations: \(P(X \geq x)\)

\[P(X\geq k) = \sum_{x=k}^{n}\binom{n}{x}p^xq^{n-x}\]

1.4.1 Using formula

Calculate \(P(X \geq 5)\) for a Bin(n=10, p=0.25) distribution:

\[P(X \geq 5) = \sum_{x=5}^{10}\binom{10}{x}0.2.5^x 0.75^{10-x}\] Calculate “directly” in R:

# vector of x values whose probabilities need to be added
x <- 5:10  
# vector of respective binomial prob's of x values
(binom_prob_5_10 <- choose(10,x) * 0.25^x * 0.75^(10-x))
[1] 5.839920e-02 1.622200e-02 3.089905e-03 3.862381e-04 2.861023e-05
[6] 9.536743e-07
# add up the probabilities
sum(binom_prob_5_10)
[1] 0.07812691

1.4.2 Using pbinom() with lower.tail = TRUE

Calculate \(P(X \geq 5)\) for a Bin(n=10, p=0.25) distribution.

\[P(X \geq 5) = 1 - P(X \leq 4) =1 - \sum_{x=0}^{4}\binom{10}{x}0.2.5^x 0.75^{10-x}\]

# P(X >= 5) for Bin(n=10, p=0.25) random variable
# P(X >= 5) = 1 - P(X <= 4)
1 - pbinom(q = 4, size = 10, prob = 0.25, lower.tail = TRUE)
[1] 0.07812691

1.4.3 Using pbinom() with lower.tail = FALSE

  • \(P(X \geq k) = P(X > k)\) = pbinom(q = k, size = n, prob = p, lower.tail = FALSE)

Calculate \(P(X \geq 5)\) for a Bin(n=10, p=0.25) distribution.

\[P(X \geq 5) = P(X > 4)\]

# P(X >= 5) for Bin(n=10, p=0.25) random variable
# P(X >= 5) = P(X > 4)
pbinom(q = 4, size = 10, prob = 0.25, lower.tail = FALSE)
[1] 0.07812691

2 Normal distribution

2.1 Probability calculations: \(P(X < x)\)

\[ P(X < x) = P\Big(Z < \frac{x-\mu}{\sigma}\Big) \] Calculate \(P(X < 10)\), for \(X \sim N(\mu = 8, \sigma = 2)\).

\[ P(X < 10) = P\Big(Z < \frac{10-8}{2}\Big) = P\Big(Z < 1\Big) \]

# P(X < 10) without using z-scores
pnorm(q = 10, mean = 8, sd = 2, lower.tail = TRUE)
[1] 0.8413447
# P(X < 10) = P(Z < 1) using z-scores
pnorm(q = 1, mean = 0, sd = 1, lower.tail = TRUE)
[1] 0.8413447
# this works too for a standard normal distribution:
pnorm(1)
[1] 0.8413447

2.2 Probability calculations: \(P(X > x)\)

\[ P(X > x) = P\Big(Z > \frac{x-\mu}{\sigma}\Big) = 1-P\Big(Z \leq \frac{x-\mu}{\sigma}\Big) \] Calculate \(P(X > 10)\), for \(X \sim N(\mu = 8, \sigma = 2)\).

\[ P(X > 10) = P\Big(Z > \frac{10-8}{2}\Big) = 1 - P\Big(Z \leq 1\Big) \]

# P(X > 10) without using z-scores
1 - pnorm(q = 10, mean = 8, sd = 2, lower.tail = TRUE)
[1] 0.1586553
# using lower.tail = FALSE:
pnorm(q = 10, mean = 8, sd = 2, lower.tail = FALSE)
[1] 0.1586553
# P(X > 10) = P(Z > 1) using z-scores
1 - pnorm(q = 1, mean = 0, sd = 1, lower.tail = TRUE)
[1] 0.1586553
# using lower.tail = FALSE:
pnorm(q = 1, mean = 0, sd = 1, lower.tail = FALSE)
[1] 0.1586553
# this works too for a standard normal distribution:
pnorm(1, lower.tail = FALSE)
[1] 0.1586553

2.3 Normal curve figure with region shaded in

  • R code to create a figure of the normal distribution curve with the probability region of interest shaded in.

Shade in the region representing \(P(X>10)\), for \(X \sim N(\mu = 8, \sigma = 2)\).

# This code shades in the probability P(X > 10) 
# for X ~ N(mu = 8, sigma = 2)
# Note that I set the upper and lower bounds of the normal curve to be from mu - 4*sigma to mu + 4*sigma since these bounds include almost 100% of the area under the normal curve

mu <- 8   # specify the mean of the normal distribution
std <- 2  # specify the standard deviation of the normal distribution

# specify upper and lower bounds of shaded region below
ggplot(data.frame(x = c(mu-4*std, mu+4*std)), aes(x = x)) + 
  stat_function(fun = dnorm, 
                args = list(mean = mu, sd = std)) + 
  stat_function(fun = dnorm, 
                args = list(mean = mu, sd = std), 
          # specify the upper and lower bounds of the shaded region:
                xlim = c(10, mu+4*std),             
                geom = "area", fill = "darkblue") +
  # the breaks values below might need to be adjusted 
  # if there are too many values showing on the x-axis
  scale_x_continuous(breaks=(mu-4*std):(mu+4*std)) +
  labs(y = "") +
  labs(title = "P(X >10) for a N(mu=8, sigma=2) distribution")

3 Poisson distribution

3.1 Probability calculations: \(P(X=x)\)

3.1.1 Using formula

Calculate \(P(X=3)\) for a \(Pois(\lambda = 5)\) distribution.

In general, for a Poisson random variable, \[P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}\] Thus the \(P(X=3)\) for a \(Pois(\lambda = 5)\) random variable is \[P(X=3) = \frac{e^{-5}5^3}{3!}\] Calculate “directly” in R:

exp(-5)*(5^3)/factorial(3)
[1] 0.1403739

3.1.2 Using dpois()

Calculate \(P(X=3)\) for a \(Pois(\lambda = 5)\) distribution.

# P(X = 3) for Pois(lambda = 5) random variable
dpois(x = 3, lambda = 5) 
[1] 0.1403739
  • Calculate \(P(X=x)\) for many possible values \(x\) for a \(Pois(\lambda = 5)\) distribution.
  • The possible values of \(x\) for a Poisson distribution are \(x=0, 1, 2, \ldots\), i.e., \(x\) can be infinitely large.
  • Below we look at probabilities just for \(x=0, 1, 2, \ldots, 20\).
# P(X = x) for x=0,1,2,...,100 for Pois(\lambda = 5) random variable
dpois(x = 0:20, lambda = 5) 
 [1] 6.737947e-03 3.368973e-02 8.422434e-02 1.403739e-01 1.754674e-01
 [6] 1.754674e-01 1.462228e-01 1.044449e-01 6.527804e-02 3.626558e-02
[11] 1.813279e-02 8.242177e-03 3.434240e-03 1.320862e-03 4.717363e-04
[16] 1.572454e-04 4.913920e-05 1.445271e-05 4.014640e-06 1.056484e-06
[21] 2.641211e-07
# although we didn't calculate the probabilities for all possible values of x, 
# below we see that the probabilities for the first 21 values almost add up to 1
sum(dpois(x = 0:20, lambda = 5) )
[1] 0.9999999

In the next section we visualize these probability distribution values.

3.2 Histogram of probability distribution

  • First, create a data frame of the distribution’s possible x values and their respective probabilities:
# Pois(lambda = 5) random variable

Poisson_probs <- data.frame(x_values = 0:20, 
                          px = dpois(x = 0:20, lambda = 5)) 
Poisson_probs
   x_values           px
1         0 6.737947e-03
2         1 3.368973e-02
3         2 8.422434e-02
4         3 1.403739e-01
5         4 1.754674e-01
6         5 1.754674e-01
7         6 1.462228e-01
8         7 1.044449e-01
9         8 6.527804e-02
10        9 3.626558e-02
11       10 1.813279e-02
12       11 8.242177e-03
13       12 3.434240e-03
14       13 1.320862e-03
15       14 4.717363e-04
16       15 1.572454e-04
17       16 4.913920e-05
18       17 1.445271e-05
19       18 4.014640e-06
20       19 1.056484e-06
21       20 2.641211e-07

3.2.1 Histogram

The histogram below is actually created as a bar plot in ggplot:

ggplot(Poisson_probs, 
       aes(x = x_values, y = px))  + 
  geom_col(fill ="cornflowerblue") +
  labs(x = "x, number of successes",
       y = "P(X=x)") +
  scale_x_continuous(breaks=0:20) +  # chose 20 since prob's are already tiny for this distribution
  labs(title = "Pois(lambda = 5) distribution")

3.3 Probability calculations: \(P(X \leq x)\)

\[P(X\leq k) = \sum_{x=0}^{k}\frac{e^{-\lambda}\lambda^x}{x!}\]

3.3.1 Using formula

Calculate \(P(X \leq 12)\) for a \(Pois(\lambda = 5)\) distribution:

\[P(X\leq 12) = \sum_{x=0}^{12} \frac{e^{-5}5^x}{x!}\] Calculate “directly” in R:

# vector of x values whose probabilities need to be added
x <- 0:12  
# vector of respective Poisson prob's of x values
(Poisson_prob_0_12 <- exp(-5)*(5^x)/factorial(x))
 [1] 0.006737947 0.033689735 0.084224337 0.140373896 0.175467370 0.175467370
 [7] 0.146222808 0.104444863 0.065278039 0.036265577 0.018132789 0.008242177
[13] 0.003434240
# add up the probabilities
sum(Poisson_prob_0_12)
[1] 0.9979811

3.3.2 Using ppois()

  • \(P(X\leq k)\) = ppois(q = k, lambda = , lower.tail = TRUE)

Calculate \(P(X \leq 12)\) for a \(Pois(\lambda = 5)\) distribution.

# P(X <= 3) for Pois(lambda = 5) random variable
ppois(q = 12, lambda = 5, lower.tail = TRUE)
[1] 0.9979811
# Note: setting TRUE for the lower.tail option is the default value
# This means that if we do not specify this option, 
# it will give the same result as above:
ppois(q = 12, lambda = 5) 
[1] 0.9979811

3.4 Probability calculations: \(P(X \geq x)\)

\[P(X\geq k) = \sum_{x=k}^{\infty} \frac{e^{-\lambda}\lambda^x}{x!}\]

3.4.1 Using formula

Calculate \(P(X \geq 13)\) for a \(Pois(\lambda = 5)\) distribution:

\[P(X \geq 13) = \sum_{x=13}^{\infty} \frac{e^{-5}5^x}{x!}\] Calculate “directly” in R:

# vector of x values whose probabilities need to be added
x <- 13:100 # choose a big number; big enough so that the probabilities for the high x values below are tiny
# vector of respective Poisson prob's of x values
(Poisson_prob_13_100 <- exp(-5)*(5^x)/factorial(x))
 [1] 1.320862e-03 4.717363e-04 1.572454e-04 4.913920e-05 1.445271e-05
 [6] 4.014640e-06 1.056484e-06 2.641211e-07 6.288597e-08 1.429227e-08
[11] 3.107014e-09 6.472947e-10 1.294589e-10 2.489595e-11 4.610361e-12
[16] 8.232787e-13 1.419446e-13 2.365743e-14 3.815715e-15 5.962055e-16
[21] 9.033417e-17 1.328444e-17 1.897777e-18 2.635801e-19 3.561893e-20
[26] 4.686701e-21 6.008592e-22 7.510739e-23 9.159438e-24 1.090409e-24
[31] 1.267918e-25 1.440816e-26 1.600906e-27 1.740116e-28 1.851187e-29
[36] 1.928320e-30 1.967673e-31 1.967673e-32 1.929091e-33 1.854895e-34
[41] 1.749901e-35 1.620279e-36 1.472981e-37 1.315162e-38 1.153650e-39
[46] 9.945263e-41 8.428189e-42 7.023491e-43 5.756959e-44 4.642709e-45
[51] 3.684690e-46 2.878664e-47 2.214357e-48 1.677543e-49 1.251898e-50
[56] 9.205131e-52 6.670385e-53 4.764561e-54 3.355324e-55 2.330086e-56
[61] 1.595950e-57 1.078344e-58 7.188962e-60 4.729580e-61 3.071156e-62
[66] 1.968690e-63 1.246006e-64 7.787539e-66 4.807123e-67 2.931172e-68
[71] 1.765766e-69 1.051051e-70 6.182656e-72 3.594567e-73 2.065843e-74
[76] 1.173775e-75 6.594239e-77 3.663466e-78 2.012894e-79 1.093964e-80
[81] 5.881526e-82 3.128471e-83 1.646564e-84 8.575854e-86 4.420543e-87
[86] 2.255379e-88 1.139080e-89 5.695402e-91
# add up the probabilities
sum(Poisson_prob_13_100)
[1] 0.002018852

3.4.2 Using ppois() with lower.tail = TRUE

Calculate \(P(X \geq 13)\) for a \(Pois(\lambda = 5)\) distribution.

\[P(X \geq 13) = 1 - P(X \leq 12) =1 - \sum_{x=0}^{12} \frac{e^{-5}5^x}{x!}\]

# P(X >= 13) for Pois(lambda = 5) random variable
# P(X >= 13) = 1 - P(X <= 12)
1 - ppois(q = 12, lambda = 5, lower.tail = TRUE)
[1] 0.002018852

3.4.3 Using ppois() with lower.tail = FALSE

  • \(P(X \geq k) = P(X > k)\) = ppois(q = k, size = n, prob = p, lower.tail = FALSE)

Calculate \(P(X \geq 13)\) for a \(Pois(\lambda = 5)\) distribution.

\[P(X \geq 13) = P(X > 12)\]

# P(X >= 13) for Pois(lambda = 5) random variable
# P(X >= 13) = P(X > 12)
ppois(q = 12, lambda = 5, lower.tail = FALSE)
[1] 0.002018852