Normal distribution in R (2024)

HOME STATISTICS WITH R NORMAL DISTRIBUTION R

Statistics with R Distributions

Normal distribution in R (1)

The Normal or Gaussian distribution is the most known and important distribution in Statistics. In this tutorial you will learn what are and what does dnorm, pnorm, qnorm and rnorm functions in R and the differences between them. In consequence, you will learn how to create and plot the Normal distribution in R, calculate probabilities under the curves, the quantiles, Normal random sampling and even how to shade a specific area under a Normal curve.

The normal or gaussian distribution

Among continuous random variables, the most important is the Normal or Gaussian distribution. This variable was introduced by Carl Friedrich in the XIX century for studying error measures.

Let \(X \sim N \left( \mu, \sigma \right)\), namely a random variable following a normal distribution with mean \(\mu\) and standard deviation \(\sigma\):

  • The probability density function (PDF), also known as Bell curve, of \(x\) is \(f(x) = \frac{1}{\sqrt{2\pi \sigma^{2}}} e^{\frac{1}{2} (\frac{x - \mu}{\sigma})^2}\).
  • The cumulative distribution function (CDF) is \(F(x) = P(X \leq x)\).
  • The quantile function is \(Q(p) = F^{-1}(p)\).
  • The expencted mean and variance are \(E(X) = \mu\) and \(Var(X) = \sigma^2\), respectively.

In R there exist the dnorm, pnorm and qnorm functions, which allows calculating the normal density, distribution and quantile function for a set of values. In addition, the rnorm function allows obtaining random observations that follow a normal distibution. The following table summarizes the functions related to the normal distribution:

FunctionDescription
dnormNormal density
(Probability Density Function)
pnormNormal distribution
(Cumulative Distribution Function)
qnormQuantile function of the Normal distribution
rnormNormal random number generation

By default, these functions consider the standard Normal distribution, which has a mean of zero and a standard deviation of one.

Although we will review each function in detail on the corresponding section, in the following illustration you can appreciate the relationship between the dnorm, pnorm and qnorm functions.

Normal distribution in R (2)

The dnorm function

In R, you can make use of the dnorm function to calculate the density function with mean \(\mu\) and standard deviation \(\sigma\) for any value of \(x, \mu\) and \(\sigma\).

dnorm(x, # X-axis values (grid) mean = 0, # Integer or vector representing the mean/s sd = 1, # Integer or vector representing the standard deviation/s log = FALSE) # If TRUE, probabilities are given as log

Consider, for instance, that you want to obtain the PDF for \(x \in (-4, 4)\), with mean 1 and standard deviation of 3. In order to calculate it, you could type:

x <- -4:4# x <- seq(-4, 4, length = 100) # More data pointsdnorm(x, mean = 1, sd = 3) 
0.03315905 0.05467002 0.08065691 0.10648267 0.125794410.13298076 0.12579441 0.10648267 0.08065691

You can also specify vectors to the mean and sd arguments of the function. In the following example, the value of even elements are from \(\mu = 1, \sigma = 3\) and odds are from \(\mu = 2, \sigma = 4\).

dnorm(x, mean = c(1, 2), sd = c(3, 4))
0.03315905 0.04566227 0.08065691 0.07528436 0.125794410.09666703 0.12579441 0.09666703 0.08065691

Plot Normal distribution in R

Creating a normal distribution plot in R is easy. You just need to create a grid for the X-axis for the first argument of the plot function and pass as input of the second the dnorm function for the corresponding grid. In the following example we show how to plot normal distributions for different means and variances.

par(mfrow = c(1, 2))# Grid of X-axis valuesx <- seq(-4, 8, 0.1)#-----------------------------------------# Same standard deviation, different mean#-----------------------------------------# Mean 0, sd 1plot(x, dnorm(x, mean = 0, sd = 1), type = "l", ylim = c(0, 0.6), ylab = "", lwd = 2, col = "red")# Mean 3, sd 1lines(x, dnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)# Adding a legendlegend("topright", legend = c("0 1", "3 1"), col = c("red", "blue"), title = expression(paste(mu, " ", sigma)), title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)#-----------------------------------------# Same mean, different standard deviation#-----------------------------------------# Mean 1, sd 1plot(x, dnorm(x, mean = 1, sd = 1), type = "l", ylim = c(0, 1), ylab = "", lwd = 2, col = "red")# Mean 1, sd 0.5lines(x, dnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)# Adding a legendlegend("topright", legend = c("1 1", "1 0.5"), col = c("red", "blue"), title = expression(paste(mu, " ", sigma)), title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)par(mfrow = c(1, 1))

Normal distribution in R (3)

The pnorm function

The pnorm function gives the Cumulative Distribution Function (CDF) of the Normal distribution in R, which is the probability that the variable \(X\) takes a value lower or equal to \(x\).

The syntax of the function is the following:

pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise log.p = FALSE) # If TRUE, probabilities are given as log

As an example, taking into account that the Normal distribution is symmetric, the probability that the variable will take a value lower than the mean is 0.5:

pnorm(0, mean = 0, sd = 1) # 0.5

pnorm function example

Now, suppose that you have a machine that packages rice inside boxes. The process follows a Normal distribution and it is known that the mean of the weight of each box is 1000 grams and the standard deviation is 10 grams. You can plot the density function typing:

Mean <- 1000Sd <- 10# X grid for non-standard normal distributionx <- seq(-3, 3, length = 100) * Sd + Mean # Density functionf <- dnorm(x, Mean, Sd)plot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")abline(v = Mean) # Vertical line on the mean

Normal distribution in R (4)

First, if you want to calculate the probability of a box weighing less than 1010 grams (\(P(X < 1010) = P(X \leq 1010)\)), you can type the following:

pnorm(1010, Mean, Sd) # 0.8413447 or 84.13%1 - pnorm(1010, Mean, Sd, lower.tail = FALSE) # Equivalent

So the probability of a box weighting less than 1010 grams is 0.8413 or 84.13%, which corresponds to the following area:

lb <- min(x) # Lower boundub <- 1010 # Upper boundx2 <- seq(min(x), ub, length = 100) # New Gridy <- dnorm(x2, Mean, Sd) # Densityplot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")abline(v = ub) polygon(c(lb, x2, ub), c(0, y, 0), col = rgb(0, 0, 1, alpha = 0.5))text(995, 0.01, "84.13%")

Normal distribution in R (5)

As shading the area under the Normal curve can be tricky and requires several lines of code, we have created a simple function to achieve it in a single line:

# mean: mean of the Normal variable# sd: standard deviation of the Normal variable# lb: lower bound of the area# ub: upper bound of the area# acolor: color of the area# ...: additional arguments to be passed to lines functionnormal_area <- function(mean = 0, sd = 1, lb, ub, acolor = "lightgray", ...) { x <- seq(mean - 3 * sd, mean + 3 * sd, length = 100) if (missing(lb)) { lb <- min(x) } if (missing(ub)) { ub <- max(x) } x2 <- seq(lb, ub, length = 100) plot(x, dnorm(x, mean, sd), type = "n", ylab = "") y <- dnorm(x2, mean, sd) polygon(c(lb, x2, ub), c(0, y, 0), col = acolor) lines(x, dnorm(x, mean, sd), type = "l", ...)}

As an example, if you want to shade the area between -1 and 2 of a standard Normal distribution you can type:

normal_area(mean = 0, sd = 1, lb = -1, ub = 2, lwd = 2)

Normal distribution in R (6)

Second, in case that you want to calculate the probability of a box weighing more than 980 grams (\(P(X > 980) = P(X \geq 980)\)) you can use the lower.tail argument.

pnorm(980, Mean, Sd, lower.tail = FALSE) # 0.9772499 or 97.72%1 - pnorm(980, Mean, Sd) # Equivalentpnorm(1020, Mean, Sd) # Equivalent by symmetry

The calculated probability corresponds to the following area:

normal_area(mean = Mean, sd = Sd, lb = 980, acolor = rgb(0, 0, 1, alpha = 0.5))text(1000, 0.01, "97.72%")

Normal distribution in R (7)

Finally, if you want to calculate the probability of a box weighing more than 990 grams and less than 1000 you have to calculate \(P(X \leq 1000) - P(x \leq 990) = P(X < 1000) - P(x <990)\) and hence you can type:

pnorm(1000, Mean, Sd) - pnorm(990, Mean, Sd) # 0.3413447 or 34.13%

You can plot the area with the following code:

normal_area(mean = Mean, sd = Sd, lb = 990, ub = 1000, acolor = rgb(0, 0, 1, alpha = 0.5))text(995, 0.01, "34.13%", srt = 90)

Normal distribution in R (8)

For continuous variables, as \(P(X = x) = 0\), \(P(X \geq x) = P(X > x)\) and \(P(X \leq x) = P(X < x)\).

Plot normal cumulative distribution function in R

With the pnorm function you can also plot the cumulative density function of the Gaussian or Normal distribution in R:

par(mfrow = c(1, 2))# Grid of X-axis valuesx <- seq(-4, 8, 0.1)#-----------------------------------------# Same standard deviation, different mean#-----------------------------------------# Mean 0, sd 1plot(x, pnorm(x, mean = 0, sd = 1), type = "l", ylim = c(0, 1), ylab = "", lwd = 2, col = "red")# Mean 3, sd 1lines(x, pnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)# Legendlegend("topleft", legend = c("0 1", "3 1"), col = c("red", "blue"), title = expression(paste(mu, " ", sigma)), title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)#-----------------------------------------# Same mean, different standard deviation#-----------------------------------------# Mean 1, sd 1plot(x, pnorm(x, mean = 1, sd = 1), type = "l", ylim = c(0, 1), ylab = "", lwd = 2, col = "red")# Mean 1, sd 0.5lines(x, pnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)# Legendlegend("topleft", legend = c("1 1", "1 0.5"), col = c("red", "blue"), title = expression(paste(mu, " ", sigma)), title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)par(mfrow = c(1, 1))

Normal distribution in R (9)

Recall that \(P(X < 0) = 0.5\) for a standard Normal distribution:

x <- seq(-4, 4, 0.1)plot(x, pnorm(x, mean = 0, sd = 1), type = "l", ylim = c(0, 1), ylab = "P(X < x)", lwd = 2, col = "red")segments(0, 0, 0, 0.5, lwd = 2, lty = 2)segments(-4, 0.5, 0, 0.5, lwd = 2, lty = 2)

Normal distribution in R (10)

The qnorm function

The qnorm function allows you to find the quantile (percentile) \(Q\) for any probability \(p\). Hence, the qnorm function is the inverse of the pnorm function. The syntax of qnorm is as follows:

qnorm(p, # Integer or vector of probabilities mean = 0, # Integer or vector of means sd = 1, # Integer or vector of standard deviations lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise log.p = FALSE) # If TRUE, probabilities are given as log

As a first example, the quantile for probability 0.5 (\(Q(0.5)\)) on a symmetric distribution is equal to the mean:

qnorm(0.5, mean = 0, sd = 1) # 0

In addition, you can obtain the quantile for any given probability. Note the relation between pnorm and qnorm functions:

x <- pnorm(-1.5, mean = 0, sd = 1) # 0.0668072qnorm(x, mean = 0, sd = 1) # -1.5normal_area(mean = 0, sd = 1, ub = -1.5, lwd = 2, acolor = rgb(0, 0, 1, alpha = 0.5))arrows(-0.5, 0.1, -1.45, 0, lwd = 2, length = 0.2)text(-0.25, 0.13, "-1.5", cex = 1.5)

Normal distribution in R (11)

If you want to calculate, for instance, the quantile \(Q(P(X > 1.5)) = Q(1 - P(X \leq 1.5)) = Q(0.067)\) you can set the lower.tail argument to TRUE.

x <- pnorm(1.5, mean = 0, sd = 1, lower.tail = FALSE) # 0.0668072qnorm(x, mean = 0, sd = 1, lower.tail = FALSE) # 1.5# Equivalent to:# x <- 1- pnorm(1.5, mean = 0, sd = 1) # qnorm(1 - x, mean = 0, sd = 1)normal_area(mean = 0, sd = 1, lb = 1.5, lwd = 2, acolor = rgb(0, 0, 1, alpha = 0.5))arrows(0, 0.1, 1.45, 0, lwd = 2, length = 0.2)text(0, 0.13, "1.5", cex = 1.5)

Normal distribution in R (12)

Plotting the Normal quantile function

You can plot the quantile function of a standard Normal distribution typing the following:

plot(qnorm, pnorm(-4), pnorm(4), lwd = 2, xlab = "p", ylab = "Q(p)")# Equivalent to:# x <- seq(pnorm(-4), pnorm(4), length = 100)# plot(x, qnorm(x, mean = 0, sd = 1), type = "l",# lwd = 2, xlab = "p", ylab = "Q(p)")

Normal distribution in R (13)

The previous plot represents the possible outcomes of the qnorm function for the standard Normal distribution. Recall that \(Q(0.5) = 0\):

plot(qnorm, pnorm(-4), pnorm(4), lwd = 2, xlab = "p", ylab = "Q(p)")segments(0.5, -4, 0.5, 0, lty = 2, lwd = 2)segments(0, 0, 0.5, 0, lty = 2, lwd = 2)

Normal distribution in R (14)

The rnorm function

The rnorm function generates \(n\) observations from the Normal distribution with mean \(\mu\) and standard deviation \(\sigma\). The syntax of the rnorm function in R is the following:

rnorm(n, # Number of observations to be generated mean = 0, # Integer or vector of means sd = 1) # Integer or vector of standard deviations

Hence, you can generate 10 observations of a standard Normal distribution in R with the following code:

rnorm(10)
0.3320065 -0.6454991 -0.3692699 -0.4160260 0.5756340-0.8793241 -0.8095590 0.4926698 -0.8038873 -0.3446890

However, it should be noted that if you don’t specify a seed the output won’t be reproducible. You can make use of the set.seed function to make your code reproducible:

# Seed for reproducibilityset.seed(1)rnorm(1) # -0.6264538

In addition, in the following plot you can observe how increasing the number of observations, the histogram of the data approaches to the true Normal density function:

par(mfrow = c(1, 3))x <- seq(-10, 10, length = 200)set.seed(3)# n = 10hist(rnorm(10, mean = 0, sd = 1), main = "n = 10", xlab = "", prob = TRUE)lines(x, dnorm(x), col = "red", lwd = 2)# n = 100hist(rnorm(100, mean = 0, sd = 1), main = "n = 100", xlab = "", prob = TRUE)lines(x, dnorm(x), col = "red", lwd = 2)# n = 1000hist(rnorm(1000, mean = 0, sd = 1), main = "n = 1000", xlab = "", prob = TRUE)lines(x, dnorm(x), col = "red", lwd = 2)par(mfrow = c(1, 1))

Normal distribution in R (15)

Normal distribution in R (2024)
Top Articles
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 6038

Rating: 4.8 / 5 (68 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.