3.7. Measures of Variation#

A measure of variation describes the amount of spread or deviation in the data about the mean. In some data sets, the data values are concentrated closely around the mean. In other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation.

Standard Deviation#

The standard deviation is a number that measures how far values are from the mean of all the values in a data set.

  • Standard deviation provide a numerical measure of the overall amount of variation in a data set

  • Standard deviation can be used to determine whether a particular data value is close to or far from the mean.

The standard deviation is always positive or zero. The standard deviation is small when all of the values are concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger when the values are more spread out from the mean, exhibiting more variation.

Standard deviation is calculated slightly differently if the data set represents a population or a sample.

Standard Deviation of a Sample#

For a sample of size \(n\), we use the symbol \(\bar{x}\) to describe the mean. For a sample, the equation for standard deviation, s is given by:

\[ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2} \]

Standard Deviation of a Population#

The equation for standard deviation for a population of size \(N\) is given by:

\[ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2} \]

Note that we can only know µ if we have measured an entire population.

Tip

The symbol used for the standard deviation of a sample is s

The symbol used for the standard deviation of a population is \(\sigma\)

Worked Example

GIVEN:

In an in-person college engineering class, the communute times to get to the class were recorded to the nearest minute. The following data are for the commute times for a sample n=17 students.

12, 20, 20, 20, 25, 27, 30, 35, 35, 35, 40, 40, 45, 50, 54, 85, 90

FIND:

Determine the mean commute time for students in the class and determine the standard deviation of commute times for the class.

SOLUTION:

Calculate the mean, \(\bar{x}\) :

\[ \bar{x} = \frac{12 + 3(20) + 25 + 27 + 30 + 3(35) + 2(40) + 45 + 54 + 85 + 90}{17} \]
\[ \bar{x} = 39 \]

Calculate the standard deviation, s :

We’ll use a table to calculate parts of the standard deviation. Remember the equation for the standard deviation of a sample:

\[ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2} \]

Our table with have columns for the data value, the frequency (or number of times this data occurs), deviation from the mean, deviation from the mean squared and frequency multiplied by deviation from the mean.

Data

Frequency

Deviation from the mean

(Deviation from the mean)2

Frequency \(\times\) (Deviation from the mean)2

\(x\)

\(f\)

\((x-\bar{x})\)

\((x-\bar{x})^2\)

\(f\cdot(x-\bar{x})^2\)

\(12\)

\(1\)

\(1 - 39 = -38\)

\((-38)^2 = 1444\)

\(1 \times 1444 = 1444\)

12

1

-27

729

729

20

3

-19

361

1083

25

1

-14

196

196

27

1

-12

144

144

30

1

-9

81

81

35

3

-4

16

48

40

2

1

1

2

45

1

6

36

36

50

1

11

121

121

54

1

15

225

225

85

1

46

2116

2116

90

1

51

2601

2601

sum = 7382

The sample variance, \(s^2\), is equal to the sum of the last column (7382) divided by the total number of data values minus one (17 – 1):

\[ s^{2} = \frac{7382}{17-1} = 461.375 \]

Then we need to take the square root of the variance to find the standard deviation, s:

\[ s = \sqrt{461.375} = 21.4796 \]
\[ s = 21 \ min \]

Variance#

Another measure of variation is variance, which is simply the square of the standard deviation.

For a sample size \(n\), the variance \(s^{2}\) is:

\[ s^{2} = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2 \]

For a population size \(N\), the variance \(\sigma^{2}\):

\[ \sigma^{2} = \frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2 \]