Skip to content

Histograms

Histograms

Historgram plots can be created with Matplotlib. A histogram is a type of statistical bar chart. Histograms include a range of values (called bins) is assigned to the x-axis and a count or frequency of data in that range (number of data points in each bin) is plotted on the y-axis. Matplotlib's plt.hist() function creates histogram plots. To create a histogram with Mtplotlib, first import Matplotlib with the standard line:

import matplotlib.pyplot as plt

In our first example, we will also import NumPy with the line import numpy as np. We'll use NumPy's random number generator to create a dataset for the histogram. If using a Jupyter notebook, include the line %matplotlib inline below the imports.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
# if using a Jupyter notebook, include:
%matplotlib inline

For the dataset, define a mean mu = 80 and a standard deviation sigma = 7. NumPy's np.random.normal() function produces an array of random numbers with a normal distribution. 200 random numbers is a good amount of random numbers to plot. The general format of np.random.normal() is below:

var = np.random.normal(mean, stdev, size=<number of values>)
In [2]:
mu = 80
sigma = 7
x = np.random.normal(mu, sigma, size=200)

Matplotlib's plt.hist() function produces the histogram plot. The first positional argument passed to plt.hist() is a list or array of values, the second positional argument denotes the number of bins on the histogram.

plt.hist(values, num_bins)

Similar to Matplotlib line plots, bar plots and pie charts, a set of keyword arguments can be included in the plt.hist() function call. Specifying values for the keyword arguments customizes the histogram.

Example keyword arguments which can be included with plt.hist() are:

  • density=
  • histtype=
  • facecolor=
  • alpha=(opacity).
    In [3]:
    plt.hist(x, 20,
             density=True,
             histtype='bar',
             facecolor='b',
             alpha=0.5)

plt.title('Historgram') plt.xlabel('x-axis') plt.ylabel('y-axis')

plt.show()

The next histogram example involves a list of commute times. Suppose the following commute times were recorded in a survey:

23, 25, 40, 35, 36, 47, 33, 28, 48, 34,
20, 37, 36, 23, 33, 36, 20, 27, 50, 34,
47, 18, 28, 52, 21, 44, 34, 13, 40, 49

We will plot a histogram of these commute times. First, import Matplotlib as in the previous example, and include %matplotib inline if using a Jupyter notebook. Then build a Python list of commute times from the survey data above.

In [4]:
import matplotlib.pyplot as plt
#if using a Jupyter notebook, include:
%matplotlib inline

commute_times = [23, 25, 40, 35, 36, 47, 33, 28, 48, 34, 20, 37, 36, 23, 33, 36, 20, 27, 50, 34, 47, 18, 28, 52, 21, 44, 34, 13, 40, 49]

plt.hist() is called, and the commute_times list and 5 bins are included as positional arguments.
In [5]:
plt.hist(commute_times, 5)

plt.show()

To construct a histogram with specific bin ranges, a list or array of bin edges is supplied to the keyword argument bins=. A table of select keyword arguments used with plt.hist() is below:

keyword argument description example
bins= list of bin edges bins=[5, 10, 20, 30]
density= if true, data is normalized density=false
histtype= type of histogram histtype='bar'
color= bar color color='b'
edgecolor= bar edge color color='k'
alpha= bar opacity alpha=0.5

For the next histogram, we will specify bins in 15 min increments. This means our bin edges are [0,15,30,45,60]. We will also specify density=False, color='b'(blue), edgecolor='k'(black), and alpha=0.5(half transparent).

The lines plt.xlabel(), plt.ylabel(), and plt.title() give the histogram axis labels and a title. plt.xticks() specifies the location for the tick labels on the x-axis. Since the bin edges are in 15 minute intervals, it makes sense to space the tick labels in 15 minute intervals as well.

In [6]:
bin_edges = [0,15,30,45,60]

plt.hist(commute_times, bins=bin_edges, density=False, histtype='bar', color='b', edgecolor='k', alpha=0.5)

plt.xlabel('Commute time range (min)') plt.xticks([0, 15, 30, 45,60]) plt.ylabel('Number of commuters') plt.title('Histogram of commute times')

plt.show()