Histograms
Histograms
Historgram plots can be created with Matplotlib. A histogram is a type of statistical bar chart. Histograms include a range of values (called bins) is assigned to the x-axis and a count or frequency of data in that range (number of data points in each bin) is plotted on the y-axis. Matplotlib's plt.hist() function creates histogram plots.
To create a histogram with Mtplotlib, first import Matplotlib with the standard line:
import matplotlib.pyplot as plt
In our first example, we will also import NumPy with the line import numpy as np. We'll use NumPy's random number generator to create a dataset for the histogram. If using a Jupyter notebook, include the line %matplotlib inline below the imports.
import matplotlib.pyplot as plt
import numpy as np
# if using a Jupyter notebook, include:
%matplotlib inline
mu = 80 and a standard deviation sigma = 7. NumPy's np.random.normal() function produces an array of random numbers with a normal distribution. 200 random numbers is a good amount of random numbers to plot. The general format of np.random.normal() is below:
var = np.random.normal(mean, stdev, size=<number of values>)
mu = 80
sigma = 7
x = np.random.normal(mu, sigma, size=200)
Matplotlib's plt.hist() function produces the histogram plot. The first positional argument passed to plt.hist() is a list or array of values, the second positional argument denotes the number of bins on the histogram.
plt.hist(values, num_bins)
Similar to Matplotlib line plots, bar plots and pie charts, a set of keyword arguments can be included in the plt.hist() function call. Specifying values for the keyword arguments customizes the histogram.
Example keyword arguments which can be included with plt.hist() are:
density=histtype=facecolor=alpha=(opacity).The next histogram example involves a list of commute times. Suppose the following commute times were recorded in a survey:In [3]:plt.hist(x, 20, density=True, histtype='bar', facecolor='b', alpha=0.5)
plt.title('Historgram') plt.xlabel('x-axis') plt.ylabel('y-axis')
plt.show()
23, 25, 40, 35, 36, 47, 33, 28, 48, 34, 20, 37, 36, 23, 33, 36, 20, 27, 50, 34, 47, 18, 28, 52, 21, 44, 34, 13, 40, 49We will plot a histogram of these commute times. First, import Matplotlib as in the previous example, and include
%matplotib inlineif using a Jupyter notebook. Then build a Python list of commute times from the survey data above.In [4]:import matplotlib.pyplot as plt #if using a Jupyter notebook, include: %matplotlib inline
commute_times = [23, 25, 40, 35, 36, 47, 33, 28, 48, 34, 20, 37, 36, 23, 33, 36, 20, 27, 50, 34, 47, 18, 28, 52, 21, 44, 34, 13, 40, 49]
plt.hist()is called, and thecommute_timeslist and5bins are included as positional arguments.To construct a histogram with specific bin ranges, a list or array of bin edges is supplied to the keyword argumentIn [5]:plt.hist(commute_times, 5)
plt.show()
bins=. A table of select keyword arguments used withplt.hist()is below:keyword argument description example bins=list of bin edges bins=[5, 10, 20, 30]density=if true, data is normalizeddensity=falsehisttype=type of histogram histtype='bar'color=bar color color='b'edgecolor=bar edge color color='k'alpha=bar opacity alpha=0.5For the next histogram, we will specify bins in 15 min increments. This means our bin edges are
[0,15,30,45,60]. We will also specifydensity=False,color='b'(blue),edgecolor='k'(black), andalpha=0.5(half transparent).The lines
plt.xlabel(),plt.ylabel(), andplt.title()give the histogram axis labels and a title.plt.xticks()specifies the location for the tick labels on the x-axis. Since the bin edges are in 15 minute intervals, it makes sense to space the tick labels in 15 minute intervals as well.In [6]:bin_edges = [0,15,30,45,60]
plt.hist(commute_times, bins=bin_edges, density=False, histtype='bar', color='b', edgecolor='k', alpha=0.5)
plt.xlabel('Commute time range (min)') plt.xticks([0, 15, 30, 45,60]) plt.ylabel('Number of commuters') plt.title('Histogram of commute times')
plt.show()