Histograms
Histograms
Historgram plots can be created with Matplotlib. A histogram is a type of statistical bar chart. Histograms include a range of values (called bins) is assigned to the x-axis and a count or frequency of data in that range (number of data points in each bin) is plotted on the y-axis. Matplotlib's plt.hist()
function creates histogram plots.
To create a histogram with Mtplotlib, first import Matplotlib with the standard line:
import matplotlib.pyplot as plt
In our first example, we will also import NumPy with the line import numpy as np
. We'll use NumPy's random number generator to create a dataset for the histogram. If using a Jupyter notebook, include the line %matplotlib inline
below the imports.
import matplotlib.pyplot as plt
import numpy as np
# if using a Jupyter notebook, include:
%matplotlib inline
mu = 80
and a standard deviation sigma = 7
. NumPy's np.random.normal()
function produces an array of random numbers with a normal distribution. 200 random numbers is a good amount of random numbers to plot. The general format of np.random.normal()
is below:
var = np.random.normal(mean, stdev, size=<number of values>)
mu = 80
sigma = 7
x = np.random.normal(mu, sigma, size=200)
Matplotlib's plt.hist()
function produces the histogram plot. The first positional argument passed to plt.hist()
is a list or array of values, the second positional argument denotes the number of bins on the histogram.
plt.hist(values, num_bins)
Similar to Matplotlib line plots, bar plots and pie charts, a set of keyword arguments can be included in the plt.hist()
function call. Specifying values for the keyword arguments customizes the histogram.
Example keyword arguments which can be included with plt.hist()
are:
density=
histtype=
facecolor=
alpha=
(opacity).In [3]:plt.hist(x, 20, density=True, histtype='bar', facecolor='b', alpha=0.5)
plt.title('Historgram') plt.xlabel('x-axis') plt.ylabel('y-axis')
plt.show()
23, 25, 40, 35, 36, 47, 33, 28, 48, 34, 20, 37, 36, 23, 33, 36, 20, 27, 50, 34, 47, 18, 28, 52, 21, 44, 34, 13, 40, 49
We will plot a histogram of these commute times. First, import Matplotlib as in the previous example, and include
%matplotib inline
if using a Jupyter notebook. Then build a Python list of commute times from the survey data above.In [4]:import matplotlib.pyplot as plt #if using a Jupyter notebook, include: %matplotlib inline
commute_times = [23, 25, 40, 35, 36, 47, 33, 28, 48, 34, 20, 37, 36, 23, 33, 36, 20, 27, 50, 34, 47, 18, 28, 52, 21, 44, 34, 13, 40, 49]
plt.hist()
is called, and thecommute_times
list and5
bins are included as positional arguments.In [5]:plt.hist(commute_times, 5)
plt.show()
bins=
. A table of select keyword arguments used withplt.hist()
is below:keyword argument description example bins=
list of bin edges bins=[5, 10, 20, 30]
density=
if true
, data is normalizeddensity=false
histtype=
type of histogram histtype='bar'
color=
bar color color='b'
edgecolor=
bar edge color color='k'
alpha=
bar opacity alpha=0.5
For the next histogram, we will specify bins in 15 min increments. This means our bin edges are
[0,15,30,45,60]
. We will also specifydensity=False
,color='b'
(blue),edgecolor='k'
(black), andalpha=0.5
(half transparent).The lines
plt.xlabel()
,plt.ylabel()
, andplt.title()
give the histogram axis labels and a title.plt.xticks()
specifies the location for the tick labels on the x-axis. Since the bin edges are in 15 minute intervals, it makes sense to space the tick labels in 15 minute intervals as well.In [6]:bin_edges = [0,15,30,45,60]
plt.hist(commute_times, bins=bin_edges, density=False, histtype='bar', color='b', edgecolor='k', alpha=0.5)
plt.xlabel('Commute time range (min)') plt.xticks([0, 15, 30, 45,60]) plt.ylabel('Number of commuters') plt.title('Histogram of commute times')
plt.show()