Creating figures and subplots
Defining figures’ subplots
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,6), dpi=200)
Adds a subplot
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 4)
Plotting in subplots
import numpy as np
x = np.linspace(0, 1, num=20)
y1 = np.square(x)
ax1.plot(x, y1, color='black', linestyle='--')
y2 = np.sin(x)
ax2.plot(x, y2, color='black', linestyle=':')
y3 = np.cos(x)
ax3.plot(x, y3, color='black', linestyle='-.')
The sharex= parameter can be passed when creating subplots to specify that all the subplots should share the same x axis.
fig, (ax1, ax2) = plt.subplots(2, figsize=(12,6), sharex=True)
ax1.plot(x, y1, color='black', linestyle='--')
y2 = np.power(x, 10)
ax2.plot(x, y2, color='black', linestyle='-.')
Enriching plots with colors, markers, and line styles
The code block that follows plots four different functions and uses the following parameters to modify the appearance:
- The color= parameter is used to assign colors.
- The linewidth= parameter is used to change the width/thickness of the lines.
- The marker= parameter assigns different shapes to mark the data points.
- The markersize= parameter changes the size of those markers.
- The alpha= parameter is used to modify the transparency.
- The drawstyle= parameter changes the default line connectivity to step connectivity between data points for one plot.
fig, (ax1, ax2, ax3, ax4) = plt.subplots(r, figsize=(12,12), sharex=True)
x = np.linspace(0, 10, num=20)
y1 = np.exp(x)
y2 = x ** 3
y3 = np.sin(y2)
y4 = np.random.randn(20)
ax1.plot(x, y1, color='black', linestyle='--', linewidth=5, marker='x', markersize=15)
ax2.plot(x, y2, color='green', linestyle='-.', linewidth=2, marker='^', markersize=10, alpha=0.9)
ax3.plot(x, y3, color='red', linestyle=':', marker='*', markersize=15, drawstyle='steps')
ax4.plot(x, y4, color='green', linestyle='-', marker='s', markersize=15)
Enriching axes with ticks, labels, and legends
The matplotlib.pyplot.xlim(…) method sets the range of values on the x axis.
The matplotlib.pyplot.xticks(…) method specifies where the ticks show up on the x axis:
plt.xlim([8, 10.5])
plt.xticks([8, 8.42, 8.94, 9.47, 10, 10.5])
plt.plot(x, y1, color='black', linestyle='--', marker='o')
Change the scale of one of the axes to non-linear using the matplotlib.Axes.set_yscale(…) method
The matplotlib.Axes.set_xticklabels(…) method changes the labels on the x axis
fig, ax = plt.subplots(1, figsize=(12,6))
ax.set_yscale('log')
ax.set_xticks(x)
ax.set_xticklabels(list('ABCDEFGHIJKLMNOPQRSTUV'))
ax.plot(x, y1, color='black', linestyle='--', marker='o', label='y=exp(x)')
Add a title to the plot and set labels for the x and y axes
Add a legend makes the plots easier to interpret. The loc= parameter specifies the location of the legend on the plot with loc=’best’, meaning Matplotlib picks the best location automatically
ax.set_title('xtickslabel example)
ax.set_xlabel('x labels')
ax.set_ylabel('log scale y values')
ax.legend(loc='best')
Enriching data points with annotations
Add a text box to our plots
ax.text(1, 10000, 'Generated using numpy and matplotlib')
The matplotlib.Axes.annotate(…) method provides more control over the annotations.
The code block that follows uses the following parameters to control the annotation:
- The xy= parameter specifies the location of the data point.
- The xytext= parameter specifies the location of the text box.
- The arrowprops= parameter accepts a dictionary specifying parameters to control the arrow from the text box to the data point.
- The facecolor= parameter specifies the color and the shrink= parameter specifies the size of the arrow.
- The horizontalalignment= and verticalalignment= parameters specify the orientation of the text box relative to the data point.
for i in [5, 10, 15]:
s = '(x=' + str(x[i]) +',y=' + str(y1[i]) + ')'
ax.annotate(s, xy=(x[i], y1[i]), xytext=(x[i]+1, y1[i]-5, arrowprops=dic(facecolor='black', shrink=0.05), horizontalalignment='left', verticalalignment='top')
The matplotlib.Axes.add_patch(…) method can be used to add different shape annotations.
The code block that follows adds a matplotlib.pyplot.Circle object, which accepts the following:
- The xy= parameter to specify the location
- The radius= parameter to specify the circle radius
- The color= parameter to specify the color of the circle
fig, ax = plt.subplots(1, figsize=(12,6))
ax.plot(x, x, linestyle='--', color='block', marker='*', markersize=15)
for val in x:
ax.add_patch(plt.Circle(xy=(xy=(val, val), radius=0.3, color='darkgray'))
Saving plots to files
fig.savefig('fig.png', dpi=200)
Charting a pandas DataFrame with Matplotlib
import pandas as pd
df = pd.DataFrame(index=range(1000), columns=['Cont value', 'Delta1, value', 'Delta2, value', 'Cat, value'])
df['Cont value'] = np.random.randn(1000).cumsum()
df['Delta1 value'] = np.random.randn(1000)
df['Delta2 value'] = np.random.randn(1000)
df['Cat value'] = np.random.permutation(['Very high', 'High', 'Medium', 'Low', 'Very Low']*200)
df['Delta1 discrete'] = pd.cut(df['Delta1 value'], labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64)
df['Delta2 discrete'] = pd.cut(df['Delta2 value'], labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64)
Creating line plots of a DataFrame column
Plot ‘Cont value’ in a line plot with the kind= parameter
df.plot(y='Cont value', kind='line', color='black', linestyle='-', figsize=(12,6))
Creating bar plots of a DataFrame column
df.groupby('Cat value')['Delta1 discrete'].value_counts().plot(kind='bar', color='darkgray', title='Occurrence by (cat,Delta1)', figsize=(12,6))
The kind=’barh’ parameter builds a horizontal bar plot instead of a vertical one
df.groupby('Delta2 discrete')['Cat value'].value_counts().plot(kind='barh', color='darkgray', title='Occurrence by (Delta2,Cat)', figsize=(12,12))
Creating histogram and density plot of a DataFrame column
df['Delta1 discrete'].plot(kind='hist', color='darkgray', figsize(12,6), label='Delta1')
plt.legend()
Build a Probability Density Function (PDF) by specifying the kind=’kde’ parameter, which generates a PDF using the Kernel Density Estimation (KDE)
df['Delta2 discrete'].plot(kind='kde', color='black', figsize=(12,6), label='Delta2 kde')
plt.legend()
Creating scatter plots of two DataFrame columns
df.plot(kin='scatter', x='Delta1 value', y='Delta2 value', alpha=0.5, color='black', figsize=(8,8))
Build a matrix of scatter plots on non-diagonal entries and histogram/KDE plot on the diagonal entries of the matrix
pd.plotting.scatter_matrix(df[['Delta1 value', 'Delta2 value']], diagonal='kde', color='black',figsize=(8,8))
Plotting time series data
Creates a pandas DataFrame containing prices for two hypothetical trading instruments, A and B. The DataFrame is indexed by the DateTimeIndex objects representing daily dates from 1992 to 2012
dates = pd.date_range('1992-01-01', '2012-10-22')
time_series = pd.DataFrame(index=dates, columns=['A','B'])
time_series['A'] = np.random.randint(low=-100, high=101, size=len(dates)).cumsun() + 5000
time_series['B'] = np.random.randint(low=-75, high=76, size=len(dates)).cumsun() + 5000
Plotting prices in a line plot
time_series['A'].plot(kind='line', linestyle='-', color='black', figsize=(12,6), label='A')
time_series['B'].plot(kind='line', linestyle='-.', color='darkgray', figsize=(12,6), label='B')
plt.legend()
Plotting price change histogrms
The usual next stop in financial time series analysis is to inspect changes in price over some duration.
time_series['A_1_delta'] = time_series['A'].shift(-1) - time_series['A'].fillna(0)
time_series['B_1_delta'] = time_series['B'].shift(-1) - time_series['B'].fillna(0)
time_series['A_5_delta'] = time_series['A'].shift(-5) - time_series['A'].fillna(0)
time_series['B_5_delta'] = time_series['B'].shift(-5) - time_series['B'].fillna(0)
time_series['A_20_delta'] = time_series['A'].shift(-20) - time_series['A'].fillna(0)
time_series['A_20_delta'] = time_series['B'].shift(-20) - time_series['B'].fillna(0)
time_series_deltas = time_series[['A_1_delta', 'B_1_delta', A_5_delta', 'B_5_delta', A_20_data', 'B_20_delta']].dropna()
Plot the price change histogram for A
time_series_deltas['A_20_delta'].plot(kind='hist', color='black', alpha=0.5, label='A_20_delta', figsize=(8,8))
time_series_deltas['A_5_delta'].plot(kind='hist', color='darkgray', alpha=0.5, label='A_5_delta', figsize=(8,8))
time_series_deltas['A_1_delta'].plot(kind='hist', color='lightgray', alpha=0.5, label='A_1_delta', figsize=(8,8))
plt.legend()
Creating price change density plots
time_series_deltas['B_20_delta'].plot(kind='kde', linestyle='-', linewidth=2, color='black', label='B_20_delta', figsize=(8,8))
time_series_deltas['B_5_delta'].plot(kind='kde', linestyle=':', linewidth=2, color='black', label='B_5_delta', figsize=(8,8))
time_series_deltas['B_1_delta'].plot(kind='kde', linestyle='--', linewidth=2, color='black', label='B_1_delta', figsize=(8,8))
plt.legend()
Creating box plots by inerval
group_A = time_series[['A']].groupby(pd.Grouper(freq='A'))
group_A.boxplot(color='black', subplots=False, rot=90, figsize=(12,12))
Box plots with whiskers are used for visualising groups of numerical data through their corresponding quartiles:
- The box’s lower bound corresponds to the lower quartile, while the box’s upper bound represents the group’s upper quartile.
- The line within the box displays the value of the median of the interval.
- The line below the box ends with the value of the lowest observation.
- The line above the box ends with the value of the highest observation.
Creating lag scatter plots
Visualise the relationships between the different price change variables using the pandas.plotting.scatter_matrix(…) method
pd.plotting.scatter_matrix(time_series[['A_1_delta', 'A_5_delta', 'A_20_delta', 'B_1_delta', 'B_5_delta', 'B_20_delta']], diagonal='kde', color='black', alpha=0.25, figsize=(12,12))
Use the pandas.plotting.lag_plot(…) method with different lag= values to specify different levels of lag to generate the scatter plots between prices and lagged prices for A
fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12,12))
pd.plotting.lag_plot(time_series['A'], ax=ax1, lag=1, c='black', alpha=0.2)
pd.plotting.lag_plot(time_series['A'], ax=ax2, lag=7, c='black', alpha=0.2)
pd.plotting.lag_plot(time_series['A'], ax=ax3, lag=20, c='black', alpha=0.2)
Creating autocorrelation plots
fig, ax = plt.subplots(1, figsize=(12,6))
pd.plotting.autocorrelation_plot(time_series['A'], ax=ax)
Autocorrelation plots summarise the randomness of a time series. For a random time series, all autocorrelations would be close to 0 for all lags. For a non-random time series, at least one of the autocorrelations would be significantly non-zero.






