Data Visualisation Using Matplotlib

Creating figures and subplots

Defining figures’ subplots

import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,6), dpi=200)

Adds a subplot

ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 4)

Plotting in subplots

import numpy as np
x = np.linspace(0, 1, num=20)
y1 = np.square(x)
ax1.plot(x, y1, color='black', linestyle='--')

y2 = np.sin(x)
ax2.plot(x, y2, color='black', linestyle=':')

y3 = np.cos(x)
ax3.plot(x, y3, color='black', linestyle='-.')

The sharex= parameter can be passed when creating subplots to specify that all the subplots should share the same x axis.

fig, (ax1, ax2) = plt.subplots(2, figsize=(12,6), sharex=True)
ax1.plot(x, y1, color='black', linestyle='--')
y2 = np.power(x, 10)
ax2.plot(x, y2, color='black', linestyle='-.')

Enriching plots with colors, markers, and line styles

The code block that follows plots four different functions and uses the following parameters to modify the appearance:

The color= parameter is used to assign colors.
The linewidth= parameter is used to change the width/thickness of the lines.
The marker= parameter assigns different shapes to mark the data points.
The markersize= parameter changes the size of those markers.
The alpha= parameter is used to modify the transparency.
The drawstyle= parameter changes the default line connectivity to step connectivity between data points for one plot.

fig, (ax1, ax2, ax3, ax4) = plt.subplots(r, figsize=(12,12), sharex=True)
x = np.linspace(0, 10, num=20)
y1 = np.exp(x)
y2 = x ** 3
y3 = np.sin(y2)
y4 = np.random.randn(20)

ax1.plot(x, y1, color='black', linestyle='--', linewidth=5, marker='x', markersize=15)
ax2.plot(x, y2, color='green', linestyle='-.', linewidth=2, marker='^', markersize=10, alpha=0.9)
ax3.plot(x, y3, color='red', linestyle=':', marker='*', markersize=15, drawstyle='steps')
ax4.plot(x, y4, color='green', linestyle='-', marker='s', markersize=15)

Enriching axes with ticks, labels, and legends

The matplotlib.pyplot.xlim(…) method sets the range of values on the x axis.

The matplotlib.pyplot.xticks(…) method specifies where the ticks show up on the x axis:

plt.xlim([8, 10.5])
plt.xticks([8, 8.42, 8.94, 9.47, 10, 10.5])
plt.plot(x, y1, color='black', linestyle='--', marker='o')

Change the scale of one of the axes to non-linear using the matplotlib.Axes.set_yscale(…) method

The matplotlib.Axes.set_xticklabels(…) method changes the labels on the x axis

fig, ax = plt.subplots(1, figsize=(12,6))
ax.set_yscale('log')
ax.set_xticks(x)
ax.set_xticklabels(list('ABCDEFGHIJKLMNOPQRSTUV'))
ax.plot(x, y1, color='black', linestyle='--', marker='o', label='y=exp(x)')

Add a title to the plot and set labels for the x and y axes

Add a legend makes the plots easier to interpret. The loc= parameter specifies the location of the legend on the plot with loc=’best’, meaning Matplotlib picks the best location automatically

ax.set_title('xtickslabel example)
ax.set_xlabel('x labels')
ax.set_ylabel('log scale y values')
ax.legend(loc='best')

Enriching data points with annotations

Add a text box to our plots

ax.text(1, 10000, 'Generated using numpy and matplotlib')

The matplotlib.Axes.annotate(…) method provides more control over the annotations.

The code block that follows uses the following parameters to control the annotation:

The xy= parameter specifies the location of the data point.
The xytext= parameter specifies the location of the text box.
The arrowprops= parameter accepts a dictionary specifying parameters to control the arrow from the text box to the data point.
The facecolor= parameter specifies the color and the shrink= parameter specifies the size of the arrow.
The horizontalalignment= and verticalalignment= parameters specify the orientation of the text box relative to the data point.

for i in [5, 10, 15]:
    s = '(x=' + str(x[i]) +',y=' + str(y1[i]) + ')'
    ax.annotate(s, xy=(x[i], y1[i]), xytext=(x[i]+1, y1[i]-5, arrowprops=dic(facecolor='black', shrink=0.05), horizontalalignment='left', verticalalignment='top')

The matplotlib.Axes.add_patch(…) method can be used to add different shape annotations.

The code block that follows adds a matplotlib.pyplot.Circle object, which accepts the following:

The xy= parameter to specify the location
The radius= parameter to specify the circle radius
The color= parameter to specify the color of the circle

fig, ax = plt.subplots(1, figsize=(12,6))
ax.plot(x, x, linestyle='--', color='block', marker='*', markersize=15)
for val in x:
    ax.add_patch(plt.Circle(xy=(xy=(val, val), radius=0.3, color='darkgray'))

Saving plots to files

fig.savefig('fig.png', dpi=200)

Charting a pandas DataFrame with Matplotlib

import pandas as pd

df = pd.DataFrame(index=range(1000), columns=['Cont value', 'Delta1, value', 'Delta2, value', 'Cat, value'])

df['Cont value'] = np.random.randn(1000).cumsum()
df['Delta1 value'] = np.random.randn(1000)
df['Delta2 value'] = np.random.randn(1000)
df['Cat value'] = np.random.permutation(['Very high', 'High', 'Medium', 'Low', 'Very Low']*200)
df['Delta1 discrete'] = pd.cut(df['Delta1 value'], labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64)
df['Delta2 discrete'] = pd.cut(df['Delta2 value'], labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64)

Creating line plots of a DataFrame column

Plot ‘Cont value’ in a line plot with the kind= parameter

df.plot(y='Cont value', kind='line', color='black', linestyle='-', figsize=(12,6))

Creating bar plots of a DataFrame column

df.groupby('Cat value')['Delta1 discrete'].value_counts().plot(kind='bar', color='darkgray', title='Occurrence by (cat,Delta1)', figsize=(12,6))

The kind=’barh’ parameter builds a horizontal bar plot instead of a vertical one

df.groupby('Delta2 discrete')['Cat value'].value_counts().plot(kind='barh', color='darkgray', title='Occurrence by (Delta2,Cat)', figsize=(12,12))

Creating histogram and density plot of a DataFrame column

df['Delta1 discrete'].plot(kind='hist', color='darkgray', figsize(12,6), label='Delta1')
plt.legend()

Build a Probability Density Function (PDF) by specifying the kind=’kde’ parameter, which generates a PDF using the Kernel Density Estimation (KDE)

df['Delta2 discrete'].plot(kind='kde', color='black', figsize=(12,6), label='Delta2 kde')
plt.legend()

Creating scatter plots of two DataFrame columns

df.plot(kin='scatter', x='Delta1 value', y='Delta2 value', alpha=0.5, color='black', figsize=(8,8))

Build a matrix of scatter plots on non-diagonal entries and histogram/KDE plot on the diagonal entries of the matrix

pd.plotting.scatter_matrix(df[['Delta1 value', 'Delta2 value']], diagonal='kde', color='black',figsize=(8,8))

Plotting time series data

Creates a pandas DataFrame containing prices for two hypothetical trading instruments, A and B. The DataFrame is indexed by the DateTimeIndex objects representing daily dates from 1992 to 2012

dates = pd.date_range('1992-01-01', '2012-10-22')
time_series = pd.DataFrame(index=dates, columns=['A','B'])
time_series['A'] = np.random.randint(low=-100, high=101, size=len(dates)).cumsun() + 5000
time_series['B'] = np.random.randint(low=-75, high=76, size=len(dates)).cumsun() + 5000

Plotting prices in a line plot

time_series['A'].plot(kind='line', linestyle='-', color='black', figsize=(12,6), label='A')
time_series['B'].plot(kind='line', linestyle='-.', color='darkgray', figsize=(12,6), label='B')
plt.legend()

Plotting price change histogrms

The usual next stop in financial time series analysis is to inspect changes in price over some duration.

time_series['A_1_delta'] = time_series['A'].shift(-1) - time_series['A'].fillna(0)
time_series['B_1_delta'] = time_series['B'].shift(-1) - time_series['B'].fillna(0)

time_series['A_5_delta'] = time_series['A'].shift(-5) - time_series['A'].fillna(0)
time_series['B_5_delta'] = time_series['B'].shift(-5) - time_series['B'].fillna(0)

time_series['A_20_delta'] = time_series['A'].shift(-20) - time_series['A'].fillna(0)
time_series['A_20_delta'] = time_series['B'].shift(-20) - time_series['B'].fillna(0)

time_series_deltas = time_series[['A_1_delta', 'B_1_delta', A_5_delta', 'B_5_delta', A_20_data', 'B_20_delta']].dropna()

Plot the price change histogram for A

time_series_deltas['A_20_delta'].plot(kind='hist', color='black', alpha=0.5, label='A_20_delta', figsize=(8,8))
time_series_deltas['A_5_delta'].plot(kind='hist', color='darkgray', alpha=0.5, label='A_5_delta', figsize=(8,8))
time_series_deltas['A_1_delta'].plot(kind='hist', color='lightgray', alpha=0.5, label='A_1_delta', figsize=(8,8))
plt.legend()

Creating price change density plots

time_series_deltas['B_20_delta'].plot(kind='kde', linestyle='-', linewidth=2, color='black', label='B_20_delta', figsize=(8,8))
time_series_deltas['B_5_delta'].plot(kind='kde', linestyle=':', linewidth=2, color='black', label='B_5_delta', figsize=(8,8))
time_series_deltas['B_1_delta'].plot(kind='kde', linestyle='--', linewidth=2, color='black', label='B_1_delta', figsize=(8,8))
plt.legend()

Creating box plots by inerval

group_A = time_series[['A']].groupby(pd.Grouper(freq='A'))
group_A.boxplot(color='black', subplots=False, rot=90, figsize=(12,12))

Box plots with whiskers are used for visualising groups of numerical data through their corresponding quartiles:

The box’s lower bound corresponds to the lower quartile, while the box’s upper bound represents the group’s upper quartile.
The line within the box displays the value of the median of the interval.
The line below the box ends with the value of the lowest observation.
The line above the box ends with the value of the highest observation.

Creating lag scatter plots

Visualise the relationships between the different price change variables using the pandas.plotting.scatter_matrix(…) method

pd.plotting.scatter_matrix(time_series[['A_1_delta', 'A_5_delta', 'A_20_delta', 'B_1_delta', 'B_5_delta', 'B_20_delta']], diagonal='kde', color='black', alpha=0.25, figsize=(12,12))

Use the pandas.plotting.lag_plot(…) method with different lag= values to specify different levels of lag to generate the scatter plots between prices and lagged prices for A

fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12,12))
pd.plotting.lag_plot(time_series['A'], ax=ax1, lag=1, c='black', alpha=0.2)
pd.plotting.lag_plot(time_series['A'], ax=ax2, lag=7, c='black', alpha=0.2)
pd.plotting.lag_plot(time_series['A'], ax=ax3, lag=20, c='black', alpha=0.2)

Creating autocorrelation plots

fig, ax = plt.subplots(1, figsize=(12,6))
pd.plotting.autocorrelation_plot(time_series['A'], ax=ax)

Autocorrelation plots summarise the randomness of a time series. For a random time series, all autocorrelations would be close to 0 for all lags. For a non-random time series, at least one of the autocorrelations would be significantly non-zero.

Author: Zhe

Leave a Reply Cancel reply

Data Visualisation Using Matplotlib

Author: Zhe

Related posts

Leave a Reply Cancel reply