Vectorised Backtesting

Simple moving average (SMA) based strategies

For buy and sell signal generation is already decades old.

Momentum strategies

Based on the hypothesis that recent performance will persist for some additional time.

Mean-reversion strategies

Stock prices or prices of other financial instruments tend to revert to some mean level or to some trend level when they have deviated too much from such levels.

Vectorised backtesting should be considered in the following cases:

Simple trading strategies
Interactive strategy exploration
Visualisation as a major goal
Comprehensive backtesting programs

Making use of vectorisation

Vectorisation, or array programming, refers to a programming style where operations on scalars (that is, integer or floating point numbers) are generalised to vectors, matrices, or even multidimensional arrays.

Python a for loop or something similar, such as a list comprehension.

v = [1, 2, 3, 4, 5]
sm = [2*i for i in v]

In principle, Python allows one to multiply a list object by an integer, but Python’s data model gives back another list object containing two times the elements of the original object:

2 * v

Vectorisation with NumPy

import numpy as np
a = np.array(v)
type(a)
2 * a
0.5 * a + 2

a = np.arange(12).reshape(4, 3)
2 * a
a ** 2
a.mean()
np.mean(a)
a.mean(axis=0)
np.mean(a, axis=1)

Vectorisation wth pandas

a = np.arange(15).reshape(5, 3)

import pandas as pd
columns = list('abc')
index = pd.date_range('2021-7-1', periods=5, freq='B')
df = pd.DataFrame(a, columns=columns, index=index)

2 * df
df.sum()
np.mean(df)
df['a'] + df['c']
0.5 * df.a + 2 * df.b - df.c
df['a'] > 5
df[df['a'] > 5]
df['c'] > df['b']
o.15 * df.a + df.b > df.c

Strategies based on simple moving averages

The basics of backtesting trading strategies that make use of two SMAs, working with end-of-day (EOD) closing data for the EUR/USD exchange rate.

raw = pd.read_csv('https://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna()

raw.info()

data = pd.DataFrame(raw['EUR='])

data.rename(columns={'EUR=': 'price'}, inplace=True)

data.info()

data['SMA1'] = data['price'].rolling(42).mean()

data['SMA2'] = data['price'].rolling(252).mean()

data.tail()

A visualisation of the original time series data in combination with the SMAs best illustrates the results.

%matplotlib inline

from pylab import mpl, plt

plt.style.use('seaborn')

mpl.rcParams['savefig.dpi'] = 300

mpl.rcParams['font.family'] = 'serif'

data.plot(title='EUR/USD | 42 & 252 days SMAs', figsize=(10, 6))

The next step is to generate signals, or rather market positionings, based on the relationship between the two SMAs. The rule is to go long whenever the shorter SMA is above the longer one and vice versa.

Indicate a long position by 1 and short position by -1.

data['position'] = np.where(data['SMA1'] > data['SMA2'], 1, -1)

data.dropna(inplace=True)

data['position'].plot(ylim=[-1.1, 1.1], title='Market Positioning', figsize=(10, 6))

To calculate the performance of the strategy, calculate the log returns based on the original financial time series next. The code to do this is again rather concise due to vectorisation.

data['returns'] = np.log(data['price'] data['price'].shift(1))

data['returns'].hist(bins=35, figsize=(10, 6))

Comparing the returns shows that the strategy books a win over the passive benchmark investment:

data['strategy'] = data['position'].shift(1) * data['returns']

data[['returns', 'strategy']].sum()

data[['returns', 'strategy']].sum().apply(np.exp)

data[['returns', 'strategy']].cumsum().apply(np.exp).plot(figsize(10, 6))

Average, annualised risk-return statistics for both the stock and the strategy

data[['returns', 'strategy']].mean() * 252

np.exp(data[['returns', 'strategy']].mean() * 252) -1

data[['returns', 'strategy']].std() * 252 ** 0.5

(data[['returns', 'strategy']].apply(np.exp) - 1).std() * 252 ** 0.5

The maximum drawdown and the longest drawdown period

data['cumret'] = data['strategy'].cumsum().apply(np.exp)

data['cummax'] = data['cumret'].cummax()

data[['cumret', 'cummax']].dropna().plot(figsize=(10, 6))

drawdown = data['cumax'] - data['cumret']
drawdown.max()

The determination of the longest drawdown period is a bit more involved. It requires those dates at which the gross performance equals its cumulative maximum. This information is stored in a temporary object. Then the differences in days between all such dates are calculated and the longest period is picked out. Such periods can be only one day long and more than 100 days. Here, the longest drawdown period lasts for 596 days — a pretty long period.

temp = drawdown[drawdown == 0]

periods = (temp.index[1:].to_pydatetime() - temp.index[:-1].to_pydatetime())

periods[12:15]

periods.max()

Generalising the Approach

SMA Backtesting Class

symbol: RIC (instrument data) to be used
SMA1: for the time window in days for the shorter SMA
SMA2: for the time window in days for the longer SMA
start: for the start date of the data selection
end: for the end data of the data selection

import SMAVectorBacktester as SMA

smabt = SMA.SMAVectorBacktester('EUR=', 42, 252, '201-1-1', '2019-12-31')

sumbt.run_strategy()

%%time

smabt.plot_results()

Strategies Based on Momentum

Two basic types of momentum strategies:

Cross-sectional momentum strategies: Selecting from a larger pool of instruments, these strategies buy those instruments that have recently outperformed relative to their peers and sell those instruments that have underperformed. The basic idea is that the instruments continue to outperforma nd underperform, respectively — at least for a certain period of time.
Time series momentum strategies: These strategies buy those instruments that have recently performed well and sell those instruments that have recently performed poorly. In this case, the benchmark is the past returns of the instrument itself.

Consider end-of-day closing prices for the gold price in USD (XAU=):

data = pd.DataFrame(raw['XAU='])

data.rename(columns={'XAU=':'price'}, inplace=True)

data['returns'] = np.log(data['price'] / data['price'].shift(1))

The most simple time series momentum strategy is to buy the stock if the last return was positive and to sell it if it was negative.

data['position'] = np.sign(data['returns']

data['strategy'] = data['position'].shift(1) * data['returns']

data[['return', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10,6))

data['position'] = np.sign(data['returns'].rolling(3).mean())

data['strategy'] = data['position'].shift(1) * data['returns']

data[['returns', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10, 6))

fn = '../data/AAPL_1min_05052020.csv'

data = pd.read_csv(fn, index_col=0, parse_dates=True)

data.info()

data['returns'] = np.log(data['CLOSE'] / data['CLOSE'].shift(1))

to_plot = ['returns']

for m in [1, 3, 5, 7, 9]:
    data['position_%d' % m] = np.sign(data['returns'].rolling(m).mean())
    data['strategy_%d' % m] = (data['position_%d' % m].shift(1) * data['returns']
    to_plot.append('strategy_%d' % m)

data[to_plot].dropna().cumsum().apply(np.exp).plot(title='AAPL intraday 05. May 2020', figsize=(10, 6), style=['-', '--', '--', '--', '--', '--'])

Momentum Backtesting Class

symbol: RIC(instrument data) to be used
start: for the start date of the data selection
end: for the end date of the data selection
amount: for the initial amount to be invested
tc: for the proportional transaction costs per trade

import MomVectorBacktester as Mon

mombt = Mom.MomVectorBactester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.0)

mombt.run_strategy(momentum=3)

mombt.plot_results()
mombt = Mom.MomVectorBacktester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.001)

mombt.run_strategy(momentum=3)

mombt.plot_results()

Strategies Based on Mean Reversion

Mean-reversion strategies rely on a reasoning that is the opposite of momentum strategies.

GLD is the symbol for SPDR Gold Shares, which is the largest physically backed exchange traded fund (ETF) for gold.
GDX is the symbol for the VanEck Vectors Gold Miners ETF, which invests in equity products to track the NYSE Arca Gold Miners Index.

data = pd.DataFrame(raw['DGX'])

data.rename(columns={'GDX': 'price'}, inplace=True)

data['returns'] = np.log(data['price'] / data['price'].shift(1))

SMA = 25

data['SMA'] = data['price'].rolling(SMA).mean()

threshold = 3.5

data['distance'] = data['price'] - data['SMA']

data['distance'].dropna().plot(figsize=(10, 6), legend=True)
plt.axhline(threshold, color='r')
plt.axhline(-threshold, color='r')
plt.axhline(0, color='r')

data['position'] = np.where(data['distance'] > threshold, -1, np.nan)

data['position'] = np.where(data['distance'] < -threshold, 1, data['position'])

data['position'] = np.where(data['distance'] * data['distance'].shift(1) < 0, 0, data['position'])

data['position'] = data['position'].ffill().fillna(0)

data['position'].iloc[SMA:].plot(ylim=[-1.1, 1.1], figsize=(10, 6))

data['strategy'] = data['position'].shift(1) * dta['returns']

data[['returns', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10, 6))

Mean Reversion Backtesting Class

import MRVectorBacktester as MR

mrbt = MR.MRVectorBactester('GLD', '2010-1-1', '2019-12-31', 10000, 0.001)

mrbt.run_strategy(SMA=43, threshold=7.5)

mrbt.plot_results()

Data Snooping and Overfitting

Author: Zhe

Leave a Reply Cancel reply

Vectorised Backtesting

Author: Zhe

Related posts

Leave a Reply Cancel reply