Simple moving average (SMA) based strategies
For buy and sell signal generation is already decades old.
Momentum strategies
Based on the hypothesis that recent performance will persist for some additional time.
Mean-reversion strategies
Stock prices or prices of other financial instruments tend to revert to some mean level or to some trend level when they have deviated too much from such levels.
Vectorised backtesting should be considered in the following cases:
- Simple trading strategies
- Interactive strategy exploration
- Visualisation as a major goal
- Comprehensive backtesting programs
Making use of vectorisation
Vectorisation, or array programming, refers to a programming style where operations on scalars (that is, integer or floating point numbers) are generalised to vectors, matrices, or even multidimensional arrays.
Python a for loop or something similar, such as a list comprehension.
v = [1, 2, 3, 4, 5]
sm = [2*i for i in v]
In principle, Python allows one to multiply a list object by an integer, but Python’s data model gives back another list object containing two times the elements of the original object:
2 * v
Vectorisation with NumPy
import numpy as np
a = np.array(v)
type(a)
2 * a
0.5 * a + 2
a = np.arange(12).reshape(4, 3)
2 * a
a ** 2
a.mean()
np.mean(a)
a.mean(axis=0)
np.mean(a, axis=1)
Vectorisation wth pandas
a = np.arange(15).reshape(5, 3)
import pandas as pd
columns = list('abc')
index = pd.date_range('2021-7-1', periods=5, freq='B')
df = pd.DataFrame(a, columns=columns, index=index)
2 * df
df.sum()
np.mean(df)
df['a'] + df['c']
0.5 * df.a + 2 * df.b - df.c
df['a'] > 5
df[df['a'] > 5]
df['c'] > df['b']
o.15 * df.a + df.b > df.c
Strategies based on simple moving averages
The basics of backtesting trading strategies that make use of two SMAs, working with end-of-day (EOD) closing data for the EUR/USD exchange rate.
raw = pd.read_csv('https://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna()
raw.info()
data = pd.DataFrame(raw['EUR='])
data.rename(columns={'EUR=': 'price'}, inplace=True)
data.info()
data['SMA1'] = data['price'].rolling(42).mean()
data['SMA2'] = data['price'].rolling(252).mean()
data.tail()
A visualisation of the original time series data in combination with the SMAs best illustrates the results.
%matplotlib inline
from pylab import mpl, plt
plt.style.use('seaborn')
mpl.rcParams['savefig.dpi'] = 300
mpl.rcParams['font.family'] = 'serif'
data.plot(title='EUR/USD | 42 & 252 days SMAs', figsize=(10, 6))
The next step is to generate signals, or rather market positionings, based on the relationship between the two SMAs. The rule is to go long whenever the shorter SMA is above the longer one and vice versa.
Indicate a long position by 1 and short position by -1.
data['position'] = np.where(data['SMA1'] > data['SMA2'], 1, -1)
data.dropna(inplace=True)
data['position'].plot(ylim=[-1.1, 1.1], title='Market Positioning', figsize=(10, 6))
To calculate the performance of the strategy, calculate the log returns based on the original financial time series next. The code to do this is again rather concise due to vectorisation.
data['returns'] = np.log(data['price'] data['price'].shift(1))
data['returns'].hist(bins=35, figsize=(10, 6))
Comparing the returns shows that the strategy books a win over the passive benchmark investment:
data['strategy'] = data['position'].shift(1) * data['returns']
data[['returns', 'strategy']].sum()
data[['returns', 'strategy']].sum().apply(np.exp)
data[['returns', 'strategy']].cumsum().apply(np.exp).plot(figsize(10, 6))
Average, annualised risk-return statistics for both the stock and the strategy
data[['returns', 'strategy']].mean() * 252
np.exp(data[['returns', 'strategy']].mean() * 252) -1
data[['returns', 'strategy']].std() * 252 ** 0.5
(data[['returns', 'strategy']].apply(np.exp) - 1).std() * 252 ** 0.5
The maximum drawdown and the longest drawdown period
data['cumret'] = data['strategy'].cumsum().apply(np.exp)
data['cummax'] = data['cumret'].cummax()
data[['cumret', 'cummax']].dropna().plot(figsize=(10, 6))
drawdown = data['cumax'] - data['cumret']
drawdown.max()
The determination of the longest drawdown period is a bit more involved. It requires those dates at which the gross performance equals its cumulative maximum. This information is stored in a temporary object. Then the differences in days between all such dates are calculated and the longest period is picked out. Such periods can be only one day long and more than 100 days. Here, the longest drawdown period lasts for 596 days — a pretty long period.
temp = drawdown[drawdown == 0]
periods = (temp.index[1:].to_pydatetime() - temp.index[:-1].to_pydatetime())
periods[12:15]
periods.max()
Generalising the Approach
SMA Backtesting Class
- symbol: RIC (instrument data) to be used
- SMA1: for the time window in days for the shorter SMA
- SMA2: for the time window in days for the longer SMA
- start: for the start date of the data selection
- end: for the end data of the data selection
import SMAVectorBacktester as SMA
smabt = SMA.SMAVectorBacktester('EUR=', 42, 252, '201-1-1', '2019-12-31')
sumbt.run_strategy()
%%time
smabt.plot_results()
Strategies Based on Momentum
Two basic types of momentum strategies:
- Cross-sectional momentum strategies: Selecting from a larger pool of instruments, these strategies buy those instruments that have recently outperformed relative to their peers and sell those instruments that have underperformed. The basic idea is that the instruments continue to outperforma nd underperform, respectively — at least for a certain period of time.
- Time series momentum strategies: These strategies buy those instruments that have recently performed well and sell those instruments that have recently performed poorly. In this case, the benchmark is the past returns of the instrument itself.
Consider end-of-day closing prices for the gold price in USD (XAU=):
data = pd.DataFrame(raw['XAU='])
data.rename(columns={'XAU=':'price'}, inplace=True)
data['returns'] = np.log(data['price'] / data['price'].shift(1))
The most simple time series momentum strategy is to buy the stock if the last return was positive and to sell it if it was negative.
data['position'] = np.sign(data['returns']
data['strategy'] = data['position'].shift(1) * data['returns']
data[['return', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10,6))
data['position'] = np.sign(data['returns'].rolling(3).mean())
data['strategy'] = data['position'].shift(1) * data['returns']
data[['returns', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10, 6))
fn = '../data/AAPL_1min_05052020.csv'
data = pd.read_csv(fn, index_col=0, parse_dates=True)
data.info()
data['returns'] = np.log(data['CLOSE'] / data['CLOSE'].shift(1))
to_plot = ['returns']
for m in [1, 3, 5, 7, 9]:
data['position_%d' % m] = np.sign(data['returns'].rolling(m).mean())
data['strategy_%d' % m] = (data['position_%d' % m].shift(1) * data['returns']
to_plot.append('strategy_%d' % m)
data[to_plot].dropna().cumsum().apply(np.exp).plot(title='AAPL intraday 05. May 2020', figsize=(10, 6), style=['-', '--', '--', '--', '--', '--'])
Momentum Backtesting Class
- symbol: RIC(instrument data) to be used
- start: for the start date of the data selection
- end: for the end date of the data selection
- amount: for the initial amount to be invested
- tc: for the proportional transaction costs per trade
import MomVectorBacktester as Mon
mombt = Mom.MomVectorBactester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.0)
mombt.run_strategy(momentum=3)
mombt.plot_results()
mombt = Mom.MomVectorBacktester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.001)
mombt.run_strategy(momentum=3)
mombt.plot_results()
Strategies Based on Mean Reversion
Mean-reversion strategies rely on a reasoning that is the opposite of momentum strategies.
- GLD is the symbol for SPDR Gold Shares, which is the largest physically backed exchange traded fund (ETF) for gold.
- GDX is the symbol for the VanEck Vectors Gold Miners ETF, which invests in equity products to track the NYSE Arca Gold Miners Index.
data = pd.DataFrame(raw['DGX'])
data.rename(columns={'GDX': 'price'}, inplace=True)
data['returns'] = np.log(data['price'] / data['price'].shift(1))
SMA = 25
data['SMA'] = data['price'].rolling(SMA).mean()
threshold = 3.5
data['distance'] = data['price'] - data['SMA']
data['distance'].dropna().plot(figsize=(10, 6), legend=True)
plt.axhline(threshold, color='r')
plt.axhline(-threshold, color='r')
plt.axhline(0, color='r')
data['position'] = np.where(data['distance'] > threshold, -1, np.nan)
data['position'] = np.where(data['distance'] < -threshold, 1, data['position'])
data['position'] = np.where(data['distance'] * data['distance'].shift(1) < 0, 0, data['position'])
data['position'] = data['position'].ffill().fillna(0)
data['position'].iloc[SMA:].plot(ylim=[-1.1, 1.1], figsize=(10, 6))
data['strategy'] = data['position'].shift(1) * dta['returns']
data[['returns', 'strategy']].dropna().cumsum().apply(np.exp).plot(figsize=(10, 6))
Mean Reversion Backtesting Class
import MRVectorBacktester as MR
mrbt = MR.MRVectorBactester('GLD', '2010-1-1', '2019-12-31', 10000, 0.001)
mrbt.run_strategy(SMA=43, threshold=7.5)
mrbt.plot_results()
Data Snooping and Overfitting









