Reading Financial Data From Different Sources

The Data Set

Suppose that we have file, AAPL.csv in a data folder.

fn = '../data/AAPL.csv'

with open(fn, 'r') as f:
    for _ in range(5):
        print(f.readline(), end='')

Reading from a CSV File with Python

import csv

csv_reader = csv.reader(open(fn, 'r'))

data = list(csv_reader)

Using a csv.DictReader iterator object instead of the standard csv.reader object.

csv_reader = cvs.DictReader(open(fn, 'r'))

data = list(csv_reader)

Calculate the mean of the Apple closing stock

sum([float(l['CLOSE']) for l in data]) / len(data)

Reading from a CSV File with pandas

Use pandas to work with the Apple stock price data set.

import pandas as pd

data = pd.read_csv(fn, index_col=0, parse_dates=True)

data.info()

data.tail()

Calculate the mean of the Apple stock closing values

data['CLOSE'].mean()

Exploring to Excel and JSON

pandas allows one to do the export in the form of Excel spreadsheet files as well as JSON files

data.to_excel('data/aapl.xls', 'AAPL')

data.to_json('data/aapl.json')

ls -n data/

Reading from Excel and JSON

data_copy_1 = pd.read_excel('data/aapl.xls', 'AAPL', index_col=0)

data_copy_1.head()

data_copy_2 = pd.read_json('data/aapl.json')

data_copy_2.head()

!rm data/*

Working with Open Data Sources

Obtain a key by signing up for a free Quandl account.

Data requests requiring an API key expect the key to be provided as the parameter api_key.

import configparser
config = configparser.ConfigParser()
config.read('../pyalgo.cfg')

import quandl as q

data = q.get('BCHAIN/MKPRU', api_key=config['quandl']['api_key'])

data.info()

data['Value'].resample('A').last()

Quandl also provides diverse data sets for single stocks, like end-of-day stock prices, stock fundamentals, or data sets related to options traded on a certain stock.

data = q.get('FSE/SAP_X', start_date='2018-1-1', end_date='2020-05-01', api_key=config['quandl']['api_key'])

data.info()

The API key can also be configured permanently with the Python wrapper.

q.ApiConfig.api_key = 'YOUR_API_KEY'

The Quandl platform also offers premium data sets for which a subscription or fee is required.

q.ApiConfig.api_key = config['quandl']['api_key']
vol = q.get(VOL/MSFT')
vol.iloc[:, :10].info()
vol[['IvMean30', 'IvMean60', 'IvMean90']].tail()

Eikon Data API

Refinitiv is one of the biggest financial data and news providers in the world. Its current desktop flagship product is Eikon, which is the equivalent to the Terminal by Bloomberg, the major competitor in the data services field.

Refinitiv have streamlined their API landscape and have released a Python wrapper package, called eikon, for the Eikon data API, which is installed via

pip install eikon

Get an Eikon app_key via the App Key Generator (APPKEY) application in either Eikon or Workspace

import eikon as ek

ek.set_app_key(config['eikon']['app_key'])

help(ek)

Retrieving Historical Structured Data

symbols= ['AAPL.0', 'MSFT.0', 'GOOG.0']

data = ek.get_timeseries(symbols, start_date='2020-01-01', end_date='2020-05-01', interval='daily', field=['*'])

data.keys()

type(data['AAPL.0'])

data['AAPL.0'].info()

data['AAPL.0'].tail()

%%time
data = ek.get_timeseries(symbols, start_date='2020-08-14', end_date='2020-08-15', interval='minute', field='*')

print(data['GOOG.0].loc['2020-08-14 16:00:00':'2020-08-14 16:00:00']

for sym in symbols:
    print('\n' + sym + '\n', data[sym].iloc[-300:-295])

Compare the following code

%%time
data = ek.get_timeseries(symbols[0], start_date='2020-08-14 15:00:00', end_date='2020-08-14 15:30:00', interval='tick', field='*')

data.info()

data.head()

resampled = data.resample('30s', label='right').agg({'VALUE': 'last', 'VOLUME': 'sum'})

resmaple.tail()

Retrieving Historical Unstructured Data

The code that follows retrieves news headlines for a fixed time interval that includes Apple Inc. as a company and “Macbook” as a word.

headlines = ek.get_news_headlines(query='R:AAPL.0 macbook', count=5, date_from='2020-4-1', date_to='2020-5-1')

story = headlines.iloc[0]

news_text = ek.get_news_story['storyId'])

from IPython.display import HTML

HTML(news_text)

Storing Financial Data Efficiently

Generates a sample financial data set in tabular form of arbitrary size

from sample_data import generate_sample_data

print(generate_sample_data(rows=5, cols=4))

Storing DataFrame Objects

The storage of a pandas DataFrame object as a whole is made simple by the pandas HDFStore wrapper functionality for the HDF5 binary storage standard. It allows one to dump complete DataFrame objects in a single step to a file-based database object.

%time data = generate_sample_data(rows=5e6, cols=10).round(4)

data.info()

Open a HDFStore object on disk and to write the DataFrame object to it.

h5 = pd.HDFStore('data/data.h5', 'w')

%time h5['data'] = data

ls -n data/data.*

h5.close()

Read the data from the file-based HDFStore object.

h5 = pd.HDFStore('data/data.h5', 'r')

%time data_copy = h5['data']

data_copy.info()

h5.close()

rm data/data.h5

Another more flexible way of writing the data from a DataFrame object to an HDFStore object.

%time data.to_hdf('data/data.h5', 'data', format='table')

ls -n data/data.*

%time data_copy = pd.read_hdf('data/data.h5', 'data')

data_copy.info()

The advantage of this approach is that one can work with the table_frame object on disk like with any other table object of the PyTables package that is used pandas.

import tables as tb

h5 = tb.open_file('data/data.h5', 'r')

h5.root.data.table[:3]

h5.close()

rm data/data.h5

Using TsTable

%%time

data = generate_sample_data(rows=2.5e6, cols=5, freq='1s').round(4)

data.info()

Some more imports and the creation fo the TsTable table object.

import tstables

import tables as tb

class desc(tb.IsDescription):
    '''Description of TsTables table structure.
    '''
    timestamp = tb.Int64Col(pos=0)
    No0 = tb.Float64Col(pos=1)
    No1 = tb.Float64Col(pos=2)
    No2 = tb.Float64Col(pos=3)
    No3 = tb.Float64Col(pos=4)

h5 = tb.open_file('data/data.h5ts', 'w')

ts = h5.create_ts('/', 'data', desc)

Write the sample data stored in a DataFrame object to the table object on disk

%time ts.append(data)

Read sub-sets of the data from a TsTables table object. TsTables returns the data already as a DataFrame object.

import datetime

start = datetime.datetime(2021, 1, 2)

end = datatime.datetime(2021, 1, 3)

%time subset = ts.read_range(start, end)

start = datetime.datetime(2021, 1, 2, 12, 30, 0)

end = datetime.datetime(20212, 1, 5, 17, 15, 30)

%time subset = ts.read_range(start, end)

subset.info()

h5.close()

rm data/*

Storing Data with SQLite3

The DataFrame class provides the method to_sql() to write data to a table in a relational database.

The size on disk with 100+ MB indicates that there is quite some overhead when using relational databases.

%time data = generate_sample_data(1e6, 5, '1min').round(4)

data.info()

import sqlite3 as sq3

con = sq3.connect('data/data.sql')

%time data.to_sql('data', con)

ls -n data/data.*

Consider a query that selects for column No1 all those rows where the value in that row lies between 105 and 108

query = 'SELECT * FROM data WHERE No1 > 105 and No2 < 108'

%time res = con.execute(query).fetchall()

res[:5]

len(res)

con.close()

rm data/*

Author: Zhe

Leave a Reply Cancel reply

Reading Financial Data From Different Sources

Author: Zhe

Related posts

Leave a Reply Cancel reply