FXMasterCourse

Trading Order Out of Chaos

  • Facebook
  • LinkedIn
  • RSS
  • Twitter
  • About
  • Trading Approach
  • Resources
  • Forex Tools
    • MT4 Code
    • Tick Data
    • Useful Tools
  • In the Media
  • Testimonials
  • Contact
  • Member Login
You are here: Home / Trading Strategies / Improving your Python Backtesting – From DataFrames to Cython [Part 1]

January 26, 2021 by Corvin Codirla 2 Comments

Improving your Python Backtesting – From DataFrames to Cython [Part 1]

Improving your Python Backtesting – From DataFrames to Cython [Part 1]

Intro

Backtesting is every systematic trader’s basic tool. And Python is becoming the lingua franca of programming. So putting Python into Backtesting to get fast results should be possible!

Yes and no!

In this article, we’ll cover how to really improve your Python backtesting and boost your speeds by several orders of magnitude!

First a quick table of cons / pros of using Python:

ProsCons
Quick implementation time [Python’s forte!]You’ll grow really old waiting for the results
Gazillions of libraries for fancy output
Gazillions of libraries for fancy analysis

And it’s this contra that’s the real biggy.

You’re a trader, so by definition you already have the attention span of a goldfish. It stands to reason, therefore, that waiting for a couple of seconds for a backtesting result to come back is an eternity.

It really is when you’re looking at a portfolio of strategies, a portfolio of assets, a portfolio of both, or if you’re running any sort of optimization.

So, in this article we’ll cover some simple and more sophisticated ways of improving our timing. We’ll start out with pure Python solutions and in Part 2 of this series we’ll cover the more sophisticated Cython module set, to squeeze the last ounce out of our code.

To keep ourselves on the straight and narrow we’ll use the Java and C implementations as a benchmark.  Of course, these languages trounce Python.  But, by the end of our journey you’ll agree, that we don’t have to give up the comfortable life Python offers to get massive speed improvements.

Background

The above statements might cause people to immediately say “Vectorize your Code using Pandas and NumPy!”

Agreed, in many circumstances you can speed up execution by calling vectorized maths functions from Pandas and NumPy.

But, let’s face it, the challenge for trading is that your decision now is dependent upon a bunch of state variables from the last n time steps. This is where vectorization fails and your now in the world of having to write for-do loops.

And even Event Driven backtesters reduce to for-do loops when you run simulations on historical data.

So, the real challenge is running fast for-do loops in Python.

Is this possible?

Setup

To experiment and validate the various methods of speeding up our backtesting for-do loops we’ll use a straightforward trading system: the RSI(2) applied to the SPY from January 1993 to until now.

Written in pseudo code it looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// cash and position are arrays, indexing is not explicit

for all history do:
   cash today = cash yesterday
   position today = position yesterday

   // Exit Long
   if long and if C[0] > MA[Close, 5]
      posn today = 0
      cash today = cash today + shares * close
   
   // Enter long
   if no longs and if C[0]> MA[Close, 200] and RSI[Close, 2] < 20
      posn today = cash / close price
      cash today = cash today – posn today * close price

Since we simply want to focus on the efficiency of the for loops we’ll pre-calculate the Moving Averages and Relative Strength Indicators and simply look up the values in their corresponding array storages.

Python Backtesting – DataFrames

The usual way somebody could implement this in Python is to:

  1. load the data into a DataFrame using the ubiquitous Pandas library
  2. add columns to the DataFrame for moving averages and relative strength
  3. loop over the rows of the data frame to calculate cash and position values over the lifetime of the system

So, it’s worthwhile to see how looping is implemented using DataFrames, since in and of itself it’s not the most obvious, and it’s the first place to remove bottlenecks

Method 1: Indexing using dates

A naïve method would use the date index of the DataFrame to retrieve the values from the matrix, while looping over all dates.

For 7,000 rows this gives an execution time of 7.5 seconds.

This is pretty impressive slowness.

Here’s the code skeleton:

1
2
3
4
5
dt_range = df.index

for d in dt_range:
    if df.loc[d, ‘close’] > df.loc[d, ‘ma_short’]
    ...

You get the picture.

Method 2: Using the iterators from DataFrames [iterrows]

The above example might be intuitive since it queries data for specific days instead of using integer indices to get array / list / series values.

However, a more natural manner of accessing values in the matrix while looping would be to use the built-in constructs.

The first one you come across is the built-in DataFrame method iterrows.

For the same 7,000 rows the time taken to complete the loop is 1.3 seconds.

Here’s the code skeleton:

1
2
3
for ix, row in df.iterrows():
    if long and (row.close > row.ma_short):
       ...

So firstly, there is a hope: just by changing the approach we’ve sped the loop up by a factor of five. But it’s still pretty lousy. Imagine wanting to loop over the stocks in the S&P 500. This would take you roughly 9 hours.

Who has time to sit around for 9 hours!!

Is there a better way?

Method 3: A second iterator from DataFrames [itertuples]

It’s really remarkable that there are two methods which are so similar in behavior yet in terms of performance are light years apart.

Replace df.iterrows() by df.itertuples().  Here is the code skeleton:

1
2
3
for row in df.itertuples():
    if long and row.close > row.ma_short:
       ...

The syntax change is minimal; however, the speed up goes from 1.3 seconds to … wait for it … 0.03 seconds.

Yes, you’ve read that right!  The same code, same logic, and same container [the DataFrame] and we’ve sped up the code by a factor of 43 times.

This is pretty insane, right?

Method 4: Forget about DataFrames and pack everything into lists

So, if DataFrames can work so well, and DataFrames are actually nothing more than complex wrappers around simple arrays, what happens if we just throw out the wrapping and use the arrays themselves.  I.e., shove the data into Python lists and loop over those?

The overhead in programming is a bit more, since we now need to explicitly code for each individual list we want to use, but…

… the time for performing the 7,000 loops now has become 0.003 seconds!

You read that right! So, if we were to analyze the S&P 500 stocks it would take a total of 1.5 seconds.  Which is much better than lounging 9 hours in front of the screen.

Conclusion

Part 1 of this series took a monstrous 7.3 second loop, backtesting an easy system in Python and reduced it down to 0.003 seconds.  That’s an improvement of 2,400 times.  Nothing to be sniffed at, and it only involved some basic rewriting of code!

All we did was to acknowledge that DataFrames are great for storing data and applying some math functions to the columns (or rows) in a vectorized fashion.  But when it comes to looping, we might as well go down the old-fashioned way of using arrays (known in Python as lists).

So where to next?

In Part 2 of speeding up Python backtesting we’ll start to delve in the Cython module set.  This does something funky: it takes your Pythonesque source [a file that ends in .pyx] and transpiles it to the C language.  In so doing it can perform a bunch of optimizations which your Python interpreter wasn’t built to do, since it has to deal with most generic use cases.  However, you have the option of giving Cython very specific indications as to how your source code is supposed to be used.

The end result is an even bigger speed up!

Do we come close to Java and C on this simple loop?

Check in to Part 2 where we unveil the Cython results as well as provide a link to the GitHub code so you can can check it out for yourself!

Here’s the summary of the speed-ups to-date of our Python backtesting with the corresponding comparisons to a Java / C implementation:

ImplementationTime for RSI2 Backtests
Python – DataFrame, date indexing7.3 s
Python – DataFrame, iterrows1.3s
Python – DataFrame, itertuples0.03s
Python – Lists0.003s
Java0.00005s
C0.00002s

Filed Under: Trading Strategies

Subscribe to Get the One Exercise which will Improve Your Foreign Exchange Trading Straight Away

  • Get rid of your fear of losing
  • Learn to stick to the rules
  • Achieve a Zen like state when you trade

Comments

  1. Mason says

    February 5, 2021 at 6:57 pm

    Hi, good to see you’re back on the blog. Found a lot of very useful info out of this and your other posts.

    Reply
    • Corvin Codirla says

      February 23, 2021 at 10:52 pm

      Hi Mason!

      Thank you very much!

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Browse Topics

Tags

ADP AUDUSD Backtest Bid Ask Bonds Breakout CHFJPY Commodities ECB Economic Indicators Economic Releases Emerging Markets Engulfing Candles Equities EURCHF EURSEK EURUSD FOMC FX G10 GBPUSD HFT Kelly Market Timing Mean Reversion Moon News Events News Trading NFP ORB Performance Measures PPP Python Range Expansion Recession Risk Parity Sharpe Ratio Slippage Solar Eclipse SP500 Trading Systems USD USDCAD USDCHF USDJPY

Subscribe to Get the One Exercise which will Improve Your Foreign Exchange Trading Straight Away

  • Get rid of your fear of losing
  • Learn to stick to the rules
  • Achieve a Zen like state when you trade

Search Site

Testimonials

The bar in trader education has been set and I highly recommend anyone wishing to pursue a career in trading to invest in this program.         Jehan Jabar
One-on-One Coaching
Very well structured, interesting and insightfull
CASS Seminar
I really do rate everything with a “5.” Very good talk
CASS Seminar
Great teaching style. Very engaging. Material was presented in interesting manner and focused more on real life than theory.
CASS Seminar
Excellent, funny. Very knowledgeable!
CASS Seminar
Very clear, engaging and insightful.
CASS Seminar
Quick thank you for the suggestion to look at shorting the CHFJPY. Just closed at 122 pips profit. With only 9 pips Max loss. Nice.  Michele Russell
One-on-One Coaching
Very insightful from an intelligent and talented individual. Life changing.
CASS Seminar

© 2023 · FXMasterCourse · Privacy Policy · Terms & Conditions · Earnings Disclaimer

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT