Unix Trader: Modeling Mean Reversion for Statistical Arbitrage

The goal of a classical statistical arbitrage strategy is to extract returns by holding combinations of instruments that follow a mean reverting process. A classic example is a so-called Pairs Trading Strategy that benefits from the mean reverting relationship between correlated stocks such as Coca Cola and Pepsi. A second example is a Yield Curve Butterfly Strategy that attempts to capture the mean reverting relationship between medium, long, and short-term interest rates. Mean reversion is an attractive property as one can extract returns from the market by selling the peaks and buying the troughs of the wave. The amplitude and frequency are two characteristics of a wave that are important in stat arb. The amplitude determines the profit potential of a single cycle in the wave and the frequency determines how many of these opportunities arise over a period of time. We can simulate a sin wave to illustrate these ideas with the following form $$ y(t) = A \cdot \text {sin} ( 2 \pi ft ) $$

where A is amplitude, and f is frequency. The following figure shows two sin waves. The red wave has a frequency of 1 while the yellow wave has a frequency of 2. Thus, the latter wave oscillates more frequently and so presents more trading opportunities.

Of course, the greater the amplitude of the waves the greater the profit potential from an individual trade. The following plot shows a sin wave with an amplitude of 1 in red, and an amplitude of 2 in yellow, with both waves having a frequency of 2. In this case we prefer to trade the yellow series as we can make more profit in the same amount of time.

Although it is possible to find mean-reverting trades, in practice the waves are noisey. The introduction of noise makes it more difficult to statistically identify mean reverting behavior and also makes risk management a more challenging task. We can simulate noisey waves by including a noise term in the above equation $$ y(t) = A \cdot \text {sin} ( 2 \pi ft) + \epsilon_t $$
where $\epsilon_t$ is a noise variable that follows some distribution. The following figure shows a sin wave with and without normally distributed noise.

In my experience the noise tends to be non normal. For example, the Bund-Bobl-Schatz butterfly spread is mean-reverting, but it is subject to irregular extreme outliers that can result in large losses. To illustrate this, the following figure shows a sin wave with and without Student T distributed noise with 3 degrees of freedom. From the figure it is clear that there are multiple occassions where huge fat tailed events could have resulted in margin calls on positions that at the time were fading the spread.

In finance the most popular model for explaining mean reverting behaviour is the Ornstein-Uhlenbeck (OH) model. The stochastic differential equation (SDE) for the OH process is given by

$$ d S_t = \lambda ( \mu - S_t ) dt + \sigma d W_t $$
with $\lambda$ the mean reversion rate, $\mu$ the mean, and $\sigma$ the volatility. A useful metric is the Half-Life, denoted H, which indicates the number of periods it takes the process to revert from the peak half way to the mean. The Half-Life is defined as
$$ H = \frac{\text{log} (2) } {\lambda} $$
All other things held constant, we prefer a low Half-Life as this results in more trading signals over a period of time.

A previous blog post outlines simple closed form solutions for calibrating the OH model parameters to a given dataset using the least squares regression method, and the maximum likelihood method. The author provides Matlab code for the calibrations. I decided to implement his functions in Python. The Python implementation for the least squares calibration is given below

def OU_Calibrate_LS(S,delta):
    from numpy import sum,power,sqrt,log,exp
    n = len(S)-1

    Sx = sum(S[0:-1])
    Sy = sum(S[1: ])
    Sxx = sum(power(S[0:-1],2))
    Sxy = sum( S[0:-1] * S[1: ])
    Syy = sum(power(S[1: ],2))

    a  = ( n*Sxy - Sx*Sy ) / ( n*Sxx - power(Sx,2) )
    b = ( Sy - a*Sx ) / n
    sd = sqrt( (n*Syy - power(Sy,2) - a*(n*Sxy - Sx*Sy) )/n/(n-2) )

    lambda_ = -log(a)/delta
    mu     = b/(1-a)
    sigma  =  sd * sqrt( -2*log(a)/delta/(1-pow(a,2)) )
    return (mu,sigma,lambda_)

And the maximum likelihood implementation is given by

def OU_Calibrate_ML(S,delta):
    from numpy import sum,power,sqrt,log,exp
    n = len(S)-1

    Sx = sum(S[0:-1])
    Sy = sum(S[1: ])
    Sxx = sum(power(S[0:-1],2))
    Sxy = sum( S[0:-1] * S[1: ])
    Syy = sum(power(S[1: ],2))

    mu  = (Sy*Sxx - Sx*Sxy) / ( n*(Sxx - Sxy) - (power(Sx,2) - Sx*Sy) )

    lambda_ = -log( (Sxy - mu*Sx - mu*Sy + n*power(mu,2)) / (Sxx -2*mu*Sx + n*power(mu,2)) ) / delta
    a = exp(-lambda_*delta)
    sigmah2 = (Syy - 2*a*Sxy + power(a,2)*Sxx - 2*mu*(1-a)*(Sy - a*Sx) + n*power(mu,2)*power(1-a,2))/n
    sigma = sqrt(sigmah2*2*lambda_/(1-power(a,2)))

    return (mu,sigma,lambda_)

In a future post I will investigate a number of potential mean reverting opportunities within the futures market using the OH model as a guide. Thanks for reading!

Popular Posts

Friday, 4 October 2013

Modeling Mean Reversion for Statistical Arbitrage

No comments:

Post a Comment