Thursday, September 17, 2015

Diversified Statistical Abritrage: Dynamically combining mean reversion and momentum investment strategies - James Velissaris 2010


A dynamically adjusted strategy between mean-reversion and momentum (2008, 2009). Stocks are grouped together using PCA. The idiosyncratic returns is calculated by comparing the returns of the stock to the returns of the entire group. This residual return often oscillates around a long-term mean. This strategy is dollar neutral and have high turnover. The medium-term momentum strategy trade the 9 sector ETFs, based on technical trading rules. Dynamic allocation was done between the  11 strategies, with rebalancing at the end of each month. Out of sample IR of 2.27, with beta 35%

Equity mean reversion model

The decomposition of the stock returns is given by $$r_t = \alpha + \sum_{j=1}^n \beta_j F_t + \epsilon_t.$$ PCA of the normalized returns (after data centering and normalization in 252 day moving window) is used and the first 12 factors are retained. The Eigenportfolio returns $F_{jt}$ are given by $\sum_i \frac{v^{(j)}_i}{\sigma_i}R_{it}$. We, further, neglect the drift in returns. The model we implement is $dX_t=k(m-X_t)dt+\sigma dW_t$. The mean reversion time is $\tau = 1/k$. Use stock with mean reversion within 20 days, and for the s-score $s=\frac{X_t-m}{\sigma_{eq}}$ at +1.25 go short and get out at +0.75 (similarly for long). Trading cost of 10 bps. The model is two-times levered per side or four-times levered gross (industry standard).

Momentum strategy

S&P500 industry sector ETFs, S&P500 ETF and SPY. 60 and 5 day exponentially moving average is used. Signal long if 5d EMA is above 60d EMA for the previous 4 or more trading days. In all other scenarios the signal is short. There is no rebalancing the trade and 10 bps cost assumed.

In-sample analysis

2005-2007 in sample show mean-reversion strategy being much better than momentum with an IR of 1.28. The equally weighted strategy has an IR of 0.49.

Optimization and out-of-sample results

There are returns to be made by dynamically optimizing the weights of different strategies. We can use Quadratic programming with the objective function and constraints as $$\min_x \frac{1}{2}x^THx+f^Tx \quad Ax \le b, \quad A_{eq}x=b_{eq}, \quad lb \le x \le ub.$$
An important input into the process is lower and upper bounds for each variable. Using expected returns and allocation targets, we can customize the optimization process to best suit our portfolio specifications. The goal of this optimization is to maximize the Sharpe ratio of the diversified portfolio with a penalty for marginal risk contribution. The portfolio was optimized at the end of each month using the returns from the previous 252 trading days. There was no transaction cost used, except flat 10 bps per trade. The diversified strategy IR is 2.27 vs static allocation IR of 1.56, out-of-sample. The mean reversion strategy has a beta exposure. Optimization can be used to control beta, volatility and leverage as well to control drawdowns.


  • Potential benefit of including both mean-reversion and momentum in portfolio.
  • Did not hedge the beta risk using SPY, but can be done.
  • Momentum signal using PCA eigen-portfolios is not apparent at individual stock level.
  • Potentially greater alpha at finer time scales.
  • Varying time-scales with signal decay for both momentum and mean reversion can be useful.

Wednesday, September 16, 2015

Scaling by correlation matrix

We analyze the effect of scaling a signal by the inverse of correlation matrix here. We start by assuming that the two assets $A_1$ and $A_2$ have unit variance. This reduces the co-variance matrix to correlation matrix. We assume a simple correlation matrix of the form $$\begin{bmatrix} 1 & c \\ c & 1 \end{bmatrix}.$$ Now let's say we have generated a signal of $\mu_1$ and $\mu_2$ for the two assets before scaling. This means that the unscaled portfolio can be written as $$\mu_1 A_1 + \mu_2 A_2.$$ Now the inverse of the correlation matrix is $$\frac{1}{1-c^2}\begin{bmatrix} 1 & -c \\ -c & 1\end{bmatrix}.$$ This makes the scaled signal ($\Sigma^{-1}\mu$) $$\frac{\mu_1-c\mu_2}{1-c^2}A_1+\frac{\mu_2-c\mu_1}{1-c^2}A_2.$$ We can see that based on the 'original signal' ($\mu_1$ and $\mu_2$) and the correlation value ($c$) the 'scaled signal' is altered. Another way to look at the 'scaled signal' is to write the portfolio as $$\mu_1\left[\frac{1}{1-c^2}A_1-\frac{c}{1-c^2}A_2\right] + \mu_2\left[\frac{1}{1-c^2}A_1-\frac{c}{1-c^2}A_2\right].$$ This is another way of saying that we trade the same original signal but replace the assets $A_1$ and $A_2$ with the spreads $\left[\frac{1}{1-c^2}A_1-\frac{c}{1-c^2}A_2\right]$ and $\left[\frac{1}{1-c^2}A_2-\frac{c}{1-c^2}A_1\right]$. In the table below we look at this 'spread' for different values of correlation coefficient $c$.  We also see the 'altered' signal value for the assets $A_1$ and $A_2$.
c & \text{$\mu_1$} & \text{$\mu_2$} &A_1 & A_2  \\
+0.9 & 5.3A_1-4.7A_2 & 5.3A_2-4.7A_1 & 5.3\mu_1-4.7\mu_2 & 5.3\mu_2-4.7\mu_1  \\
+0.5 & 1.3A_1-0.7A_2 & 1.3A_2-0.7A_1 & 1.3\mu_1-0.7\mu_2 & 1.3\mu_2-0.7\mu_1 \\
+0.1 & 1.0A_1-0.1A_2 & 1.0A_2-0.1A_1 & 1.0\mu_1-0.1\mu_2& 1.0\mu-0.1\mu \\
0.0 & A_1 & A_2 & \mu_1 & \mu_2\\
-0.1 & 1.0A_1+0.1A_2 & 1.0A_2+0.1A_1 & 1.0\mu_1+0.1\mu_2 & 1.0\mu_2+0.1\mu_1  \\
-0.5 & 1.3A_1+0.7A_2 & 1.3A_2+0.7A_1  & 1.3\mu_1+0.7\mu_2 & 1.3\mu_2+0.7\mu_1   \\
-0.9 & 5.3A_1+4.7A_2 & 5.3A_2+4.7A_1  & 5.3\mu_1+4.7\mu_2 & 5.3\mu_2+4.7\mu_1
For the case of high absolute correlations, till $\mu_1$ and $\mu_2$ are comparable the total portfolio values are within limits. But if $\mu_1$ and $\mu_2$ differ substantially huge positive and negative positions can be created, which may be undesirable. This is a likely scenario as signals are based on recent updated information while the correlations rely on slow window.

What if we add a third asset $A_3$ with signal $\mu_3$ which is uncorrelated to the first two assets? We have the correlation matrix as $$\begin{bmatrix} 1 & c & 0 \\ c & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix},$$ the inverse of this matrix is $$\frac{1}{1-c^2}\begin{bmatrix}1 & -c & 0\\ -c & 1 & 0 \\ 0 & 0 & 1-c^2\end{bmatrix}.$$ This results in the following 'altered' portfolio $$\frac{\mu_1-c\mu_2}{1-c^2}A_1+\frac{\mu_2-c\mu_1}{1-c^2}A_2+\mu_3A_3.$$ This shows that the signal of the uncorrelated asset is not changed.

Pairs trading the commodity futures curve - Antti Nikkanen

Notes on Antti Nikkanen Master's thesis Aug 2012

Ch1. Introduction

Commodity futures trading strategy, which exploits the roll returns of commodity futures as its main driver of excess return. To minimize the volatility of returns, pairs trading methodology is used to trade the futures curve, with a Sharpe of 3. Liquidity is taken into account with trading cost of 3.3 bps. Commodity is still unknown because of lack of good data, it being a derivative security, short maturity claim on a real asset and have pronounced seasonality in prices levels and volatility.

Ch2. Literature Review

Hong and Yogo (2012) show that aggregate basis (ratio of futures price to commodity price) is the most important predictor of commodity returns. The main factor behind the fluctuation of the aggregate basis is hedging pressure (how much producers short commodity futures to hedge their long positions in the underlying spot).

Erb and Harvey (2006) show that roll returns explain more than 90% of long-run cross-sectional variation of commodity futures returns over 1982-2004. The time-series variation of future returns is mostly explained by spot price movement. To become spot neutral the author creates spreads.

Fuertes and Miffre (2010) show tactical position of shorting contangoed and long backwarded futures. They also include momentum.

Gorton and Rouwenhorst (2005) state that the commodity futures returns are negatively correlated with those of equity and bond returns. But this low correlation exists only in 'normal' markets. The spread strategy reduces correlation even in 'abnormal' markets.

Ch3. Theory

Commodity markets do not fit the CAPM (Bodie and Rosansky 1980) because it is difficult to make a distinction between systematic risk/return and unsystematic risk/return. Also, the price is dependent on demand and supply factors, not perceived adequate risk premiums.

Stocks (like Finnish mining company Talvivaara) follow closely the price of underlying commodity (nickel). But many companies, especially the oil companies have hedged away its oil exposure e.g. ExxonMobile. With commodity ETFs there may be large tracking error e.g. USO is an oil ETF but lagged massively the movements in oil prices after the 2008 crash due to rolling the portfolio in times of negative roll returns. GLD on the other hand tracks the spot gold quite closely.

Less than 1% of futures contract result in a delivery of the underlying asset. Commodity futures do not represent direct exposures to actual commodities. They are bets on expected future spot prices (Gourton and Rouwenhorst 2005). The relationship between the futures and spot price is $F=Se^{(r+c-y)(T-t)}$, where $r$ is the risk free rate, $c$ is the storage cost (storage facilities, insurance, inspections, transportation and maintenance, spoilage and financing), $y$ is the convenience yield (ability to profit from local supply demand imbalances, leasing of gold to jewelry manufacturers).

Economics of backwardation and contango

Upward sloping (contango) and downward sloping (backwardation) are determined by demand, supply and seasonal changes. For a hedger who is inherently long (petroleum producer long on crude through exposure to oil exploration, developing refining and marketing), speculators are going to take the long risk if the price is sufficiently discounted vs spot price, i.e they are in backwardation. (Anson 2009). Contango occurs for commodities in which the hedger is inherently short to the exposure of commodity (e.g. aircraft manufacturers that does not have aluminum mines, willing to purchase the futures contract of a future aluminum delivery). Hence, profits for the speculator is determined by the amount the hedgers have interest for risk capital, not the long-term price trends of the commodity markets (Anson 2009).

Hicks' rational expectations hypothesis states that the price of an asset for delivery in future must be the market's current forecast of the spot price on the future delivery date (spot does not move in presence of any further information). This has proven not to be useful practically. Storage models have been better at explaining practicality, which states that relationship between the spot and future depends on storage levels and expected storage levels in the future (i.e. inventory). This mean there is an expectation of the spot price to move as well through maturity. A difficult to store commodity (NG) has steep forward curve. When inventories are high relative to demand, the curve will be upward-sloping and when tight downward-sloping (Till, Feldman 2006). These, difficult to store commodities (HO, HG, LC, LH) have the highest average excess returns versus easy to store commodities.

Commodity futures returns composition

Commodity returns is the sum of spot return, risk-free rate and roll return. Commodity markets are usually favorable for sudden spot price rises but show mean-reverting tendency over longer periods.


Generally trend following, in contrast to market timing strategies where statistical techniques are used to predict the trends before they become apparent. Managed futures strategies are either technical or fundamental in either systematic or discretionary manner. Most do technical systematically. Bridgewater, an exception, does fundamental systematically, e.g. in 2008 they spotted the possibility for either an inflationary or a deflationary deleveraging through contraction in private credit growth, declining stock market and a widening credit spread and adjusted their positions based on 1920s Germany, 1980s Latin American inflationary deleveraging and the deflationary deleveraging of Great depression in the 1930s and Japan in 1990s (Schwager 2012).

A hedge against inflation

In inflationary periods, usually long commodity future positions benefit and stock and bond returns are negatively impacted, because the purchasing power of the money declines and earning power of the corporation erodes.

Pairs Trading

Johansen test can check the cointegration of multiple time series at a time. It is a relative strategy and does not care about absolute value of the assets. With stocks, it is more common that just one of the assets is over or under priced (Gatev, Goetzmann, Rouwenhorst 2006). For futures curve, even the underpriced contracts when in contango, usually have a negative expected return.

The main reason to pairs trade the future curve is to hedge price movement risk and only capture the part of the commodity futures roll return. This strategy could be made dynamically adjusting to be more profitable.

For two time series to move together there needs to be something called the error correction, which causes correction of prices and hence mean reversion. Usually the order of integration is first determined with a unit root test before running an actual cointegration test (crucial to check with common sense and graphics). Augmented Dickey-Fuller test takes care of the autocorrelation in the difference variable series. Johansen test is based on the error-correction representation of the VAR equation and testing for reduced rank and then using Granger's representation theorem to get the cointegration vector.

Ch4. Empirical work

1991 to 2012. Daily frequency of 12 nearest contracts of 20 commodities. Transaction cost of 3.3 bps per leg per trade and contracts with open interest less than 20000 not traded.


  1. Determine the shape (contango vs backwardation) by taking the difference of the first five contracts, and taking an average of them. $$\frac{1}{5}\sum_{i=1}^5(f_i-f_{i+1}).$$
  2. If the result is positive (backwardation), go long the 'most' backwarded contract (maximum absolute slope), which is equivalently the most out of its path regarding its cointegration with the other data points in the curve. The position is taken onto the further contract.
  3. The short position is determined by taking the smallest value of differenced contracts and going short on the further contract.
  4. The pair is chosen only if both have open interest more than 20000.
  5. If contango, the process is same but reversed. Take position into the largest difference and a long position into the smallest absolute difference.
  6. At the start of each month the portfolio is set up for next 30 days, with equal weights.
All the commodity curves are found to be cointegrated. The information ratio is 3.1 for monthly rebalancing. All assets show positive returns. This can be bifurcated between roll returns (alpha genration) and hedged returns (to reduce volatility). Feeder cattle is invested only 3% of the time period while CL is invested 100%. daily traded strategy is similar with more trading cost, but good returns.


  1. The current strategy is suboptimal in terms of when to trade.
  2. Entry should be based on price deviations form the equilibrium level.
  3. Best 5 instead of all would produce better results.
  4. To choose the 'hedging pair' from the real difference of the futures price and not the absolute price difference. This would capture the, though rare, instances where the futures curve has elements of both backwardation and contango.

Four Essays in Stat Arb - Jozef Rudy

These are my notes on the phd thesis 'Four Essay in Statistical Arbitrage in Equity Markets' by Jozef Rudy. Hoping to implement some of these eventually.

Ch 1 - Introduction

This is just a summary chapter. The work is mostly about Pairs trading and its modifications, concentrating on daily trading but also applying high frequency data and other modification. There is also a chapter on mean reversion strategies - fitting under statistical arbitrage.

The standard market approach is daily sampling (Gatev 2006). In the standard form, the edge such strategies provide seems to be dissipating. Going to higher frequency can potentially achieve higher information (Aldridge 2009). Nonstandard half-daily sampling frequency and using ETFs can further help the performance.

Ch 2 - Literature Review

Nunzio Tartaglia is credited for developing pairs trading at Morgan Stanley in 1980s. Hugely successful but profits have come down recently. That is why one needs to go into higher frequencies Marshall et al. (2010). Similarly, Shulmeister (2007) finds that technical are profitable, but only on higher time frames. That motivates half-daily timeframe. 

Engle and Granger (1987) brought cointegration to limelight. Johansen (1988) developed the critical test. For a pair, the simpler method is to first calculate the beta using $P_{1t}=\beta P_{2t}+\epsilon_t$. Then check the residual using Augmented Dickey-Fuller unit root test (ADF) at 95% confidence using 
$$\Delta \epsilon_t = \phi+\gamma\epsilon_{t-1}+\sum_{i=1}^{p}\alpha_i\Delta \epsilon_{t-1}+u_t.$$
We include the most significant lags in an iterative sense and then check for the no cointegration using $\gamma=0$, against the hypothesis $\gamma<0$. 

For more than two assets one need to use Johansen method. Non-parametric distance method (Gatev 2006) and stochastic approach (Mudchanatongsuk 2008) has also been used.

Time adaptive models like Kalman filter have been shown to be superior to rolling window OLS based methods due to forward looking methodology of the former. Double exponential smoothing-based prediction based models can give comparable results to Kalman filter but run order of magnitude times faster.

'Market neutral' hedge funds are generally pairs trading kind of funds.

Ch 3 - Stats Arb. and HF data 

The main innovation is to apply statistical arbitrage technique of pairs trading to high-frequency equity data (Eurostoxx 50 stocks). This is done for 5-minute interval (IR~3) to daily frequency (IR~1). Pairs are chosen based on best in-sample IR and highest in-sample t-stats of the ADF test of the residuals of the cointegrating regression sampled at daily frequency. 5 best pairs are chosen. The simplest method is Engle and Granger (1987) cointegration approach. To make beta parameter adaptive the following techniques can be used - rolling OLS, DESP model and Kalman filter.

Cointegration model

Take pairs from same industry based on economic reasoning and apply OLS regression on them:
$$Y_t=\beta X_t + \epsilon_t$$
Then test the residuals of the OLS regression for stationarity using the Augmented Dickey-Fuller unit root test.

Rolling OLS

Similarly we can calculate the rolling beta using rolling OLS. This approach suffers from 'ghost effect', 'lagging effect' and 'drop-out-effect'. The window can be optimized for maximum in-sample IR. This was around 200 periods. This was used for out of sample.

Double Exponential smoothing prediction model

We first calculate $\beta_t=Y_t/X_t$. We then do double smoothing by:
$$S_t = \alpha \beta_t+(1-\alpha)S_{t-1}$$
$$T_t=\alpha S_t + (1-\alpha)T_{t-1}$$
Using these the prediction of beta at time period $t+1$ is 
$$\hat{\beta}_{t+1} = \Bigg[2S_t-T_t\Bigg] + k \Bigg[\frac{\alpha}{1-\alpha}(S_t-T_t)\Bigg].$$
$k$ is the number of look-back periods. the optimized values of $\alpha$ and $k$ are 0.8126 and 30.

Time-varying parameter model with Kalman filter

This is more optimal than OLS for adaptive parameter estimation. The measurement equation is
$$Y_t=\beta_t X_t+\epsilon_t$$
and the state equation is
The idea to add second equation is based on the intuition that there is some characteristic of beta, i.e. auto-correlation which can be added as information for better estimation. The noise ratio is to be optimized yielding $3e^{-7}$.

The pair trading model

Choosing the pairs within an industry makes us immune to industry wide shock. The spread between the pairs is calculated as $z_t=P_{Y_t}-\beta_{t}P_{X_t}$. We did not include a constant in any of the models. This spread is normalized by subtracting the mean and divided by the standard deviation. Entry is at 2 standard deviation and exit near 0.5 standard deviation. Once the entry is triggered we wait one period before we enter. We choose money neutral investment by putting equal money in the two  sides (irrespective of the $\beta$). There is no re-balancing. When normalized spread returns to its long term mean, it is caused by the combination of two things: real reversal of the spread and adaptation of beta to new equilibrium value - leading to not total reversal in dollar value even when the spread has totally reversed. 

In sample indicators are used with the objective to identify out of sample performance:
1) t-stat from ADF test on the residuals of the OLS regression.
2) the information ratio
3) half life of mean-reversion. 
The half-life is given by $-ln(2)/k$, where k is the median unbiased estimate of the strength of  mean-reversion OU equation
$$dz_t = k(\mu-z_t)dt+\sigma dW_t$$
where $z_t$ is the value of the spread, $\sigma$ is the standard devation. The higher the $k$, the faster the spread tends to revert to its long term mean. In sample IR is also used as a metric (IR 2 means strategy is profitable every month, IR 3 means strategy is profitable every day). IR is overestimated if the returns are auto-correlated.

Out of sample performance 

Assuming a trading cost of 30 bps one way. The best result comes out for 30 minute interval. Kalman is the best out of - fixed beta, rolling OLS, DESP and Kalman, with the smoothest beta (Table 3-3).

Further investigations

Relationship between the in-sample t-stats and the out-of-sample information ratio

The in-sample t-stats for the fit is positively correlated to out of sample information ratio for upto 10 minutes frequency. Beyond this the correlation is statistically indistinguishable from 0. 

Relationship between t-stats for different high-frequency and pairs

Trading pairs have similar t-stats across all frequencies is ascertained by the first PCA explaining almost all of the variance (after standardizing the t-stats of ADF test for all pairs). This has the following implication - once a pair has been found to be co-integrated at a certain frequency, it tends to be co-integrated across all frequencies. 

Does cointegration in daily data imply higher frequency cointegration 

The correlation between t-stats (of the ADF test) of daily data and 5-minute data has an interval of [-0.03,0.33] using bootstrapping. Hence, co-integration found at daily frequency implies there is co-integration at 5-min interval as well.

Does in-sample information ratio and the half-life of mean reversion indicate what the out-of-sample information ratio will be?

 Using bootstrapping the confidence bounds indicate that the in-sample information ratio can positively predict the out-of-sample information ratio to a certain extent. Also, There is negative relation between the half-life of mean reversion and subsequent out-of-sample information ratio.

A diversified pair trading strategy

Using the indicators presented above, best 5 pairs are selected. Best in-sample IR - gives attractive the out of sample performance. Half-life of mean reversion - does not work out. In-sample t-stats of the ADF test of the cointegrating regression as indicator only works for 5 to 10 minute strategies. A combination is worse than individual indicators. Finally, a daily IR of 1.34 and high frequency IR of 3.24 comes out to be better than simple long.

Ch 4 - Profitable Pair Trading: A comparison using the S&P 100 constituent stocks and the 100 Most liquid ETFs

The greatest known risk to pairs trading is a stock going bankrupt. ETFs can avoid that. But are they equally profitable? It turns out they are than stocks based on adaptive long-short strategy (IR of 1 vs 0), extending in-sample period (1.7 vs 0.2) and preselecting pairs based on in-sample IR (2.93 vs 0.46).The ratio can be made time adaptive via Kalman filter. Pairs trading strategy in its basic form might be becoming unprofitable.

Datastream is used to get data for 100 most liquid ETFs and S&P100 stocks. In-sample period of 3/4 and 5/6 is used. Based of if there is cointegration or not 428 ETF pairs and 693 stock pairs are evaluated.


Bollinger bands are used with 20 day moving window with 2 standard deviation windows for entry/exit triggers, in general. These parameters are optimized for max in-sample IR and differ from one pair to another.


The spread is calculated using adaptive beta using Kalman filter, based on prices. By optimizing the noise ratio $Q/H$, an increase in ratio makes the beta more adaptive and decrease more smooth. Constant level is not used to reduce parameter. We invest the same amount of dollars on each side of the trade. Once invested, we wait for the spread to revert back. The initial money neutral positions are not dynamically rebalanced.

Out of sample results 

With 75% in-sample the IR for ETF and stocks are 1.06 and 0.08 respectively. This increases to 1.71 and 0.22 for 83% in-sample respectively. ETFs used are index trackers, thus they contain lower idiosyncratic risk as shares. Index divergence is more probable to reverse than stock divergence, where the reason could be more fundamental. Much better results of ETFs could also be a result of a stronger autocorrelations of ETF pairs compared to shares. Lower volumes traded (only marginally) also makes ETF market less competitive

Results for the best 50 pairs

The correlation between in-sample and out-of-sample IR is 0.24 and 0.14 for ETFs and Stocks. This motivates using better performing in-sample pairs in out of sample. This increases the IR to 1.58 and 0.13 for 75% in-sample case for ETFs and Stocks respectively. And an IR of 2.93 and 0.46 for 83% in-sample case.


  1. ETFs are better than Stocks because of non-existence of idiosyncratic risk in ETFs.
  2. Decreasing out-of-sample period improves performance. Hence, re-estimating the model once per week will improve the results.
  3. In-sample IR predicts out of sample IR.

Ch 5 - Mean Reversion based on Autocorrelation: A comparison using the S&P 100 constituents and the 100 most liquid ETFs

Simple strategy based on normalized previous period's return and the actual conditional autocorrelation can give traders and edge. ETFs are more suitable than Stocks and half-daily frequency improves the performance.


  1. Form pairs with 30 days trailing conditional correlation above the threshold of 0.8
  2. Eliminate pairs with a previous day's normalized spread returns smaller than 1.
  3. Select pairs with first order autocorrelation within certain bounds.
Two different samplings - daily and half-daily are used, with 4 year in sample and out of sample period.

Contrarian profits, explained by overreaction hypothesis causing negative autocorrelation, have decreased in recent periods (Khandani and Lo 2007). Higher frequencies still have some juice (Dunis et al 2010). Market neutral strategies have been shown to be exposed to general market factors. S&P 100 stocks and 100 ETFs are used with investment exactly for one trading period.


JPMorgan (1996) method is used to calculate conditional (time-varying) volatility and conditional correlation (cutoff 0.8), over a period of 30 days. $$cov(r_A, r_B)_t=\lambda cov(r_A,r_B)_{t-1}+(1-\lambda)r_A r_B,$$ where $\lambda$ is the constant 0.94, corresponding to 30 days. The return of the spread is simply the difference of the returns of the constituents. The conditional autocorrelation of the pair is calculated as $$\rho_t=\frac{cov(r_t,r_{t-1})_t}{\sigma_t \sigma_{t-1}},$$  where $r_t$ is the returns of the spread pair. The conditional covariance of the pair is calculated as $$cov(r_t,r_{t-1})_t=\lambda cov(r_t,r_{t-1})_{t-1}+(1-\lambda)r_t r_{t-1}.$$ The normalized returns of the spread is simply $$R_t=\frac{r_t}{\sigma_t}.$$ We only trade pairs with normalized returns above 1. If the autocorrelation is negative we bet on the reversal otherwise be bet the pair will continue to move in the same direction as in current period, with each pair held only for one period. 5 best pairs with highest normalized returns are chosen.

Trading results

Trading cost of 20 bps per pair trade is assumed. Net of cost IR for in-sample and out-of-sample top 1, 5, 10 and 20 best pairs for different autocorrelation ranges is all negative for stocks. The results are positive both for in-sample and out-of-sample for ETFs (5, 10, 20 pairs) for the range -0.4 to 0 (but not -1 to -0.4).

For half-daily frequency results are better but still not good enough for shares. For ETFs the results are stupendous for the full negative autocorrelation range. Positive autocorrelation range is not that productive.

The out of sample results are consistent till 2009 after which it is flat. Adding more pairs makes the equity curve more consistent.

Ch 6 - Profitable Mean Reversion after large price drops: A story of Day and Night in the S&P500, 400 Mid Cap and 600 Small Cap Indices

Open-to-close (day) and close-to-open (night) have information. The worst performing shares during the day (resp. night) are bought and held during night (resp. day). The alpha is not explained by Fama and French 3-factors and Carhart 5-factors.

Literature review

Contrarian returns have been reducing (Khandani and Lo 2007). Most strategies use close to close information and don't make use of the opening prices into account. Existence of contrarian profits can be explained by overreaction hypothesis (Lo and MacKinlay 1990), with a negative autocorrelation assumption. De Bondt (1985) show that for 3 years rebalancing losers beat the past winners, with the outperformance continuing as late as 5 years after the portfolio have been formed. Predictability of short-term returns are exploited either by momentum or reversion. Serletis and Rosenberg (2009) show the Hurst exponent for the four major US stock market indices during 1971-2006 display mean-reverting behavior. Bali (2008) find that the speed of the mean reversion is higher during periods of large falls in prices.

De Gooijer et al. (2009) find non-linear relationship between overnight price and opening price. Cliff et. al. (2008) show that night returns are positive while day returns are 0. The effect is partly driven by the higher opening prices which decline during the first trading hour of the session.

Financial Data

Stocks consisting of - S&P 500, S&P 400 MidCap and S&P 600 SmallCap are used. Data from 2000-2010 adjusted prices. 5bps trading cost one way.  We calculate open-to-close day returns and close-to-open night returns. The average return of holding the shared during day and night is very similar for the constituent stocks of S&P 500 index and is slightly positive for both. For S&P 400 MidCap the daily returns are positive and overnight returns negative, similar to S&P 600 SmallCap. These differences are not profitable after trading cost.

Trading Strategy

Exploit the mean reverting behavior of the largest losers either during the day or night. Version 1 (day holding) buys n worst performing shares during the close-to-open period (decision period) with shares bought at the market open and sold at market close, equally weighted. Version 2 (night holding) buys n worst performing shares during the open-to-close period (decision period). The Benchmark strategy buys the n worst losers based on full day returns.

Strategy Performance

For S&P 600 small cap, the first two deciles (stocks with largest decline during the decision period) produce high IRs and the last two negative (a short strategy will work, which is not examined here). This holds true for both day and night strategies. There is a clear structure present going from top to bottom deciles. Overreaction is not as strong for mid cap stocks as it for small caps. But the pattern is similar and extreme deciles are profitable.

The benchmark strategy (close to close decision period with subsequent close to close as holding period) has been unprofitable for Small, Mid and S&P500 cross section more recently. Version 1 and version 2 have been more profitable.

Park (1995) claims that the profitability of mean reversion strategy disappears once the average bid-ask price is used instead of a closing price, i.e. the most significant part of the close-to-close contrarian strategy is caused by the bid-ask bounce and is not achievable in practice. The two versions shown here are better than the benchmark (close-to-close) and hence this strategy is immune to bid-ask bounce.

Multi-factor Models

Style factors:
  • CPAM model by Sharpe (1964) - market returns.
  • Fama and French 3-factor model (1992) - Mkt, small-big, value-growth.
  • Adj. Carhart's 5 factor model (1997) - Mkt, small-big, value-growth, Momentum: High returns - low returns (M2 to M12), reversion: low returns - high returns (M1).
$\alpha$ comes out positive for each case. Momentum factor turns out to be negative while the reversal factor comes out positive, as expected.

Ch 7 - General Conclusions

Two ways to improve trading results:
  1. Using more data - higher frequency, bigger universe. Even including opening prices can be hugely beneficial. Getting opening price and instantly process is a challenge.
  2. Using advanced modeling - Kalman can be fast and efficient vs OLS. Factor neutralizing the pairs ratio (not only industry neutral as done here) can further improve the results. Neural networks and SVM can be used to predict the future direction of spreads instead of using fixed std. level for the spread entry specification.
Delving more into model complexity, as opposed to data complexity, would be more beneficial.

Monday, September 7, 2015

Option trading: Pricing and Volatility strategies and techniques - Euan Sinclair

Traders are pragmatic, interested in results. But focusing on the process can take care of the results. Good traders learn this. But good traders are intellectually parsimonious due to the demand of trading. But more knowledge brings more adaptability in uncertain times.

Derivatives traders don't need technical or fundamental analysis but sound knowledge of market structure and arbitrage relationships. Causality need to be investigated in every model.

Ch1. History

Options are not modern invention. They have a longer history than either stocks or bonds. Options are legal contracts, and hence subject to changes in the legal system (e.g. Dutch Tulips crash case in 1636). The South Sea bubble crash of 1720 involved a form of call options. The first exchange to list standardized contracts was the CBOE in 1973. Black-Scholes-Merton model published the same year. Options may well have been a tool in the speculative bubble, but were not the root cause. They are inevitable for modern risk management.  

Ch2. Introduction to Options

One must simply know  all details of the instrument's specifications. For example, FXP gives twice the daily negative returns of FXI, does not mean the compounded returns over a period of time will have the same relationship. Key words are: options, right not obligation, underlying, premium, maturity. Options can be created out of thin air, till there is ability to collateralize it. They have nonlinear payoffs. 

Specifications for an option contract

  1. Option type - calls and puts. 
  2. Underlying asset - certain number of stocks, indices (times a multiple), futures.
  3. Strike price - exercise price
  4. Expiration date - last date on which the option exists.
  5. Exercise style - American and European. Bermudan (on specific days).
  6. Contract unit - multiplier. Need to be aware of the effects of corporate actions. 

Uses of options

Replication of options using underlying is possible but expensive so options are not redundant. The subtle difference between the option and underlying replicating portfolio is where the professional traders make money.

  1. Hedging - A position in underlying can be protected from falls by buying a protective put. Presence of hedging activity shows the fallacy in methods that use the number of outstanding puts or calls to predict the direction of any underlying security.
  2. Speculation - If we think stocks will fall we can but a put. Out of money puts give greater leverage. 
  3. Creation of structured products - e.g. equity linked note. Investors are torn between fear and greed. Equity linked note are ideal product which promise principle and give an upside if the index is over a certain percentage.
  4. Volatility trading - A position in options and underlying can be used to trade change of volatility(and not directions or returns).
  5. Structured product arbitrage - Many financial products contain options like features, e.g. convertible bond. These can be replicated, hedge against or speculated using options.

Market structure

An options trade can be put with a broker after completing Securities account, options account and Options Clearing Corporation risk disclosure agreements. Market or Limit orders for Call or Put can be placed with details provided. Main exchanges in the US are Boston, Chicago Board, International Securities, NASDAQ Options, NYSE Alternext and Philadelphia. These markets are linked on a real-time basis. Ticks are either $0.05 or $0.01 generally in the US. There is also a private inter-dealer market called the 'call-around market'. The United States equity options market is served by a single clearing house, the Options Clearing Corporation (OCC), which the exchanges collectively own. The appropriate cash transfer happens the next business day. Transaction cost includes broker and exchange commissions. The margins are of two types - strategy based margin and portfolio based margin. 

Ch3. Arbitrage Bounds for Option Prices

Law of one price is behind these bounds. Sometimes what appears to be an arbitrage is merely a situation with larger than anticipated transaction costs, or unconsidered risk. The future price of a stock is related by $F=Se^{rT}$, where $r$ is the risk free rate. This is because of absence of arbitrage. A different borrowing and lending rate will give a no-arbitrage band instead of a value. Dividends and storage cost should be properly accommodated in the stock price. If interest rates are positively correlated with the underlying the futures are slightly more valuable than the forwards.

We can use this information to get bounds on options, which if violated can be exploited.

  1. American options are always expensive that European, both call and put. $c\le C$ and $p\le P$.
  2. A call can't cost more than underlying. $c\le S$.
  3. A put can never by more than the strike price (discounted to present for European). $P\le X$ and $p\le Xe^{-rt}$.
  4. The minimum value of call option is $c\ge S-Xe^{-rt}$. $C \ge Max(0,S-X)$.
  5. The minimum value of put option is $p \ge Xe^{-rt}-S$. $P \ge Max(0,X-S)$.