Sunday, August 23, 2015

Range-based estimation of Stochastic Volatility Models

Alizadeh, Brandt and Diebold (2001)

Theoretically, numerically and empirically the range is not only a highly efficient volatility proxy, but also that it is approximately Gaussian and robust to microstructure noise. Two factor models - one persistent and other mean reverting - do a better job describing simultaneously the high and low frequency dynamics of volatility - to explain both autocorrelation of volatility and the volatility of volatility.

Volatility is not constant. It is both time-varying and predictable. Gaussian quasi-maximum likelihood estimation (QMLE) for estimating stochastic volatility falls wayside because the volatility models are non-Gaussian - log absolute or squared returns. Range - difference of highest and lowest log security prices is a much more efficient estimator - due to its near normality.

Sunday, August 16, 2015

Expected Returns on Major Asset Classes - Ilmanen (2012)

This article is written by Ilmanen from AQR. It is a very good summary of fundamental ideas (mostly smart beta) floating in different asset classes. 

Ch1 Introduction
Asset class expected returns and risk premia are time varying and somewhat predictable.

Ch2 Equity Risk Premium
Ch3 Bond Risk Premium
Ch4 Credit Risk Premium
Ch5 Alternative Risk Premium

Saturday, August 15, 2015

Algorithmic Trading - Ernest Chan

Profits are not derived from some subtle, complicated cleverness of the strategy but from an intrinsic inefficiency in the market that is hidden in plain sight. 

Two kind of strategies - Mean reversion (ADF test, Hurst exponent, Variance ratio test, half-life, Johansen test) using linear, bollinger band, kalman filter - both temporal and cross sectional. Momentum (roll returns, forced asset sales and purchases, news, sentiment, order flow). Nuances - data-snooping bias, survivor ship bias, primary vs consolidated quotes,  venue dependence of currency quotes, short-sales constraints, construction of futures continuous contracts, closing vs settlement prices, regime shift. Kelly criteria for risk management, risk indicators, Monte Carlo simulations. Lessons - never manually override, under-leveraged is better, strategy performance mean reverts, overconfidence in a strategy is a poison pill.

Ch1 - Backtesting and automated execution

Statistical significance of the numbers is important to establish. Regime shifts are unputdownable, they need to be investigated. Good Backtesting platform is the the life blood of productive endevour. Common pitfalls:
  1. Look-ahead bias - using future information to construct signal. Trading and back-testing should be on same platform
  2. Data-snooping bias - out of sample testing, cross validation, make the model as simple as possible with as few parameters, assuming simple Gaussian, linear price predication and allocation formula. e.g. $rank_s=\sum_i^n sign(i) rank_s(i)$. Walk forward test as a final true-out-of-sample testing. After all this, be happy if the live trading generates a Sharpe better than half its backtest value.
  3. Stock splits and Dividend adjustments - and
  4. Survivorship Bias in Stock Database - get delisted stock data -,,
  5. Primary vs Consolidated Stock Prices - use tradable data. be skeptic in a healthy way.
  6. Venue Dependence of currency quotes - get the data where it will be traded.
  7. Short sale constraint - hard to borrow stocks (huge short interest) should be modeled realistically. This data is broker dependent. Backtest will be inflated otherwise. 
  8. Futures continuous contracts - take care of roll jumps.
  9. Futures close versus settlement prices - use synchronous prices, settlement is preferred.

Statistical significance of backtesting: Hypothesis testing

Hypothesis testing: we find the probability $p$ in the tail bigger than test statistic, with null hypothesis supposing the true value is zero. A very low value rejects the 'null hypothesis' and gives credence to the number. Three methods to evaluate the probability distribution for the statistical significance of backtesting on finite sample size:

  1. Gaussian distribution: Sharpe ($\times \sqrt{n}$) of 2.32 means $p$ is less than 0.01
  2. Monte Carlo to generate simulated historical price data and feed these simulated data into our strategy to determine empirical probability of distribution of profits.
  3. Generate set of simulated trades, with number of long short entry trades is the same as in the backtest with same average holding as in the backtest, but distributed randomly. 
Failure to reject null might inspire insights, which success may be a slightly weaker preposition. Seriously flawed strategies:

  • Annualized returns of 30% and Sharpe of 0.3 and draw down duration of 2 years.
  • Strategy worse than buy and hold.
  • Survivorship biased dataset.
  • Neural net with 100 nodes with Sharpe 6.
  • High frequency strategies with high Sharpe, not taking into account market response.

Will a backtest be predictive of future returns?

Regime shift are important to determine by observing the market, and statistically if possible. 
  • Decimalization of US stocks in 2001. Profits of statistical strategies decreased and profits of high frequency strategies increased.
  • 2008 crisis decreased average daily trading and caused a subsequent decrease in average volatility but increasing frequency of sudden outbursts. General decrease in profits of mean reverting strategies. Multi year bear market in momentum strategies started as well.
  • 2007 obsolescence of the NYSE block trade and removal of old uptick rule for short sales. 

Ch2 - The Basics of Mean Reversion

Financial price series are geometrical random walk, it's the returns which which distribute around a mean of zero, but we can't trade them (but anti-serial correlation or returns which is same as mean reversion of prices can be traded). We can manufacture a lot of price series that are mean-reverting (tested using ADF test, Hurst exponent and Variance Ratio test) in prices, though the price series of individual components are not. This is called cointegration. This can be tested using (CADF test and Johansen test). This is called time-series mean-reversion. The other type is cross-sectional mean reversion (short-term relative returns of the instruments are serially anti-correlated).

Mean-reverting series means that change in the price series in the next period is proportional to the difference between the mean price and the current price. ADF test tests whether we can reject the null hypothesis that the proportionality constant is zero. A stationary price-series has variance of the log of prices increasing slower than that of a geometric random walk, i.e. sublinear function of time. That is for $\tau^{2H}$, where $\tau$ is the time separating two price measurements, $H$ is the Hurst exponent, if less than 0.5 the price-series is stationary. The variance Ratio test can be used to see whether we can reject the null hypothesis that the Hurst exponent is actually 0.5.

ADF test

If a price series is mean reverting, then if the price level is higher than the mean, the next move will be downward and vice versa. We can describe the price change dynamics via $$\Delta p_t = \lambda p_{t-1} + (\mu + \beta t) + \sum_{i=1}^{k} \alpha_i\Delta p_{t-i}+\epsilon_t,$$ where $\Delta p_t = p_t-p_{t-1}$, etc. The ADF test has the null hypothesis of $\lambda=0$. If the null hypothesis can be rejected it means the price series is not a random walk and mean reverts. Since we expect mean regression, the test statistic $\lambda/SE(\lambda)$ has to be negative. When the model is fit we can use DW stats (equivalent to $2(1-\rho)$)to check if the residuals have autocorrelation.

Hurst exponent and variance ratio test

A stationary price series means that the prices diffuse slowly than the geometric random walk would. The variance for a time period $\tau$ is defined as $Var(\tau)=<|z(t+\tau)-z(t)|^2>$, where $z=log(p)$. For the geometric walk we know this variance is $\sim\tau$, but for mean reverting or trending process this is $\sim \tau^{2H}$, where $H$ is the Hurst exponent. $H=0.5$ for random walk, $H>0.5$ for trending series and $H<0.5$ for mean reverting series. $H$ serves as an indicator for the degree of mean reversion or trendiness. The statistical significance of $H$ can be provided by the Variance ratio test, which tests whether the following ratio is equal to 1. $$\frac{Var[z_t-z_{t-\tau}]}{\tau Var[z_t-z_{t-1}]}.$$

Half life of mean reversion

In practical trading we can be successful with less demanding tests. We just need $\lambda$ negative enough to make a trading strategy practical, even if we can't reject the null hypothesis. $\lambda$ is a measure of how long it takes for a price to mean revert. Converting the difference equation to a continuous form  (ignoring the trend and the lagged differences) $$dp_t=(\lambda p_t-1+\mu)dt + d\epsilon,$$ which solves to $$E[p_t] \propto e^{\lambda t}.$$ The expected time for half decay is $-log(2)/\lambda.$ Notice that $\lambda$ us negative. This determines the natural lookback period of our strategy as well, some small multiple of half-life period to avoid brute force optimization of lookback period.

A linear Mean-reverting trading strategy

One the tests confirm mean reversion, and half-life is appropriate in terms of our holding period expectations we determine the normalized deviation of the price from its moving average (with look back period equal to half-life) and maintain number of units of assets negatively proportional to this normalized deviation. Given a price series that passed the stationarity statistical tests, or at least one with a short enough half-life, we can be assured that we can eventually find a profitable trading strategy, maybe just not the one that we had backtested.


We can proactively create a portfolio of individual price series so that the  market value series of this portfolio is stationary. This is the notion of cointegration. The most common combination is a pair.

Cointegrated Augmented Dickey-Fuller Test (CADF) 

Since we do not know apriori what hedge ratio we should use to combine the pairs usual mean reversion test wouldn't work. Using Engle and Granger (1987) process we first determine the optimal hedge ratio by running a linear regression fit between the two price series, use this hedge ratio to form a portfolio, and then finally run a stationarity test on this portfolio. The order of the price series will change the hedge ratio (will not be exact reciprocal). Generally only one of those ratios is correct, yields the most negative t-stats.

Johansen Test

For more than two variables we need to use the Johansen test. We first present the matrix form of the equation $$\Delta P_t = \Gamma P_{t-1} + M + \sum_{i=1}^{k}A_i \Delta P_{t-i}+\pmb{\epsilon}_t. $$ If $\Gamma=\pmb{0}$ (or each eigenvalue is 0), we don not have cointegration. If the rank of the matrix is $r$ and the number of price series are $n$, then the number of independent portfolios that can be formed by various linear combinations of the cointegrating price series is equal to $r$. The Johansen test does the analysis based on trace statistic and eigen statistic. The null hypothesis $r=0$ (no cointegration relationship), should be rejected to find mean reversion, followed by $r\le 1$, ..., up to $r \le n-1$. If all these hypothesis are rejected, then we have $r=n$. The eigenvectors found can be used as our hedge ratios for the individual price series to form a stationary portfolio.

This also reveals the inverse relationship ( not generally reciprocal), i.e. Johansen test is independent of the order of the price series. The cointegrating relationship is the strongest for the highest eigenvalue, have the shortest half-life for mean reversion.

Linear Mean-reverting trading on a portfolio

We determine the portfolio vector and accumulate units of the portfolio proportional to the z-score of the 'unit' portfolio's price, determined by Johansen eigenvector. At the outset we cannot really enter and exit an infinitesimal number of shares whenever the price moves by an infinitesimal amount. To avoid data snooping, we should determine Johansen vector in a moving window fashion (unlike in the book). The lookback can be half-life. Shorter the half-life, more significant are the results.

Pros and cons of mean-reverting strategies

Portfolio trading is the most profitable and has most opportunities. There is also often a good fundamental story behind a mean-reverting pair. Canadian and Australian market are cointegrated because they are commodities economy. GDX and GLD cointegrate because the value of gold-mining companies is very much based on the value of gold. Even when a cointegrating pair falls apart, we can often understand the reason. And with understanding comes remedy. This availability of fundamental reasoning is in contrast to many momentum strategies whose only justification is that there are investors who are slower than we are in reacting to the news, i.e. there are greater fools out there. Mean-reverting strategies also span a great variety of time scales. 

Unfortunately, it is because of the seemingly high consistency of mean-reverting strategy that may lead to sudden break down. This often happens when the leverage is maximum after an unbroken string of successes. Hence, risk management is particularly important and difficult since usual stop losses cannot be logically deployed. 

Ch3 - Implementing Mean Reversion Strategies

In practice, we do not necessarily need true stationarity or cointegration in order to implement a successful mean reversion strategy. We can capture short-term or seasonal mean reversion (during specific period of the day or under specific conditions), and liquidate our positions before the prices go to their next equilibrium level. Conversely, not all stationary series will lead to great profits, particularly when the half life is longer. A more practical version of the implementation is using Bollinger bands. Kalman filter can be used for better estimation of the hedge ratio.

Trading pairs using price spreads, log price spreads, or ratios

Consider the 'unit' portfolio time series $y$ as the trading signal, which is just a weighted sum. Instead of prices we could also use log prices (with different coefficients estimated, of course if stationary). Unlike prices, using log prices would not represent shares of the portfolio or constituents. To understand the log price relationship we take a time difference of this equation. The difference of this series gives linear combination of returns.

Price based portfolio's constants represent number of shares, while return based series' constants represent the market value of the assets together with a cash component implicitly included. Note that a cash component must be implicitly included because the constants are the market values, and there is no other way that the market value of the portfolio can change with time. This cash does not show up in the difference equation because it is constant from $t-1$ to $t$ and is rebalanced then. This requires the trader to constantly rebalance the portfolio, which is necessitated by using the log of prices.

The ratio $p_1/p_2$ does not necessarily form a stationary series, but may have advantage when the underlying pair is not truly cointegrating, but there is short term mean reversion present. This also keeps the hedge ratio 1. This may come handy during currency trading, where ratios of currency pairs may have a real meaning.

Bollinger Bands

The linear strategy deployed till now is not practical as it does not limit the deployed capital. Bollinger bands can be used to state the entry Z-score and exit Z-scores. The performance of the example improves but there are additional parameters introduced.

Does Scaling-in work?

Scaling-in/ averaging-in is the idea that one invests more as the price deviates more from the mean (assuming mean reversion happens). This is what a linear mean-reversal strategy does. This reduces price impact and can make profits even when the price never reverts to its mean. Multiple entry exits using Bollinger bands and mimic the situation. Schoenberg and Corwin (2010) show that entering or exiting at two more more Bollinger bands is never optimal, with the implicit assumption that probabilities of changes are constant, which is not the case. Practically, scaling in may well outperform the all-in method out-of-sample.

Kalman Filter as Dynamic Linear Regression

What is the best way to estimate hedge ratio when it can vary with time? Moving window can have ghost effects, entry-drop off effects. EWM can improve this, but it is not clear if it is optimal. Kalman filter is an optimal linear algorithm that updates the expected value of a hidden variable based on the latest value of an observable variable. If we assume the noises are Gaussian and relationships are linear, it is the best filter available. We need to figure out observable and hidden variable and observation and state transition model matrices. In the measurement equation $$y_t=x_t \beta_t + \epsilon_t,$$ $y_t$ is the observable price, $x_t$ the price series augmented with ones($N\times 2$ matrix) is the observation model matrix. $\beta_t$ is the $2\times 1$ hidden variable denoting both the intercept and the slope. $V_{\epsilon}$ is the variance of the Gaussian noise $\epsilon_t$. Next we make a crucial assumption that the regression coefficient at time $t$ is the same as that at time $t-1$ plus noise $$\beta_t=\beta_{t-1}+\omega_{t-1},$$ where $\omega$ is also a Gaussian noise with covariance $V_{\omega}$, i.e. the state transition model here is just the identity matrix.

Kalman filter can now iteratively generate the expected value of the hidden variables $\beta$ given an observation at time $t$, which not only includes the dynamic hedge ratio between the two assets, but also the 'moving average' of the spread. It also generates an estimate of the standard deviation of the forecast error of the observable variable which can be used in place of the moving standard deviation of the Bollinger band! We also need to specify $V_{\epsilon}$ and $V_{\omega}$.

If $R(t|t-1)$ is $Cov(\beta_t-\hat{\beta}(t|t-1))$, measuring the covariance of the error of the hidden variable estimates we have given the quantities $\hat{\beta}(t-1|t-1)$ and $R(t-1|t-1)$ at time $t-1$, $$\hat{\beta}(t|t-1)=\hat{\beta}(t-1|t-1) \quad (\mbox{State prediction})$$ $$R(t|t-1)=R(t-1|t-1)+V_{\omega} \quad (\mbox{State Covariance prediction})$$ $$\hat{y}(t)=x(t)\hat{\beta}(t|t-1)\quad (\mbox{Measurement prediction})$$ $$Q(t)=x(t)^TR(t|t-1)x(t)+V_{\epsilon}\quad (\mbox{Measurement Variance prediction}),$$ where $\epsilon(t)=y(t)-x(t)\hat{\beta}(t|t-1)$ is the forecast error for $y(t)$ given observations at $t-1$, and $Q(t)$ is $Var(\epsilon(t))$, measuring the variance of the forecast error. After observing the measurement at time $t$, Kalman filter update equations are $$\hat{\beta}(t|t)=\hat{\beta}(t|t-1)+K(t)\epsilon(t)\quad (\mbox{State update})$$ $$R(t|t)=R(t|t-1)-K(t)x(t)R(t|t-1)\quad (\mbox{State Covariance update}),$$ where $K(t)$ is the Kalman gain given by $$K(t)=R(t|t-1)x(t)/Q(t).$$ To start the recursions, we assume $\hat{\beta}(1|0)=0$ and $R(0|0)=0.$ $V_{\omega}$ and $V_{\epsilon}$ need to be provided or estimated from data (Rajamani and Rawlings 2009). Following Montana we assume $V_{\omega}=\frac{\delta}{1-\delta}I$, where $\delta$ is between 0 and 1. If $\delta=0$ it becomes a OLS while with $\delta=1$ $\beta$ will fluctuate wildly. The optimal values can be obtained via training data, we pick $\delta=0.0001$ and $V_{\epsilon}=0.001.$

Kalman Filter as market-making model

We are concerned here with a single mean-reverting price series, intending to mind the mean price and standard deviation. This is a favorite model for the market makers to update their estimate of the mean price of an asset (Sinclair 2010). So mean price $m_t$ is the hidden variable and price $y_t$ is the observable variable. $$y_t = m_t + \epsilon_t \quad (\mbox{Measurement equation})$$ $$m_t=m_{t-1}+\omega_{t-1}\quad (\mbox{State equation})$$ $$m(t|t)=m(t|t-1)+k(t)(y(t)-m(t|t-1))\quad (\mbox{State update})$$ The variance of forecast is $$Q(t)=Var(m(t))+V_{\epsilon}$$ Kalman gain is $$K(t)=R(t|t-1)/(R(t|t-1)+V_{\epsilon}),$$ $$R(t|t)=(1-K(t))R(t|t-1)\quad (\mbox{State Variance update}).$$ To make these equations more practical, practitioners make further assumptions about the measurement error $V_{\epsilon}$ which measures the uncertainty in the observed transaction price. If the trade size is large the uncertainty is small, and vice versa. So $V_{\epsilon}$ becomes a time dependent function, specifically on trade size $T$ $$V_{\epsilon}=R(t|t-1)\left( \frac{T}{T_{max}}-1\right)$$
If $T=T_{max}$ there is no uncertainty and the Kalman gain is 1 and the mean estimate price is exactly equal to the observed price!$T_{max}$ can be some fraction of total trading volume of the previous day. This is similar to VWAP approach to determine mean price/fair value along with time weighted average price.

The danger of data errors

Particularly insidious on both backtesting and executing mean-reverting strategies. 'Outliers' inflate the backtest of a mean-reversion strategy (Thomas Falkenberry 2002). But they suppress the backtest performance of a momentum strategy. In live trading they produce wrong trades for both strategies. 

Ch4 - Mean Reversion of Stocks and ETFs

Stock from same sectors are good candidates for forming pairs, diversification is easy. Simple mean-reverting strategies actually work better for ETF pairs and triplets than stocks. In short term (seasonal), most stocks exhibit mean-reverting properties under normal circumstances (there isn't any news on stock). Over the long term stock prices follow geometric random walk. Index arbitrage is another familiar mean-reverting strategy, stock vs futures, stock vs ETFs. Profits have decreased so the strategy has to be modified. Cross-sectional mean reversion is prevalent in basket of stocks. The statistical tests for time series mean reversion are largely irrelevant for cross-sectional mean reversion. Due to huge attraction and ease of finding mean-reversion profits have decreased.

The difficulties of trading stock pairs

The daily frequency mean-reversion is a game of past. The intraday and seasonal mean-reverting properties are still exploitable. Out-of-sample cointegration is difficult to find. It is difficult to consistently make profits in mean reversion unless one has a fundamental understanding of each of the companies and can exit a position in time before bad news on one of them become public. Law of large numbers done not come to rescue (due to lack of independence) because the small profits gained by the 'good' pairs have been completely overwhelmed by the large losses of the pairs that have gone 'bad'. Further, there are short sale constraint resulting in short squeeze. The new alternative uptick rule also creates uncertainty in both backtesting and live trading. Once the circuit breaker is triggered, we are essentially forbidden to send short market orders. Since the profits have decreased it becomes imperative to enter and exit positions intraday to capture the best prices. Avoiding overnight positions also avoid changes in fundamental valuations that plague longer-term positions. The bid-ask spread and size has become very small due to prevalence of using dark pools, icebergs, high frequency trading and decimalization of US stock prices. So pairs traders who act as a type of market markers, find that their market-making profits have decreased as well. But other countries and US ETFs are still profitable.

Trading ETF Pairs (and Triplets) 

Once found to be conintegrating, ETF pairs are less likely to fall apart in out-of-sample data (vs stocks), because the fundamental economics of a basket changes more slowly than that of a single company (e.g. EWA-EWC Australian, Canadian ETF). We need to find ETFs that are exposed to common economic factors, e.g. country ETFs, sector ETFs (retail fund RTH vs consumer staples fund XLP). 

Another ETF pair is between commodity ETF and an ETF of companies that produce that commodity, e.g. GLD vs GDX. They have conintegrated till 2008, after which oil shock became a big part of mining expenses. Introducing USO as a triplet we find them cointegrated. Oil fund USO and energy sector fund XLE do not cointegrated because USO tracks the oil futures and not the spot oil. Mean reversion trading of such pairs would be much less risky if the commodity fund holds the actual commodity rather than the futures. 

Intraday Mean Reversion: Buy-on-Gap Model

Daily prices are indeed geometric random walks. There are many seasonal mean reversion occurring at the intraday time frames even for stocks.

Select all stocks near the market open whose returns from their previous day's low to today's open are lower than 1 standard deviation based on daily close-to-close returns of past 90 days. These are the 'gapped down' stocks. Apply a momentum filter by requiring their open prices to be higher than the 20-day moving average of the closing prices. Buy the top 10 stocks in this list and liquidate the position at the end of the day. Similarly a short strategy can be constructed. The rational is that for an up-trending stock, if the stock is down before the open, panic selling will depress it further but it will appreciate over the course of the day. Usually, a stock that has dropped a little bit has a better chance of reversal than the one that has dropped a lot because the latter are often due to negative news, which are permanent and less likely to revert. The fact that a stock is higher than long-term moving average attracts selling pressure from larger players with longer horizons. This demand for liquidity at the open may exaggerate the downward pressure on the price, but liquidity driven moves are more likely to revert when such demand vanish. The long only strategy may present some risk management challenge and have low capacity.

For realistic backtest one can use pre-open prices (e.g. at ARCA) to determine the trading signals. Also trading can't be ascertained at the open price. This induces signal noise. Intraday data can be used for more realistic numbers. Primary exchange prices should be used vs consolidated prices. Short sale strategies suffer the short sale constraint pitfall. This strategy is well known among traders and there are many variations on the same theme. A hedged version can be traded which is long the stocks but short the index futures. Sector restrictions can be applied. Buying period can be extended beyond the market open. Intraday profit caps can be imposed. The lesson is: price series do not exhibit mean reversion when sampled with daily bars but can exhibit strong mean reversion during specific periods. This is conditional seasonality at work at shorter time scale.

Arbitrage between and ETF and its component stocks

Index arbitrage trades on the difference in value between a portfolio of stocks constituting the index and the futures on that index. If the stocks are weighted same as index construction the cointegration is too tight to be exploited. Sophisticated traders can still profit by trading intraday, at high frequency. In order to increase these differences, we can select only a subset of the stocks in the index to from the portfolio. Same idea can be applied to ETF and its constituents. One selection method is to just pick all the stocks that cointegrate individually with the ETF with 90 percent probability using Johansen test. Then we form a portfolio of these stocks with equal weights. We reconfirm using the Johansen test that this long-only portfolio still cointegrates with the ETF (SPY e.g.). We are using log prices so the weights are capital on each stock, as we expect to rebalance it every day. After the cointegration is confirmed in-sample, we can backtest the linear mean reversion strategy. We can't test all the stocks and the index together via Johansen test because the test can take only a maximum number of symbols and would admit long-short positions which we may not want because that may double short some stocks increasing specific risks.

Another method of constructing long-only portfolio is to first test each stock vs the index using Johansen test. This subset is then used via constrained optimization method (e.g. genetic algorithm or simulated annealing) to minimize the average absolute difference between this stock portfolio price and the index price series. The variable of optimization are the hedge ratios, with the constraint that all weights are positive. Short sale constraint is less harmful here as there is enough diversification.

Cross-sectional mean reversion: A linear long-short model

In this type of so-called "cross-sectional" mean reversion strategies, the individual stock price revert to their short-term relative returns. These, generally, don't work for futures and currencies. We rely on the serial anti-correlation of these relative returns to generate profits. We expect the under-performer to outperform and vice versa. We should not expect profits from each stock, as some may serve as hedge. Proposed by Khandani and Lo (2007) the weights are $$w_i=-\frac{(r_i-<r_j>)}{\sum_k |r_k-<r_j>|},$$ where $r_i$ is the daily return of the $i^{th}$ stock, $<r_j>$ is the average daily return of all the stocks in the index. We rebalance everyday to $1. Usually backtesting on a smaller cap universe will generate even higher returns.

The return of this strategy can be enhanced by using the returns from the previous close to today's open to determine the weights for entry at the open. All the positions will be liquidated at the market close, thus turning it into an intraday strategy. This open-to-close strategy will have double the transaction cost and will have signal noise like the buy-on-gap model.

There are possibly other factors that are better at predicting cross-sectional mean reversion of stock prices than the relative returns that we have used. One popular variable is price-earnings ratio from last quarter, or may be projected earnings estimated by the analysts or the companies themselves. If price moves are justified by fundamental mean reversion will not occur and we should avoid shorting such stocks if we use P/E ratio to rank the stocks.

Ch5 - Mean Reversion of Currencies and Futures

Most CTAs are momentum based. Most currency and futures pairs will not cointegrate and most portfolios of currencies or futures do not exhibit cross-sectional mean reversion. Mean reversion opportunities are limited but not non-existent, e.g. future calendar spreads and volatility future vs stock index future. Currency portfolio must be valued in same base currency and rollover interest must be accounted for.

Trading Currency cross-rates

Commodity currencies (Australian dollar, Canadian dollar, South African Rand, Norwegian Krone) may be cointegrated.  Liquidity in currencies is higher compared to corresponding stock index ETFs. Higher leverage can be employed. There are no short-sale constraints. And it trades around the clock and we can employ stop losses in a meaningful way (If market is closed for a long period stop losses are useless as the market can gap up or down when it reopens). 

  1. In the pair AUD.ZAR, AUD is the base currency and ZAR is the quote currency.
  2. A quote of 5 for AUD.ZAR means it takes it takes 5 ZAR to buy 1 AUD. 
  3. Buying 100 AUD.ZAR means buying 100 AUD and selling 500 ZAR. 
  4. Very few brokers offer AUD or ZAR as a cross-rate. So we have to buy 100 USD.ZAR and sell 100 USD.AUD to effectively buy 100 AUD.ZAR
  5. USD.ZAR/USD.AUD is the synthetic pair for AUD.ZAR
The non-local currencies should be regularly converted to base USD to remove currency risk. In order to interpret the eigenvector from the Johansen test as capital weights, the two price series must have the same quote currency. The trades can be appropriately places for the right order of currency pair. If we find cointegration between two entirely different cross-rates, care should be taken to calculate the returns correctly. The key step in backtesting currency arbitrage strategies is not the complexity of the strategies, but the right way to prepare the data series for cointegration tests, and the right formula to measure returns!

Rollover interests in currency strategy

A feature of trading currency cross-rate is the differential interest rate earned or paid if the cross-rate position is held overnight (till or beyond 5 PM ET). If we are long the currency pair B.Q the interest we earn is $i_B-i_Q$. When $i_Q>i_B$ we pay this interest and it is called the rollover interest. For $T+2$ settlement if T+3 is a holiday or weekend for either currency holiday are added to the interest, so anything past 5PM ET Wednesday will accrue excess rollover interest. For USD.CAD and USD.MXN it is $T+1$ settlement so anything past 5PM ET Thursday would accrue weekend interest as well. 

For intraday positions rollover interest is zero. For trading a long-short dollar neutral equity portfolio, for futures position the financing cost is zero. In the case of currency cross-rates, we should add the rollover interest to the percentage change of the cross-rate, i.e. we need to modify the return calculation for cross rate strategies. 

Trading Futures Calendar Spread

In reality, calendar spreads do not generally mean-revert. To understand why we need to understand the drivers of the returns of futures in general. Roll returns and spot returns constitute the total returns of a future. An ETF of commodity producers (XLE) may cointegrate with the spot prices but not with futures prices because of the presence of the roll returns. If we assume that spot and roll returns are truly constant throughout time ($F(t,T)=Ce^{\alpha t}e^{\gamma (t-T)}$), we can use linear regression to estimate their values. Spot returns can be directly regressed on time, but to find the roll return we need to regress the price vs the time to maturity. This will be different for different maturities and hence $\gamma$ will vary with time. In general, average roll returns are much larger in magnitude than spot returns.

Volatility, and hence the VIX index, is mean reverting. But the futures VX are not, they just inexorably decline, all due to roll return. This roll returns has been mostly negative. If we define the spread of log prices of the two legs and maintain the market value of the two legs to be the same at every period, it turns out to be $\gamma(T_1-T_2)$, which is simply roll returns and independent of the spot price. Hence, spreads are returns but only roll returns component. This log spread series mean-reverts. 

Do calendar spreads mean-revert?

We may expect the calendar spread components to be cointegrated and hence mean-revert, but in reality roll returns derail our intuition. The difference of log prices of the two legs, maintaining the market value of the two legs to be same at every period, is simply $\gamma(T_1-T_2)$, which is simply roll returns and independent of the spot. Hence, we are considering on the roll returns part of the total returns. The log spread series indeed mean-reverts. We apply the strategy of holding period for a pair of 61 trading days, roll 10 days before expiration, and the contracts are 1 year apart. We get an IR of 1.3 for CL. 

Seasonality is often a prominent feature for commodities. For a particular market, only calendar spreads of certain months (and certain months apart) mean-revert. Same reversion of spreads can be applied to VIX calendar spreads, but do not work! However the ratio back/front for VX is mean reverting (verified by ADF test), but only from 2008 when a regime change happened. This gives an IR of 1.5 from oct 2008 to Apr 2012.

Futures Inter-market Spreads

It is almost impossible to find futures with different underlyings that are mean reverting. The Prices need to be synchronous for the  mean reversion to be tested (taking care of multipliers). The possible candidates are 
  • Crack spread - the 3:2:1 ratio does not have a mean reverting behavior, fails ADF
  • CL:BZ - fails ADF. BZ has outperformed CL due to increase in production in US, pipeline bottleneck at Cushing and geopolitical concerns like Iranian embargo, which effected Europe and hence BZ more than US.
  • basket of CL, BZ, RB and HO

Volatility Futures versus Equity Index Futures

Volatility is anti-correlated with the stock equity market index: When the market goes down, volatility shoots up, and to a lesser extent, vice versa. There appears to be two regimes in a plot of VX vs ES futures - before 2008 and after 2008 (low vol but with greater range!). Applying liner regression or apply the Johansen test to a mixture of both regimes would be not correct. Hence, we can apply Engle-Granger process post 2008 after multiplying the multiple. The IR is 1.4, for two year 2010 to 2012. There is also a VX-ES momentum strategy discussed in next chapter.

Ch6 - Interday Momentum Strategies

Causes of momentum - persistence of roll returns, particularly the sign (futures), slow diffusion of news, forced sales or purchase by funds, market manipulation by high frequency traders. There is a time-series and cross-sectional version. Interday momentum suffers from a recently discovered weakness, which intraday momentum are less affected by.

Tests for Time series Momentum

Time series momentum means that past returns are positively correlated with the future returns. Correlation between past n day and future m days returns can be tested. Or else the correlation of signs could also be checked. For long term trends we can check the Hurst exponent or the variance ratio test to rule out random walk hypothesis. We present numbers for TU, two year treasury.

In computing the correlations of pairs or returns, we must take care not to use overlapping data. If look-back is greater than the holding period, we have to shift forward by the holding period to generate a new returns pair. If the holding period is greater than the look-back, we have to shift forward by the look-back period. With a one day shift the t-stats would be artificially high, but the correlations would still be correct. Hence, to estimate the right p-value non-overlapping windows are essential. A look-back of 60 and 250 days with a holding of 10 to 25 days. The Hurst exponent is 0.44 and the variance ratio test is rejected, i.e. it is a random walk - the time series exhibits momentum and mean reversion at different time frames. 

Time Series Strategies

Paper by Moskowitz, Yao and Pedersen 2012 present momentum with 12 month holding period and holding for 1 month. This can be rolled over every day with 1/25 th fraction invested each day. For TU the IR is 1.0 with very low margin. 

Why do many futures returns exhibit serial correlations and why do they occur only at a fairly long time scale? The explanation lies in the roll returns. Typically a future stays in contango or backwardation over long period of time. The spot return however can vary rapidly. So, in longer, terms if roll returns dominate the spot returns we will get serial correlation (Corn is an exception). Hence, if we use lagged roll returns as a signal it might be cleaner. Applying this makes the IR of TU 2.1 with reduced drawdowns as well!

Other possible entry signals can be - buy when prices reach N-day high, when prices exceed N-day moving average or exponential moving average, when the prices exceeds the upper Bollinger band, when the number of up days exceeds the number of down days in a moving period. Alexander Filter - buy when the daily returns moves up at least x percentage, and then sell and go short if the prices moves down at least x percentage from a subsequent high. 

Sometimes the combination of mean-reverting and momentum strategy may work better. One example strategy on CL - But at the market close if the price is lower than that of 30 days ago and is higher than that of 40 days ago; vice versa for shorts. There are are Mutual funds selling diversified momentum indicators. The true test always is true out-of-sample testing.

Extracting Roll Returns through Futures versus ETF arbitrage

If contango, but the underlying and short the future; and vice versa if backwardation. This arbitrage strategy is likely to result in a shorter holding period and a lower risk, since in the previous strategy we needed to hold the future for a long time before the noisy spot return can be averaged out.

The logistics of buying and especially shorting the underlying asset is not simple. But ETF for many precious metals can be found. But in contrast to owning futures, owning ETF (e.g. GLD) acutally incurs financing cost, which generally eats up the roll returns. ETFs and Futures settle at different times, so the asynchronicity is a pitfall. 

Outside precious metals it is difficult to find ETFs that hold underlying commodities. But ETFs containing commodity producing companies often cointegrate with the spot price of those commodities. We can use these ETFs as a proxy for the spot prices. For example arbitrage between the energy sector ETF XLE and ETF USO (the WTI crude oil futures CL has different closing time). Short USO and long XLE whenever the CL is in contango, and vice versa for backwardation. The IR is 1.0 from 2006 to 2012.

VX does not have an underlying trading commodity, but a basket of options, which is very hard to replicate. But we can find an index highly correlated or anti-correlated with the spot returns. In case of VIX, the familiar ETF SPY fits the bill, because it has insignificant roll returns. 

Volatility futures versus equity index futures: redux

VX is highly anti-correlated with ES. We can use large roll return magnitude of VX, the small roll return magnitude of ES to develop a momentum strategy. If the price of the front contract of VX is higher than that of VIX by 0.1 point (contango) times the number of trading days until settlement, short 0.4 front contracts of VX and short 1 front contract of ES, holding for a day. Vice versa for backwardation. VX forward price don not fall on a straight line, so the curve can't be used to estimate the roll returns like for other commodities. The hedge ratio is based on the regression fit between the VX versus ES prices (not between returns!). This gives a Sharpe of 1 from 2010 to 2012.

Cross - sectional strategies

If we believe that commodities' spot prices are positively correlated with economic growth or some other macroeconomic indices, we can just buy a portfolio of futures in backwardation, and simultaneously short a portfolio of futures in contango, leaving us with favorable roll returns. Daniel and Moskowitz 2011 described the cross-sectional momentum with longer holding period. Ranking based on 12-month returns and holding for 1 month the long short portfolio give good performance but not during the great financial crisis. The same strategy works for universe of world stocks, currencies, international stocks and US socks. 

We can rank the stocks by many other factors as well (except lagged returns). Total returns can be decomposed into spot return and roll return. Similar it can be decomposed to market return and factor return. A cross-sectional portfolio will eliminate the market component. We can rank based on fundamentals like earnings growth, book to price ratio or some linear combination thereof. Or it could be statistical factors like PCA. All these factors except PCA change very slowly resulting in long holding periods. For futures we could use GDP growth, inflation rate or PCA. 

News Sentiment as fundamental factor

With machine-readable news it is now possible to programmatically capture all news items, not just earning and merger and acquisition activities. Sentiment score can be applied based on price impact of the article. Aggregation of these sentiment scores from multiple news articles was found to be predictive of its future return. Hafez and Xie (2012) give a sorting based on RavenPack's sentiment score with IR of 5.3 before cost. This also demonstrate slow diffusion of news is the cause of stock momentum. Other vendors are Recorded Futures,, Thomas Reuters News Analytics. Newsware offers a low-cost version of news feeds. Bloomber Event-Driven Trading, Dow Jones Elementized News Feed, and Thomas Reuters Machine readable News are the lower latency and better coverage options. 

The general mood of the society using 'Twitter' feeds is predictive of market index itself (Bollen, Mao, Zeng 2010).

Mutual Funds Asset Fire Sale and Forced Purchases

Coval and Stafford (2007) found that mutual funds experiencing large redemption are likely to reduce or eliminate their existing stock positions. This is not surprising as mutual funds are mostly fully invested with no cash reserves. Funds experiencing large capital inflows increase their existing positions rater than investing in newer ideas. The 'fire sale' by poor performing mutual funds cause the stocks to experience negative returns and is contagious and causes further redemption by other funds. The same situation occurs in reverse for stocks held by superbly performing mutual funds with large capital inflows. This order flow based momentum is applicable at all time scales.

A factor can be constructed to measure the selling (buying) pressure on a stock based on the net percentage of funds holding them that experienced redemption (inflows). This can be defined as $$Pressure(i,t)=\frac{\sum_j \mathcal{I}(Buy_{i,j,t}|flow_{j,t}>5\%) - \sum_j\mathcal{I}(Sell_{i,j,t}|flow_{j,t}<-5\%)}{\sum_j \mathcal{I}(1_{j,i,t-1})}$$ Weighing Buy by NAV may give better results. Coval and Stafford (with quarterly updates) found that the market-neutral portfolio formed based on shorting stocks with highest selling pressure and buying stocks with highest buying pressure generates annualized returns of about 17 percent before transaction costs. 

Furthermore, capital flows into and out of mutual funds can be predicted with good accuracy based on their past performance and capital flows (herd like behavior of retail investors). Based on these predictions we can predict the future values of the pressure factors noted above, i.e. we can front-run the mutual funds in our selling of the stocks that are currently owned. This front running strategy generates another 17 percent annualized before transaction cost. 

Finally, since these stocks experience such selling and buying pressure due to liquidity-driven reasons, and suffer suppression or elevation of their prices (not because of fundamental reason) they often mean-revert after the pressure is over. Indeed, buying stocks that experienced the most selling pressure in the t-4 up to t-1 quarters, and vice versa, generates another 7 percent annualized returns. 

Combining all three strategies (momentum, front running, and mean reverting) generates a total return of about 41 percent before transaction costs. The slippage might be significant because of delay in mutual fund holding data at the end of the quarter. The data from Center for research in security prices (CRSP) costs 10K an year. 

Apart from mutual funds, index funds and levered ETFs ignite similar momentum as well. In fact, forced asset sales and purchases by hedge funds can also lead to momentum in stocks, as in the August 2007 quant funds meltdown.

Pros and Cons of momentum strategies

Momentum strategies are diametrically opposite to mean-reverting strategies. Starting with the cons, it is harder to create profitable momentum strategies and they tend to have lower Sharpe than mean-reverting strategies because

  1. Have long look-back periods, so the number of independent trading signals is few and far in between leading to lower Sharpe. 
  2. Momentum crashes - these strategies perform miserably for several years after crashes. During this period momentum is replaced by mean reversion. Momentum crashes are caused by strong rebound of short positions following market crisis.
  3. The duration over which news-driven momentum remain in force gets progressively shorter as more traders catch on to it. This constantly shortening of holding period has no predictable schedule.
Looking at the pros:
  1. Ease of risk management - there are two common type of exit strategies for momentum: time-based and stop-loss. Stop-losses are consistent with momentum strategies. If momentum has changed direction we should enter the opposite position and hence this change of momentum serves as a natural stop-loss. Stop-losses are not consistent with mean-reversion. Hence, momentum losses are always limited. 
  2. Momentum strategies can thrive in risky environment as well. For mean-reverting strategies the upside is limited by the mean they will revert to but the downside can be unlimited. For momentum strategies, their upside is unlimited, while their downside is limited. The more often 'black swan' event occurs, the more likely that a momentum strategy will benefit from them. The thicker the tails of the return distribution curve, or higher the kurtosis, the better that market is for momentum strategies. 
  3. Most futures and currencies exhibit momentum, allowing us to truly diversify the risk across different asset-classes and countries. 

Ch7 - Intraday Momentum Strategies

Time series momentum is typically long - month or longer, resulting in lower Sharpe and lower statistical significance due to infrequent independent trading signals. They also suffer from under performance after crashes. Short term intraday strategies do not suffer from these drawbacks. Apart from roll returns reason the other three reasons for momentum also operate at intraday time frame. An additional reason for intraday momentum is 'triggering of stops', causing breakout strategies. 

Intraday momentum can be triggered by specific events beyond just price actions like corporate news of earning announcements, analyst recommendation changes, or macro-economic news. Intraday momentum can also be triggered by actions of large funds, e.g. daily rebalancing of leveraged ETFs leads to short-term momentum.  Finally, the imbalance of bid and ask sizes, the changes in order flows, or nonuniform distribution of stop orders can all induce momentum in prices. 

Opening Gap strategy

Buying when the instrument gaps up, and shorting when it gaps down. Works best for Dow Jones STOXX 50. This produces an IR of 1.4 from 2004 to 2012. For currencies the daily "open" and "close" need to be defined differently, close to 5 PM ET and open to 5 AM ET (London open). The same strategy for GBPUSD has an IR of 1.3 from 2007-2012. Overnight or weekend gap trigger momentum because they accumulate un-acted information. The execution of the stop orders often lead to momentum because a cascade effect may trigger stop orders placed further away from the open price as well. Alternatively, there may be significant events that occurred over-night.

News driven momentum strategy

Slow diffusion of news makes the momentum at few days, hours, or seconds after post-earnings, and other corporate and macroeconomic news. Post earning announcement drift still exist but the duration has reduced. As recent as 2011, if we enter the market open after earning announcement was made after previous close, buying back the stock if the returns are very positive and shorting if the returns are very negative, and liquidate the position at the day's close we can make good returns. has such data. We can use 90 day moving standard deviation of previous-close-to-next day's open return as the benchmark for deciding whether the announcement is 'surprising' enough to generate the post announcement drift. For a universe of S&P 500 stocks an IR of 1.5 is available from 2011-2012. This can be levered up 4 times as it is an intraday strategy. Holding the positions overnight is not rewarding, the returns overnight are negative. 10-20 years ago PEAD lasted 1-2 days, more recently the momentum has shortened. 

Drift due to other events

Earning guidance, analyst ratings and recommendation changes, same store sales, airline load factors (provided by Dow Jones Newswire delivered by Newsware which is machine readable). See Hafez 2011 for a comprehensive list. Merger and acquisitions can also deliver news momentum kind of strategies. It is interesting to note that acquiree's stock price falls more than the acquirer's after the initial announcement of the acquisition.

Index composition changes generate buying and selling and create momentum. When a stock is added to an index there is buying pressure immediately after the announced changes. These drift horizons have changed from days to intraday.

The impact of macroeconomic events such as Federal Open Market Committee's rate decisions or the release of latest consumer price index do not produce any significant momentum on EURUSD. Clare and Courtenay 2001 report that UK macroeconomic data releases and Bank of England interest rate announcements induced momentum in GBPUSD for up to 10 minutes.

Levered ETF Strategy

The constant leverage requirement has some counter-intuitive consequences. If there is a big drop one would need to substantially reduce the positions in the levered portfolio to keep the leverage constant. and vise versa as a holder of the leveraged ETF. These rebalancing happen at market close and produce momentum. As a strategy buy DRN (real estate ETF) if return from previous day's close to 15 minutes before market close is greater than 2 percent, and sell if the returns are smaller than -2 percent. Exit at market close. This gives and IR of 1.8 from 2011-2012.

As the aggregate assets of these ETF increase the returns of the strategy increase. The total AUM of levered ETFs is 19 billion Cheng and Madhavan 2009 which can create a big order at close. Rodier, Haryanto, Shum and Hejazi 2012 have updated this analysis.

The flow of investor's cash also effect the momentum. A large inflow will cause positive momentum on the underlying's price. A large inflow into short leveraged ETF will cause negative momentum.

High frequency strategies

Most of them extract information from the order book. e.g. if bid size is much bigger than the ask size, expect the price to tick up and vice versa (Maslov and Mills 2001). The effect is stronger for lower volume stocks. Books on microstructure (Arnuk and Saluzzi 2012, Durbin 2010, Harris 2003, Sinclair 2010) describe a lot of hig-frequency momentum strategy. 'Ratio trades' can be used for momentum profits in markets that fill orders on a pro-rata basis such as eurodollar futures on CME. 'Ticking or quote matching' can be used when the bid-ask spread is bigger than two ticks, and there is expectation of an uptick.

'Momentum ignition' is to create an illusion of buying pressure (or vice versa). This works for market with time priority for orders. 'Flipping' can be used to generate artificial imbalance. Private data feed from exchanges like ITCH from Nasdaq, EDGX from Direct edge, PITCH from BATS can be used to detect flippers.

These strategies and defenses show that high-frequency traders can profit from slower traders only. Due to this the quote sizes have decreased and large orders are broken into smaller orders. 'Stop hunting' strategies exploit the short-term momentum when the resistance is breached. These resistance levels are either reported daily by banks or just be round numbers in the proximity of the current price levels. This is because there are  a large number of stop orders placed at or near the support and resistance level.

Order flow information is good predictor of price movements, because market makers can distill important fundamental information from order flow information, and set bid-ask accordingly. The urgency of using market orders indicates that the information is new and not widely known. For Stocks and futures we can monitor and record every tick and determine whether a transaction took place at bid or ask. We can then compute the cumulative or average order flow over some look-back period and use that to predict whether the price will move up or down.

Ch8 - Risk Management

Risk aversion - an average human being needs to have the potential for making 2 to compensate for the risk of losing 1 (which is why Sharpe of 2 is so appealing Kahneman 2011). This dislike for risk is not rational. The goal should be maximizing long term equity growth. The key concept is the prudent use of leverage, which can be optimized using Kelly formula or some numerical methods that maximize compounded growth rate. In short term draw-down control is much more important, which can be limited by stop-losses, but it is problematic. The other way is constant proportion portfolio insurance, which tries to maximize the upside of the account in addition to preventing large drawdowns. Finally, stopping trading during high risk of loss can be used using leading indicators of risk as an effective loss-avoidance technique.

Optimal Leverage

For managing own money, where maximizing net worth over long term is important and short-term draw-downs and volatility of returns are not important 

Kelly Formula

Optimization of Expected Growth Rate using simulated returns

Optimization of historical growth rate

Maximum drawdown

Constant Proportion portfolio Insurance

Stop Loss

Risk Indicators