## Saturday, June 20, 2015

### Developing high frequency equity trading models: Infantino and Itzhaki (2010)

Seconds to minutes horizon. PCA based equity market neutral reversal strategy combined with regime switching gives handsome results.

#### Ch 1: Introduction

We want a short term valuation and identify the regime if the market will act in line of against the valuation. With so much noise, we should not expect high precision in our solutions. We only need to be slightly precise to generate decent alpha in a high frequency environment, with approx holding periods on the orders of seconds to minutes. By fundamental law of active management: $IR = IC \sqrt{Breadth}$. where,  $IR$ is the information ratio, $IC$ is the Information coefficient (correlation between predicted and real values) and $Breadth$ is the number of independent decisions made on a trading strategy in one year. An $IC$ of 0.05 is huge!

Ultra high frequency traders (millisecond technology players) make their profits by providing liquidity. They do not attempt to correct the mispricing in high frequency domain (second to minute), due to their shorter holding periods (Jurek, Yang 2007).

The model is a mean-reversion model as described in Khandani and Lo (2007) - 'what happened to the quants?' - to analyze the quant meltdown of August 2007. The weight of security i at date t is given by,
$$w_{i,t}=-\frac{1}{N}(R_{i,t-k}-R_{m,t-k})$$
where $R_{m,t-k}=\frac{1}{n}\Sigma_{i=1}^{N} R_{i,t-k}$. This is a market neutral strategy. Daily re-balancing correspond to $k=1$. These produce huge IRs at daily frequency and even more impressive numbers as the frequency is increased to 60 mins to 5 mins. This assumes every security has a CAPM beta close to 1 (which will be addressed using PCA).

Avellaneda and Lee (2010) describe statistical arbitrage with holding period from seconds to weeks. Pairs trading is the fundamental idea based on the expectation that one stock tracks the other, after controlling for beta in the following relationship, for stock P and Q
$$\frac{dP_t}{P_t}=\alpha dt+\beta \frac{dQ_t}{Q_t}+dX_t,$$
where $X_t$ is the mean reverting process to be traded on. The stock returns can be decomposed to systematic and idiosyncratic components by using PCA giving
$$\frac{dP_t}{P_t}=\alpha dt+\Sigma_{j=1}^{n}\beta_j F_t^{(j)}+dX_t,$$
where $F^{(j)}_t$ represent the risk factors of the  market/cluster under consideration.

These ideas will be merged and utilized in a slightly different sense in this paper.

#### Ch 2: The model

Log returns and cumulative returns: This is only the predictive part of the step. Regime switch will be tackled in next chapter. We use log returns ($ln(1+r_t)$ assumed to be normal) as compounding of returns is easy and normality holds when compounding returns over a larger period. Further prices have log normal distribution and log returns are close approximation of real returns, i.e. $ln(1+r) \approx r$, for $r<<1$. Also, by using cumulative returns we take advantage of CLT, and build a model to predict cumulative returns, with the cumulative returns of the principal components.

PCA: We use PCA for valuation using OLS for predictive modeling. This statistical in nature as 'identity' of the risk factor is not cared about. At seconds time frame, instead of Debt to Equity ratio, Current ratio and Interest coverage it is the positioning and flow of hedge funds, brokers and asset managers which is much more a driving factor. Orthogonality of PCA also avoids multi-collinearity in OLS. PCA have also been shown to identify market factors without bias to market capitalization. Finally, PCA uses implicitly the variance-co-variance matrix of returns, giving different threshold for each stock reversion, based on different combination of PCs for each of them. This address the basic flaw of having to assume a general threshold for the entire universe, with a CAPM based beta close to one for every security.

Model description: The steps are -
1) Define the stock universe - 50 stocks randomly chosen from S&P500. 1000 will need clustering techniques. Collected top of the book bid-ask quotes on the tick data for each trading day (2009).
2) Intervalize dataset - one-second intervals using the first mid-price quote of the second.
3) Calculate log-returns - calculate log returns on the one-second mid-prices.
4) PCA - For N assets and T time steps, demean and calculate the eigenvectors for the first k eigenvalues (of covariance matrix $\Sigma$) as columns into $\Phi$ and then calculate the dimensionally reduced returns $D$ of principal components.
$$D = [\Phi^T(X-M)^T]^T$$
where $M$ is the mean vector of $X$.
5) Build prediction model - Following Campbell, Lo and MacKinlay (1997) we ran regression on future accumulated log returns with the last sum of H-period dimensionally reduced returns in the principal component space:
$$r_{t+1}+...+r_{t+H}=\beta_1\Sigma_0^H D_{t-i,1}+...+\beta_{k}\Sigma_{0}^H D_{t-i,k}+\eta_{t+H,H}$$
Which can be represented in matrix form as:
$$S = \hat{D_t}B.$$
To form the mean-value signal we add back the mean
$$\hat{S}=S + M_t$$
The base assumption is that the principal components explain the main risk factors that should drive the stock's returns in a systematic way, and the residuals are the noise we will try to get rid of. IF we see that the last H-period accumulated log-returns have been higher than the signal, we assume that the stock is overvalues and thus place a sett order. Thus the final signal is $\hat{S} - \Sigma^{H} r_i$.

Since this is a liquidity providing strategy, trading cost should hurt less, relatively. A lag of 1 second is assumed.

Results : shows a negative Sharpe of -1.84 for 2009 with a drawdown of -65% at an annualized volatility of 16.6%. Positive returns for first quarter and then negative.

#### Ch3: Regime switching model

The mean reversion model itself is not profitable, at all times. Change of market behavior has to be determined beyond the fair value (sentiment?). The two main regimes in which the market work is momentum and mean-reversion. Under momentum regime we expect the returns to further diverge from the theoretical returns. Adaptive market hypothesis is applicable, particularly, to high frequency world, where 'poker' is played and irrationality would be common, Lo and Mueller (2010).

Since principal components are the main risk factors, they are the ones who can justify the two regimes. Momentum regime is related to the sprouting of dislocations in the market - measured by the cross sectional volatility of the Principal components, $\sigma_D(t)$. The key observation is: as the short term changes in $\sigma_D$ appeared to be more pronounced (identified by very narrow peaks in the $\sigma_D$ time series), cumulative returns from the basic mean-reversion strategy seemed to decrease (or momentum sets up). Changes in $\sigma_D$ over time are defined by $\psi = d\sigma_D/dt$ and the cumulative returns of the basic strategy by $\rho(t)$. We define the measure $E_H$ as:
$$E_H(t)=\sqrt{\Sigma_{i=0}^H[\psi(t-i)]^2}.$$
There is a pretty consistent negative correlation between $E_H(t-1)$ and $\rho(t)$. This allows for the identification of the strength of mean reversion strategy in next second. The sprouting of principal components dislocation at time t triggers momentum at time t+1. The regime switching strategy would then follow $E_H(t)-E_H(t-1)$ at time t. IF this value is greater than zero, we understand that the dislocation is increasing and we trade on 'momentum', otherwise we stick to 'mean-reversion' behavior. We see that 'momentum' seems to be linked to the 'acceleration' of $\sigma_D$.

Results: After applying the regime switching conditioning the Sharpe is +7.67 (2009) with a max drawdown of -1.45% at 10.03% annualized volatility.

#### Ch4: Potential improvements

1) Clustering - A different set of stocks may give us very different PCs. To counter that we can cluster the stocks into smaller buckets, each characterized by their PCs.
2) Selection of Eigenvectors - Selection of eigenvectors could be better. One can get a time series of number of eigenvectors that maximize the Sharpe at a given time, and then run an auto-regressive model to determine the number of eigenvectors to use in future.
3) NLS - Instead of taking simple sum of the returns, one can do weighted sum using beta functions, changing it from OLS to an NLS problem.
4) Other - Numerical speed can be obtained by using SVD decomposition instead of covariance matrix computations, using Marquardt Levenberg algorithm for NLS and GPU.

#### Ch5: Conclusions

Alpha source found between ultra high frequency and traditional statistical arbitrage environment.