## Sunday, June 28, 2015

## Saturday, June 20, 2015

### Developing high frequency equity trading models: Infantino and Itzhaki (2010)

Seconds to minutes horizon. PCA based equity market neutral reversal strategy combined with regime switching gives handsome results.

Ultra high frequency traders (millisecond technology players) make their profits by providing liquidity. They do not attempt to correct the mispricing in high frequency domain (second to minute), due to their shorter holding periods (Jurek, Yang 2007).

The model is a mean-reversion model as described in Khandani and Lo (2007) - 'what happened to the quants?' - to analyze the quant meltdown of August 2007. The weight of security i at date t is given by,

$$w_{i,t}=-\frac{1}{N}(R_{i,t-k}-R_{m,t-k})$$

where $R_{m,t-k}=\frac{1}{n}\Sigma_{i=1}^{N} R_{i,t-k}$. This is a market neutral strategy. Daily re-balancing correspond to $k=1$. These produce huge IRs at daily frequency and even more impressive numbers as the frequency is increased to 60 mins to 5 mins. This assumes every security has a CAPM beta close to 1 (which will be addressed using PCA).

Avellaneda and Lee (2010) describe statistical arbitrage with holding period from seconds to weeks. Pairs trading is the fundamental idea based on the expectation that one stock tracks the other, after controlling for beta in the following relationship, for stock P and Q

$$\frac{dP_t}{P_t}=\alpha dt+\beta \frac{dQ_t}{Q_t}+dX_t,$$

where $X_t$ is the mean reverting process to be traded on. The stock returns can be decomposed to systematic and idiosyncratic components by using PCA giving

$$\frac{dP_t}{P_t}=\alpha dt+\Sigma_{j=1}^{n}\beta_j F_t^{(j)}+dX_t,$$

where $F^{(j)}_t$ represent the risk factors of the market/cluster under consideration.

These ideas will be merged and utilized in a slightly different sense in this paper.

PCA: We use PCA for valuation using OLS for predictive modeling. This statistical in nature as 'identity' of the risk factor is not cared about. At seconds time frame, instead of Debt to Equity ratio, Current ratio and Interest coverage it is the positioning and flow of hedge funds, brokers and asset managers which is much more a driving factor. Orthogonality of PCA also avoids multi-collinearity in OLS. PCA have also been shown to identify market factors without bias to market capitalization. Finally, PCA uses implicitly the variance-co-variance matrix of returns, giving different threshold for each stock reversion, based on different combination of PCs for each of them. This address the basic flaw of having to assume a general threshold for the entire universe, with a CAPM based beta close to one for every security.

Model description: The steps are -

1) Define the stock universe - 50 stocks randomly chosen from S&P500. 1000 will need clustering techniques. Collected top of the book bid-ask quotes on the tick data for each trading day (2009).

2) Intervalize dataset - one-second intervals using the first mid-price quote of the second.

3) Calculate log-returns - calculate log returns on the one-second mid-prices.

4) PCA - For N assets and T time steps, demean and calculate the eigenvectors for the first k eigenvalues (of covariance matrix $\Sigma$) as columns into $\Phi$ and then calculate the dimensionally reduced returns $D$ of principal components.

$$D = [\Phi^T(X-M)^T]^T$$

where $M$ is the mean vector of $X$.

5) Build prediction model - Following Campbell, Lo and MacKinlay (1997) we ran regression on future accumulated log returns with the last sum of H-period dimensionally reduced returns in the principal component space:

$$r_{t+1}+...+r_{t+H}=\beta_1\Sigma_0^H D_{t-i,1}+...+\beta_{k}\Sigma_{0}^H D_{t-i,k}+\eta_{t+H,H}$$

Which can be represented in matrix form as:

$$S = \hat{D_t}B.$$

To form the mean-value signal we add back the mean

$$\hat{S}=S + M_t$$

The base assumption is that the principal components explain the main risk factors that should drive the stock's returns in a systematic way, and the residuals are the noise we will try to get rid of. IF we see that the last H-period accumulated log-returns have been higher than the signal, we assume that the stock is overvalues and thus place a sett order. Thus the final signal is $\hat{S} - \Sigma^{H} r_i$.

Since this is a liquidity providing strategy, trading cost should hurt less, relatively. A lag of 1 second is assumed.

Results : shows a negative Sharpe of -1.84 for 2009 with a drawdown of -65% at an annualized volatility of 16.6%. Positive returns for first quarter and then negative.

Since principal components are the main risk factors, they are the ones who can justify the two regimes. Momentum regime is related to the sprouting of dislocations in the market - measured by the cross sectional volatility of the Principal components, $\sigma_D(t)$. The key observation is:

$$E_H(t)=\sqrt{\Sigma_{i=0}^H[\psi(t-i)]^2}.$$

There is a pretty consistent negative correlation between $E_H(t-1)$ and $\rho(t)$. This allows for the identification of the strength of mean reversion strategy in next second. The sprouting of principal components dislocation at time t triggers momentum at time t+1. The regime switching strategy would then follow $E_H(t)-E_H(t-1)$ at time t. IF this value is greater than zero, we understand that the dislocation is increasing and we trade on 'momentum', otherwise we stick to 'mean-reversion' behavior. We see that 'momentum' seems to be linked to the 'acceleration' of $\sigma_D$.

Results: After applying the regime switching conditioning the Sharpe is +7.67 (2009) with a max drawdown of -1.45% at 10.03% annualized volatility.

2) Selection of Eigenvectors - Selection of eigenvectors could be better. One can get a time series of number of eigenvectors that maximize the Sharpe at a given time, and then run an auto-regressive model to determine the number of eigenvectors to use in future.

3) NLS - Instead of taking simple sum of the returns, one can do weighted sum using beta functions, changing it from OLS to an NLS problem.

4) Other - Numerical speed can be obtained by using SVD decomposition instead of covariance matrix computations, using Marquardt Levenberg algorithm for NLS and GPU.

#### Ch 1: Introduction

We want a short term valuation and identify the regime if the market will act in line of against the valuation. With so much noise, we should not expect high precision in our solutions. We only need to be slightly precise to generate decent alpha in a high frequency environment, with approx holding periods on the orders of seconds to minutes. By fundamental law of active management: $IR = IC \sqrt{Breadth}$. where, $IR$ is the information ratio, $IC$ is the Information coefficient (correlation between predicted and real values) and $Breadth$ is the number of independent decisions made on a trading strategy in one year. An $IC$ of 0.05 is huge!Ultra high frequency traders (millisecond technology players) make their profits by providing liquidity. They do not attempt to correct the mispricing in high frequency domain (second to minute), due to their shorter holding periods (Jurek, Yang 2007).

The model is a mean-reversion model as described in Khandani and Lo (2007) - 'what happened to the quants?' - to analyze the quant meltdown of August 2007. The weight of security i at date t is given by,

$$w_{i,t}=-\frac{1}{N}(R_{i,t-k}-R_{m,t-k})$$

where $R_{m,t-k}=\frac{1}{n}\Sigma_{i=1}^{N} R_{i,t-k}$. This is a market neutral strategy. Daily re-balancing correspond to $k=1$. These produce huge IRs at daily frequency and even more impressive numbers as the frequency is increased to 60 mins to 5 mins. This assumes every security has a CAPM beta close to 1 (which will be addressed using PCA).

Avellaneda and Lee (2010) describe statistical arbitrage with holding period from seconds to weeks. Pairs trading is the fundamental idea based on the expectation that one stock tracks the other, after controlling for beta in the following relationship, for stock P and Q

$$\frac{dP_t}{P_t}=\alpha dt+\beta \frac{dQ_t}{Q_t}+dX_t,$$

where $X_t$ is the mean reverting process to be traded on. The stock returns can be decomposed to systematic and idiosyncratic components by using PCA giving

$$\frac{dP_t}{P_t}=\alpha dt+\Sigma_{j=1}^{n}\beta_j F_t^{(j)}+dX_t,$$

where $F^{(j)}_t$ represent the risk factors of the market/cluster under consideration.

These ideas will be merged and utilized in a slightly different sense in this paper.

#### Ch 2: The model

Log returns and cumulative returns: This is only the predictive part of the step. Regime switch will be tackled in next chapter. We use log returns ($ln(1+r_t)$ assumed to be normal) as compounding of returns is easy and normality holds when compounding returns over a larger period. Further prices have log normal distribution and log returns are close approximation of real returns, i.e. $ln(1+r) \approx r$, for $r<<1$. Also, by using cumulative returns we take advantage of CLT, and build a model to predict cumulative returns, with the cumulative returns of the principal components.PCA: We use PCA for valuation using OLS for predictive modeling. This statistical in nature as 'identity' of the risk factor is not cared about. At seconds time frame, instead of Debt to Equity ratio, Current ratio and Interest coverage it is the positioning and flow of hedge funds, brokers and asset managers which is much more a driving factor. Orthogonality of PCA also avoids multi-collinearity in OLS. PCA have also been shown to identify market factors without bias to market capitalization. Finally, PCA uses implicitly the variance-co-variance matrix of returns, giving different threshold for each stock reversion, based on different combination of PCs for each of them. This address the basic flaw of having to assume a general threshold for the entire universe, with a CAPM based beta close to one for every security.

Model description: The steps are -

1) Define the stock universe - 50 stocks randomly chosen from S&P500. 1000 will need clustering techniques. Collected top of the book bid-ask quotes on the tick data for each trading day (2009).

2) Intervalize dataset - one-second intervals using the first mid-price quote of the second.

3) Calculate log-returns - calculate log returns on the one-second mid-prices.

4) PCA - For N assets and T time steps, demean and calculate the eigenvectors for the first k eigenvalues (of covariance matrix $\Sigma$) as columns into $\Phi$ and then calculate the dimensionally reduced returns $D$ of principal components.

$$D = [\Phi^T(X-M)^T]^T$$

where $M$ is the mean vector of $X$.

5) Build prediction model - Following Campbell, Lo and MacKinlay (1997) we ran regression on future accumulated log returns with the last sum of H-period dimensionally reduced returns in the principal component space:

$$r_{t+1}+...+r_{t+H}=\beta_1\Sigma_0^H D_{t-i,1}+...+\beta_{k}\Sigma_{0}^H D_{t-i,k}+\eta_{t+H,H}$$

Which can be represented in matrix form as:

$$S = \hat{D_t}B.$$

To form the mean-value signal we add back the mean

$$\hat{S}=S + M_t$$

The base assumption is that the principal components explain the main risk factors that should drive the stock's returns in a systematic way, and the residuals are the noise we will try to get rid of. IF we see that the last H-period accumulated log-returns have been higher than the signal, we assume that the stock is overvalues and thus place a sett order. Thus the final signal is $\hat{S} - \Sigma^{H} r_i$.

Since this is a liquidity providing strategy, trading cost should hurt less, relatively. A lag of 1 second is assumed.

Results : shows a negative Sharpe of -1.84 for 2009 with a drawdown of -65% at an annualized volatility of 16.6%. Positive returns for first quarter and then negative.

#### Ch3: Regime switching model

The mean reversion model itself is not profitable, at all times. Change of market behavior has to be determined beyond the fair value (sentiment?). The two main regimes in which the market work is momentum and mean-reversion. Under momentum regime we expect the returns to further diverge from the theoretical returns. Adaptive market hypothesis is applicable, particularly, to high frequency world, where 'poker' is played and irrationality would be common, Lo and Mueller (2010).Since principal components are the main risk factors, they are the ones who can justify the two regimes. Momentum regime is related to the sprouting of dislocations in the market - measured by the cross sectional volatility of the Principal components, $\sigma_D(t)$. The key observation is:

*as the short term changes in $\sigma_D$ appeared to be more pronounced (identified by very narrow peaks in the $\sigma_D$ time series), cumulative returns from the basic mean-reversion strategy seemed to decrease (or momentum sets up).*Changes in $\sigma_D$ over time are defined by $\psi = d\sigma_D/dt$ and the cumulative returns of the basic strategy by $\rho(t)$. We define the measure $E_H$ as:$$E_H(t)=\sqrt{\Sigma_{i=0}^H[\psi(t-i)]^2}.$$

There is a pretty consistent negative correlation between $E_H(t-1)$ and $\rho(t)$. This allows for the identification of the strength of mean reversion strategy in next second. The sprouting of principal components dislocation at time t triggers momentum at time t+1. The regime switching strategy would then follow $E_H(t)-E_H(t-1)$ at time t. IF this value is greater than zero, we understand that the dislocation is increasing and we trade on 'momentum', otherwise we stick to 'mean-reversion' behavior. We see that 'momentum' seems to be linked to the 'acceleration' of $\sigma_D$.

Results: After applying the regime switching conditioning the Sharpe is +7.67 (2009) with a max drawdown of -1.45% at 10.03% annualized volatility.

#### Ch4: Potential improvements

1) Clustering - A different set of stocks may give us very different PCs. To counter that we can cluster the stocks into smaller buckets, each characterized by their PCs.2) Selection of Eigenvectors - Selection of eigenvectors could be better. One can get a time series of number of eigenvectors that maximize the Sharpe at a given time, and then run an auto-regressive model to determine the number of eigenvectors to use in future.

3) NLS - Instead of taking simple sum of the returns, one can do weighted sum using beta functions, changing it from OLS to an NLS problem.

4) Other - Numerical speed can be obtained by using SVD decomposition instead of covariance matrix computations, using Marquardt Levenberg algorithm for NLS and GPU.

#### Ch5: Conclusions

Alpha source found between ultra high frequency and traditional statistical arbitrage environment.## Friday, June 19, 2015

### Trends in Quantitative Finance: Fabozzi, Focardi and Kolm (2006)

As the last few lines of the preface for this article says "an excellent and comprehensive survey of the challenges one meets in using quantitative methods for portfolio construction and forecasting." This is for non-technical audience but extremely relevant to technical people too, as an appetizer. The right 'putting pin on the board' article to read, before you indulge in your own personal niche research.

#### Ch 1: Forecasting financial markets

A price/return process is predictable if its distribution depends on present information set and is unpredictable if its distribution is time-invariant.Market's partial predictability is theoretically inevitable. Markets are not made up of rational agents but agents practicing bounded rationality.

#### Ch 2: General Equilibrium Theories - concepts and applicability

Market equilibrium is not unique. Same asset in various states can have different equilibrium values.

#### Ch 3: Extended framework for Applying Modern Portfolio Theory

Mean variance optimization under Robustness is computationally efficient and is a second order conic problem. Approaches are - empirical, factor bases, clustering, Bayesian, stochastic volatility. Departure from normality is addressed using Monte Carlo. Concentrates on dispersion and downside risk.

#### Ch 4 : Equity Tactical strategies (2d-2m)

CAPM (2 fund separation theorem) is a unconditional, static model. The world is conditional, dynamic. Bounded rationality and asymmetry of information creates repeated patterns.

1) delayed response: Leader companies are effected by news first, which then diffuses to lagged companies Kanas and Kouretas (2005). Bhargava and Malhotra (2006) use co-integration to give a definitive answer to the distribution of agent response to same information.

2) momentum: 3 to 12 months. Lo and MacKinlay (1990) analyze a tool to detect whether momentum depend on individual asset or cross-auto-correlation effects.

3) Reversal: less used and more potential. Timing the end of momentum is a necessary ingredient. Jegadeesh and Titman (1993, 2001) and Lewellen(2005) document it.

4) Co-integration and Mean Reversion: Returns can't be co-integrated (stationary), prices are. Identifying pairs is the common approach but not the most successful. Common trends and co-integration relationships should be explored overall.

Static models are not predictive, dynamic are. Only simple dynamic models can be statistically estimated. This is restricted by limited data for complex models and risks and transaction cost considerations.

#### Ch 5: Equity Strategic Ideas (mths - yrs)

Aggregation over time: fractal idea is not completely right. Regressive and auto-regressive models are defined by time horizon of correlation and auto-correlation decay. GARCH does not remain invariant after time aggregation. Regime-shift models also exhibit time scale. Using high frequency correlation to estimate long term correlations.

Market behavior at different time horizon: Day and weeks – depend on trading practices and how traders react to news, long run – quantity of money, global economic performance.

Recognizing regime shifts: discrete shift(e.g. Hamiltonian model) vs continuous shift (e.g. GARCH).

Estimating Approximate models on moving windows: Once regime break is detected, the moving window should become agile to account for that. Static model causes exponentially diverging price, which empirically is a Pareto distribution. Dynamic model can do right long term modeling. Linear models can't capture long term regime shifts, but only periodic movements with a fixed period.

Nonlinear coupling of two dynamic models have been successful – GARCH, Hamilton, Markov.

Mean Reversion of log of prices: compounding – If auto-correlation less than 1, process oscillates around a trend.

Variance ratio test: If the variance grows less rapidly with time, there is mean reversion (Lo MacKinlay 1988). Differentiate trend stationary process and random walk with linear trend. The variance of a random walk keeps on growing with time, but the variance of a trend-stationary process remains constant.

Central tendency - Stock prices can only have linear or stochastic trend.

Time diversification – Less risky in long run vs short run.

#### Ch 6: Machine Learning

Machine learning is progressive learning. AI is useful, but its merits and limits are now more clearly understood. The link between algorithmic process and creative problem solving is the concept of searching.

Neural Network – To generalize layers and notes have to be restricted. Hertz, Krogh and Palmer (1991) give mathematical introduction. Used widely with mild success.

SVM – Performance generally superior to NN.

Classification, regression Trees – satisfactory results using CART.

Genetic Algorithms – satisfactory to remarkable prediction. Used in asset allocation (Armano, Marchesi and Murru 2005). Used to select predictors for NN (Thawornwong and Enke 2004).

Text mining - Automatic text handling has shown promise.

The possibility of replacing intuition and judgment is remote. ML is more extensive and nonlinear handling of broader set of data.

#### Ch 7: Model Selection, Data snooping, over-fitting, and model risk

Alpha ideas need creativity but testing and analysis of models is a well-defined method with scientific foundation. Simpler models are always better than complex models if equally explainable. Trade-off between model complexity and forecasting ability. To avoid looking for chance patterns, one must stick rigorously to the paradigm of statistical tests. Exceptional patters (over smaller subset) are generally spurious. Good practice calls for testing any model against a surrogate random sample generated with a the same statistical characteristics as the empirical sample. Out of sample testing should be the norm. Try a new model on already used data will 'always involve some data snooping'. Re-sampling and cross validation should be used for parameter selection.

If the market is evolutionary (changing slowly) it can be calibrated on moving windows. If it has regime shifts Morkov switching models can be used or alternatively random coefficient models (Longford 1993) could be used (averaging the results of different models).

#### Ch 8: Predictive Models of Return

Risk and returns are partially predictable. The more difficult question is how return predictability can be turned into a profit - to keep risk return trade-off positive.

1) Regressive models - regress return on predictive factors

Two types of dependence of Y on X - first, distribution of Y depend on X and expected value of Y depend on X - second, distribution of Y depend on X, but expected value of y does not depend on X. Concept of regression does not imply any notion of time - time dependence must be checked, e.g. correlation between noise terms. R-square determines the fitness.

a) Static regressive models - not predictive, like CAPM, but uncover unconditional dependence.

b) Predictive regressive models - lagged predictive regression. Distributed lag models to uncover rate of change.

2) Linear Auto-regressive models - a variable is regressed on its own lagged past values. One variable is called AR and multiple variable is called VAR. Vector auto-regressive models can capture cross-auto-correlations, should be used with few factors to reduce number of variables. Two types:

a) stable VAR models - generate stationary process. Response to stable VAR model to each shock is a sum of exponentially decays (with exponent <1) - e.g. EWMA, with most recent having the most effect. Some solutions may be oscillatory which some damped but all remain stationary exhibiting auto-correlation.

b) unstable VAR models - explosive (exponent >1) or integrated (exponent =1). In integrated process, shocks accumulate and never decay (e.g. Log price), error terms are auto-correlated. Generally, first difference gives a stationary process. VAR process with individually integrated series may have a linear combination that is stationary - this is called

**co-integration**(Engle and Granger 1987). Regression of non-stationary (e.g. trends) time series can't rely on R-square or detrending. If both sides have integrated processes test for co-integration, to ascertain meaningful regression. For n integrated processes with k co-integrating relationships (of possible 1 to n-1), there will be n-k common trends such that every other solution can be expressed as a linear regression on these common trends.
3) Dynamic Factor Models - predictive regressive models where predictive factors follow a VAR model. These are compact formulation of state-space model with - observable and hidden state variables. Motivation is generally to reduce dimensionality. Both integrated and stable processes can be modeled. The solution are sum of exponential. Ability to mix levels and differences add to forecasting possibilities.

4) Hidden-variable models - best known linear state-space models are ARCH/GARCH. nonlinear state-space models are MS-VAR (Markov switching vector auto-regressive) family. e.g. Hamilton model (Hamilton 1989) - one random walk with drift for economic expansion and other with a smaller drift for periods of economic recession - switching regime model.

#### Ch 9: Model Estimation

Determining the parameter of the model. Robust estimation is becoming more important to discount noisy inputs. Being a function of data an estimator is a random variable and hence have a sampling distribution. Finite data size is a challenge. Estimation always have a probabilistic interpretation. Three methods:1) Least-square estimation - OLS method of best projection on subspace.

2) Maximum-Likelihood - maximizing the likelihood of sample given assumption of the underlying distribution.

3) Bayesian estimation - posterior distribution is prior distribution multiplied by the likelihood. One need to know the form of distribution.

Robust estimation

Matrices - PCA can be used to reduce dimension, select smaller set of factors and reduce noise.

Regression models - optimal linear estimator OLS, if residuals are normal it coincides with ML. Auto-correlation of residuals does not invalidate OLS, but makes it less efficient. Can be made more robust by making it less sensitive to outliers. Variable selection can be done by penalized objective functions (Ridge, Lasso, elastic net). Clustering makes estimates more robust by averaging - shrinkage, random coefficient model, contextual regressions (Sorensen, Hua, Qian 2005).

Vector auto-regressive models - multivariate OLS. For specific assumptions and limits ML can be used. Unstable VAR is generally concerned with co-integration detection and uses state of art ML method (Johansen 1991, Banerjee and Hendry 1992). Bayesian VAR (BVAR) can be used e.g. Litterman (1986) method.

Liner hidden-variable models - Kalman filter. Two main methods - ML based and subspace based.

Nonlinear hidden-variable models - MS-VAR can be estimated using EM algorithm.

#### Ch 10: Practical consideration with optimizer

softwares are available but understanding of the procedure is important for robustness and accuracy. Fortunately in finance, most problems have one unique optimal solution. Standard forms are - linear, quadratic, convex, conic, nonlinear. Markowitz's optimization is quadratic. Quadratic, convex and conic have unique solutions.Solving the optimization problem - formulate problem, choose optimizer, solve problem. Re-sampled optimization can be used.

#### Ch 11: Industry survey

US $4 trillion asset under management. 2005. US and European managers.Equity return forecasting techniques - Simple methods with economic intuition preferred. Momentum and reversal are most widely used. Regression using predictors like financial ratios is the bread and butter. Desire to combine company fundamentals to sentiments. Auto-regressive, co-integration, state-space, regime-switching, nonlinear methods (NN, DT) are central in some companies. Growing interest in high-frequency data.

Models based on exogenous predictors - operating efficiency, financial strength, earning quality, capital expenditure etc as core bottom-up equity model. Fundamental combined with momentum/reversal models.

Momentum and reversal models - use of multiple time horizon, most widely used, turnover is a concern - weighting and penalty functions used to mitigate it.

co-integration models - performance sensitive to liquidity and volatility. many firms use it as it models short term dynamics and long term equilibrium.

Markov-switching/regime-switching models - not used widely as market timing is difficult to predict.

auto-regressive models - Not been widely evaluated! A step ahead of momentum models. Over-fitting is cautioned.

state-space models - not widely used.

nonlinear models - potential in NN, DT. Some already use it.

models of higher-moment dynamics - e.g GARCH, used by very few.

Model risk mitigation techniques - mostly co-variance matrix estimation.

Bayesian estimation - hard to implement explicitly but used implicitly.

Shrinkage/Averaging - widely used.

random coefficient models - hardly used. randomization of data to ensure against over-fitting used.

Optimization techniques - half the firms use it, other half distrust it.

Robust optimization - re-sampling methods

multistage stochastic optimization - hardly used.

#### Ch 12: Today and tomorrow

Changing markets is a reality. Supervision is required. Regression is the king today. Text mining may be becoming hot.
Subscribe to:
Posts (Atom)