Modelling
Also called 'factors combination'.
1. Motivation and approach
We discribe how the factors are combined together in order to produce robust forecasts.
We build a modelling framework allowing the use of diverse machine learning models.
We believe that the specific machine learning technique used represents only one (although important) part of the modelling process.
Therefore, the framework should not be built around or be too dependent on the ML model.
The framework approach provides scalability, modularity and robustness.
A cross-sectional fit approach is chosen, as opposed to a fit by instrument as the former provides the benefits of minimizing overfitting and making it easy to add new instruments on the fly.
The model is recalibrated monthly and goodness of fit is tested one month forward out of sample on the validation set with a calibrated lookback window, allowing the model to memorize market dynamics over a given timeframe.
The monthly recalibration enables the model to take the most recent market regime into consideration. Market regimes can be either bullish, bearish or sideway.
2. Residualizing coins returns
a. Why do we residualize instruments returns?
Quantitative traders often residualize instrument returns as part of their trading strategies for several reasons. Residualization refers to the process of removing the predictable or systematic components from an instrument's returns, allowing traders to focus on the residual or unexplained component.
Here are the reasons why we residualize instrument returns:
i. Signal extraction: Residualization allows traders to extract and analyze the specific signals or anomalies associated with an instrument's returns. By filtering out the common market factors, traders can identify unique patterns or deviations from the expected behavior. These signals can be utilized to develop trading strategies that capitalize on the instrument's individual characteristics.
ii. Alpha generation: Quantitative traders aim to generate alpha, which refers to returns that are in excess of the market's overall performance. Residualizing returns helps traders in identifying and exploiting sources of alpha that are specific to an instrument. By focusing on the unexplained component of returns, traders can develop strategies that aim to profit from the unique attributes or inefficiencies of individual instruments.
iii. Statistical modelling: Residualization is often employed in statistical modelling and econometric analysis. By removing the systematic components from returns, traders can obtain cleaner and more accurate estimates of the instrument's specific effects. This allows for more robust modelling and analysis, helping traders make informed decisions based on reliable data.
Residualizing instrument returns is a common practice in quantitative trading to isolate specific risk, extract signals, generate alpha, improve statistical modelling, and enhance portfolio diversification. By separating the systematic and idiosyncratic components of returns, we can gain valuable insights and develop effective strategies tailored to individual coins.
b. How to residualize instruments returns?
Residualizing instrument returns involves a two-step process:
i. Estimate the systematic component: The first step is to estimate the systematic or common factors that influence the model aims to explain the instrument's returns using the factors. By fitting the model to the historical data, we obtain the estimated systematic component.
ii. Calculate the residual component: Once we have estimated the systematic component, we calculate the residual component by subtracting the estimated systematic component from the actual instrument returns. The residual component represents the unexplained or idiosyncratic part of the returns that is specific to the instrument.
3. Machine learning
The framework uses three machine learning techniques.
a. The Exponentially Weighted Lasso regression
The exponentially weighted Lasso regression is a variant of the Lasso (Least Absolute Shrinkage and Selection Operator) regression that introduces exponential weighting to the regularization term. It is used for variable selection and parameter estimation in statistical modelling, particularly when dealing with time series data or data with temporal dependencies.
In traditional Lasso regression, the regularization term is the sum of the absolute values of the coefficients multiplied by a tuning parameter (lambda). This penalty term encourages sparse solutions by driving some of the coefficients to zero, effectively performing variable selection.
In exponentially weighted Lasso regression, the regularization term is modified by introducing an exponential decay factor that assigns different weights to the coefficients based on their temporal proximity. The objective is to promote variable selection while considering the temporal dependencies in the data. This is particularly useful when dealing with time series or panel data where observations are correlated over time or across entities.
The exponentially weighted Lasso regression formulation is as follows:
where:
- y is the response variable,
- X is the design matrix of predictor variables,
- beta is the vector of coefficients to be estimated,
- lambda is the tuning parameter that controls the strength of the regularization,
- w is the exponential weighting factor, which assigns different weights to the coefficients based on their temporal proximity.
The exponential weighting factor w can be defined in different ways, depending on the specific application and desired temporal decay. For example, it can be defined as a function of time lags or decay rates.
The exponentially weighted Lasso regression combines the benefits of the Lasso regularization, which promotes sparse solutions and variable selection, with the consideration of temporal dependencies through the exponential weighting. This makes it a useful tool for modelling time series data while accounting for the dynamics and dependencies present in the data.
Note that the L1 regression by the Lasso implies that some features will have a zero weight, as opposed to the Ridge L2 regression which still assigns small weights to useless features.
b. XGBoost and LSTM multi task will be added in future framework updates
Last updated