Data pipeline

Note: the beta data pipeline is called 'alex'.

We focus on robust, straightforward, agile and modular development, while leveraging our financial markets experience to clean, normalize and prepare the data for ingestion by our models. Without getting into details, financial markets data features specific challenges that one must carefully deal with.

The data pipeline aim is to pull the data from the data providers and prepare datasets in the format ingested by the quantitative models factory.

Data types:

Note: A futures contract is defined as a legal agreement to buy or sell a standardized asset on a specific date or during a specific month (or without specific expiry date in the case of perpetual contracts) that is facilitated through a futures exchange.

1. Cryptocurrencies OHLC (Open, High, Low, Close): perpetual futures trade data aggregated by hourly bins and split by Open, High, Low and Close columns. Hourly bins provide enough granularity for the multi days horizon forecasting models

2. Global macro futures: global equities, global fixed income, currencies and comodities from the CME, CBOT, COMEX, NYMEX exchanges; aggregated by hourly bins

There are two levels of preparation:

1. L1 level: the raw data format as obtained directly from the exchange

2. L2 level: we prepare the data in a format that will be understood by the quant models factory (time bins, etc)

PreviousProduction data pipeline NextFactors construction

Last updated 1 year ago