Code Organization

Code organization of `amp`

Conventions

In this code organization files we use the following conventions:
Comments: """foobar is ..."""
Dirs and subdirs: /foobar
Files: foobar.py
Objects: FooBar
Markdown files: foobar.md
The directories, subdirectory, objects are listed in order of their dependencies (from innermost to outermost)
When there is a dir in one repo that has the same role of a dir in an included repo we add the suffix from the repo to make them unique
E.g., a dataflow dir in lemonade is called dataflow_lem
We assume that there is no filename repeated across different repos
This holds for notebooks, tests, and Python files
To disambiguate we add a suffix to make it unique (e.g., _lem)
Since the code is split in different repos for access protection reason, we assume that if the repos could be merged into a single one, then the corresponding dirs could be collapsed (e.g., //amp/dataflow and //lime/dataflow_lem) without violating the dependencies
TODO(gp): Not sure about this
E.g.,
We want to build a HistoricalDataSource (from //amp/dataflow/system/source_nodes.py) with inside an IgReplayedTimeMarketDataInterface (from //lime/market_data_lime/eg_market_data.py)
The object could be called IgHistoricalDataSource since it's a specialization of an HistoricalDataSource using IG data
The file:
- Can't go in //lime/market_data since dataflow_lime depends on market_data
- Needs to go in //lime/dataflow_lime/system
- Can be called eg_historical_data_source.py

Finding deps

Using `invoke find_dependency`

> i find_dependency --module-name "amp.dataflow.model" --mode "find_lev2_deps" --ignore-helpers --only-module dataflow

Using grep

To check for dependencies between one module (e.g., dataflow/model) and another (e.g., dataflow/system): ```

(cd dataflow/model/; jackpy "import ") | grep -v notebooks | grep -v test | grep -v init | grep "import dataflow.system" | sort ```

Using Pydeps

Install ```

pip install pydeps pip install dot ```
Test on a small part of the repo: ```

pydeps . --only helpers -v --show-dot -o deps.dot ```
Run on helpers ```

pydeps --only helpers -x helpers.test -x helpers.old -x helpers.telegram_notify -vv --show-dot -o deps.html --max-bacon 2 --reverse ```

Component dirs

Useful utilities are:

> dev_scripts/vi_all_py.sh im

> tree.sh -d 1 -p im_v2/ccxt/data
im_v2/ccxt/data/
|-- client/
|-- extract/
|-- qa/
`-- __init__.py

4 directories, 1 file

`helpers/`

helpers/
"""Low-level helpers that are general and not specific of this project"""

`core/`

core/
"""Low-level helpers that are specific of this project"""
/config
- Config
- """A dict-like object that allows to configure workflows"""
event_study/
artificial_signal_generators.py
features.py
finance.py
signal_processing.py
statistics.py

`devops/`

devops/
TODO(gp): reorg

`dev_scripts/`

/dev_scripts
TODO(gp): reorg

`im/`

sorrentum_sandbox/
common/
- validate.py: implement QaCheck and DatasetValidator
- TODO(gp): Move this to im_v2
im/
TODO(gp): Merge with im_v2
im_v2/
TODO(gp): Rename im_v2 to datapull
airflow/
- """Airflow DAGs for downloading, archiving, QA, reconciliation"""
- airflow_utils/
aws/
- """Deploy Airflow DAGs definition"""
- aws_update_task_definition.py
binance/
- """ETL for Binance data"""
bloomberg/
- """ETL for Bloomberg data"""
ccxt/
- """ETL for CCXT"""
- data/
- client
- extract
  - extractor.py
- qa
- db/
- universe/
- utils.py
common/
- data/
- client/
  - """Objects to read data from ETL pipeline"""
- extract/
  - """Scripts and utils to perform various extract functions"""
  - extract_utils.py: library functions
- qa/
  - """Utils to perform QA checks"""
- transform/
  - """Scripts and utils to transform data"""
  - E.g., convert PQ to CSV, resample data
  - transform_utils.py
- data_snapshot/
- db/
- db_utils.py: manage IM Postgres DB
- extract/
- extract_utils.py
- universe/
- full_symbol.py: implement full symbol (e.g., binance:BTC_USDT)
- universe.py: get universe versions and components
- universe_utils.py
crypto_chassis/
- """ETL for CryptoChassis"""
devops/
ig/
kibot/
mock1/

`market_data/`

market_data/
"""Interface to read price data"""
MarketData
ImClientMarketData
RealTimeMarketData
ReplayedMarketData
TODO(gp): Move market_data to datapull/market_data

`dataflow/`

dataflow/
"""DataFlow module"""
core/
- nodes/
- """Implementation of DataFlow nodes that don't depend on anything outside of this directory"""
- base.py
  - FitPredictNode
  - DataSource
- sources
  - FunctionDataSource
  - DfDataSource
  - ArmaDataSource
- sinks.py
  - WriteCols
  - WriteDf
- transformers.py
- volatility_models.py
- sklearn_models.py
- unsupervided_sklearn_models.py
- supervided_sklearn_models.py
- regression_models.py
- sarimax_models.py
- gluonts_models.py
- local_level_models.py
- Dag
- DagBuilders
- DagRunners
- ResultBundle
pipelines/
- """DataFlow pipelines that use only core nodes"""
- event_study/
- features/
- """General feature pipelines"""
- price/
- """Pipelines computing prices"""
- real_times/
- TODO(gp): -> dataflow/system
- returns/
- """Pipelines computing returns"""
- dataflow_example.py
- NaivePipeline
system/
- """DataFlow pipelines with anything that depends on code outside of DataFlow"""
- source_nodes.py
- DataSource
- HistoricalDataSource
- RealTimeDataSource
- sink_nodes.py
- ProcessForecasts
- RealTimeDagRunner
model/
- """Code for evaluating a DataFlow model"""
oms/
"""Order management system"""
architecture.md
Broker
Order
OrderProcessor
Portfolio
ForecastProcessor
/optimizer
/research_amp

dataflow dependencies

dataflow/core
Should not depend on anything in dataflow
dataflow/pipelines
-> core since it needs the nodes
dataflow/model
-> core
dataflow/backtest
"""contain all the code to run a backtest"""
-> core
-> model
dataflow/system
-> core
-> backtest
-> model
-> pipelines
TODO(gp): Move backtest up

Top level dirs

(cd amp; tree -L 1 -d --charset=ascii -I "*test*|*notebooks*" 2>&1 | tee /tmp/tmp)
.
|-- core
|-- dataflow
|-- helpers
|-- im
|-- im_v2
|-- infra
|-- market_data
|-- oms
|-- optimizer
`-- research_amp

Invariants

We assume that there is no file with the same name either in the same repo or across different repos
In case of name collision, we prepend as many dirs as necessary to make the filename unique
E.g., the files below should be renamed:

```bash

ffind.py utils.py | grep -v test ./amp/core/config/utils.py -> amp/core/config/config_utils.py

./amp/dataflow/core/utils.py -> amp/dataflow/core_config.py

./amp/dataflow/model/utils.py -> amp/dataflow/model/model_utils.py ``` - Note that this rule makes the naming of files depending on the history, but it minimizes churn of names

Misc

To execute a vim command, go on the line

bash :exec '!'.getline('.') :read /tmp/tmp

To inline in vim

bash !(cd amp; tree -v --charset=ascii -I "*test*|*notebooks*" market_data 2>&1 | tee /tmp/tmp) :read /tmp/tmp

Print only dirs

```bash

tree -d ```

Print only dirs up to a certain level

```bash

tree -L 1 -d ```

Sort alphanumerically

```bash

tree -v ```

Print full name so that one can also grep

```bash

tree -v -f --charset=ascii -I "test|notebooks" | grep amp | grep -v dev_scripts

-- dataflow/system |-- dataflow/system/__init__.py |-- dataflow/system/real_time_dag_adapter.py |-- dataflow/system/real_time_dag_runner.py |-- dataflow/system/research_dag_adapter.py |-- dataflow/system/sink_nodes.py-- dataflow/system/source_nodes.py ```

Code Organization

Code organization of amp

Conventions

Finding deps

Using invoke find_dependency

Using grep

Using Pydeps

Component dirs

helpers/

core/

devops/

dev_scripts/

im/

market_data/

dataflow/

dataflow dependencies

Top level dirs

Invariants

Misc

Code organization of `amp`

Using `invoke find_dependency`

`helpers/`

`core/`

`devops/`

`dev_scripts/`

`im/`

`market_data/`

`dataflow/`