Workflow

KaizenFlow workflow explanation

This document is a roadmap of most activities that Quants, Quant devs, and DevOps can perform using KaizenFlow.

For each activity we point to the relevant resources (e.g., documents in docs, notebooks) in the repo.

A high-level description of KaizenFlow is KaizenFlow White Paper

Work organization

Issues workflow explained amp/docs/work_organization/ck.issue_workflow.explanation.md
GitHub and ZenHub workflows explained /docs/work_organization/all.use_github_and_zenhub.how_to_guide.md
TODO(Grisha): add more from /docs/work_organization/.

Set-up

TODO(gp): Add pointers to the docs we ask to read during the on-boarding

Documentation_meta

The dir docs/documentation_meta contains documents about writing the documentation
Conventions and suggestions on how to create diagrams in the documentation
/docs/documentation_meta/all.architecture_diagrams.explanation.md
A summary of how to create how-to, tutorial, explanations, reference according to the Diataxis framework
/docs/documentation_meta/all.diataxis.explanation.md
Writing documentation in Google Docs
/docs/documentation_meta/all.gdocs.how_to_guide.md
Writing documentation in Markdown
/docs/documentation_meta/all.writing_docs.how_to_guide.md
Plotting in Latex
/docs/documentation_meta/plotting_in_latex.how_to_guide.md

Quant workflows

The life of a Quant is spent between:

Exploring the raw data
Computing features
Building models to predict output given features
Assessing models

These activities are mapped in KaizenFlow as follows:

Exploring the raw data
This is performed by reading data using DataPull in a notebook and performing exploratory analysis
Computing features
This is performed by reading data using DataPull in a notebook and creating some DataFlow nodes
Building models to predict output given features
This is performed by connecting DataFlow nodes into a Dag
Assessing models
This is performed by running data through a Dag in a notebook or in a Python script and post-processing the results in an analysis notebook
Comparing models
The parameters of a model are exposed through a Config and then sweep over Config lists

`DataPull`

Universe

Universe explanation
/docs/datapull/ck.universe.explanation.md
Analyze universe metadata
/im_v2/common/universe/notebooks/Master_universe_analysis.ipynb
/im_v2/ccxt/notebooks/Master_universe.ipynb

Dataset signature

Organize and label datasets
Helps to uniquely identify datasets across different sources, types, attributes etc.
/docs/datapull/all.data_schema.explanation.md
/docs/datapull/ck.handle_datasets.how_to_guide.md
Inspect RawData
/im_v2/common/notebooks/Master_raw_data_gallery.ipynb
/im_v2/common/data/client/im_raw_data_client.py
Convert data types
/im_v2/common/data/transform/convert_csv_to_pq.py
/im_v2/common/data/transform/convert_pq_to_csv.py
Data download pipelines explanation
/docs/datapull/ck.binance_bid_ask_data_pipeline.explanation.md
/docs/datapull/ck.binance_ohlcv_data_pipeline.explanation.md
Download data in bulk
/im_v2/common/data/extract/download_bulk.py
/im_v2/ccxt/data/extract/download_exchange_data_to_db.py
TODO(Juraj): technically this could be joined into one script and also generalized for more sources
Download data in real time over a given time interval
/im_v2/common/data/extract/periodic_download_exchange_data_to_db.py
Archive data
Helps with optimizing data storage performance/costs by transferring older data from a storage like postgres to S3
Suitable to apply to high frequency high volume realtime orderbook data
/im_v2/ccxt/db/archive_db_data_to_s3.py
Resampling data
/docs/datapull/all.datapull_derived_data.explanation.md
/im_v2/common/data/transform/resample_daily_bid_ask_data.py
ImClient
/docs/datapull/all.im_client.reference.ipynb
MarketData
/docs/datapull/all.market_data.reference.ipynb
How to QA data
/docs/datapull/ck.datapull_data_quality_assurance.reference.md
/im_v2/ccxt/data/qa/notebooks/data_qa_bid_ask.ipynb
/im_v2/ccxt/data/qa/notebooks/data_qa_ohlcv.ipynb
/im_v2/common/data/qa/notebooks/cross_dataset_qa_ohlcv.ipynb
/im_v2/common/data/qa/notebooks/cross_dataset_qa_bid_ask.ipynb
/research_amp/cc/notebooks/Master_single_vendor_qa.ipynb
/research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb
/research_amp/cc/notebooks/compare_qa.periodic.airflow.downloaded_websocket_EOD.all.bid_ask.futures.all.ccxt_cryptochassis.all.v1_0_0.ipynb
How to load Bloomberg data
/im_v2/common/notebooks/CmTask5424_market_data.ipynb
TODO: Generalize the name and make it Master_
Kibot guide
/docs/datapull/ck.kibot_data.explanation.md
/docs/datapull/ck.kibot_timing.reference.md
Interactive broker guide
/docs/datapull/ck.run_ib_connect.how_to_guide.md
/docs/datapull/ck.use_ib_metadata_crawler.how_to_guide.md
How to run IM app /docs/datapull/ck.run_im_app.how_to_guide.md
TODO(gp): Reorg /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.old.ipynb /research_amp/cc/notebooks/Master_machine_learning.ipynb /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.ipynb /research_amp/cc/notebooks/Master_crypto_analysis.ipynb /research_amp/cc/notebooks/Master_model_prediction_analyzer.ipynb /research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.ipynb /im/app/notebooks/Master_IM_DB.ipynb /im/ib/metadata/extract/notebooks/Master_analyze_ib_metadata_crawler.ipynb

`DataFlow`

DAG

General concepts of DataFlow
Introduction to KaizenFlow, DAG nodes, DataFrame as unit of computation, DAG execution
- /docs/dataflow/all.computation_as_graphs.explanation.md
DataFlow data format
- /docs/dataflow/all.dataflow_data_format.explanation.md
Different views of System components, Architecture
- /docs/dataflow/all.dataflow.explanation.md
Conventions for representing time series
- /docs/dataflow/all.time_series.explanation.md
Explanation of how to debug a DAG
- /docs/dataflow/all.dag.explanation.md
Learn how to build a DAG
Build a DAG with two nodes
- /docs/dataflow/all.build_first_dag.tutorial.ipynb
Build a more complex DAG implementing a simple risk model
- /docs/dataflow/all.build_simple_risk_model_dag.tutorial.ipynb
Best practices to follow while building DAG
- /docs/dataflow/all.best_practice_for_building_dags.explanation.md
Learn how to run a DAG
Overview, DagBuilder, Dag, DagRunner
- /docs/dataflow/ck.run_batch_computation_dag.explanation.md
Configure a simple risk model, build a DAG, generate data and connect data source to the DAG, run the DAG
- /docs/dataflow/ck.run_batch_computation_dag.tutorial.ipynb
Build a DAG from a Mock2 DagBuilder and run it
- /docs/kaizenflow/all.run_Mock2_pipeline_in_notebook.how_to_guide.ipynb
General intro about model simulation
Property of tilability, batch vs streaming
- /docs/dataflow/all.batch_and_streaming_mode_using_tiling.explanation.md
Time semantics, How clock is handled, Flows
- /docs/dataflow/all.timing_semantic_and_clocks.md
Phases of evaluation of Dags
- /docs/dataflow/all.train_and_predict_phases.explanation.md
Event study explanation
- /docs/dataflow/ck.event_study.explanation.md
Run a simulation of a DataFlow system
Overview, Basic concepts, Implementation details
- /docs/dataflow/ck.run_backtest.explanation.md
How to build a system, run research backtesting, Process results of backtesting, How to run replayed time simulation, Running experiments
- /docs/dataflow/ck.run_backtest.how_to_guide.md
Simulation output explanation
- /docs/dataflow/all.simulation_output.reference.md
Run a simulation sweep using a list of Config parameters
/docs/dataflow/ck.run_backtest.how_to_guide.md
TODO(gp): @grisha do we have anything here? It's like the stuff that Dan does
TODO(Grisha): @Dan, add a link to the doc here once it is ready
Post-process the results of a simulation
Build the Config dict, Load tile results, Compute portfolio bar metrics, Compute aggregate portfolio stats
/dataflow/model/notebooks/Master_research_backtest_analyzer.ipynb
TODO(Grisha): is showcasing an example with fake data enough? We could use Mock2 output
Analyze a DataFlow model in details
Build Config, Initialize ModelEvaluator and ModelPlotter
/dataflow/model/notebooks/Master_model_analyzer.ipynb
TODO(gp): @grisha what is the difference with the other?
TODO(Grisha): ask Paul about the notebook
Analyze features computed with DataFlow
Read features from a Parquet file and perform some analysis
- /dataflow/model/notebooks/Master_feature_analyzer.ipynb
TODO(gp): Grisha do we have a notebook that reads data from ImClient/MarketData and performs some analysis?
TODO(Grisha): create a tutorial notebook for analyzing features using some real (or close to real) data
Mix multiple DataFlow models
/dataflow/model/notebooks/Master_model_mixer.ipynb
TODO(gp): add more comments
Exporting PnL and trades
/dataflow/model/notebooks/Master_save_pnl_and_trades.ipynb
/docs/dataflow/ck.export_alpha_data.explanation.md
/docs/dataflow/ck.export_alpha_data.how_to_guide.md
/docs/dataflow/ck.load_alpha_and_trades.tutorial.ipynb
/docs/dataflow/ck.load_alpha_and_trades.tutorial.py
TODO(gp): add more comments

System

Learn how to build System
TODO(gp): @grisha what do we have for this?
TODO(Grisha): add a tutorial notebook that builds a System and explain the flow step-by-step
Configure a full system using a Config
Fill the SystemConfig, build all the components and run the System
/docs/dataflow/system/all.use_system_config.tutorial.ipynb
Create an ETL batch process using a System
/dataflow_amp/system/risk_model_estimation/run_rme_historical_simulation.py
TODO(Grisha): add an explanation doc and consider converting into a Jupyter notebook.
Create an ETL real-time process
DagBuilder, Dag, DagRunner
- /docs/dataflow/system/ck.build_real_time_dag.explanation.md
Build a DAG that runs in real time
- /dataflow_amp/system/realtime_etl_data_observer/scripts/run_realtime_etl_data_observer.py
- TODO(Grisha): consider converting into a Jupyter notebook.
Build a System that runs in real time
- /dataflow_amp/system/realtime_etl_data_observer/scripts/DataObserver_template.run_data_observer_simulation.py
- TODO(Grisha): consider converting into a Jupyter notebook.
Batch simulation a Mock2 System
Description of the forecast system, Description of the System, Run a backtest, Explanation of the backtesting script, Analyze the results
/docs/kaizenflow/all.run_Mock2_in_batch_mode.how_to_guide.md
Build the config, Load tiled results, Compute portfolio bar metrics, Compute aggregate portfolio stats
/docs/kaizenflow/all.analyze_Mock2_pipeline_simulation.how_to_guide.ipynb
Run an end-to-end timed simulation of Mock2 System
/docs/kaizenflow/all.run_end_to_end_Mock2_system.tutorial.md
/dataflow_amp/system/mock2/scripts/run_end_to_end_Mock2_system.py
TODO(gp): reorg the following files /oms/notebooks/Master_PnL_real_time_observer.ipynb /oms/notebooks/Master_bid_ask_execution_analysis.ipynb /oms/notebooks/Master_broker_debugging.ipynb /oms/notebooks/Master_broker_portfolio_reconciliation.ipynb /oms/notebooks/Master_c1b_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_dagger_reconciliation.ipynb /oms/notebooks/Master_execution_analysis.ipynb /oms/notebooks/Master_model_qualifier.ipynb /oms/notebooks/Master_multiday_system_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_research_stats.ipynb /oms/notebooks/Master_system_reconciliation_fast.ipynb /oms/notebooks/Master_system_reconciliation_slow.ipynb /oms/notebooks/Master_system_run_debugger.ipynb

Quant dev workflows

DataPull

Learn how to create a DataPull adapter for a new data source
/docs/datapull/all.dataset_onboarding_checklist.reference.md
/docs/datapull/ck.add_new_data_source.how_to_guide.md
How to update CCXT version
/docs/datapull/all.update_CCXT_version.how_to_guide.md
Download DataPull historical data
?
Onboard new exchange
/docs/datapull/ck.onboarding_new_exchange.md
Put a DataPull source in production with Airflow
/docs/datapull/ck.create_airflow_dag.tutorial.md
- TODO(gp): This file is missing
/docs/datapull/ck.develop_an_airflow_dag_for_production.explanation.md
- TODO(Juraj): See https://github.com/cryptokaizen/cmamp/issues/6444
Add QA for a DataPull source
Compare OHLCV bars
/im_v2/ccxt/data/client/notebooks/CmTask6537_One_off_comparison_of_Parquet_and_DB_OHLCV_data.ipynb
TODO(Grisha): review and generalize
How to import Bloomberg historical data
/docs/datapull/ck.process_historical_data_without_dataflow.tutorial.ipynb
How to import Bloomberg real-time data
TODO(*): add doc.
TODO(gp): Add docs /docs/datapull/ck.binance_trades_data_pipeline.explanation.md /docs/datapull/ck.database_schema_update.how_to_guide.md /docs/datapull/ck.datapull.explanation.md /docs/datapull/ck.relational_database.explanation.md

DataFlow

All software components
/docs/dataflow/ck.data_pipeline_architecture.reference.md

TradingOps workflows

Trading execution

MLOps workflows

Deploying

Model deployment in production
/docs/deploying/all.model_deployment.how_to_guide.md
Run production system
/docs/deploying/ck.run_production_system.how_to_guide.md
Model references
/docs/deploying/ck.supported_models.reference.md

Monitoring

Monitor system
/docs/monitoring/ck.monitor_system.how_to_guide.md
System reconciliation explanation
/docs/monitoring/ck.system_reconciliation.explanation.md
System Reconciliation How to guide
/docs/monitoring/ck.system_reconciliation.how_to_guide.md

DevOps workflows

The documentation outlines the architecture and deployment processes for the Kaizen Infrastructure, leveraging a blend of AWS services, Kubernetes for container orchestration, and traditional EC2 for virtualized computing. Emphasizing Infrastructure as Code (IaC), the project employs Terraform for provisioning and Ansible for configuration, ensuring a maintainable and replicable environment.

Overview

Development and deployment stages
/docs/infra/ck.development_stages.explanation.md
S3 Buckets overview
/docs/infra/ck.s3_buckets.explanation.md
This document provides an overview of the S3 buckets utilized by Kaizen Technologies.

Current set-up description

Document details steps for setting up Kaizen infrastructure
/docs/infra/ck.kaizen_infrastructure.reference.md
EC2 servers overview
/docs/infra/ck.ec2_servers.explanation.md

Set up infra

Document the implementation of Auto Scaling in the Kubernetes setup, focusing on the Cluster Autoscaler (CA), Horizontal Pod Autoscaler (HPA), and Auto Scaling Groups (ASG)
/docs/infra/all.auto_scaling.explanation.md
Compare AWS RDS instance types and storage performance
/docs/infra/all.rds.comparison.md
Setup S3 buckets with Terraform
/docs/infra/ck.set_up_s3_buckets.how_to_guide.md
AWS API Key rotation guide
/docs/infra/ck.aws_api_key_rotation.how_to_guide.md
Amazon Elastic File System (EFS) overview
/docs/infra/ck.aws_elastic_file_system.explanation.md
Client VPN endpoint creation with Terraform
/docs/infra/ck.create_client_vpn_endpoint.how_to_guide.md
Set-up AWS Client VPN
/docs/infra/ck.set_up_aws_client_vpn.how_to_guide.md
Utility server application set-up overview
/docs/infra/ck.set_up_utility_server_app.how_to_guide.md
Storing secret information (API keys, login credentials, access tokens etc.)
/docs/infra/ck.storing_secrets.explanation.md

Workflow

KaizenFlow workflow explanation

Work organization

Set-up

Documentation_meta

Quant workflows

DataPull

Universe

Dataset signature

DataFlow

Meta

DAG

System

Quant dev workflows

DataPull

DataFlow

TradingOps workflows

Trading execution

Intro

Components

Testing

Procedures

MLOps workflows

Deploying

Monitoring

DevOps workflows

Overview

Current set-up description

Set up infra

`DataPull`

`DataFlow`