Workflow
KaizenFlow workflow explanation
This document is a roadmap of most activities that Quants, Quant devs, and
DevOps can perform using KaizenFlow
.
For each activity we point to the relevant resources (e.g., documents in docs
,
notebooks) in the repo.
A high-level description of KaizenFlow is KaizenFlow White Paper
Work organization
- Issues workflow explained
amp/docs/work_organization/ck.issue_workflow.explanation.md
- GitHub and ZenHub workflows explained
/docs/work_organization/all.use_github_and_zenhub.how_to_guide.md
- TODO(Grisha): add more from
/docs/work_organization/
.
Set-up
- TODO(gp): Add pointers to the docs we ask to read during the on-boarding
Documentation_meta
-
The dir
docs/documentation_meta
contains documents about writing the documentation -
Conventions and suggestions on how to create diagrams in the documentation
-
/docs/documentation_meta/all.architecture_diagrams.explanation.md
-
A summary of how to create how-to, tutorial, explanations, reference according to the Diataxis framework
-
Writing documentation in Google Docs
-
Writing documentation in Markdown
-
Plotting in Latex
- /docs/documentation_meta/plotting_in_latex.how_to_guide.md
Quant workflows
The life of a Quant is spent between:
- Exploring the raw data
- Computing features
- Building models to predict output given features
- Assessing models
These activities are mapped in KaizenFlow
as follows:
- Exploring the raw data
- This is performed by reading data using
DataPull
in a notebook and performing exploratory analysis - Computing features
- This is performed by reading data using
DataPull
in a notebook and creating someDataFlow
nodes - Building models to predict output given features
- This is performed by connecting
DataFlow
nodes into aDag
- Assessing models
- This is performed by running data through a
Dag
in a notebook or in a Python script and post-processing the results in an analysis notebook - Comparing models
- The parameters of a model are exposed through a
Config
and then sweep overConfig
lists
DataPull
- General intro to
DataPull
- /docs/datapull/ck.datapull.explanation.md
- /docs/datapull/all.datapull_qa_flow.explanation.md
- /docs/datapull/all.datapull_client_stack.explanation.md
- /docs/datapull/all.datapull_sandbox.explanation.md
- /docs/datapull/ck.ccxt_exchange_timestamp_interpretation.reference.md
Universe
- Universe explanation
-
Analyze universe metadata
- /im_v2/common/universe/notebooks/Master_universe_analysis.ipynb
- /im_v2/ccxt/notebooks/Master_universe.ipynb
Dataset signature
- Organize and label datasets
- Helps to uniquely identify datasets across different sources, types, attributes etc.
- /docs/datapull/all.data_schema.explanation.md
-
Inspect RawData
- /im_v2/common/notebooks/Master_raw_data_gallery.ipynb
-
Convert data types
- /im_v2/common/data/transform/convert_csv_to_pq.py
-
Data download pipelines explanation
- /docs/datapull/ck.binance_bid_ask_data_pipeline.explanation.md
-
/docs/datapull/ck.binance_ohlcv_data_pipeline.explanation.md
-
Download data in bulk
- /im_v2/common/data/extract/download_bulk.py
- /im_v2/ccxt/data/extract/download_exchange_data_to_db.py
-
TODO(Juraj): technically this could be joined into one script and also generalized for more sources
-
Download data in real time over a given time interval
-
/im_v2/common/data/extract/periodic_download_exchange_data_to_db.py
-
Archive data
- Helps with optimizing data storage performance/costs by transferring older data from a storage like postgres to S3
- Suitable to apply to high frequency high volume realtime orderbook data
-
Resampling data
- /docs/datapull/all.datapull_derived_data.explanation.md
-
ImClient
-
MarketData
-
How to QA data
- /docs/datapull/ck.datapull_data_quality_assurance.reference.md
- /im_v2/ccxt/data/qa/notebooks/data_qa_bid_ask.ipynb
- /im_v2/ccxt/data/qa/notebooks/data_qa_ohlcv.ipynb
- /im_v2/common/data/qa/notebooks/cross_dataset_qa_ohlcv.ipynb
- /im_v2/common/data/qa/notebooks/cross_dataset_qa_bid_ask.ipynb
- /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb
- /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb
-
How to load
Bloomberg
data - /im_v2/common/notebooks/CmTask5424_market_data.ipynb
-
TODO: Generalize the name and make it Master_
-
Kibot guide
- /docs/datapull/ck.kibot_data.explanation.md
-
Interactive broker guide
- /docs/datapull/ck.run_ib_connect.how_to_guide.md
-
How to run IM app /docs/datapull/ck.run_im_app.how_to_guide.md
-
TODO(gp): Reorg /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.old.ipynb /research_amp/cc/notebooks/Master_machine_learning.ipynb /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.ipynb /research_amp/cc/notebooks/Master_crypto_analysis.ipynb /research_amp/cc/notebooks/Master_model_prediction_analyzer.ipynb /research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.ipynb /im/app/notebooks/Master_IM_DB.ipynb /im/ib/metadata/extract/notebooks/Master_analyze_ib_metadata_crawler.ipynb
DataFlow
Meta
- Best practices for Quant research
- /docs/dataflow/ck.research_methodology.explanation.md
-
TODO(Grisha):
ck.*
->all.*
? -
A description of all the available generic notebooks with a short description
- /docs/dataflow/ck.master_notebooks.reference.md
- TODO(Grisha): does this belong to
DataFlow
? - TODO(Grisha):
ck.master_notebooks...
->all.master_notebooks
?
DAG
- General concepts of
DataFlow
- Introduction to KaizenFlow, DAG nodes, DataFrame as unit of computation, DAG execution
- DataFlow data format
- Different views of System components, Architecture
- Conventions for representing time series
-
Explanation of how to debug a DAG
-
Learn how to build a
DAG
- Build a
DAG
with two nodes - Build a more complex
DAG
implementing a simple risk model -
Best practices to follow while building
DAG
-
Learn how to run a
DAG
- Overview, DagBuilder, Dag, DagRunner
- Configure a simple risk model, build a DAG, generate data and connect data source to the DAG, run the DAG
-
Build a DAG from a Mock2 DagBuilder and run it
-
General intro about model simulation
- Property of tilability, batch vs streaming
- Time semantics, How clock is handled, Flows
- Phases of evaluation of
Dag
s -
Event study explanation
-
Run a simulation of a
DataFlow
system - Overview, Basic concepts, Implementation details
- How to build a system, run research backtesting, Process results of backtesting, How to run replayed time simulation, Running experiments
-
Simulation output explanation
-
Run a simulation sweep using a list of
Config
parameters - /docs/dataflow/ck.run_backtest.how_to_guide.md
- TODO(gp): @grisha do we have anything here? It's like the stuff that Dan does
-
TODO(Grisha): @Dan, add a link to the doc here once it is ready
-
Post-process the results of a simulation
- Build the Config dict, Load tile results, Compute portfolio bar metrics, Compute aggregate portfolio stats
- /dataflow/model/notebooks/Master_research_backtest_analyzer.ipynb
-
TODO(Grisha): is showcasing an example with fake data enough? We could use Mock2 output
-
Analyze a
DataFlow
model in details - Build Config, Initialize ModelEvaluator and ModelPlotter
- /dataflow/model/notebooks/Master_model_analyzer.ipynb
- TODO(gp): @grisha what is the difference with the other?
-
TODO(Grisha): ask Paul about the notebook
-
Analyze features computed with
DataFlow
- Read features from a Parquet file and perform some analysis
- TODO(gp): Grisha do we have a notebook that reads data from ImClient/MarketData and performs some analysis?
-
TODO(Grisha): create a tutorial notebook for analyzing features using some real (or close to real) data
-
Mix multiple
DataFlow
models - /dataflow/model/notebooks/Master_model_mixer.ipynb
-
TODO(gp): add more comments
-
Exporting PnL and trades
- /dataflow/model/notebooks/Master_save_pnl_and_trades.ipynb
- /docs/dataflow/ck.export_alpha_data.explanation.md
- /docs/dataflow/ck.export_alpha_data.how_to_guide.md
- /docs/dataflow/ck.load_alpha_and_trades.tutorial.ipynb
- /docs/dataflow/ck.load_alpha_and_trades.tutorial.py
- TODO(gp): add more comments
System
- Learn how to build
System
- TODO(gp): @grisha what do we have for this?
-
TODO(Grisha): add a tutorial notebook that builds a System and explain the flow step-by-step
-
Configure a full system using a
Config
- Fill the
SystemConfig
, build all the components and run theSystem
-
Create an ETL batch process using a
System
- /dataflow_amp/system/risk_model_estimation/run_rme_historical_simulation.py
-
TODO(Grisha): add an explanation doc and consider converting into a Jupyter notebook.
-
Create an ETL real-time process
- DagBuilder, Dag, DagRunner
- Build a DAG that runs in real time
- /dataflow_amp/system/realtime_etl_data_observer/scripts/run_realtime_etl_data_observer.py
- TODO(Grisha): consider converting into a Jupyter notebook.
-
Build a
System
that runs in real time- /dataflow_amp/system/realtime_etl_data_observer/scripts/DataObserver_template.run_data_observer_simulation.py
- TODO(Grisha): consider converting into a Jupyter notebook.
-
Batch simulation a Mock2
System
- Description of the forecast system, Description of the System, Run a backtest, Explanation of the backtesting script, Analyze the results
- /docs/kaizenflow/all.run_Mock2_in_batch_mode.how_to_guide.md
- Build the config, Load tiled results, Compute portfolio bar metrics, Compute aggregate portfolio stats
-
/docs/kaizenflow/all.analyze_Mock2_pipeline_simulation.how_to_guide.ipynb
-
Run an end-to-end timed simulation of
Mock2
System
- /docs/kaizenflow/all.run_end_to_end_Mock2_system.tutorial.md
-
/dataflow_amp/system/mock2/scripts/run_end_to_end_Mock2_system.py
-
TODO(gp): reorg the following files /oms/notebooks/Master_PnL_real_time_observer.ipynb /oms/notebooks/Master_bid_ask_execution_analysis.ipynb /oms/notebooks/Master_broker_debugging.ipynb /oms/notebooks/Master_broker_portfolio_reconciliation.ipynb /oms/notebooks/Master_c1b_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_dagger_reconciliation.ipynb /oms/notebooks/Master_execution_analysis.ipynb /oms/notebooks/Master_model_qualifier.ipynb /oms/notebooks/Master_multiday_system_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_research_stats.ipynb /oms/notebooks/Master_system_reconciliation_fast.ipynb /oms/notebooks/Master_system_reconciliation_slow.ipynb /oms/notebooks/Master_system_run_debugger.ipynb
Quant dev workflows
DataPull
- Learn how to create a
DataPull
adapter for a new data source - /docs/datapull/all.dataset_onboarding_checklist.reference.md
-
How to update CCXT version
-
Download
DataPull
historical data -
?
-
Onboard new exchange
-
Put a
DataPull
source in production with Airflow - /docs/datapull/ck.create_airflow_dag.tutorial.md
- TODO(gp): This file is missing
-
/docs/datapull/ck.develop_an_airflow_dag_for_production.explanation.md
- TODO(Juraj): See https://github.com/cryptokaizen/cmamp/issues/6444
-
Add QA for a
DataPull
source -
Compare OHLCV bars
- /im_v2/ccxt/data/client/notebooks/CmTask6537_One_off_comparison_of_Parquet_and_DB_OHLCV_data.ipynb
-
TODO(Grisha): review and generalize
-
How to import
Bloomberg
historical data -
/docs/datapull/ck.process_historical_data_without_dataflow.tutorial.ipynb
-
How to import
Bloomberg
real-time data -
TODO(*): add doc.
-
TODO(gp): Add docs /docs/datapull/ck.binance_trades_data_pipeline.explanation.md /docs/datapull/ck.database_schema_update.how_to_guide.md /docs/datapull/ck.datapull.explanation.md /docs/datapull/ck.relational_database.explanation.md
DataFlow
- All software components
- /docs/dataflow/ck.data_pipeline_architecture.reference.md
TradingOps workflows
Trading execution
Intro
- Binance trading terms
- /docs/oms/broker/ck.binance_terms.reference.md
Components
- OMS explanation
- /docs/oms/ck.oms.explanation.md
- CCXT log structure
- /docs/oms/broker/ck.ccxt_broker_logs_schema.reference.md
Testing
- Replayed CCXT exchange explanation
- /docs/oms/broker/ck.replayed_ccxt_exchange.explanation.md
- How to generate broker test data
- /docs/oms/broker/ck.generate_broker_test_data.how_to_guide.md
Procedures
- Trading procedures (e.g., trading account information)
- /docs/trading_ops/ck.trading.how_to_guide.md
- How to run broker only/full system experiments
- /docs/trading_ops/ck.trade_execution_experiment.how_to_guide.md
- Execution notebooks explanation
- /docs/oms/broker/ck.execution_notebooks.explanation.md
MLOps workflows
- Encrypt a model
- /docs/dataflow/ck.release_encrypted_models.explanation.md
- /docs/dataflow/ck.release_encrypted_models.how_to_guide.md
Deploying
- Model deployment in production
- /docs/deploying/all.model_deployment.how_to_guide.md
- Run production system
- /docs/deploying/ck.run_production_system.how_to_guide.md
- Model references
- /docs/deploying/ck.supported_models.reference.md
Monitoring
- Monitor system
- /docs/monitoring/ck.monitor_system.how_to_guide.md
- System reconciliation explanation
- /docs/monitoring/ck.system_reconciliation.explanation.md
- System Reconciliation How to guide
- /docs/monitoring/ck.system_reconciliation.how_to_guide.md
DevOps workflows
The documentation outlines the architecture and deployment processes for the Kaizen Infrastructure, leveraging a blend of AWS services, Kubernetes for container orchestration, and traditional EC2 for virtualized computing. Emphasizing Infrastructure as Code (IaC), the project employs Terraform for provisioning and Ansible for configuration, ensuring a maintainable and replicable environment.
Overview
- Development and deployment stages
-
S3 Buckets overview
- /docs/infra/ck.s3_buckets.explanation.md
- This document provides an overview of the S3 buckets utilized by Kaizen Technologies.
Current set-up description
- Document details steps for setting up Kaizen infrastructure
-
EC2 servers overview
- /docs/infra/ck.ec2_servers.explanation.md
Set up infra
- Document the implementation of Auto Scaling in the Kubernetes setup, focusing on the Cluster Autoscaler (CA), Horizontal Pod Autoscaler (HPA), and Auto Scaling Groups (ASG)
-
Compare AWS RDS instance types and storage performance
-
Setup S3 buckets with Terraform
-
AWS API Key rotation guide
-
Amazon Elastic File System (EFS) overview
-
Client VPN endpoint creation with Terraform
-
Set-up AWS Client VPN
-
Utility server application set-up overview
-
Storing secret information (API keys, login credentials, access tokens etc.)
- /docs/infra/ck.storing_secrets.explanation.md