Workflow
KaizenFlow workflow explanation
This document is a roadmap of most activities that Quants, Quant devs, and
DevOps can perform using KaizenFlow.
For each activity we point to the relevant resources (e.g., documents in docs,
notebooks) in the repo.
A high-level description of KaizenFlow is KaizenFlow White Paper
Work organization
- Issues workflow explained
amp/docs/work_organization/ck.issue_workflow.explanation.md - GitHub and ZenHub workflows explained
/docs/work_organization/all.use_github_and_zenhub.how_to_guide.md - TODO(Grisha): add more from
/docs/work_organization/.
Set-up
- TODO(gp): Add pointers to the docs we ask to read during the on-boarding
Documentation_meta
-
The dir
docs/documentation_metacontains documents about writing the documentation -
Conventions and suggestions on how to create diagrams in the documentation
-
/docs/documentation_meta/all.architecture_diagrams.explanation.md
-
A summary of how to create how-to, tutorial, explanations, reference according to the Diataxis framework
-
Writing documentation in Google Docs
-
Writing documentation in Markdown
-
Plotting in Latex
- /docs/documentation_meta/plotting_in_latex.how_to_guide.md
Quant workflows
The life of a Quant is spent between:
- Exploring the raw data
- Computing features
- Building models to predict output given features
- Assessing models
These activities are mapped in KaizenFlow as follows:
- Exploring the raw data
- This is performed by reading data using
DataPullin a notebook and performing exploratory analysis - Computing features
- This is performed by reading data using
DataPullin a notebook and creating someDataFlownodes - Building models to predict output given features
- This is performed by connecting
DataFlownodes into aDag - Assessing models
- This is performed by running data through a
Dagin a notebook or in a Python script and post-processing the results in an analysis notebook - Comparing models
- The parameters of a model are exposed through a
Configand then sweep overConfiglists
DataPull
- General intro to
DataPull - /docs/datapull/ck.datapull.explanation.md
- /docs/datapull/all.datapull_qa_flow.explanation.md
- /docs/datapull/all.datapull_client_stack.explanation.md
- /docs/datapull/all.datapull_sandbox.explanation.md
- /docs/datapull/ck.ccxt_exchange_timestamp_interpretation.reference.md
Universe
- Universe explanation
-
Analyze universe metadata
- /im_v2/common/universe/notebooks/Master_universe_analysis.ipynb
- /im_v2/ccxt/notebooks/Master_universe.ipynb
Dataset signature
- Organize and label datasets
- Helps to uniquely identify datasets across different sources, types, attributes etc.
- /docs/datapull/all.data_schema.explanation.md
-
Inspect RawData
- /im_v2/common/notebooks/Master_raw_data_gallery.ipynb
-
Convert data types
- /im_v2/common/data/transform/convert_csv_to_pq.py
-
Data download pipelines explanation
- /docs/datapull/ck.binance_bid_ask_data_pipeline.explanation.md
-
/docs/datapull/ck.binance_ohlcv_data_pipeline.explanation.md
-
Download data in bulk
- /im_v2/common/data/extract/download_bulk.py
- /im_v2/ccxt/data/extract/download_exchange_data_to_db.py
-
TODO(Juraj): technically this could be joined into one script and also generalized for more sources
-
Download data in real time over a given time interval
-
/im_v2/common/data/extract/periodic_download_exchange_data_to_db.py
-
Archive data
- Helps with optimizing data storage performance/costs by transferring older data from a storage like postgres to S3
- Suitable to apply to high frequency high volume realtime orderbook data
-
Resampling data
- /docs/datapull/all.datapull_derived_data.explanation.md
-
ImClient
-
MarketData
-
How to QA data
- /docs/datapull/ck.datapull_data_quality_assurance.reference.md
- /im_v2/ccxt/data/qa/notebooks/data_qa_bid_ask.ipynb
- /im_v2/ccxt/data/qa/notebooks/data_qa_ohlcv.ipynb
- /im_v2/common/data/qa/notebooks/cross_dataset_qa_ohlcv.ipynb
- /im_v2/common/data/qa/notebooks/cross_dataset_qa_bid_ask.ipynb
- /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb
- /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb
-
How to load
Bloombergdata - /im_v2/common/notebooks/CmTask5424_market_data.ipynb
-
TODO: Generalize the name and make it Master_
-
Kibot guide
- /docs/datapull/ck.kibot_data.explanation.md
-
Interactive broker guide
- /docs/datapull/ck.run_ib_connect.how_to_guide.md
-
How to run IM app /docs/datapull/ck.run_im_app.how_to_guide.md
-
TODO(gp): Reorg /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.old.ipynb /research_amp/cc/notebooks/Master_machine_learning.ipynb /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.ipynb /research_amp/cc/notebooks/Master_crypto_analysis.ipynb /research_amp/cc/notebooks/Master_model_prediction_analyzer.ipynb /research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.ipynb /im/app/notebooks/Master_IM_DB.ipynb /im/ib/metadata/extract/notebooks/Master_analyze_ib_metadata_crawler.ipynb
DataFlow
Meta
- Best practices for Quant research
- /docs/dataflow/ck.research_methodology.explanation.md
-
TODO(Grisha):
ck.*->all.*? -
A description of all the available generic notebooks with a short description
- /docs/dataflow/ck.master_notebooks.reference.md
- TODO(Grisha): does this belong to
DataFlow? - TODO(Grisha):
ck.master_notebooks...->all.master_notebooks?
DAG
- General concepts of
DataFlow - Introduction to KaizenFlow, DAG nodes, DataFrame as unit of computation, DAG execution
- DataFlow data format
- Different views of System components, Architecture
- Conventions for representing time series
-
Explanation of how to debug a DAG
-
Learn how to build a
DAG - Build a
DAGwith two nodes - Build a more complex
DAGimplementing a simple risk model -
Best practices to follow while building
DAG -
Learn how to run a
DAG - Overview, DagBuilder, Dag, DagRunner
- Configure a simple risk model, build a DAG, generate data and connect data source to the DAG, run the DAG
-
Build a DAG from a Mock2 DagBuilder and run it
-
General intro about model simulation
- Property of tilability, batch vs streaming
- Time semantics, How clock is handled, Flows
- Phases of evaluation of
Dags -
Event study explanation
-
Run a simulation of a
DataFlowsystem - Overview, Basic concepts, Implementation details
- How to build a system, run research backtesting, Process results of backtesting, How to run replayed time simulation, Running experiments
-
Simulation output explanation
-
Run a simulation sweep using a list of
Configparameters - /docs/dataflow/ck.run_backtest.how_to_guide.md
- TODO(gp): @grisha do we have anything here? It's like the stuff that Dan does
-
TODO(Grisha): @Dan, add a link to the doc here once it is ready
-
Post-process the results of a simulation
- Build the Config dict, Load tile results, Compute portfolio bar metrics, Compute aggregate portfolio stats
- /dataflow/model/notebooks/Master_research_backtest_analyzer.ipynb
-
TODO(Grisha): is showcasing an example with fake data enough? We could use Mock2 output
-
Analyze a
DataFlowmodel in details - Build Config, Initialize ModelEvaluator and ModelPlotter
- /dataflow/model/notebooks/Master_model_analyzer.ipynb
- TODO(gp): @grisha what is the difference with the other?
-
TODO(Grisha): ask Paul about the notebook
-
Analyze features computed with
DataFlow - Read features from a Parquet file and perform some analysis
- TODO(gp): Grisha do we have a notebook that reads data from ImClient/MarketData and performs some analysis?
-
TODO(Grisha): create a tutorial notebook for analyzing features using some real (or close to real) data
-
Mix multiple
DataFlowmodels - /dataflow/model/notebooks/Master_model_mixer.ipynb
-
TODO(gp): add more comments
-
Exporting PnL and trades
- /dataflow/model/notebooks/Master_save_pnl_and_trades.ipynb
- /docs/dataflow/ck.export_alpha_data.explanation.md
- /docs/dataflow/ck.export_alpha_data.how_to_guide.md
- /docs/dataflow/ck.load_alpha_and_trades.tutorial.ipynb
- /docs/dataflow/ck.load_alpha_and_trades.tutorial.py
- TODO(gp): add more comments
System
- Learn how to build
System - TODO(gp): @grisha what do we have for this?
-
TODO(Grisha): add a tutorial notebook that builds a System and explain the flow step-by-step
-
Configure a full system using a
Config - Fill the
SystemConfig, build all the components and run theSystem -
Create an ETL batch process using a
System - /dataflow_amp/system/risk_model_estimation/run_rme_historical_simulation.py
-
TODO(Grisha): add an explanation doc and consider converting into a Jupyter notebook.
-
Create an ETL real-time process
- DagBuilder, Dag, DagRunner
- Build a DAG that runs in real time
- /dataflow_amp/system/realtime_etl_data_observer/scripts/run_realtime_etl_data_observer.py
- TODO(Grisha): consider converting into a Jupyter notebook.
-
Build a
Systemthat runs in real time- /dataflow_amp/system/realtime_etl_data_observer/scripts/DataObserver_template.run_data_observer_simulation.py
- TODO(Grisha): consider converting into a Jupyter notebook.
-
Batch simulation a Mock2
System - Description of the forecast system, Description of the System, Run a backtest, Explanation of the backtesting script, Analyze the results
- /docs/kaizenflow/all.run_Mock2_in_batch_mode.how_to_guide.md
- Build the config, Load tiled results, Compute portfolio bar metrics, Compute aggregate portfolio stats
-
/docs/kaizenflow/all.analyze_Mock2_pipeline_simulation.how_to_guide.ipynb
-
Run an end-to-end timed simulation of
Mock2System - /docs/kaizenflow/all.run_end_to_end_Mock2_system.tutorial.md
-
/dataflow_amp/system/mock2/scripts/run_end_to_end_Mock2_system.py
-
TODO(gp): reorg the following files /oms/notebooks/Master_PnL_real_time_observer.ipynb /oms/notebooks/Master_bid_ask_execution_analysis.ipynb /oms/notebooks/Master_broker_debugging.ipynb /oms/notebooks/Master_broker_portfolio_reconciliation.ipynb /oms/notebooks/Master_c1b_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_dagger_reconciliation.ipynb /oms/notebooks/Master_execution_analysis.ipynb /oms/notebooks/Master_model_qualifier.ipynb /oms/notebooks/Master_multiday_system_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_research_stats.ipynb /oms/notebooks/Master_system_reconciliation_fast.ipynb /oms/notebooks/Master_system_reconciliation_slow.ipynb /oms/notebooks/Master_system_run_debugger.ipynb
Quant dev workflows
DataPull
- Learn how to create a
DataPulladapter for a new data source - /docs/datapull/all.dataset_onboarding_checklist.reference.md
-
How to update CCXT version
-
Download
DataPullhistorical data -
?
-
Onboard new exchange
-
Put a
DataPullsource in production with Airflow - /docs/datapull/ck.create_airflow_dag.tutorial.md
- TODO(gp): This file is missing
-
/docs/datapull/ck.develop_an_airflow_dag_for_production.explanation.md
- TODO(Juraj): See https://github.com/cryptokaizen/cmamp/issues/6444
-
Add QA for a
DataPullsource -
Compare OHLCV bars
- /im_v2/ccxt/data/client/notebooks/CmTask6537_One_off_comparison_of_Parquet_and_DB_OHLCV_data.ipynb
-
TODO(Grisha): review and generalize
-
How to import
Bloomberghistorical data -
/docs/datapull/ck.process_historical_data_without_dataflow.tutorial.ipynb
-
How to import
Bloombergreal-time data -
TODO(*): add doc.
-
TODO(gp): Add docs /docs/datapull/ck.binance_trades_data_pipeline.explanation.md /docs/datapull/ck.database_schema_update.how_to_guide.md /docs/datapull/ck.datapull.explanation.md /docs/datapull/ck.relational_database.explanation.md
DataFlow
- All software components
- /docs/dataflow/ck.data_pipeline_architecture.reference.md
TradingOps workflows
Trading execution
Intro
- Binance trading terms
- /docs/oms/broker/ck.binance_terms.reference.md
Components
- OMS explanation
- /docs/oms/ck.oms.explanation.md
- CCXT log structure
- /docs/oms/broker/ck.ccxt_broker_logs_schema.reference.md
Testing
- Replayed CCXT exchange explanation
- /docs/oms/broker/ck.replayed_ccxt_exchange.explanation.md
- How to generate broker test data
- /docs/oms/broker/ck.generate_broker_test_data.how_to_guide.md
Procedures
- Trading procedures (e.g., trading account information)
- /docs/trading_ops/ck.trading.how_to_guide.md
- How to run broker only/full system experiments
- /docs/trading_ops/ck.trade_execution_experiment.how_to_guide.md
- Execution notebooks explanation
- /docs/oms/broker/ck.execution_notebooks.explanation.md
MLOps workflows
- Encrypt a model
- /docs/dataflow/ck.release_encrypted_models.explanation.md
- /docs/dataflow/ck.release_encrypted_models.how_to_guide.md
Deploying
- Model deployment in production
- /docs/deploying/all.model_deployment.how_to_guide.md
- Run production system
- /docs/deploying/ck.run_production_system.how_to_guide.md
- Model references
- /docs/deploying/ck.supported_models.reference.md
Monitoring
- Monitor system
- /docs/monitoring/ck.monitor_system.how_to_guide.md
- System reconciliation explanation
- /docs/monitoring/ck.system_reconciliation.explanation.md
- System Reconciliation How to guide
- /docs/monitoring/ck.system_reconciliation.how_to_guide.md
DevOps workflows
The documentation outlines the architecture and deployment processes for the Kaizen Infrastructure, leveraging a blend of AWS services, Kubernetes for container orchestration, and traditional EC2 for virtualized computing. Emphasizing Infrastructure as Code (IaC), the project employs Terraform for provisioning and Ansible for configuration, ensuring a maintainable and replicable environment.
Overview
- Development and deployment stages
-
S3 Buckets overview
- /docs/infra/ck.s3_buckets.explanation.md
- This document provides an overview of the S3 buckets utilized by Kaizen Technologies.
Current set-up description
- Document details steps for setting up Kaizen infrastructure
-
EC2 servers overview
- /docs/infra/ck.ec2_servers.explanation.md
Set up infra
- Document the implementation of Auto Scaling in the Kubernetes setup, focusing on the Cluster Autoscaler (CA), Horizontal Pod Autoscaler (HPA), and Auto Scaling Groups (ASG)
-
Compare AWS RDS instance types and storage performance
-
Setup S3 buckets with Terraform
-
AWS API Key rotation guide
-
Amazon Elastic File System (EFS) overview
-
Client VPN endpoint creation with Terraform
-
Set-up AWS Client VPN
-
Utility server application set-up overview
-
Storing secret information (API keys, login credentials, access tokens etc.)
- /docs/infra/ck.storing_secrets.explanation.md