Integrate Repos
How to integrate repos
Concepts
- We have two dirs storing two forks of the same repo
- Files are touched (e.g., added, modified, deleted) in each forks
- The most problematic files are the files that are modified in both forks
- Files that are added or deleted in one fork, should be added / deleted also in the other fork
- Often we can integrate "by directory", i.e., finding entire directories that were touched in one branch but not in the other
- In this case we can simply copy the entire dir from one repo to the other
- 
Other times we need to integrate each file 
- 
There are various interesting Git reference points: 
- The branch point for each fork, at which the integration branch was started
- The last integration point for each fork, at which the repos are the same, or at least aligned
Invariants for the integration workflows
- The user runs commands in an abs dir, e.g., /Users/saggese/src/{amp1,cmamp1}
- The user refers in the command line to dir_basename, which is the basename of the integration directories (e.g.,amp1,cmamp1,kaizenflow1)
- The src_dir_basenameis the one where the command is issued
- The dst_dir_basenameis assumed to be parallel to thesrc_dir_basename
- The dirs are then transformed in absolute dirs abs_src_dir
Integration process
Preparation
- 
Pull master 
- 
Crete the integration branches 
```bash
cd cmamp1 git checkout master i integrate_create_branch --dir-basename cmamp1 cd kaizenflow1 git checkout master i integrate_create_branch --dir-basename kaizenflow1 ```
- In one line
bash
  cd $HOME/cmamp1 && \
    git checkout master && \
    i integrate_create_branch --dir-basename cmamp1 && \
    cd $HOME/kaizenflow1 && \
    git checkout master && \
    i integrate_create_branch --dir-basename kaizenflow1
- Remove white spaces from both source and destination repos:
```bash
dev_scripts/clean_up_text_files.sh git commit -am "Remove white spaces"; git push
- One should still run the regressions out of paranoia since some golden outcomes can be changed> i gh_create_pr --no-draft > i gh_workflow_list ```
- Remove empty files:
```bash
find . -type f -empty -print | grep -v .git | grep -v init | grep -v ".log$" | grep -v ".txt$" | xargs git rm
`` - TODO(gp): Add this step todev_scripts/clean_up_text_files.sh`
- Align lib_tasks.py:
```bash
vimdiff ~/src/{cmamp1, kaizenflow1}/tasks.py; diff_to_vimdiff.py --dir1 ~/src/cmamp1 --dir2 ~/src/kaizenflow1 --subdir helpers ```
- Lint both dirs:
```bash
cd amp1 i lint --dir-name . --only-format cd cmamp1 i lint --dir-name . --only-format ```
or at least the files touched by both repos:
```bash
i integrate_files --file-direction only_files_in_src cat tmp.integrate_find_files_touched_since_last_integration.cmamp1.txt tmp.integrate_find_files_touched_since_last_integration.amp1.txt | sort | uniq >files.txt FILES=$(cat files.txt) i lint --only-format -f "$FILES" ``` - This should be done as a single separated PR to be reviewed separately
- Align lib_tasks.py: ```bashvimdiff ~/src/{amp1,cmamp1}/tasks.py; diff_to_vimdiff.py --dir1 ~/src/amp1 --dir2 ~/src/cmamp1 --subdir helpers ``` 
Integration
- Create the integration branches:
```bash
cd amp1 i integrate_create_branch --dir-basename amp1 i integrate_create_branch --dir-basename kaizenflow1 cd cmamp1 i integrate_create_branch --dir-basename cmamp1 ```
- Check what files were modified in each fork since the last integration:
```bash
i integrate_files --file-direction common_files i integrate_files --file-direction common_files --src-dir-basename cmamp1 --dst-dir-basename kaizenflow1
i integrate_files --file-direction only_files_in_src i integrate_files --file-direction only_files_in_dst ```
- Look for directory touched on only one branch:
  ```bashi integrate_files --file-direction common_files --mode "print_dirs" i integrate_files --file-direction only_files_in_src --mode "print_dirs" i integrate_files --file-direction only_files_in_dst --mode "print_dirs" ``` 
- If we find dirs that are touched in one branch but not in the other we can copy / merge without running risks
```bash
i integrate_diff_dirs --subdir $SUBDIR -c ```
- Check which change was made in each side since the last integration
```bash # Find the integration point:
i integrate_files --file-direction common_files ... last_integration_hash='813c7e763'
# Diff the changes in each side from the integration point:
i git_branch_diff_with -t hash -h 813c7e763 -f ... git difftool 813c7e763 ... ```
- Check which files are different between the dirs:
```bash
i integrate_diff_dirs ```
- Diff dir by dir
```bash
i integrate_diff_dirs --subdir dataflow/system ```
- Copy by dir
```bash
i integrate_diff_dirs --subdir market_data -c ```
- Sync a dir to handle moved files
- Assume that there is a dir where files were moved
  ```bashinvoke integrate_diff_dirs ... ... Only in .../cmamp1/.../alpha_numeric_data_snapshots: alpha ... Only in .../amp1/.../alpha_numeric_data_snapshots: latest ``` 
- You can accept one side with:
  ```bashinvoke integrate_rsync $(pwd)/marketing ``` 
- This corresponds to:
  ```bashrsync --delete -a -r {src_dir}/ {dst_dir}/ ``` 
Double-check the integration
- Check that the regressions are passing on GH
```bash
i gh_create_pr --no-draft ```
- Check the files that were changed in both branches (i.e., the "problematic ones") since the last integration and compare them to the base in each branch
```bash
cd amp1 i integrate_diff_overlapping_files --src-dir-basename "amp1" --dst-dir-basename "cmamp1" cd cmamp1 i integrate_diff_overlapping_files --src-dir-basename "cmamp1" --dst-dir-basename "amp1" ```
- Read the changes to Python files:
```bash
cd amp1 i git_branch_diff_with -t base --keep-extensions py cd cmamp1 i git_branch_diff_with -t base --keep-extensions py ```
- Quickly scan all the changes in the branch compared to the base:
  ```cd amp1 i git_branch_diff_with -t base cd cmamp1 i git_branch_diff_with -t base ``` 
Run tests
- Check amp/cmampusing GH actions:
```bash
i gh_create_pr --no-draft i pytest_collect_only i gh_workflow_list ```
- Check lemon dev1
```bash # Clean everything.
git reset --hard; git clean -fd; git pull; (cd amp; git reset --hard; git clean -fd; git pull)
i git_pull
AM_BRANCH=AmpTask1786_Integrate_20220916 (cd amp; gco $AM_BRANCH)
i pytest_collect_only i pytest_buildmeister
i git_branch_create -b $AM_BRANCH ```
- 
Check limeon dev4
- 
Check orangeon dev1
- 
Check dev_toolson dev1