Conda Environment Obsolete
THIS IS OBSOLETE AFTER DOCKER DEV CONTAINER
Conda flow
(optional) Install anaconda
- For the AWS machine there is already a central conda, so there is no need for users to install
- For a laptop you need to install it yourself
- You need anaconda3
Configure anaconda
- Configure anaconda for your shell using:
```bash
conda init bash ```
- Anaconda3 adds a snippet of code in your
.bashrc
Create conda environment
- This is needed to install all the packages that are required for development:
```bash
cd $DST_DIR ./dev_scripts/create_conda.develop.sh ```
-
This script takes 5 mins to run
-
You need to create an environment for every server you use (e.g., for the AWS server
research
, for your laptop) - You can reuse the same environment for multiple Git clients
Check conda environment
- Check that your conda environment exists:
```bash
conda info --envs # conda environments: # base /anaconda3 develop * /home/
/.conda/envs/develop ```
Configure conda environment
- Every time you open a shell you need to activate the development environment run:
```bash
source dev_scripts/setenv_....sh ```
- This script:
- Activates the conda environment
- Sets environment variables
- Makes sure things are working properly
Delete / recreate environment
Overwrite a conda environment with create_conda.py
- You can use the option
--delete_env_if_exists
to overwrite a conda env, creating it from scratch -
This is the typical approach
-
There are some pre-packaged command lines to create the standard environments, e.g.,
./dev_scripts/create_conda.develop.sh
-
The
create_conda.py
help as some useful examples of command lines, see the help: ```bashcreate_conda.py -h ```
Manually delete a conda environment
- You can delete a conda environment by simply deleting the corresponding directory
- The conda command tries to be smart removing the packages and leaving the dir, but IMO it doesn't always work
- You look at the environments with:
```bash
conda info --envs # conda environments: # ... develop * /Users/
/.conda/envs/develop ... ``` - Then you can delete with:
```bash
rm -rf /Users/
/.conda/envs/develop ``` - It's a good idea to move it so you can resume it if something goes wrong:
```bash
mv /Users/
/.conda/envs/develop > /Users/ /.conda/envs/develop.OLD ``` - Note that
develop.OLD
might not work anymore since all the links are broken by the move
To delete the entire conda installation (advanced users)
- This is a dangerous operation, since it deletes the executable
conda
- You want to do this only when your environment is screwed up: a more expert team member can help you diagnose it
- If you want to delete your conda installation, find the base env
```bash
conda info --envs base /anaconda3 ... ```
- Run
rm -rf /anaconda3
- A good idea is to move it so you can resume the state
Update anaconda
- To update anaconda (i.e., the framework that manages conda packages and
conda
executable)
```bash
conda activate base # Remove index cache, lock files, tarballs, unused cache packages, and source # cache. conda clean --all conda update conda conda update anaconda conda -V conda 4.7.12 ```
- You can try to activate one environment
```bash
conda activate amp_develop which python /Users/saggese/.conda/envs/amp_develop/bin/python ```
Configure user credentials
-
For now this topic is obsolete. All development with AWS is running on a server side (or locally) in a docker container. Here you can find the documentation the link
-
Update the user credential files in
amp/helpers/user_credentials.py
- Commit this so all your clients are configured
- Typically you can just copy-paste a portion of the configuration of another user
Be patient
- The
create_conda.py
flow is designed to make our projects portable across: - Platforms (e.g., macOS, Linux)
- Different installation of OSes (e.g., GP's laptop vs Paul's laptop) with all the peculiar ways we install and manage servers and laptops
- Different versions of conda
- Different versions of python 3.x
-
Different versions of python packages
-
There is no easy way to make sure that
create_conda.py
works for everybody - We can only make sure that Jenkins builds the environment correctly in its set-up by following the process described above
- Try to follow the steps one by one, using a clean shell, cutting and pasting commands
- If you hit a problem, be patient, ping GP / Paul, and we will extend the script to handle the quirks of your set-up
Conda bloat
-
"Conda bloat" refers to the situation when there are more packages in the conda recipe than what strictly needed to allow us to make progress.
-
The effects of conda bloat are:
- Longer time for
conda
to solve the dependencies, download the packages, and update the environment - Additional dependencies forcing
conda
to install older packages that can create subtle issues in our environment (we want our code always to be forward compatible, but we can't afford our code to be back compatible) - People starting pinning down packages to get deterministic library behaviors
-
In the worst case, not being able to install the environment at all due to conflicting
conda
requirements -
On the one side, we want to minimize "conda bloat".
- On the other side, we want to be able to experiment with packages.
Minimize conda bloat
- To minimize conda bloat, our process consists of adding a package to the conda recipe when a new package is actually needed by code and to run unit tests
- Having code using the library and running unit tests for that code should happen together
- The rule is "Important code should be tested, and if code is not important it should not be in the repo at all"
- Thus a corollary is that all code in the repo should be tested
Conda environment lifecycle
Experimental conda environment
-
On the other side we want to be free to experiment with a package that can save us tons of development time.
-
The proposed approach is:
- Install the package in whatever way you want on top of your standard conda
environment (
conda
,pip
, whatever) - Be aware that some packages are not easy to add to our production system (e.g. packages written by students trying to graduate, non-coding savvy PhDs and professors)
- Create a new local experimental conda environment with:
create_conda.py --env_name develop.nlp
-
If you want to reproduce your environment or share it (e.g., among the NLP team) you can branch
dev_scripts/install/requirements/develop.yaml
and modify it, then you can create an environment programmatically with:create_conda.py --env_name develop.nlp --req_file dev_scripts/install/requirements/develop.nlp.yaml
-
We can make this process more automated by generalizing the scripts we already have.
Releasing a new conda environment
- Once the new package is added to the official conda environment, we should:
- Test the new conda environment locally, by creating a fresh environment and running all the tests
- Compare the list of packages installed before / after the change:
git diff amp/dev_scripts/install/conda_envs/amp_develop.saggese.Darwin.gpmac.lan.txt
- E.g., Check whether installed pandas went from 0.25 to 0.14.1
- Use the
dev
build to make sure Jenkins is happy with it, especially when we develop on a different OS than Linux -
Send an email to the team asking them to recreate the environment
-
Typically GP takes care of getting all this fun stuff to work, but you are welcome to try locally to minimize surprises.
Conda maintenance (only for admins)
Updating conda itself
- To update conda itself you can run:
> conda activate base
> conda --version
3.7.1
> conda update anaconda
> conda --version
3.8.0
Cleaning conda packages
- One can clean up the entire cache of packages with:
> conda clean --yes --all
- This operation:
- Affects the conda system for all the users
- Is just about deleting cached artifacts so it should not have destructive effects