This short workshop aims to show familiarize participants with conda
environments. For many new and existing Python programmers, Anaconda/Miniconda has become the default way to install the language and manage third-party libraries. While the graphical user interface (Navigator) offers plenty of features, it pays off to learn to use the command-line, specially for those working with version control systems, such as git, or in CLI environments, like compute clusters.
Ideally, you will leave this workshop loving conda
environments. More seriously, our main goal is to demonstrate the advantages of using different conda
environments for separate projects. In addition, by the end of the workshop you will be able to install, upgrade, and remove packages using the command-line interface of conda
.
There are no hard requirements to follow this workshop. It helps if you are familiar with command-line interfaces, but we will explain all the commands as they are introduced.
To follow this workshop, please install any version of Anaconda or Miniconda. We highly recommend installing one that supports Python 3.x, as Python 2 is no longer supported since January 2020. When prompted to let the installer initialize conda
, click on yes. This setting will automatically enable conda
on all terminal sessions by default.
We will also make use of the terminal. Terminal choices depend on your operative system. Mac OS and Linux users are lucky in that they already have one pre-installed. On Mac OS, you might want to look at iTerm2 for a nicer terminal experience, but the default Terminal application is good enough for this tutorial. For Windows users, you can simply use the Anaconda Prompt.
After installing Anaconda or Miniconda and opening a terminal window, you should be greeted with a prompt that looks like this, minus the user and host names:
(base) joaor@desktop $
This prefix - (base)
- is the way conda
tells us that we are in the base environment. So what's an environment? An environment is a folder that contains a specific set of packages you have installed. In other words, an environment is an isolated installation of Python with some extra libraries installed. Since scientific packages often depend on very specific version of certain libraries, using dedicated environments helps minimize clashes between these dependencies. More importantly, dedicated environments makes your life easier if installing a package breaks ten others - worst case scenario, you just delete and recreate the environment. Without environments you'd have to reinstall Anaconda/Miniconda. So, as a rule of thumb, you never want to install packages in the base
environment unless you have a really good reason to do so.
The major downside of using environments is that you will need some additional disk space. Each environment should include its own Python interpreter, so if you have twenty environments, you will have twenty Python installations somewhere on your hard drive. Thankfully, computers today come with several hundred GBs of disk space, or even TBs, so this is really not an issue in the vast majority of cases.
Now that we understand what conda
environments are, let's create our first. Type the following in your terminal:
(base) joaor@desktop $ conda create --name myenv python=3.8
This command will create a new environment called myenv
with Python 3.8 as its default interpreter. Importantly, environment names cannot have spaces. The python=3.8
part of the command instructs conda
to install the python package, at version 3.8. This version specification is particularly useful. If you'd want to run a library that is only supported in Python 2.7, you would create an environment with python=2.7
. There are plenty more options that you can pass to conda create
, which you can find out about with the option --help
.
Depending on your internet connection, conda
might take a while to collect the necessary information to run the command. Sooner or later, you will be shown a list of packages that will be downloaded and installed, followed by a prompt asking you if you want to proceed. Type y
and conda
will download and install those packages, and setup your environment. When you repeat this operation later on, conda
will be smart about which packages it needs to download. If you previously downloaded a version of a package you need, it will use that cached version, saving you the time (and trouble) to download it from a remote server.
Once the command finishes, notice that you remain on the (base)
environment. Creating an environment does not automatically trigger a change in environment. To move to - or to activate - the myenv
environment, you need to use the conda activate
command, as follows:
(base) joaor@desktop $ conda activate myenv
(myenv) joaor@desktop $
The prompt prefix changes to indicate the environment we are in changed. If you were to run python
, it would launch the interpreter stored in the myenv
environment folder. If you wanted to return to the base
environment, you would issue the command conda deactivate
. As of now, every Python script runs through this version of the interpreter, having access only to the modules installed on this environment. So let's go ahead and install some packages.
Installing packages in an environment requires a bit information about conda
as a package manager. Packages in the conda
cloud are distributed through channels. Default installations, like yours, have access to the defaults
channel which contains various common modules for scientific Python, such as numpy
. To install a package, you use the following command:
(myenv) joaor@desktop $ conda install numpy
The resulting dialogue is similar to that when we first created the environment, and for a good reason: the process is essentially the same. When we created the environment, we included python=3.8
, which is itself a package. More specifically, we asked for version 3.8 of the python
package. Similarly, we could have asked for a specific version of numpy
using the same syntax, e.g. numpy=1.15
. The conda
manager will then compute whatever dependencies it needs to download and install to get the package we requested working, or die trying. Sometimes, packages have conflicting dependencies. For example, version 1.15 of numpy
might ask for a particular version of mkl
which is incompatible with the version of scipy
you have installed. In this case, conda
will prompt if you want to downgrade
packages, if it found a way to solve the dependencies constraints. Otherwise, it might just quit with an error. Again, the good thing with having separate environments.
To install a package that is not on the defaults
channel, we have to specify which channel we want to search in. Let's try and install biopython
, which is only available through the conda-forge
channel:
(myenv) joaor@desktop $ conda install -c conda-forge biopython
You can verify that bioptyhon
is indeed installed by trying to import it in your environment:
(myenv) joaor@desktop $ python -c 'import Bio; print(Bio.__version__)'
1.76
If you try and run this same command on the base
environment, it will give an error because biopython
is not installed there:
(myenv) joaor@desktop $ conda activate base
(base) joaor@desktop $ python -c 'import Bio; print(Bio.__version__)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'Bio'
Let us go back to our myenv
environment before proceeding:
(base) joaor@desktop $ conda activate myenv
In addition to specifying channels directly when you install a package, you can also permanently add a channel to your conda
configuration. You can either do this globally, affecting all environments, or per environment. We recommend the latter, as it is less likely to surprise you in the future. Let's add conda-forge
as a permanent channel for our myenv
environment.
(myenv) joaor@desktop $ conda config --env --add channels conda-forge
By default, using --add
adds the specified channel at the top of the channel priority list. In short, this means that conda
will look there first to see if a package is available. If you would rather add a channel at the bottom of the priority list, for instance if you only want to use that channel for a very specific package, then replace --add
by --append
in the command above. There are other, more advanced, ways of specifying channel priorities that go beyond the scope of the introduction. For more information, read the documentation.
Installing packages is useful, but so is removing them. To remove a package, e.g. biopython
, you can run the following command:
(myenv) joaor@desktop $ conda remove biopython
Note that this removes that package alone, not its dependencies. Because of how the dependency solving process works, it is safer to let the user remove packages manually.
The next command we will cover in this workshop lets us export the configuration of an environment to a file, so that we can share it with others. Instead of bundling the packages themselves, conda
exports a list of the package names and their versions, as we have them on our system. In addition to package details, the file contains also the list of all channels we defined in our configuration, both globally and environment-specific. Finally, the file is written in YAML
, a human-readable text format that we can inspect manually and edit if necessary. Let's export the myenv
environment to a file:
(myenv) joaor@desktop $ conda env export --no-builds --file myenv.yaml
There are some options to unpack in this command. First, we do not use conda
but the conda env
subcommand, which is a more advanced script to manage environments. We also pass a --no-builds
option, which tells conda
to specify only the package names and versions. By default, conda
would have exported the build information for each package, which you can think of as a very precise version that is sometimes specific to the operative system. While this is great for reproducibility, it is very likely that you will end up not being able to share your environment across different systems.
Finally, it is recommended to open the resulting yaml
file and keep only the packages you installed yourself. For example, we only installed python
and numpy
, so we would want to remove all but these two entries from the yaml
file. Why? First, because these would be very likely enough to reproduce our results (to the numerical precision of the computers). And second, because the conda
of whoever tries to replicate our environment from our file would solve dependencies on its own, as it sees best, without running into compatibility issues caused by machine-dependent package versions/settings. We might also want to remove the prefix
line, although there are no real consequences if you do not. As a result, our yaml
file would look like this:
name: myenv
channels:
- defaults
- https://repo.anaconda.com/pkgs/main
- https://repo.anaconda.com/pkgs/free
dependencies:
- numpy=1.18.1
- python=3.8.2
Now let's do the reverse operation and create an environment from a yaml
file. You will find these files often in GitHub repositories, so it is handy to know how to use them. Let's open a text editor and make some changes to our myenv.yaml
file, so that it looks like this:
name: myenv
channels:
- defaults
- https://repo.anaconda.com/pkgs/main
- https://repo.anaconda.com/pkgs/free
dependencies:
- numpy
- python=2.7
Then, run the following command on your terminal:
(myenv) joaor@desktop $ conda env create --name myenv_py27 --file myenv.yaml
The --name
option defines the name of the environment we will create and overrides the setting on the yaml
file. As you will see in the command's output, conda
will fetch all the necessary dependencies to include numpy
in our environment, which is based on python
version 2.7.
Finally, let's see how we remove environments. Removing environments is useful when you make mistakes and environments become unusable, or just because you finished a project and you need to clear some disk space. The command to remove an environment is the following:
(myenv) joaor@desktop $ conda env remove --name myenv_py27
Running this command will delete all packages installed in that environment and remove the environment folder. Naturally, you cannot delete the environment you are on - you must first use conda deactivate
or move to another environment.
And that's it. There is much more to learn about conda
, so go ahead and browse the documentation for more information. However, these few commands we learned are sufficient to get you started and to help you manage your Python projects a tiny bit more efficiently.
Let us know if you have any questions or feedback, you can reach me at [email protected]
. Thank you for following along and good luck with your conda
and Python adventures!
Content licensed under CC-BY-4.0. For details, see the LICENSE file on the repository.
Last updated: 20 May 2020