Creating projects#
In this tutorial, we will walk through how to set up a new Python project in conda
using an environment.yml
file. This file will help you keep track of your
dependencies and share your project with others. We cover how to create your
project, add a simple Python program and update it with new dependencies.
Requirements#
To follow along, you will need a working conda installation. Please head over to our installation guide for instructions on how to get conda installed if you do not have it.
This tutorial relies heavily on using your computer's terminal (Command Prompt or PowerShell
on Windows), so it is also important to have a working familiarity with using basic commands
such as cd
and ls
.
Creating the project's files#
To start off, we will need a directory that will contain the files for our project. This can be created with the following command:
mkdir my-project
In this directory, we will now create a new environment.yaml
file, which will hold the
dependencies for our Python project. In your text editor (e.g. VSCode, PyCharm, vim, etc.),
create this file and add the following:
name: my-project
channels:
- defaults
dependencies:
- python
Let's briefly go over what each part of this file means.
- Name#
The name of your environment. Here, we have chosen the name "my-project", but this can be anything you want.
- Channels#
Channels specify where you want conda to search for packages. We have chosen the
defaults
channel, but others such asconda-forge
orbioconda
are also possible to list here.- Dependencies#
All the dependencies that you need for your project. So far, we have just added
python
because we know it will be a Python project. We will add more later.
Creating our environment#
Now that we have written a basic environment.yml
file, we can create and activate an environment
from it. To do so, run the following commands:
conda env create --file environment.yml
conda activate my-project
Creating our Python application#
With our new environment with Python installed, we can create a simple Python program.
In your project folder, create a main.py
file and add the following:
def main():
print("Hello, conda!")
if __name__ == "__main__":
main()
We can run our simple Python program by running the following command:
python main.py
Hello, conda!
Updating our project with new dependencies#
If you want your project to do more than the simple example above, you can use one of the thousands of available packages on conda channels. To demonstrate this, we will add a new dependency so that we can pull in some data from the internet and perform a basic analysis.
To perform the data analysis, we will be relying on the Pandas
package. To add this to our project, we will need to update our environment.yml
file:
name: my-project
channels:
- defaults
dependencies:
- python
- pandas # <-- This is our new dependency
Once we have done that, we can run the conda env update
command to install the new package:
conda env update --file environment.yml
Now that our dependencies are installed, we will download some data to use for our analysis. For this, we will use the U.S. Environmental Protection Agency's Walkability Index dataset available on data.gov. You can download this with the following command:
curl -O https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
Tip
If you do not have curl
, you can visit the above link with a web browser to download it.
For our analysis, we are interested in knowing what percentage of U.S. residents live in highly
walkable areas. This is a question that we can easily answer using the pandas
library.
Below is an example of how you might go about doing that:
import pandas as pd
def main():
"""
Answers the question:
What percentage of U.S. residents live highly walkable neighborhoods?
"15.26" is the threshold on the index for a highly walkable area.
"""
csv_file = "./EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv"
highly_walkable = 15.26
df = pd.read_csv(csv_file)
total_population = df["TotPop"].sum()
highly_walkable_pop = df[df["NatWalkInd"] >= highly_walkable]["TotPop"].sum()
percentage = (highly_walkable_pop / total_population) * 100.0
print(
f"{percentage:.2f}% of U.S. residents live in highly" "walkable neighborhoods."
)
if __name__ == "__main__":
main()
Update your main.py
file with the code above and run it. You should get the following
answer:
python main.py
10.69% of Americans live in highly walkable neighborhoods
Conclusion#
You have just been introduced to creating your own data analysis project by using
the environment.yml
file in conda. As the project grows, you may wish to add more dependencies
and also better organize the Python code into separate files and modules.
For even more information about working with environments and environment.yml
files,
please see Managing Environments.