This repository contains a Cookiecutter template for Python projects. While it is designed to fit a data science context, it is designed to cover most common uses of Python.
First, make sure Cookiecutter is installed:
pip install -U cookiecutter
Then, create a project based on this template:
cookiecutter https://github.com/sdsc-innovation/cookiecutter-python.git
This will prompt you for:
- a project name, used in
README.md
; - a project slug (in
kebab-case
), used for the top-level directory name; - a module name (in
snake_case
); - whether you would like to use Ruff to format and analyze your code;
- whether you would like to use pytest to write unit tests.
The module name may also be a short generic identifier, such as lib
or helper
, if you do not plan to use it externally.
The generated folder structure is straightforward:
- code is stored as an installable module, in
src
; - unit tests have a dedicated folder in
tests
; - notebooks have their own folder;
- and, by default, a
data
folder is added, as a suggestion.
You can now create an empty repository on GitHub or GitLab (i.e. without an initial README file), and initialize it locally:
cd your-project
git init --initial-branch=main
git remote add origin https://github.com/you/your-project.git
git add .
git commit -m "Initial commit"
git push --set-upstream origin main
Apart from adding code, both in .py
files and notebooks, it is recommended to make sure that dependencies are properly configured. More information about version specifiers can be found in PEP 440.
requirements.txt
should contain the exact (a.k.a. pinned) versions of the required dependencies during development. Note thatpipreqs
can be used to infer automatically which packages are used in your code (whereaspip freeze
would list all installed packages). Also note that-e .
is used to install your newly created module in editable mode.environment.yml
, by default, delegates torequirements.txt
.- In
pyproject.toml
, under sectiontool.poetry.dependencies
, are listed the install dependencies of your module. It should represent the minimal versions of the required dependencies during regular usage. Therefore, this typically excludes any development tool, such asruff
orpytest
.
If unit tests are enabled during template instantiation, CI/CD configuration files are provided both for GitHub and GitLab. Keep only the one that applies to your scenario.
Please refer to the generated README.md
for more details, in particular to install dependencies and register pre-commit hooks.
- Python 3.10 is chosen as a minimum, as 3.9 will reach end-of-life in 2025. We recommend 3.11, as some packages may not fully support 3.12 yet.
- A
pyproject.toml
file is the recommended way to store configuration. - PyPA's Setuptools is used as build backend, as this is historically the most common solution. However, other options are discussed in Python Packaging User Guide, such as Hatch or Poetry.
- No
setup.py
is provided, as typical projects do not need to access low-level Setuptools configuration. Note thatsetup.py
and Setuptools are not deprecated as a build backend; building C extensions will require this file. requirements.txt
is used to define development dependencies. Anenvironment.yml
is also provided for Conda users. Installation dependencies should be defined inpyproject.toml
.- The
src
layout is used, to enforce proper use of editable installation during development. - Ruff is used for code formatting and linting, using mostly the default configuration.
- pytest is used for unit testing, as the standard
unittest
module tends to be more verbose. - mypy is suggested for static type checking.
.gitlab-ci.yml
is pre-configured to run unit tests on GitLab CI/CD..github/workflows/pytest.yml
is provided to run unit tests on GitHub Actions.- pre-commit hooks are configured to enforce Ruff, and also some quality-of-life built-in tools.
- No default license file is provided, as this template does not necessarily targets open source projects. By default, copyright applies; choose a license if you would like to open your project!
- Python Release Cycle
- Cookiecutter templates:
- https://github.com/audreyfeldroy/cookiecutter-pypackage
- https://github.com/drivendata/cookiecutter-data-science
- https://github.com/sourcery-ai/python-best-practices-cookiecutter
- https://github.com/SwissDataScienceCenter/renku-project-template/tree/master/python-minimal
- https://github.com/cmdoret/renku-project-template/tree/master/python-datasci
- https://github.com/khuyentran1401/data-science-template
.gitattributes
best practices- Flake8 configuration
- Black configuration
- Pylint configuration
- Ruff configuration and rules
- Poetry basic usage
- Semantic versioning