This lab aims to outline a recipe for building a standardised Python server that can be run in a container. Our major aims are to be able to:
- Expose an API that will eventually sit behind a reverse proxy
- A Postgres server for development
- A Redis server for development
- Healthcheck endpoint that will validate that the API can get to the database
- Worker processes that will process tasks in the background (using TasIQ)
- Provide
Dockerfilefor development and production - Application level logging
-
CSRF protectionsee #52, also see official guide - Basic endpoints for authentication (JWT and OTP based) - along with recommendations for encryption algorithms
Additionally, we provide:
- Examples of working with Stripe for payments including handling webhooks
- Provide a pattern on separating the API endpoints for the hooks for scale
In production (see Terraform lab project) we will use Terraform and Helm provision infrastructure and deploy the app in pods. Ideally Postgres and Redis would be provisioned as a hosted products (Linode is yet to offer this), in the short term they will be installed from official Charts.
TODO:
- Convert the project to be a Cookiecutter template
The approach taken in this guide is to document the tools and commands are they are and not build additional tooling or abstractions. The aim is to educate the user on the tools and how to use them.
Ultimately the output of this lab will be consumed as the app and worker for the Terramform Lab.
All docker compose files depend on the following environment variables, which can be set by either exporting them before you run the commands or by declaring them in your .env file.
PROJ_NAMEis a prefix that is used to label resources, object storesPROJ_FQDNis the domain name of the application, this can be setORG_NAMEis the organisation for your Docker registry; Anomaly defaults to using Github
Python 3.11 requires the Tinker package to be installed, not sure why this is the case and why the the base Docker image does not contain this. Take a look at the Dockerfile where we install this package via apt.
On macOS we manage this via Homebrew:
brew install [email protected]
Handy commands and tools, which we use to help with the infrastructure:
Use openssl to generate a random hex string where the 20 is th length:
openssl rand -hex 20
This can be used to generate secrets which the application uses, folllow more notes on how to cycle secrets in this guide.
The above is wrapped up as a Task endpoints, you need to supply the length of the hash as a parameter:
task crypt:hash -- 32If you are using the development docker-compose.yml it exposes the following ports to the host machine:
5432- standard port forpostgresso you can use a developer tool to inspect the database15672- RabbitMQ web dashboard (HTTP)9000- MinIO web server was exchanging S3 compatible objects (HTTPS, see configuration details)9001- MinIO web Dashboard (HTTPS)
Some of these ports should not be exposed in production
The following Python packages make the standard set of tools for our projects:
- SQLAlchemy - A Python object relational mapper (ORM)
- alembic - A database migration tool
- FastAPI - A fast, simple, and flexible framework for building HTTP APIs
- pydantic - A data validation library that is central around the design of FastAPI
- TaskIQ - An
asynciocompatible task queue processor that uses RabbitMQ and Redis and has FastAPI like design e.g Dependencies - pendulum - A timezone aware datetime library
- pyotp - A One-Time Password (OTP) generator
Packages are managed using poetry, docs available here.
Python packages should be moved up manually to ensure that there aren't any breaking changes.
Use poetry to list a set of outdated packages:
cd src/
poetry show -othis will produce a list of outdated packages. Some of these will be dependencies of our dependencies, so start by upgrading the top level project dependencies (e.g FastAPI or SQLAlchemy):
poetry add SQLAlchemy@latestIf you haven't installed a particular package e.g. starlette then be wary of forcefully upgrading it as it might break it's parent dependency e.g. FastAPI.
Note: It's a very good habit not to have packages that you don't use. Please review the package list for every project. This also applies to any handlers e.g
stripe, if your application does not use payments then please disable these.
Directory structure for our application:
src/
├─ tests/
├─ labs
| └─ routers/ -- FastAPI routers
| └─ tasks/ -- TaskIQ
| └─ models/ -- SQLAlchemy models
| └─ dto/ -- Data Transfer Objects
| └─ alembic/ -- Alembic migrations
| └─ __init__.py -- FastAPI app
| └─ api.py -- ASGI app that uvicorn serves
| └─ broker.py -- TaskIQ broker configuration
| └─ settings -- pyndatic based settings
| └─ db.py -- SQLALchemy session management
| └─ utils/ -- App wide utility functions
├─ pyproject.toml
├─ poetry.lock
FastAPI is a Python framework for building HTTP APIs. It is a simple, flexible, and powerful framework for building APIs and builds upon the popular pydantic and typing libraries. Our general design for API end points is to break them into packages.
Each submodule must define a router where the handlers defined in the submodule are mounted on. This router should then be bubbled up to the main router in the __init__.py file and so on until we reach the top of the routers package.
In the routers package we import the top level routers as router_modulename and add mount them mto the router_root. If your router need to be prefixed with a URL then use the prefix parameter when mounting the router:
from fastapi import APIRouter
from .auth import router as router_auth
from .ext import router as router_ext
router_root = APIRouter()
# Auth exposes routes on the root according to the OAuth2 spec
router_root.include_router(
router_auth,
)
# Prefixed router with /ext
router_root.include_router(
router_ext,
prefix="/ext",
)api.py imports the router_root and mounts it, thus mounting all routers in your application. Never modify the api.py if you want to keep up to date with the template.
FastAPI camel cases the method name as the short description and uses the docstring as documentation for each endpoint. Markdown is allowed in the docstring.
When running behind a Reverse Proxy (which would almost always be the case for our applications), FastAPI can accept the root path in numerous ways. This tells FastAPI where to mount the application i.e how the requests are going to be forwarded to the top router so it can stripe away the prefix before routing the requests. So for example the root FastAPI application might be mounted on /api and the FastAPI will need to strip away the /api before handling the request.
FastAPI's Behind a Proxy document describes how to set this up. Where possible the recommend way is to pass this in via the uvicorn invocation, see Dockerfile for the api container:
ENTRYPOINT ["uvicorn", "labs.api:app", "--host=0.0.0.0", "--port=80", "--root-path=/api", "--reload"]Failing everything you can pass the argument in the FastAPI constructor.
There are times that you don't want the endpoint to be included in the documentation, which in turns makes client code generators to ignore the endpoint. FastAPI has include_in_schema parameter in the decorator, which is set to True by default. This can be set to False to exclude the endpoint from the documentation.
FastAPI provides a really nice, clean way to build out endpoints. We recommend that each endpoint must:
- Explicitly define the
status_codeit will return for positive responses - Throw
HTTPExceptionon errors with the proper HTTP Error Code and a descriptive message (seestatuspackage provided by FastAPI) - Provide a
pydanticschema for the request and response body (where appropriate) - Provide a summary of the operation (no matter how trivial) which will make for better documentation
from fastapi import (
APIRouter,
Depends,
HTTPException,
Query,
status
)
@router.get(
"/{id}",
summary="Get a particular user",
status_code=status.HTTP_200_OK
)
async def get_user_by_id(
id: UUID,
session: AsyncSession = Depends(get_async_session)
) -> UserResponse:
""" Get a user by their id
"""
user = await User.get(session, id)
if not user:
raise HTTPException(
status.HTTP_404_NOT_FOUND,
"User not found"
)
return userEnsure that handlers never have unreferenced or variables
Anomaly puts great emphasis on code readability and standards. These circle around the following design principles proposed by the languages and the others around protocols (e.g RESTful responses, JSON, etc). We recommend strictly following:
Our web-client defines the standards for the front end. It's important to note the differences that both environments have and the measure to translate between them. For example:
Python snake case is translated to camel case in JavaScript. So my_var becomes myVar in JavaScript. This is done by the pydantic library when it serialises the data to JSON.
from pydantic import BaseModel
def to_lower_camel(name: str) -> str:
"""
Converts a snake_case string to lowerCamelCase
"""
upper = "".join(word.capitalize() for word in name.split("_"))
return upper[:1].lower() + upper[1:]
class User(BaseModel):
first_name: str
last_name: str = None
age: float
model_config = ConfigDict(
from_attributes=True,
alias_generator=to_lower_camel,
)Source: CamelCase Models with FastAPI and Pydantic by Ahmed Nafies
It is important to pay attention to such detail, and doing what is right for the environment and language.
To assist with this src/labs/schema/utils.py provides the class AppBaseModel which inherits from pydantic's BaseModel and configures it to use to_lower_camel function to convert snake case to camel case. If you inherit from AppBaseModel you will automatically get this behaviour:
from .utils import AppBaseModel
class MyModel(AppBaseModel):
first_name: str
last_name: str = None
age: floatFastAPI will try and generate an operation_id based on the path of the router endpoint, which usually ends up being a convoluted string. This was originally reported in labs-web-client. You can provide an operation_id in the decorator e.g:
@app.get("/items/", operation_id="some_specific_id_you_define")which would result in the client generating a function like someSpecificIdYouDefine().
For consistenty FastAPI docs shows a wrapper function that globally re-writes the operation_id to the function name. This does put the onus on the developer to name the function correctly.
As of FastAPI 0.99.x it takes a generate_unique_id_function parameter as part of the constructor which takes a callable to return the operation id. If you name your python function properly then you can use them as the operation id. api.py features this simple function to help with it:
def generate_operation_id(route: APIRoute) -> str:
"""
With a little help from FastAPI docs
https://bit.ly/3rXeAvH
Globally use the path name as the operation id thus
making things a lot more readable, note that this requires
you name your functions really well.
Read more about this on the FastAPI docs
https://shorturl.at/vwz03
"""
return route.nameThe project uses TaskIQ to manage task queues. TaskIQ supports asyncio and has FastAPI like design ideas e.g dependency injection and can be tightly coupled with FastAPI.
TaskIQ is configured as recommend for production use with taskiq-aio-pika as the broker and taskiq-redis as the result backend.
broker.py in the root of the project configures the broker using:
broker = (
AioPikaBroker(str(settings.amqp.dsn),)
.with_result_backend(redis_result_backend)
)api.py uses FastAPI events to start and shutdown the broker. As their documentation notes:
Calling the startup method is necessary. If you don't call it, you may get an undefined behaviour.
# TaskIQ configurartion so we can share FastAPI dependencies in tasks
@app.on_event("startup")
async def app_startup():
if not broker.is_worker_process:
await broker.startup()
# On shutdown, we need to shutdown the broker
@app.on_event("shutdown")
async def app_shutdown():
if not broker.is_worker_process:
await broker.shutdown()We recommend creating a tasks.py file under each router directory to keep the tasks associated to each router group next to them. Tasks can be defined by simply calling the task decorator on the broker:
@broker.task
async def send_account_verification_email() -> None:
import logging
logging.error("Kicking off send_account_verification_email")and kick it off simply use the kiq method from the FastAPI handlers:
@router.get("/verify")
async def verify_user(request: Request):
"""Verify an account
"""
await send_account_verification_email.kiq()
return {"message": "hello world"}There are various powerful options for queuing tasks both scheduled and periodic tasks are supported.
Towards the end of broker.py you will notice the following override:
# For testing we use the InMemory broker, this is set
# if an environment variables is set, please note you
# will require pytest-env for environment vars to work
env = os.environ.get("ENVIRONMENT")
if env and env == "pytest":
from taskiq import InMemoryBroker
broker = InMemoryBroker()which allows us to use the InMemoryBroker for testing. This is because FastAPI provides it's own testing infrastructure which routes the calls internally and the RabbitMQ broker and redis backend is not available.
Note: that you will need to install
pytest-envfor this to work and be sure to set theENVIRONMENTenvironment variable topytest. Refer topyproject.tomlto see ho we configure it for the template.
SQLAlchemy is making a move towards their 2.0 syntax, this is available as of v1.4 which is what we currently target as part of our template. This also brings the use of asyncpg which allows us to use asyncio with SQLAlchemy.
First and foremost we use the asyncpg driver to connect to PostgreSQL. Refer to the property postgres_async_dsn in config.py.
asyncio and the new query mechanism affects the way you write queries to load objects referenced by relationships. Consider the following models and relationships:
from typing import Optional
from sqlalchemy.orm import Mapped, mapped_column,\
DeclarativeBase
# Used by the ORM layer to describe models
class Base(DeclarativeBase):
"""
SQLAlchemy 2.0 style declarative base class
https://bit.ly/3WE3Srg
"""
pass
class Catalogue(Base):
__tablename__ = "catalogue"
name: Mapped[str]
description: Mapped[Optional[str]]
# Catalogues are made of one or more products
products = relationship("Product",
back_populates="catalogue",
lazy="joined"
)
class Product(Base):
__tablename__ = "product"
name: Mapped[str]
description: Mapped[Optional[str]]
# Products have one or more prices
prices = relationship("Price",
primaryjoin="and_(Product.id==Price.product_id, Price.active==True)",
back_populates="product"
)
class Price(Base):
__tablename__ = "price"
name: Mapped[str]
description: Mapped[Optional[str]]
amount: Mapped[float]For you to be able to access the Products and then related Prices you would have to use the selectinload option to ensure that SQLAlchemy is able to load the related objects. This is because the asyncio driver does not support joinedload which is the default for SQLAlchemy.
from sqlalchemy.orm import selectinload
query = select(cls).options(selectinload(cls.products).\
selectinload(Product.prices)).\
where(cls.id == id)
results = await async_db_session.execute(query)Note: how the
selectinloadis chained to theproductsrelationship and then thepricesrelationship.
Our base project provides serveral Mixin, a handy one being the ModelCRUDMixin (in src/labs/models/utils.py). It's very likely that you will want to write multiple getters for your models. To facilitate this we encourage each Model you have overrides _base_get_query and returns a query with the selectinload options applied.
@classmethod
def _base_get_query(cls):
query = select(cls).options(selectinload(cls.products).\
selectinload(Product.prices))
return queryThis is then used by the get method in the ModelCRUDMixin to load the related objects and apply any further conditions, or orders:
@classmethod
async def get(cls, async_db_session, id):
query = cls._base_get_query()
results = await async_db_session.execute(query)
(result,) = results.one()
return resultTo initialise alembic activate the virtualenv created by poetry:
cd src/
poetry shelland run the initialiser script for async mode:
alembic init -t async alembicIn alembic.ini the first parameter is the location of the alembic script, set to the following by default:
script_location = alembicchange this to be relative to the project:
script_location = labs:alembic
Since we want the alembic to configure itself dynamically (e.g Development container, Production container) we need to drop empty out the value set in alembic.ini
# From
sqlalchemy.url = driver://user:pass@localhost/dbname
# to, as it will be configured in env.py
sqlalchemy.url =you need to import the following in env.py, relative imports don't seem to be allowed (pending investigation):
# App level imports
from labs.settings import config as settings
from labs.db import Base
from labs.models import *and then in env.py import the application configuration and set the environment variable, I've decided to do this just after the config variable is assigned:
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Read the app config and set sqlalchemy.url
config.set_main_option("sqlalchemy.url", settings.postgres_dsn)lastly you have to assign your declerative_base to the target_metadata varaible in env.py, so find the line:
target_metadata = Noneand change it to:
target_metadata = Base.metadataNote: that the
Basecomes from the above imports and we import everything from our models package so alembic tracks all the models in the application.
And finally you should be able to run your initial migration:
docker compose exec api sh -c "alembic -c /opt/labs/alembic.ini revision --autogenerate -m 'init db'"producing the following output:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Generating /opt/labs/alembic/versions/4b2dfa16da8f_init_db.py ... done
followed by upgrading to the latest revision:
docker compose exec api sh -c "alembic -c /opt/labs/alembic.ini upgrade head"producing the following output
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 4b2dfa16da8f, init db
During development you may find the need to nuke the database all together and start all over again. We provide a handy way of re-creating the schema using the Task endpoint task db:init which simply runs the SQLAlchemy create_all method.
You are still left in a position where the database is unaware of alembic's state. This is because the alembic_version table is no longer present in the database. To restore the state we need to recreate the table and insert a record with the HEAD SHA.
We provide task db:alembic:heads which runs the alembic heads command. You can also use task db:alembic:attach to query the HEAD SHA and recreate the table populated with the SHA.
Post this point you should be back where you were and use migrations as you'd expect to.
MinIO is able to run with TLS enabled, all you hve to do is provide it a certificate. By default MinIO looks for certificates in ${HOME}/.minio/certs. You can generate certificates and mount them into the container:
volumes:
- minio-data:/data
- .cert:/root/.minio/certsThis will result in the dashboard being available via HTTPS and the signed URLs will be TLS enabled.
Since we use TLS enabled endpoints for development, running MinIO in secure mode will satisfy any browser security policies.
The template provides a SQLAlchemy table called S3FileMetadata this is used to store metadata about file uploads.
The client sends a request with the file name, size and mime type, the endpoint create a S3FileMetadata and returns an pre-signed upload URL, that the client must post the contents of the file to.
The client can take as long as it takes it upload the contents, but must begin uploading within the signed life e.g five minutes from when the URL is generated.
The template is designed to schedule a task to check if the object made it to the store. It continues to check this for a period of time and marks the file to be available if the contents are found on the object store.
The client must keep polling back to the server to see if the file is eventually available.
Task is a task runner / build tool that aims to be simpler and easier to use than, for example, GNU Make. Wile it's useful to know the actual commands it's easy to use a tool like task to make things easier on a daily basis:
eject- eject the project from a templatebuild:image- builds a publishable docker imagecrypt:hash- generate a random cryptographic hashdb:alembic- arbitrary alembic command in the containerdb:alembic:heads- shows the HEAD SHA for alembic migrationsdb:alembic:attach- join the database container to alembic migrationsdb:init- initialise the database schemadb:migrate- migrates models to HEADdb:rev- create a database migration, pass a string as commit stringdev:psql- postgres shell on the db containerdev:pyshell- get a python session on the api containerdev:sh- get a bash session on the api containerdev:test- runs tests inside the server container
It's important that we put files that should not be copied across into the containers in the .dockerignore file. Please try and to keep this up to date as your projects grow. The best security is the absence of files that are not required.
Our current standard image is python:3.10-slim-buster this should be updated as the infrastructure changes.
While it is possible to pass multiple environment files to a container, we are trying to see if it's possible to keep it down to a single file.
The Dockerfile for the API simply copies the contents of the src directory and then uses poetry to install the packages including the application itself.
The
virtualenvs.createis set tofalsefor containers asvirtualenvare not required
We run the application using uvicorn and pass in --root-path=/api for FastAPI to work properly when behind a reverse proxy. FastAPI recommends setting this at the server level, setting the flag in FastAPI is the last resort.
Dockerfile is the configuration referenced by docker-compose.yml for development and Dockerfile.prod is the configuration referenced by docker-compose.prod.yml for production. For Kubernetes based deployment please reference Dockerfile.prod.
The template provides Docker file for production, this uses multi staged builds to build a slimmer image for production.
There's a fair bit of documentation available around deploying uvicorn for production. It does suggest that we use a process manager like gunicorn but it might be irrelevant depending on where we are deploying. For example if the application is deployed in a Kubernetes cluster then each pod would sit behind a load balancer and/or a content distribution network (CDN) and the process manager would be redundant.
The production container does have the
postgresclient installed to provide you access topsqlthis is rather handy for initialising the database or performing any manual patches.
Many a times you will want to interactively get a shell to postgres to update the database. Our containers have the postgres client installed. If you have a file called .pgpass in /root/ then you can use psql directly without having to enter a password.
Remember that the container has very limited software installed so you will require to save contents of .pgpass using:
echo "kubernetescluster-aurora-cluster.cluster-cc3g.ap-southeast-2.rds.amazonaws.com:5432:harvest:dbuser:password" > ~/.pgpassOnce you have done that you can use kubectl to execute psql directly.
kubectl exec -it server-565497855b-md96l -n harvest -- /usr/bin/psql -U dbuser -h kubernetescluster-aurora-cluster.cluster-cc3g.ap-southeast-2.rds.amazonaws.com -d harvestNote that you have to specify the hostname and username as well as the database name. The password is read from the
.pgpassfile.
Once you have this working you can pipe the contents of a SQL file from your local machine to the container.
cat backup.sql | kubectl exec -it server-565497855b-md96l -n harvest -- /usr/bin/psql -U dbuser -h kubernetescluster-aurora-cluster.cluster-cc3g.ap-southeast-2.rds.amazonaws.com -d harvestWe recommend the use of a registry such as Github Container Repository to host your images. Assuming you are using GitHub, you can build your images using the following command:
docker build -t "ghcr.io/anomaly/python-lab-server-api:v0.1.0" -f Dockerfile.api .where v0.1.0 is the version of the image and Docker.api is the the Dockerfile to use. python-lab-server is the the name of the package that will be published on GitHub Container Registry. To publish the image use the following command:
docker push ghcr.io/anomaly/python-lab-server-api:v0.1.0where anomaly is the organisation on GitHub and v0.1.0 is the version of the image.
Ensure you tag the release on your version control system and write thorough release notes.
Note that if you are building on Apple Silicon by default the images are built for the arm architecture, if you are going to deploy to amd64 you must specify this as an argument --platform=linux/amd64.
- Deploying FastAPI apps with HTTPS powered by Traefik by Sebastián Ramírez
- How to Inspect a Docker Image’s Content Without Starting a Container by James Walker
- Poetry sub packages, an open issue to support sub packages in Poetry, which will be handy in splitting up our code base further.
- Using find namespaces or find namespace package
SQLAlchemy speciific resources:
- FastAPI with Async SQLAlchemy, SQLModel, and Alembic by Michael Herman
- SQLAlchemy Async ORM is Finally Here! by Ahmed Nafies
- Better Jinja - Jinja syntax highlighting for VS Code by @SamuelColvin (accepted as part of PAP #47)
Contents of this repository are licensed under the Apache 2.0 license.