A Python web application built on Flask that allows an asset with a URL to be analyzed and a textual and embedding representation stored in MongoDB Atlas.
A vector search can then be performed on the embeddings.
The application uses Replicate to run AI models and Hookdeck to reliability receive asynchronous results from Replicate.
At present the application supports analyzing audio assets and getting the transcribed contents. However, there is a framework in place to support other asset types such as text, HTML, images, and video
The following diagram shows the sequence of how assets are submitted within the Flask application and processed by Replicate, and the results sent via webhooks through Hookdeck back to the Flask application and stored in MongoDB.
- A free Hookdeck account
- The Hookdeck CLI installed
- A trial MongoDB Atlas account
- A Replicate account
- Python 3
- Poetry for package management
Activate the virtual environment:
poetry shell
Install dependencies:
poetry install
Create a .env
file with the following configuration, replacing with values as indicated:
# A secret used for signing session cookies
# https://flask.palletsprojects.com/en/2.3.x/config/#SECRET_KEY
SECRET_KEY=""
# MongoDB Atlas connection string
MONGODB_CONNECTION_URI=""
# Hookdeck Project API Key
# Hookdeck Dashboard -> Settings -> Secrets
HOOKDECK_PROJECT_API_KEY=""
# Replicate API Token
REPLICATE_API_TOKEN=""
# Hookdeck Source URLs
# These will be automatically populated for you in the next step
AUDIO_WEBHOOK_URL=""
EMBEDDINGS_WEBHOOK_URL=""
Run the following to create Hookdeck connections to receive webhooks from Replicate:
poetry run python create-hookdeck-connections.py
Run the following to create a search indexes within MongoDB:
Warning
You may need some data within MongoDB before you can create the indexes.
poetry run python create-indexes.py
Run the app:
poetry run python -m flask --app app --debug run
Create localtunnels to receive webhooks from the two Hookdeck Connections:
hookdeck listen 5000 '*'
Navigate to localhost:5000
within your web browser.