Hotels LLM Demo with LangChain, CassIO and Astra DB

What

A full demo app (API + front-end) to search and review hotels, powered by GenAI, LLMs and embedding vectors.

The demo features:

LLM-processed user profile;
vector-search-powered hotel review search;
LLM-generated hotel summaries based on reviews + user profile (to maximize relevance);
caching of prompt/response LLM interactions.

Tech stack:

Astra DB as the vector database;
OpenAI for the LLM and the embeddings;
React+Typescript for the client;
FastAPI, LangChain and CassIO for the API.

Note that at the time of writing, pending a PR to LangChain, a custom fork of the repo will be installed.

Prerequisites

You need:

an Astra Vector Database (free tier is fine!). You'll be asked to supply a Database Administrator token, the string starting with AstraCS:...;
likewise, get your Database ID ready, you will have to enter it;
an OpenAI API Key. (More info here, note that out-of-the-box this demo supports OpenAI unless you tinker with the code.)

Note: If you have switched Astra to the New Vector Developer Experience UI, click here for instructions on the DB credentials.

Go to your database dashboard and click on the "Connection Details" button on the right. A dialog will open with instructions for connecting. You'll do two things:

click "Generate Token" and copy the AstraCS:... string in its entirety once that appears on the dialog;
locate the api_endpoint=... line in the Python code example. The database ID is the sequence after https:// and before the dash + region name (e.g. -us-east1) in the definition of the endpoint. It looks like 01234567-89ab-cdef-0123-456789abcdef (and has always this length).

How-to (Gitpod)

Click this button, confirm opening of the workspace (you might need to do a Gitpod login in the process) and wait 3-4 minutes for the full setup to complete: instructions will show up in the console below, where you'll have to provide connection details and OpenAI key when prompted.

In the meantime, the app will open in the top panel.

How-to (local run)

Setup (API)

Create a python3.8+ virtualenv and pip install -r requirements.txt.

Note: this demo has been tested with Python versions 3.8, 3.9 and 3.10. Please stick to these Python versions, otherwise you'll likely be unable to install all required dependencies (until newer wheels for them are published).

Copy .env.template to .env and fill the values (see Prerequisites above).

Prepare database

There are a few scripts to run in sequence which create the necessary tables on the database and fill them with data. Simply launch the following scripts one after the other (the script names make it clear what part of the setup they do):

python -m setup.2-populate-review-vector-table
python -m setup.3-populate-hotels-and-cities-table
python -m setup.4-create-users-table
python -m setup.5-populate-reviews-table

Note: the repo comes with a dataset ready for ingestion in the DB, i.e. already cleaned and made into the correct format (including the embedding calculation!). If you are curious about how that was prepared, we have included the scripts for that - which you do not need to run, to be clear.

Show me the dataset preprocessing steps, I'm curious

Download the dataset

Download Datafiniti_Hotel_Reviews_Jun19.csv from here (unzip if necessary) and put it into setup/original.

Clean the input CSV

Refine the original CSV into its "cleaned" version for later use:

python -m setup.0-clean-csv

Calculate embeddings (takes time and some OpenAI calls!)

This script calculates embedding vectors for all reviews (it actually combines review title and body in a certain way, and the resulting string is what is sent to the embedding OpenAI service):

python -m setup.1-augment-with-embeddings

Note: this step is time-consuming and makes use of several calls of your OpenAI account. This is why, to save time and (your) money, the script stores the resulting vectors in a precalculated_embeddings.json file (which uses a custom compression scheme, see the code!), so that the "populate review vector table" step does not need to calculate them anymore. We included the precalculated embeddings in the repo: this is why you can start the setup from step 2.

Launch the API

In the console with the virtual environment active, run:

uvicorn api:app
# (optionally add "--reload")

Once you see the Uvicorn running on [address:port] message, the API is ready.

Client

Setup

Note: you need a recent version of Node.js installed.

On another console, go to the client directory and run npm install to get all dependencies.

Start the client

Issue the command

npm start

If it does not open by itself, go to localhost:3000 on your browser.

Note: if you start the API on another address/port, you can specify it like this instead:

REACT_APP_API_BASE_URL="http://10.1.1.2:6789" npm start

Note: do not worry if you see some API requests being done twice. This is due to the React (v18+) app running in dev mode with use strict. See here for more. Behaviour in production would be all right.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
client		client
images		images
scripts		scripts
setup		setup
utils		utils
.env.template		.env.template
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
README.md		README.md
TODOS.md		TODOS.md
api.py		api.py
common_constants.py		common_constants.py
requirements-setup.txt		requirements-setup.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hotels LLM Demo with LangChain, CassIO and Astra DB

What

Prerequisites

How-to (Gitpod)

How-to (local run)

Setup (API)

Prepare database

Download the dataset

Clean the input CSV

Calculate embeddings (takes time and some OpenAI calls!)

Launch the API

Client

Setup

Start the client

About

Releases

Packages

Contributors 2

Languages

CassioML/langchain-hotels-app

Folders and files

Latest commit

History

Repository files navigation

Hotels LLM Demo with LangChain, CassIO and Astra DB

What

Prerequisites

How-to (Gitpod)

How-to (local run)

Setup (API)

Prepare database

Download the dataset

Clean the input CSV

Calculate embeddings (takes time and some OpenAI calls!)

Launch the API

Client

Setup

Start the client

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages