This is a starter repo designed to be cloned & modified by you to be the base of an AI-powered app on Coherence. Using OpenAI, LangChain, and Pinecone. Using Firebase for real-time chat storage and SQLite stored in the repo itself for metadata.
This is the foundation for the Cloud Whitepaper Index where you can chat with an AI Cloud Architect refrencing over 500 AWS & GCP Whitepapers. It should be very easy (and fun!) to modify this into a "chat with your own docs" app. Even if you're not building a chat app, it's an easy way to get started with these amazing technologies and build the app you're excited about.
Documents are placed into the files firectory. PDF and HTML files are supported. Metadata is placed in a CSV file called metadata.csv in the files directory. The indexing process runs locally (or using a Coherence Workspace) and it:
- parses each file
- links it to the metadata row (based on the filename == the
Namecolumn of the CSV) - breaks it into chunks to be embedded and stored in Pinecone (using
PyPDFLoaderorUnstructuredHTMLLoaderfrom LangChain) - embeds the chunks (using
vectorstore.add_documentsfrom LangChain) - saves a row in sqlite with the document and metadata, along with additional AI-augmented metadata such as a description, keywords, etc...
- the docs are now available to chat with! (the prompts in
app.pywill customize the responses you can get based on your document types)
The reason we are using an in-repo SQLite file for the database is to keep the lifecycle of the data here to one DB and one Pinecone index. This allows us to keep to the free tier on Pinecone and is also just simple to reason about. Since there are no user-generated uploads here, it also "just works." You could certainly modify it to use a "real" DB like Postgres and Coherence would make that easy, too - your app would just need to do some logic to find the right Pinecone index for each database instance.
- Clone the repo
- Authenticate Firebase by adding your
firebase .jsonto thebackenddirectory and push to a private repo in github (or modify the code to load from an env var or other source of auth for more security)
- Authenticate Firebase by adding your
- Onboard the app into Coherence by installing the github app and authorizing your cloud (AWS or GCP)
- Add the required env vars
- OPENAI_API_KEY
- PINECONE_API_KEY
- FIREBASE_URL
- Lanch a Workspace on your first feature
- Upload some docs to the
filesdirectory, and add the corresponding metadata tometadata.csv - Run
cocli exec api python cli.py --parse_all - Use the
Dev Previewin the Workspace to test out the app - Deploy to your cloud by using the Workspace to push to github
- Create as many full-stack branch environments as you want by adding new Features
- Modify the app to fulfill your dreams!
- Tweak the frontend to add your own branding & prompts
- Modify the prompts in
app.pyto get better responses - Share your work with the community!