Episode-3

Episode 3: How Much Does an LLM Cost for My Company?

Nowadays, every company is looking to include Generative AI in their daily operations, but it is often unclear how much it will cost them. In this episode, we will break down these costs for you. We will assume that a company wants to implement a chatbot with 2 users that answers questions based on specific company data.

Introduction

The CEO of Company XYZ is requesting a chatbot that uses sales documents to answer questions such as: "Where does the customer who bought items today live?" or "How many purchase orders were made today for each item?" It seems obvious that a dashboard would be a significantly better option, but the CEO envisions that customers will interact with this chatbot in the future to track their own orders without the need for the current call center department. The data involved includes purchase orders in the form of PDFs. What are we going to do? We are going to explain parts of our architecture such as Amazon Lambda, Amazon RDS, Amazon BedRock, and Amazon Sagemaker, resulting in the total cost of an LLM application.

LLM Architecture Application

Amazon S3 for PDF Uploads: Our sales department manually uploads all purchase orders (POs) as PDF files to an Amazon S3 bucket.
Data Extraction and Preprocessing with LangChain: We utilize the LangChain framework to extract and preprocess data from the PDFs stored in S3.
Text Embedding with Hugging Face Model: We use the pre-trained 'all-MiniLM-L6-v2' Sentence Transformer model from Hugging Face to generate text embeddings.
Database Storage with Amazon RDS Postgres: Our database system is Amazon RDS Postgres, enhanced with the pgvector extension for storing and querying vector data.
Lambda Function for Vector Database Retrieval: A second Lambda function retrieves information from the vector database, leveraging vector embedding techniques to correlate the user’s question with the LLaMa-2-7B model.
Foundation Model Storage with Amazon Bedrock: We use Amazon Bedrock to store the LLaMa-2-7B foundation model, which has 7 billion parameters.
User Interaction via Application: The application interfaces with the user, allowing them to pose questions and receive responses generated through the system

Architecture Cost

Please note that the architecture I describe is not only suitable for Company XYZ but also incorporates two of the most commonly used AI/ML services provided by AWS (Amazon Bedrock and Amazon Sagemaker). While this architecture can be optimized for cost and scalability, I strongly recommend conducting your own research before applying it to your project.

Service	Description	Dimension
Amazon RDS	Amazon RDS include in May 2023 the use of that pgvector from Postgres. This database will allows us to stored and search embedding.	Instance: db.m3.medium, vCPU: 1, Memory: 3.75 GiB Utilization: 100% of the Month Storage Amount: 30GB Hour Rate: 0.095 USD Storage pricing (Monthly): 3.45 USD Monthly Cost for RDS Proxy (Monthly): 21.90 USD Amazon RDS PostgreSQL instances cost (Monthly): 69.35 USD Total Cost of Amazon RDS: 94.70 USD
Amazon Lambda	The first Amazon Lambda function handles data extraction and preprocessing with LangChain, retrieving PDF files stored in the Amazon S3 bucket. The second Amazon Lambda function is responsible for vector database retrieval, helping to find relationships with the received question. This process runs every time a question is asked.	Container size - 128 MB, 512 MB ephemeral storage 2 Lambda functions used for authorization Container size - 256 MB, 512 MB ephemeral storage, 5 requests per second with 20 seconds average compute time Total Cost of Amazon Lambda: 20.89 USD
Amazon SageMaker	This service includes a SageMaker Studio Notebook for text embedding, utilizing the sentence transformer model 'all-MiniLM-L6-v2'.	Instance: ml.g4dn.12xlarge, GPU: 4, Memory: 192 GiB, vCPU: 48 Total Cost of Amazon SageMaker: 508.56 USD
Amazon BedRock	When using Amazon Bedrock to integrate the LLaMa 2 13B model (the one currently available), one of the challenges is estimating the number of tokens we will provide on a daily basis.	Approximation Cost Total Cost of Amazon Bedrock: 21.87 USD
Amazon S3	The Amazon S3 bucket will receive POST requests to upload each PDF and GET requests to retrieve them. A Lambda function will then process these PDFs, breaking them down and uploading the data to our pgvector database.	Calculation based on S3 Standard, Object Lambda, Data Transfer Storage Data: 30GB S3 Standard cost (Monthly): 0.77 Data Transfer cost (Monthly): 2.70 Total Cost of Amazon Lambda: 3.48 USD

This proposed pricing and architecture are tailored to the needs of Company XYZ. From the analysis, it appears that Amazon SageMaker is more expensive for our use case. We might be able to reduce costs by optimizing our approach or considering a switch to Amazon Bedrock, especially since we already have the Lambda function in place (Next Steps). The total estimated monthly cost is $649.50 USD. There are additional elements that might be needed, such as an API Gateway to facilitate direct communication with the web, but the current proposal focuses solely on model development, data extraction, and storage.

Reference

Reference that help me create the architecture and logic

Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain Building AI-powered search in PostgreSQL using Amazon SageMaker and pgvector RAG with Amazon Bedrock and PGVector on Amazon RDS Text Embeddings Pipeline for Retrieval Augmented Generation (RAG) Easy Serverless RAG with Knowledge Base for Amazon Bedrock LangChain Bedrock Documentation Generate Embeddings using Amazon Bedrock and LangChain

Reference that help me have a cost estimation

Generative AI Application Builder on AWS RAG AI-LLM Databases on AWS: do not pay for oversized, go Serverless instead Understanding AWS Bedrock: Basics, Pricing, and Cost Optimization What is AWS Bedrock? AWS Bedrock Pricing Simplified Amazon Bedrock pricing Amazon S3 pricing

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Episode-3

Introduction

LLM Architecture Application

Architecture Cost

Reference

About

Releases

Packages

marinandres/Episode-3

Folders and files

Latest commit

History

Repository files navigation

Episode-3

Introduction

LLM Architecture Application

Architecture Cost

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages