Deploying Sample Agents for Evaluation

You can choose one or both of the RAG and Text2SQL sample agent to try out evaluations

Deployment Steps

Set up a Langfuse project using either the cloud version www.langfuse.com or the AWS self-hosted option https://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main/langfuse-v3
If you are using the self-hosted option and want to see model costs then you must create a model definition in Langfuse for "us.anthropic.claude-3-5-sonnet-20241022-v2:0", instructions can be found here https://langfuse.com/docs/model-usage-and-cost#custom-model-definitions
Create a SageMaker notebook instance in your AWS account
Open a terminal and navigate to the SageMaker/ folder within the instance

cd SageMaker/

git clone https://github.com/aws-samples/amazon-bedrock-agent-evaluation-framework

cd amazon-bedrock-agent-evaluation-framework/
pip3 install -r requirements.txt

Go to the blog_sample_agents/ folder and navigate to 0-Notebook-environment/setup-environment.ipynb to set up your Jupyter environment
Choose the conda_python3 kernel for the SageMaker notebook
Follow the respective agent notebooks to deploy each agent and evaluate it with a benchmark dataset!

Run through the RAG/Text2SQL notebook to create the agents in your AWS account (WARNING: DUE TO NATURE OF SQL QUERIES OPTIMIZED FOR DIFFERENT DATABASE ENGINES, SOME MORE COMPLEX TEXT2SQL SAMPLE QUESTIONS MAY EITHER NOT WORK OR HAVE A LOW EVALUATION SCORE)
Check the langfuse console for traces and evaluation metrics (Refer to the 'Navigating the Langfuse Console' section in the root readme)