You can choose one or both of the RAG and Text2SQL sample agent to try out evaluations
-
Set up a Langfuse project using either the cloud version www.langfuse.com or the AWS self-hosted option https://github.com/aws-samples/deploy-langfuse-on-ecs-with-fargate/tree/main/langfuse-v3
-
If you are using the self-hosted option and want to see model costs then you must create a model definition in Langfuse for "us.anthropic.claude-3-5-sonnet-20241022-v2:0", instructions can be found here https://langfuse.com/docs/model-usage-and-cost#custom-model-definitions
-
Create a SageMaker notebook instance in your AWS account
-
Open a terminal and navigate to the SageMaker/ folder within the instance
cd SageMaker/
- Clone this repository
git clone https://github.com/aws-samples/amazon-bedrock-agent-evaluation-framework
- Navigate to the repository and install the necessary requirements
cd amazon-bedrock-agent-evaluation-framework/
pip3 install -r requirements.txt
-
Go to the blog_sample_agents/ folder and navigate to 0-Notebook-environment/setup-environment.ipynb to set up your Jupyter environment
-
Choose the conda_python3 kernel for the SageMaker notebook
-
Follow the respective agent notebooks to deploy each agent and evaluate it with a benchmark dataset!
- Run through the RAG/Text2SQL notebook to create the agents in your AWS account (WARNING: DUE TO NATURE OF SQL QUERIES OPTIMIZED FOR DIFFERENT DATABASE ENGINES, SOME MORE COMPLEX TEXT2SQL SAMPLE QUESTIONS MAY EITHER NOT WORK OR HAVE A LOW EVALUATION SCORE)
- Check the langfuse console for traces and evaluation metrics (Refer to the 'Navigating the Langfuse Console' section in the root readme)