Name		Name	Last commit message	Last commit date
parent directory ..
titanic_data		titanic_data
README.md		README.md
custom_agent.py		custom_agent.py
data.csv		data.csv
pandas_agent_gpt_35.py		pandas_agent_gpt_35.py
pandas_agent_gpt_4.py		pandas_agent_gpt_4.py
pandas_agent_instruct.py		pandas_agent_instruct.py
pandas_ai.py		pandas_ai.py
requirements.txt		requirements.txt
result_35.png		result_35.png
results_4.png		results_4.png
results_custom.png		results_custom.png
results_pandasai.png		results_pandasai.png
streamlit_app.py		streamlit_app.py
titanic.csv		titanic.csv
upload_data.py		upload_data.py

README.md

CSV Question Answering

This module shows how we benchmark question answering over CSV data. There are several components:

Setup

To setup, you should install all required packages:

pip install -r requirements.txt

You then need to set environment variables. This heavily uses LangSmith, so you need to set those environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_ENDPOINT=https://api.langchain.plus
export LANGCHAIN_API_KEY=...

This also uses OpenAI, so you need to set that environment variable:

export OPENAI_API_KEY=...

How we collected data

To do this, we set up a simple streamlit app that was logging questions, answers, and feedback to LangSmith. We then annotated examples in LangSmith and added them to a dataset we were creating. For more details on how to do this generally, see this cookbook

When doing this, you probably want to specific a project for all runs to be logged to:

export LANGCHAIN_PROJECT="Titanic CSV"

The streamlit_app.py file contains the exact code used to run the application. You can run this with streamlit run streamlit_app.py

What the data is

See data.csv for the data points we labeled.

How we evaluate

In order to evaluate, we first upload our data to LangSmith, with dataset name Titanic CSV. This is done in upload_data.py. You can run this with:

python upload_data.py

This allows us to track different evaluation runs against this dataset. We then use a standard qa evaluator to evaluate whether the generated answers are correct are not.

We include scripts for evaluating a few different methods:

pip install beautifulsoup4 pandasai

Then can run with python pandas_ai.py

Results (note token tracking is off because not using LangChain):

Custom Agent

A custom agent equipped with a custom prompt and some custom tools (Python REPL and vectorstore)

Run with python custom_agent.py

Results:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv-qa

csv-qa

README.md

CSV Question Answering

Setup

How we collected data

What the data is

How we evaluate

Pandas Agent, GPT-3

Pandas Agent, GPT-4

PandasAI

Custom Agent

Files

csv-qa

Directory actions

More options

Directory actions

More options

Latest commit

History

csv-qa

Folders and files

parent directory

README.md

CSV Question Answering

Setup

How we collected data

What the data is

How we evaluate

Pandas Agent, GPT-3

Pandas Agent, GPT-4

PandasAI

Custom Agent