alan-turing-institute · rchan26 · Sep 24, 2024 · Sep 23, 2024 · Sep 24, 2024 · Sep 24, 2024
diff --git a/README.md b/README.md
@@ -33,7 +33,7 @@ A pre-print for this work is available on [arXiv](https://arxiv.org/abs/2408.118
 
 The benefit of  _asynchronous querying_ is that it allows for multiple requests to be sent to an API _without_ having to wait for the LLM's response, which is particularly useful to fully utilise the rate limits of an API. This is especially useful when an experiment file contains a large number of prompts and/or has several models to query. [_Asynchronous programming_](https://docs.python.org/3/library/asyncio.html) is simply a way for programs to avoid getting stuck on long tasks (like waiting for an LLM response from an API) and instead keep running other things at the same time (to send other queries).
 
-With `prompto`, you are able to define your experiments of LLMs in a jsonl file where each line contains the prompt and any parameters to be used for a query of a model from a specific API. The library will process the experiment file and query models and store results. You are also  able to query _multiple_ models from _different_ APIs in a single experiment file and `prompto` will take care of querying the models _asynchronously_ and in _parallel_.
+With `prompto`, you are able to define your experiments of LLMs in a jsonl or csv file where each line/row contains the prompt and any parameters to be used for a query of a model from a specific API. The library will process the experiment file and query models and store results. You are also  able to query _multiple_ models from _different_ APIs in a single experiment file and `prompto` will take care of querying the models _asynchronously_ and in _parallel_.
 
 The library is designed to be extensible and can be used to query different models.
 

diff --git a/docs/experiment_file.md b/docs/experiment_file.md
@@ -2,6 +2,8 @@
 
 An experiment file is a [JSON Lines (jsonl)](https://jsonlines.org/) file that contains the prompts for the experiments along with any other parameters or metadata that is required for the prompt. Each line in the jsonl file is a valid JSON value which defines a particular input to the LLM which we will obtain a response for. We often refer to a single line in the jsonl file as a "`prompt_dict`" (prompt dictionary).
 
+From `prompto` version 0.2.0 onwards, it's also possible to use `csv` files as input to the pipeline. See the [CSV input section](#csv-input) for more details.
+
 For all models/APIs, we require the following keys in the `prompt_dict`:
 
 * `prompt`: the prompt for the model
@@ -15,6 +17,9 @@ For all models/APIs, we require the following keys in the `prompt_dict`:
 
 In addition, there are other optional keys that can be included in the `prompt_dict`:
 
+* `id`: a unique identifier for the prompt
+    * This is a string that can be used to uniquely identify the prompt. This is useful when you want to track the responses to the prompts and match them back to the original prompts
+    * This is not strictly required, but is often useful to have
 * `parameters`: the parameter settings / generation config for the query (given as a dictionary)
     * This is a dictionary that contains the parameters for the query. The parameters are specific to the model and the API being used. For example, for the Gemini API (`"api": "gemini"`), some parameters to configure are {`temperature`, `max_output_tokens`, `top_p`, `top_k`} etc. which are used to control the generation of the response. For the OpenAI API (`"api": "openai"`), some of these parameters are named differently for instance the maximum output tokens is set using the `max_tokens` parameter and `top_k` is not available to set. For Ollama (`"api": "ollama"`), the parameters are different again, e.g. the maximum number of tokens to predict is set using `num_predict`
     * See the API documentation for the specific API for the list of parameters that can be set and their default values
@@ -23,3 +28,18 @@ In addition, there are other optional keys that can be included in the `prompt_d
     * Note that you can use parallel processing without using the "group" key, but using this key allows you to have full control in order group the prompts in a way that makes sense for your use case. See the [specifying rate limits documentation](rate_limits.md) for more details on parallel processing
 
 Lastly, there are other optional keys that are only available for certain APIs/models. For example, for the Gemini API, you can have a `multimedia` key which is a list of dictionaries defining the multimedia files (e.g. images/videos) to be used in the prompt to a multimodal LLM. For these, see the documentation for the specific API/model for more details.
+
+## CSV input
+
+For using CSV inputs, the `prompt_dict`s are defined as rows in the CSV file. The CSV file should have a header row with the keys corresponding to the keys above with the exception of the `parameters` key. The parameters (the keys in the dictionary) should have their own columns in the CSV file _prepended with a "parameters-" prefix_. For example, if you have a parameter `temperature` in the `parameters` dictionary, you should have a column named `parameters-temperature` in the CSV file. The values for the parameters should be in the corresponding columns.
+
+For example, the two jsonl and csv file inputs are equivalent:
+
+```json
+{"id": "id-0", "prompt": "What is the capital of France?", "api": "openai", "model_name": "gpt-3.5-turbo", "parameters": {"temperature": 0.5, "max_tokens": 100}}
+```
+
+```csv
+id,prompt,api,model_name,parameters-temperature,parameters-max_tokens
+id-0,What is the capital of France?,openai,gpt-3.5-turbo,0.5,100
+```
diff --git a/.../25-06-2024-19-14-47-completed-test.jsonl → .../24-09-2024-09-13-56-completed-test.jsonl b/.../25-06-2024-19-14-47-completed-test.jsonl → .../24-09-2024-09-13-56-completed-test.jsonl
diff --git a/examples/notebooks/data2/input/test.jsonl → ...test/24-09-2024-09-13-56-input-test.jsonl b/examples/notebooks/data2/input/test.jsonl → ...test/24-09-2024-09-13-56-input-test.jsonl
diff --git a/examples/notebooks/data2/output/test/24-09-2024-09-13-56-log-test.txt b/examples/notebooks/data2/output/test/24-09-2024-09-13-56-log-test.txt
@@ -0,0 +1,4 @@
+24-09-2024, 09:14: Error (i=1, id=9): NotImplementedError - API unknown-api not recognised or implemented
+24-09-2024, 09:14: Error (i=2, id=10): NotImplementedError - API unknown-api not recognised or implemented
+24-09-2024, 09:14: Error (i=3, id=11): NotImplementedError - API unknown-api not recognised or implemented
+24-09-2024, 09:14: Completed experiment: test.jsonl! Experiment processing time: 3.703 seconds, Average time per query: 1.234 seconds
diff --git a/examples/notebooks/data2/output/test/25-06-2024-19-14-47-input-test.jsonl b/examples/notebooks/data2/output/test/25-06-2024-19-14-47-input-test.jsonl
diff --git a/examples/notebooks/data2/output/test/25-06-2024-19-14-47-log-test.txt b/examples/notebooks/data2/output/test/25-06-2024-19-14-47-log-test.txt
diff --git a/...25-06-2024-19-15-29-completed-test2.jsonl → ...24-09-2024-09-14-36-completed-test2.jsonl b/...25-06-2024-19-15-29-completed-test2.jsonl → ...24-09-2024-09-14-36-completed-test2.jsonl
@@ -1,3 +1,3 @@
-{"id": 9, "prompt": ["Hello", "My name is Bob and I'm 6 years old", "How old am I next year?"], "api": "test", "model_name": "test", "parameters": {"candidate_count": 1, "max_output_tokens": 64, "temperature": 1, "top_k": 40}, "response": "This is a test response"}
-{"id": 10, "prompt": ["Can you give me a random number between 1-10?", "What is +5 of that number?", "What is half of that number?"], "api": "test", "model_name": "test", "parameters": {"candidate_count": 1, "max_output_tokens": 128, "temperature": 0.5, "top_k": 40}, "response": "This is a test response"}
-{"id": 11, "prompt": "How many theaters are there in London's South End?", "api": "test", "model_name": "test", "response": "This is a test response"}
+{"id": 9, "prompt": ["Hello", "My name is Bob and I'm 6 years old", "How old am I next year?"], "api": "test", "model_name": "test", "parameters": {"candidate_count": 1, "max_output_tokens": 64, "temperature": 1, "top_k": 40}, "timestamp_sent": "24-09-2024-09-14-39", "response": "This is a test response"}
+{"id": 10, "prompt": ["Can you give me a random number between 1-10?", "What is +5 of that number?", "What is half of that number?"], "api": "test", "model_name": "test", "parameters": {"candidate_count": 1, "max_output_tokens": 128, "temperature": 0.5, "top_k": 40}, "timestamp_sent": "24-09-2024-09-14-40", "response": "This is a test response"}
+{"id": 11, "prompt": "How many theaters are there in London's South End?", "api": "test", "model_name": "test", "timestamp_sent": "24-09-2024-09-14-41", "response": "ValueError - This is a test error which we should handle and return"}
diff --git a/examples/notebooks/data2/input/test2.jsonl → ...st2/24-09-2024-09-14-36-input-test2.jsonl b/examples/notebooks/data2/input/test2.jsonl → ...st2/24-09-2024-09-14-36-input-test2.jsonl
diff --git a/examples/notebooks/data2/output/test2/24-09-2024-09-14-36-log-test2.txt b/examples/notebooks/data2/output/test2/24-09-2024-09-14-36-log-test2.txt
@@ -0,0 +1,2 @@
+24-09-2024, 09:14: Error (i=3, id=11): ValueError - This is a test error which we should handle and return
+24-09-2024, 09:14: Completed experiment: test2.jsonl! Experiment processing time: 3.615 seconds, Average time per query: 1.205 seconds
diff --git a/examples/notebooks/data2/output/test2/25-06-2024-19-15-29-input-test2.jsonl b/examples/notebooks/data2/output/test2/25-06-2024-19-15-29-input-test2.jsonl
diff --git a/examples/notebooks/data2/output/test2/25-06-2024-19-15-29-log-test2.txt b/examples/notebooks/data2/output/test2/25-06-2024-19-15-29-log-test2.txt
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		24-09-2024, 09:14: Error (i=3, id=11): ValueError - This is a test error which we should handle and return
		24-09-2024, 09:14: Completed experiment: test2.jsonl! Experiment processing time: 3.615 seconds, Average time per query: 1.205 seconds