start updating docs

alan-turing-institute · Sep 10, 2024 · 54cecbd · 54cecbd
1 parent 0b80209
commit 54cecbd
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@
 
 `prompto` is a Python library which facilitates processing of experiments of Large Language Models (LLMs) stored as jsonl files. It automates _asynchronous querying of LLM API endpoints_ and logs progress.
 
-`prompto` derives from the Italian word "_pronto_" which means "_ready_". It could also mean "_I prompt_" in Italian (if "_promptare_" was a verb meaning "_to prompt_").
+`prompto` derives from the Italian word "_pronto_" which means "_ready_" (or "hello" when answering the phone). It could also mean "_I prompt_" in Italian (if "_promptare_" was a verb meaning "_to prompt_").
 
 A pre-print for this work is available on [arXiv](https://arxiv.org/abs/2408.11847). If you use this library, please see the [citation](#citation) below. For the experiments in the pre-print, see the [system demonstration examples](./examples/system-demo/README.md).
 
@@ -41,6 +41,10 @@ For more details on the library, see the [documentation](./docs/README.md) where
 
 See below for [installation instructions](#installation) and [quickstarts for getting started](#getting-started) with `prompto`.
 
+## `prompto` for Evaluation
+
+`prompto` can also be used as an evaluation tool for LLMs. For details on how to use `prompto` for evaluation, see the [evaluation docs](./docs/evaluation.md).
+
 ## Available APIs and Models
 
 The library supports querying several APIs and models. The following APIs are currently supported are:

diff --git a/docs/commands.md b/docs/commands.md
@@ -31,14 +31,14 @@ Note that if the experiment file is already in the input folder, we will not mak
 
 ### Automatic evaluation using an LLM-as-judge
 
-It is possible to automatically run a LLM-as-judge evaluation of the responses by using the `--judge-location` and `--judge` arguments of the CLI. See the [Create judge file](#create-judge-file) section for more details on these arguments.
+It is possible to automatically run a LLM-as-judge evaluation of the responses by using the `--judge-folder` and `--judge` arguments of the CLI. See the [Create judge file](#create-judge-file) section for more details on these arguments.
 
 For instance, to run an experiment file with automatic evaluation using a judge, you can use the following command:
 ```
 prompto_run_experiment \
     --file path/to/experiment.jsonl \
     --data-folder data \
-    --judge-location judge \
+    --judge-folder judge \
     --judge gemini-1.0-pro
 ```
 
@@ -75,13 +75,13 @@ prompto_check_experiment \
 
 ## Create judge file
 
-Once an experiment has been ran and responses to prompts have been obtained, it is possible to use another LLM as a "judge" to score the responses. This is useful for evaluating the quality of the responses obtained from the model. To create a judge file, you can use the `prompto_create_judge` command passing in the file containing the completed experiment and to a folder (i.e. judge location) containing the judge template and settings to use. To see all arguments of this command, run `prompto_create_judge --help`.
+Once an experiment has been ran and responses to prompts have been obtained, it is possible to use another LLM as a "judge" to score the responses. This is useful for evaluating the quality of the responses obtained from the model. To create a judge file, you can use the `prompto_create_judge` command passing in the file containing the completed experiment and to a folder (i.e. judge folder) containing the judge template and settings to use. To see all arguments of this command, run `prompto_create_judge --help`.
 
-To create a judge file for a particular experiment file with a judge-location as `./judge` and using judge `gemini-1.0-pro` you can use the following command:
+To create a judge file for a particular experiment file with a judge-folder as `./judge` and using judge `gemini-1.0-pro` you can use the following command:
 ```
 prompto_create_judge \
     --experiment-file path/to/experiment.jsonl \
-    --judge-location judge \
+    --judge-folder judge \
     --judge gemini-1.0-pro
 ```
 
@@ -92,7 +92,7 @@ In `judge`, you must have two files:
 
 See for example [this judge example](./../examples/evaluation/judge/) which contains example template and settings files.
 
-The judge specified with the `--judge` flag should be a key in the `settings.json` file in the judge location. You can create different judge files using different LLMs as judge by specifying a different judge identifier from the keys in the `settings.json` file.
+The judge specified with the `--judge` flag should be a key in the `settings.json` file in the judge folder. You can create different judge files using different LLMs as judge by specifying a different judge identifier from the keys in the `settings.json` file.
 
 ## Obtain missing results jsonl file