diff --git a/README.md b/README.md index 1a4e4ba0..a4092555 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ `prompto` is a Python library which facilitates processing of experiments of Large Language Models (LLMs) stored as jsonl files. It automates _asynchronous querying of LLM API endpoints_ and logs progress. -`prompto` derives from the Italian word "_pronto_" which means "_ready_". It could also mean "_I prompt_" in Italian (if "_promptare_" was a verb meaning "_to prompt_"). +`prompto` derives from the Italian word "_pronto_" which means "_ready_" (or "hello" when answering the phone). It could also mean "_I prompt_" in Italian (if "_promptare_" was a verb meaning "_to prompt_"). A pre-print for this work is available on [arXiv](https://arxiv.org/abs/2408.11847). If you use this library, please see the [citation](#citation) below. For the experiments in the pre-print, see the [system demonstration examples](./examples/system-demo/README.md). @@ -41,6 +41,10 @@ For more details on the library, see the [documentation](./docs/README.md) where See below for [installation instructions](#installation) and [quickstarts for getting started](#getting-started) with `prompto`. +## `prompto` for Evaluation + +`prompto` can also be used as an evaluation tool for LLMs. For details on how to use `prompto` for evaluation, see the [evaluation docs](./docs/evaluation.md). + ## Available APIs and Models The library supports querying several APIs and models. The following APIs are currently supported are: diff --git a/docs/commands.md b/docs/commands.md index 98034b18..a1f30ff5 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -31,14 +31,14 @@ Note that if the experiment file is already in the input folder, we will not mak ### Automatic evaluation using an LLM-as-judge -It is possible to automatically run a LLM-as-judge evaluation of the responses by using the `--judge-location` and `--judge` arguments of the CLI. See the [Create judge file](#create-judge-file) section for more details on these arguments. +It is possible to automatically run a LLM-as-judge evaluation of the responses by using the `--judge-folder` and `--judge` arguments of the CLI. See the [Create judge file](#create-judge-file) section for more details on these arguments. For instance, to run an experiment file with automatic evaluation using a judge, you can use the following command: ``` prompto_run_experiment \ --file path/to/experiment.jsonl \ --data-folder data \ - --judge-location judge \ + --judge-folder judge \ --judge gemini-1.0-pro ``` @@ -75,13 +75,13 @@ prompto_check_experiment \ ## Create judge file -Once an experiment has been ran and responses to prompts have been obtained, it is possible to use another LLM as a "judge" to score the responses. This is useful for evaluating the quality of the responses obtained from the model. To create a judge file, you can use the `prompto_create_judge` command passing in the file containing the completed experiment and to a folder (i.e. judge location) containing the judge template and settings to use. To see all arguments of this command, run `prompto_create_judge --help`. +Once an experiment has been ran and responses to prompts have been obtained, it is possible to use another LLM as a "judge" to score the responses. This is useful for evaluating the quality of the responses obtained from the model. To create a judge file, you can use the `prompto_create_judge` command passing in the file containing the completed experiment and to a folder (i.e. judge folder) containing the judge template and settings to use. To see all arguments of this command, run `prompto_create_judge --help`. -To create a judge file for a particular experiment file with a judge-location as `./judge` and using judge `gemini-1.0-pro` you can use the following command: +To create a judge file for a particular experiment file with a judge-folder as `./judge` and using judge `gemini-1.0-pro` you can use the following command: ``` prompto_create_judge \ --experiment-file path/to/experiment.jsonl \ - --judge-location judge \ + --judge-folder judge \ --judge gemini-1.0-pro ``` @@ -92,7 +92,7 @@ In `judge`, you must have two files: See for example [this judge example](./../examples/evaluation/judge/) which contains example template and settings files. -The judge specified with the `--judge` flag should be a key in the `settings.json` file in the judge location. You can create different judge files using different LLMs as judge by specifying a different judge identifier from the keys in the `settings.json` file. +The judge specified with the `--judge` flag should be a key in the `settings.json` file in the judge folder. You can create different judge files using different LLMs as judge by specifying a different judge identifier from the keys in the `settings.json` file. ## Obtain missing results jsonl file