small fixes for eval docs

alan-turing-institute · rchan26 · Sep 11, 2024 · Sep 10, 2024 · Sep 10, 2024 · Sep 10, 2024
commit dc07252a3d0f60789c1e44c74d205309b8ccc6f8
diff --git a/docs/evaluation.md b/docs/evaluation.md
@@ -9,6 +9,8 @@ To perform an LLM-as-judge evaluation, we essentially treat this as just _anothe
 
 Therefore, given a _completed_ experiment file (i.e., a jsonl file where each line is a json object containing the prompt and response from a model), we can create another experiment file where the prompts are generated using some judge evaluation template and the completed response file. We must specify the model that we want to use as the judge. We call this a _judge_ experiment file and we can use `prompto` again to run this experiment and obtain the judge evaluation responses.
 
+Also see the [Running LLM-as-judge experiment notebook](https://alan-turing-institute.github.io/prompto/examples/evaluation/running_llm_as_judge_experiment/) for a more detailed walkthrough the library for creating and running judge evaluations.
+
 ### Judge folder
 
 To run an LLM-as-judge evaluation, you must first create a _judge folder_ consisting of:
@@ -128,6 +130,8 @@ def my_scorer(prompt_dict: dict) -> dict:
     return prompt_dict
 ```
 
+Also see the [Running experiments with custom evaluations](https://alan-turing-institute.github.io/prompto/examples/evaluation/running_experiments_with_custom_evaluations/) for a more detailed walkthrough the library for using custom scoring functions.
+
 ### Using a scorer in `prompto`
 
 In Python, to use a scorer, when processing an experiment, you can pass in a list of scoring functions to the `Experiment.process()` method. For instance, you can use the `match` and `includes` scorers as follows:

diff --git a/examples/evaluation/Running_experiments_with_custom_evaluations.ipynb b/examples/evaluation/Running_experiments_with_custom_evaluations.ipynb
@@ -27,12 +27,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Environment setup\n",
+    "\n",
     "In this experiment, we will use the Anthropic API, but feel free to edit the input file provided to use a different API and model.\n",
     "\n",
     "When using `prompto` to query models from the Anthropic API, lines in our experiment `.jsonl` files must have `\"api\": \"anthropic\"` in the prompt dict. \n",
     "\n",
-    "## Environment variables\n",
-    "\n",
     "For the [Anthropic API](https://alan-turing-institute.github.io/prompto/docs/anthropic/), there are two environment variables that could be set:\n",
     "- `ANTHROPIC_API_KEY`: the API key for the Anthropic API\n",
     "\n",