u

HDembinski · Jan 28, 2025 · 0be5263 · 0be5263
1 parent 954d85c
commit 0be5263
Showing 1 changed file with 98 additions and 26 deletions.
diff --git a/posts/parsing_webpages_with_llm.ipynb b/posts/parsing_webpages_with_llm.ipynb
@@ -37,7 +37,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -198,7 +198,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -212,52 +212,54 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "0.0: Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
-      "0.1: Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
-      "0.2: Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
-      "1.0: Johannes Albrecht, Hans Dembinski, Anatoli Fedynitch, Karl-Heinz Kampert, Astropart.Space Sci. 367 (2022) 3, 27, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
-      "1.1: Johanes Albrecht, Lorenzo Cazon, Hans Dembinski, Anatoli Fedynitch, Karl-Heinz Kampert, Astrophys.Space Sci., \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
-      "1.2: Johannes Albrecht et al., Astrophys.Space Sci. 367 (2022) 3, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
+      "0.0: Roel Aaij, JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)Note: The DOI URL is automatically generated from the DOI string and may not match the provided example exactly.\n",
+      "0.1: Roel Aaij, JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166).\n",
+      "0.2: Roel Aaij, Nikhef, Amsterdam et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166).\n",
+      "1.0: Albrecht, Cazon, Dembinski, Fedynitch, Kampert, Astrophys.Space Sci. 367 (2022) 3, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5).\n",
+      "1.1: Albrecht et al., Astrophys. Space Sci. 367 (2022) 3, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5).\n",
+      "1.2: Johannes Albrecht, Lorenzo Cazon, Hans Dembinski, Anatoli Fedynitch, Karl-Heinz Kampert, Astrophys.Space Sci. 367 (2022) 3, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5).\n",
       "2.0: Hans Peter Dembinski, Ahmed Abdelmotteleb, Eur.Phys.J.C 82 (2022) 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z).\n",
-      "2.1: Hans Peter Dembinski, Ahmed Abdelmotteleb, Eur.Phys.J.C 82 (2022) 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z).\n",
-      "2.2: Hans Peter Dembinski, Ahmed Abdelmotteleb, Eur.Phys.J.C 82 (2022) 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z).\n",
-      "3.0: L. Cazon et al., PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
-      "3.1: L. Cazon et al., PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
-      "3.2: L. Cazon et al., PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
-      "4.0: Hans Dembinski, Michael Schmelling, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [arXiv:2110.00294](https://arxiv.org/abs/2110.00294)\n",
-      "4.1: Hans Dembinski, Michael Schmelling, arXiv:2110.00294, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
-      "4.2: Hans Dembinski, Michael Schmelling, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", arXiv:2110.00294\n"
+      "2.1: Hans Peter Dembinski, Ahmed Abdelmotteleb, Eur. Phys. J. C 82 (2022) 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z).\n",
+      "2.2: Hans Peter Dembinski, Ahmed Abdelmotteleb, Eur. Phys. J. C 82 (2022) 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z).\n",
+      "3.0: L. Cazon, H.P. Dembinski, G. Parente, F. Riehn, A.A. Watson, PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
+      "3.1: 1. L. Cazon, H.P. Dembinski, G. Parente, F. Riehn, A.A. Watson, PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
+      "3.2: L. Cazon, H.P. Dembinski, G. Parente, F. Riehn, A.A. Watson, PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431).\n",
+      "4.0: Hans Dembinski, Michael Schmelling, arXiv:2110.00294 [stat.AP], \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166).\n",
+      "4.1: Hans Dembinski, Michael Schmelling, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", arXiv:2110.00294 [stat.AP], [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166).\n",
+      "4.2: Hans Dembinski, Michael Schmelling, arXiv:2110.00294, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166).\n"
      ]
     }
    ],
    "source": [
     "prompt_template = \"\"\"\n",
-    "Extract the authors, title, journal info, and DOI from the text in <input> tags.\n",
+    "Extract a reference with authors, title, journal info, and DOI from the text in <input> tags.\n",
     "\n",
     "<input>\n",
     "{text}\n",
     "</input>\n",
     "\n",
-    "Return the result in Markdown format in this format:\n",
+    "Return the reference in this structure and nothing else:\n",
     "\n",
     "First and last name of first author, First and last name of second author, ..., journal reference, \"The title\", [DOI](DOI URL)\n",
     "\n",
+    "Examples of valid references:\n",
+    "\n",
+    "Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
+    "Flavia Gesualdi et al., PoS ICRC2021 (2021) 473, \"On the muon scale of air showers and its application to the AGASA data\", [10.22323/1.395.0473](https://doi.org/10.22323/1.395.0473)\n",
+    "\n",
     "Requirements:\n",
-    "- If there are more than four authors, use `First author et al.` instead of listing all authors.\n",
-    "- The journal reference must not contain *italic* or **bold** emphasis.\n",
-    "- The list of authors must be author names only separated by commas.\n",
-    "- Convert LaTeX formulas into equivalent plain text.\n",
     "\n",
-    "Examples that pass the check:\n",
-    "- Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
-    "- Flavia Gesualdi et al., PoS ICRC2021 (2021) 473, \"On the muon scale of air showers and its application to the AGASA data\", [10.22323/1.395.0473](https://doi.org/10.22323/1.395.0473)\n",
+    "- If there are more than four authors, use `<first author> et al.` instead of listing all authors. Replace <first author> with the actual name of the author.\n",
+    "- The list of authors must be author names only separated by commas.\n",
+    "- Remove any *italic* or **bold** emphasis from the reference.\n",
+    "- Convert any LaTeX code in the title with an equivalent plain text description.\n",
     "\n",
     "The extracted reference:\n",
     "\"\"\"\n",
@@ -296,6 +298,76 @@
     "One technique to improve the output of a flawed model is to let the model critique its previous output and suggest improvements. I tried this, but this small model is unable to critique itself, it always accepts its own answer even if it does not adhere to the requested format."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Update: Using distilled DeepSeek R1\n",
+    "\n",
+    "[DeepSeek R1](https://arxiv.org/abs/2501.12948) is a new Open Source reasoning model which uses test-time compute to improve its reasoning, like OpenAI's o1 model. Basically, it has the chain-of-thought prompting technique hard-wired and will generate a thinking process before answering. In several benchmarks it reaches the same performance as o1.\n",
+    "\n",
+    "While the full DeepSeek R1 model with over 600b parameters is too large to be run locally, the authors provide distilled small models. We use one of these here based on Llama with 8b parameters. I also tried another version based on Qwen2 with 14b parameters, but it is consistently crashing the Ollama server after a while.\n",
+    "\n",
+    "To use it for our task, we need to skip over the thinking process and keep only the final output. We expect the model to be smarter in following our instructions, especially the rule about shortening long lists of authors."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.0: Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
+      "0.1: Roel Aaij et al., *JHEP* 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at s=13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
+      "0.2: Roel Aaij et al., JHEP 01 (2022) 166, \"Measurement of prompt charged-particle production in pp collisions at √s = 13 TeV\", [10.1007/JHEP01(2022)166](https://doi.org/10.1007/JHEP01(2022)166)\n",
+      "1.0: Johannes Albrecht et al., Astrophys. Space Sci. 367 (2022) 3, 27, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
+      "1.1: Johannes Albrecht et al., Astrophys. Space Sci. 367 (2022) 3, 27, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
+      "1.2: Johannes Albrecht et al., *Astrophys.Space Sci.* 367 (2022) 3, 27, \"The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider\", [10.1007/s10509-022-04054-5](https://doi.org/10.1007/s10509-022-04054-5)\n",
+      "2.0: Hans Peter Dembinski and Ahmed Abdelmotteleb, *Eur.Phys.J.C* 82 (2022) 11, 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z)\n",
+      "2.1: Hans Peter Dembinski, Ahmed Abdelmotteleb, *Eur. Phys. J. C* 82 (2022) 11, 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z)\n",
+      "2.2: Hans Peter Dembinski and Ahmed Abdelmotteleb, *Eur. Phys. J. C* 82 (2022) 11, 1043, \"A new maximum-likelihood method for template fits\", [10.1140/epjc/s10052-022-11019-z](https://doi.org/10.1140/epjc/s10052-022-11019-z)\n",
+      "3.0: L. Cazon et al., PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431)\n",
+      "3.1: L. Cazon et al., \"The muon measurements of Haverah Park and their connection to the muon puzzle,\" [PoS ICRC2023 (2023) 431](https://doi.org/10.22323/1.444.0431), [10.22323/1.444.0431](//doi.org/10.22323/1.444.0431)\n",
+      "3.2: L. Cazon et al., PoS ICRC2023 (2023) 431, \"The muon measurements of Haverah Park and their connection to the muon puzzle\", [10.22323/1.444.0431](https://doi.org/10.22323/1.444.0431)\n",
+      "4.0: Dembinski H. and Schmelling M., JHEP 11 (2021) 123, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [10.1007/JHEP11(2021)123](https://doi.org/10.1007/JHEP11(2021)123)\n",
+      "4.1: Hans Dembinski and Michael Schmelling, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [arXiv:2110.00294](https://arxiv.org/abs/2110.00294)\n",
+      "4.2: Hans Dembinski and Michael Schmelling, \"Bias, variance, and confidence intervals for efficiency estimators in particle physics experiments\", [arXiv:2110.00294](https://doi.org/arXiv:2110.00294)\n"
+     ]
+    }
+   ],
+   "source": [
+    "KEYWORD = \"</think>\"\n",
+    "\n",
+    "for idoc, doc in enumerate(documents):\n",
+    "    # strip the bibliography block\n",
+    "    d = doc[:doc.index(\"###\")]\n",
+    "    prompt = prompt_template.format(text=d)\n",
+    "    for trial in range(3):\n",
+    "        # a low temperate is recommended by the authors\n",
+    "        response = ollama.generate(model='deepseek-r1:8b', prompt=prompt, options={\"temperature\": 0.3, \"seed\": trial})\n",
+    "        # skip the thinking part\n",
+    "        raw = response.response\n",
+    "        text = raw[raw.index(KEYWORD) + len(KEYWORD):].strip()    \n",
+    "        print(f\"{idoc}.{trial}: {text}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The model is indeed better at counting the authors and appropriately replacing long author lists with the \"et al.\" form. It even managed once to follow the instruction to convert LaTeX code into a proper text description by replacing `\\sqrt{s}` with `√s`, which is quite impressive. Overall, the output is more consistent than with `llama3-chatqa`, but the model still makes a few minor and major mistakes.\n",
+    "\n",
+    "- It occasionally produces an invalid URL, omitting the `https:` prefix. `llama3-chatqa` never makes that mistake.\n",
+    "- It does not always follow the instruction to remove all emphasis markup from the reference. That can be rectified by post-processing in this case.\n",
+    "- In one case, it swapped the order of title and journal.\n",
+    "- Similar to `llama3-chatqa`, it usually hallucinates an invalid journal and DOI for the last paper, because it was only released on arXiv, but it got it occasionally right.\n",
+    "\n",
+    "In conclusion, the distilled `deepseek-r1` model performs slightly better at this task, although it is using the same architecture as `llama3-chatqa`, at the cost of using 6x more compute. It would be interesting to see whether version that is less quantized performs better, in which case the errors could be attributed to \"noise\" in the reasoning process, or whether the main issue is the limited attention capabilities of 8b models, which have fewer attention blocks compared to larger models."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,