-
Hello, I need help with this prompt generated by Kor, which is struggling to extract data from a small portion of text. Is there a way to improve it please? I added just one example, but it doesn't seem sufficient ! Prompt: Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below. ```TypeScript information: { // Extracting chemichal informations for an fds file. Please filter and remove duplicate items. substances: Array<{ // The substances (CAS number, EC number, name). name: string // ec: string // cas: string // }> } ``` Please output the extracted information in JSON format. Do not output anything except for the extracted information. Do not add any clarifying information. Do not add any fields that are not in the schema. If the text contains attributes that do not appear in the schema, please ignore them. All output must be in JSON format and follow the schema specified above. Wrap the JSON in tags. Input: Identification (CE) 1272/2008 67/548/CEE Nota % INDEX: 601-004-00-0 GHS02, GHS04 F+ C 2.5 <= x % < 10 CAS: 106-97-8 Dgr F+;R12 [1] EC: 203-448-7 Flam. Gas 1, H220 REACH: 01-2119474691-32-xxx x BUTANE (CONTENANT MOINS DE 0.1% DE BUTADIÈNE) Output: {"information": {"substances": [{"substances": {"name": "BUTANE (CONTENANT MOINS DE 0.1% DE BUTADIÈNE)", "cas": "106-97-8", "ec": "203-448-7"}}]}} Input: ["SECTION 3 : COMPOSITION/INFORMATIONS SUR LES COMPOSANTS\n3.2. Mélanges\nComposition :\nIdentification (CE) 1272/2008 67/548/CEE Nota %\nINDEX: 601-004-00-0 GHS02, GHS04 F+ C 2.5 <= x % < 10\nCAS: 106-97-8 Dgr F+;R12 [1]\nEC: 203-448-7 Flam. Gas 1, H220\nREACH:\n01-2119474691-32-xxx\nx\nBUTANE (CONTENANT\nMOINS DE 0.1% DE\nBUTADIÈNE)\nMade under licence of European Label System, Software of INFODYNE ( )\nQuick-FDS [17972-36332-18464-010285] - 2017-03-15 - 10:05:32IMPEC AEROSOL - 3.5\nINDEX: 603-117-00-0 GHS02, GHS07 Xi,F [1] 1 <= x % < 2.5\nCAS: 67-63-0 Dgr Xi;R36\nEC: 200-661-7 Flam. Liq. 2, H225 F;R11\nREACH: Eye Irrit. 2, H319 R67\n01-2119457558-25 STOT SE 3, H336\nPROPANE-2-OL\nINDEX: 007-010-00-4 GHS03, GHS06, GHS09 T,N,O 0 <= x % < 1\nCAS: 7632-00-0 Dgr T;R25\nEC: 231-555-9 Ox. Sol. 3, H272 N;R50\nREACH: Acute Tox. 3, H301 O;R8\n01-2119471836-27 Aquatic Acute 1, H400\nM Acute = 1\nNITRITE DE SODIUM\nINDEX: 601-029-00-7 GHS02, GHS07, GHS09 Xi,N 0 <= x % < 1\nCAS: 5989-27-5 Wng Xi;R38-R43\nEC: 227-813-5 Flam. Liq. 3, H226 N;R50/53\nREACH: Skin Irrit. 2, H315 R10\n01-2119529223-47-xxx Skin Sens. 1, H317\nx Aquatic Acute 1, H400\nM Acute = 1\n(R)-P-MENTHA-1,8-DIE Aquatic Chronic 1,\nNE H410\nM Chronic = 1\nInformations sur les composants :\n[1] Substance pour laquelle il existe des valeurs limites d'exposition sur le lieu de travail.\n"] Output: Result:
Expected Result:
The EC and CAS codes of the first substance are correct, the rest are not. Any ideas on how to fix it would be well appreciated.? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Please follow: https://eyurtsev.github.io/kor/guidelines.html The general expectaution is that there will be errors. Use a better LLM or fine tuned LLM or more examples, specify |
Beta Was this translation helpful? Give feedback.
Please follow: https://eyurtsev.github.io/kor/guidelines.html
The general expectaution is that there will be errors. Use a better LLM or fine tuned LLM or more examples, specify
input_formatter
etc -- all of these things will help reduce the number of errors etc.