Please download this PDF file with detailed instructions to run your own LLMs locally:
Teach Yourself to Run LLMs (Large Language Models) in your local computer. Free.
This repository contains details, code, and sample data to help you learn how to build clinical trial-specific Large Language Models ("LLM"). Then you will be able to run the LLM in your local computer.
This work is done in collaboration with Jan Philip Göpfert
This presentation provides context to the materials in this repository.
"Clinical trial-specific LLM to auto-generate Protocols and SAPs." ("SAP" stands for Staatistical Analysis Plan). Here's the full presentation file for download.
Summry: "TrialTwin is building a software platform with Natural Language Generation (“NLG”) capabilities using Natural Language Processing (“NLP”) and other software-driven linguistic processors. Our platform will programmatically extract (and encode) both meaning and context from massive amounts of Open Data into a Domain-specific Large Language Model (“LLM”). Our LLM will then be able to programmatically generate highly-realistic and domain- specific new content. Build your own trial-specific LLM, here are the pieces."
[Tuesday 04 June 2024] The Python files will be posted this week.
Please download the PDF file at the top of this file for detailed instructions.
You can download sample Open Data to use with the LLMs. The Open Data comes from 02 US government sources:
- ClinicalTrials.gov
- Food and Drug Administration
The ClinicalTrials.gov website provides information about 400,000 clinical trials worldwide.
These files include sample PDFs downloaded from ClinicalTrials.gov:
- 10 actual PDFs
- Pre-generated indices for those 10 PDFs for the 10 PDFs.
Each SQLite3 file contains details of clinical trials sponsored by each company. The files also contain the full text of individual Protocols, Statistical Analysis Plans ("SAP") and ICF ("Individual Consent Forms").
Download the free, multi-OS DB Browser for SQLite to open and query SQLite3 database files.
Sponsor | Download SQLite3 file |
---|---|
Abbott | 04 MB |
Abbvie | 08.2 MB |
AstraZeneca | 17.4 MB |
Bayer | 06.1 MB |
Bristol Myers Squibb | 16.5 MB |
Johnson & Johnson | 04.2 MB |
Pfizer | 18.4 MB |
Roche | 18.3 MB |
Sanofi | 09.8 MB |
Table 1: SQLite3 files
The Drugs@FDA website "..includes most of the drug products approved since 1939. The majority of patient information, labels, approval letters, reviews, and other information are available for drug products approved since 1998."
Here you can download pre-generated index files for 100 records and 500 records of the Drugs@FDA dataset.
José C. Lacal [email protected] for any questions or suggestions for improvement.
Connect with me and follow me in LinkedIn
We help companies in Life Sciences to accelerate their data management processes.
Please take a look at our capabilities.