Skip to content

Building, running, and testing LLMs in your local computer.

License

Notifications You must be signed in to change notification settings

JLacal/local-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running Clinical Trial-specific LLMs Locally

Please download this PDF file with detailed instructions to run your own LLMs locally:

Teach Yourself to Run LLMs (Large Language Models) in your local computer. Free.

This repository contains details, code, and sample data to help you learn how to build clinical trial-specific Large Language Models ("LLM"). Then you will be able to run the LLM in your local computer.

This work is done in collaboration with Jan Philip Göpfert

PHUSE SDE presentation, May 30 2024.

This presentation provides context to the materials in this repository.

"Clinical trial-specific LLM to auto-generate Protocols and SAPs." ("SAP" stands for Staatistical Analysis Plan). Here's the full presentation file for download.

Summry: "TrialTwin is building a software platform with Natural Language Generation (“NLG”) capabilities using Natural Language Processing (“NLP”) and other software-driven linguistic processors. Our platform will programmatically extract (and encode) both meaning and context from massive amounts of Open Data into a Domain-specific Large Language Model (“LLM”). Our LLM will then be able to programmatically generate highly-realistic and domain- specific new content. Build your own trial-specific LLM, here are the pieces."

Python Code

[Tuesday 04 June 2024] The Python files will be posted this week.

Please download the PDF file at the top of this file for detailed instructions.

Tutorial_01.py

Tutorial_02.py

Sample Data

You can download sample Open Data to use with the LLMs. The Open Data comes from 02 US government sources:

  • ClinicalTrials.gov
  • Food and Drug Administration

ClinicalTrials.gov

The ClinicalTrials.gov website provides information about 400,000 clinical trials worldwide.

These files include sample PDFs downloaded from ClinicalTrials.gov:

Each SQLite3 file contains details of clinical trials sponsored by each company. The files also contain the full text of individual Protocols, Statistical Analysis Plans ("SAP") and ICF ("Individual Consent Forms").

Download the free, multi-OS DB Browser for SQLite to open and query SQLite3 database files.

SQLite3 database file sample

Sponsor Download SQLite3 file
Abbott 04 MB
Abbvie 08.2 MB
AstraZeneca 17.4 MB
Bayer 06.1 MB
Bristol Myers Squibb 16.5 MB
Johnson & Johnson 04.2 MB
Pfizer 18.4 MB
Roche 18.3 MB
Sanofi 09.8 MB

Table 1: SQLite3 files

Drugs@FDA

The Drugs@FDA website "..includes most of the drug products approved since 1939. The majority of patient information, labels, approval letters, reviews, and other information are available for drug products approved since 1998."

Here you can download pre-generated index files for 100 records and 500 records of the Drugs@FDA dataset.

Contact Me

José C. Lacal [email protected] for any questions or suggestions for improvement.

Connect with me and follow me in LinkedIn

Support our work!

We help companies in Life Sciences to accelerate their data management processes.

Please take a look at our capabilities.

About

Building, running, and testing LLMs in your local computer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages