Skip to content

AJFOWLER/MedicalCorpusMaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

MedicalCorpusMaker

Create a custom medical corpus using PubMed API.

This has been created to enable development of a custom corpus for machine learning / natural language processing.

This is based on searching a large medical index - PubMed for custom search terms.

The abstracts are then collated to return either a Counter object with all terms and their frequency, or just the whole corpus. This is selected by setting count_list to TRUE (default is FALSE)

An email address must be provided to enable API searching of PubMed via. BioPython.

Multiple terms can be used, separated by commas.

make_corpus(email, terms*, count_list=FALSE)

Example use:

email = [email protected]
term_1 = 'renal'
term_2 = 'muscle'

corpus = make_corpus(email, term_1, term_2)
#returns corpus 

corpus_counted = make_corpus(email, term_1, term_2, count_list = TRUE)
#returns Counter object

This corpus can then be used in e.g. a custom text splitter

All/any changes & pull requests welcomed.

requirements:
Biopython

About

Create a custom medical corpus by PubMed API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages