Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions medcat-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,3 +389,17 @@ The main settings that can be used to improve the performance when querying larg
MedCAT parameters are defined in selected `envs/medcat*` file.

For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/).

## Local development

For local development, set up a Python virtual environment, install dependencies with pip, and make sure to also install the local MedCAT core library (the `medcat-v2` folder) in editable mode.

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
SETUPTOOLS_SCM_PRETEND_VERSION="2.4.0-dev0" pip install -e "../medcat-v2[meta-cat,spacy]"
bash start_service_debug.sh

# Service will run on localhost:8000
```
2 changes: 1 addition & 1 deletion medcat-service/medcat_service/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class Settings(BaseSettings):
)

app_root_path: str = Field(
default="/",
default="",
description="The Root Path for the FastAPI App",
examples=["/medcat-service"],
)
Expand Down
121 changes: 121 additions & 0 deletions medcat-service/medcat_service/demo/demo_content.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

short_example = "John had been diagnosed with acute Kidney Failure the week before"


long_example = """Description: Intracerebral hemorrhage (very acute clinical changes occurred immediately).
CC: Left hand numbness on presentation; then developed lethargy later that day.

HX: On the day of presentation, this 72 y/o RHM suddenly developed generalized weakness and lightheadedness, and could not rise from a chair. Four hours later he experienced sudden left hand numbness lasting two hours. There were no other associated symptoms except for the generalized weakness and lightheadedness. He denied vertigo.

He had been experiencing falling spells without associated LOC up to several times a month for the past year.

MEDS: procardia SR, Lasix, Ecotrin, KCL, Digoxin, Colace, Coumadin.

PMH: 1)8/92 evaluation for presyncope (Echocardiogram showed: AV fibrosis/calcification, AV stenosis/insufficiency, MV stenosis with annular calcification and regurgitation, moderate TR, Decreased LV systolic function, severe LAE. MRI brain: focal areas of increased T2 signal in the left cerebellum and in the brainstem probably representing microvascular ischemic disease. IVG (MUGA scan)revealed: global hypokinesis of the LV and biventricular dysfunction, RV ejection Fx 45% and LV ejection Fx 39%. He was subsequently placed on coumadin severe valvular heart disease), 2)HTN, 3)Rheumatic fever and heart disease, 4)COPD, 5)ETOH abuse, 6)colonic polyps, 7)CAD, 8)CHF, 9)Appendectomy, 10)Junctional tachycardia.
""" # noqa: E501

article_footer = """
## Disclaimer
This software is intended solely for the testing purposes and non-commercial use. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

[email protected] for more information.

Please note this is a limited version of MedCAT and it is not trained or validated by clinicans.
""" # noqa: E501

anoncat_example = """Patient Information:

Name: John Parkinson
Date of Birth: February 12, 1958
Gender: Male
Address: 789 Wellness Lane, Healthville, HV 56789
Phone: (555) 555-1234
Email: [email protected]
Emergency Contact:

Name: Mary Parkinson
Relationship: Spouse
Phone: (555) 555-5678
Insurance Information:

Insurance Provider: HealthWell Assurance
Policy Number: HW765432109
Group Number: G876543
Medical History:

Allergies:

None reported
Medications:

Levodopa/Carbidopa for Parkinson's disease symptoms
Pramipexole for restless legs syndrome
Lisinopril for hypertension
Atorvastatin for hyperlipidemia
Metformin for Type 2 Diabetes
Medical Conditions:

Parkinson's Disease (diagnosed on June 20, 2015)
Hypertension
Hyperlipidemia
Type 2 Diabetes
Osteoarthritis
Vital Signs:

Blood Pressure: 130/80 mmHg
Heart Rate: 72 bpm
Temperature: 98.4°F
Respiratory Rate: 18 breaths per minute
Recent Inpatient Stay (Dates: September 1-10, 2023):

Reason for Admission: Acute exacerbation of Parkinson's symptoms, pneumonia, and uncontrolled diabetes.

Interventions:

Neurology Consultation for Parkinson's disease management adjustments.
Antibiotic therapy for pneumonia.
Continuous glucose monitoring and insulin therapy for diabetes control.
Physical therapy sessions to maintain mobility.
Complications:

Delirium managed with close monitoring and appropriate interventions.
Discharge Plan:

Medication adjustments for Parkinson's disease.
Follow-up appointments with neurologist, endocrinologist, and primary care.
Home health care for continued physical therapy.
Follow-up Visits:

Date: October 15, 2023

Reason for Visit: Post-discharge Follow-up
Notes: Stable Parkinson's symptoms, pneumonia resolved. Adjusted diabetes medications for better control.
Date: December 5, 2023

Reason for Visit: Neurology Follow-up
Notes: Fine-tuned Parkinson's medication regimen. Recommended ongoing physical therapy.
""" # noqa: E501

anoncat_help_content = """Demo app for the deidentification of private health information using the CogStack AnonCAT model

Please DO NOT test with any real sensitive PHI data.

Local validation and fine-tuning available via [MedCATtrainer](
https://github.com/CogStack/cogstack-nlp/tree/main/medcat-trainer).
Email us, [[email protected]](mailto:[email protected]), to discuss model access,
model performance, and your use case.

The following PHI items have been trained:

| PHI Item | Description |
|----------|-------------|
| NHS Number | UK National Health Service Numbers. |
| Name | All names, first, middle, last of patients, relatives, care providers etc. Importantly, does not redact conditions that are named after a name, e.g. "Parkinsons's disease". |
| Date of Birth | DOBs. Does not include other dates that may be in the record, i.e. dates of visit etc. |
| Hospital Number | A unique number provided by the hospital. Distinct from the NHS number |
| Address Line | Address lines - first, second, third or fourth |
| Postcode | UK postal codes - 6 or 7 alphanumeric codes as part of addresses |
| Telephone Number | Telephone numbers, extensions, mobile / cell phone numbers |
| Email | Email addresses |
| Initials | Patient, relatives, care provider name initials. |
""" # noqa: E501
153 changes: 153 additions & 0 deletions medcat-service/medcat_service/demo/demo_logic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""
This module provides conversion utilities between the MedCAT output format
and the exact format expected by Gradio components, specifically aligning
with the output schema of Hugging Face Transformers pipelines (e.g., for
NER highlighting). Use these definitions and helper functions to bridge
MedCAT's annotation results and Gradio's interactive demo expectations.
"""

import logging

from pydantic import BaseModel

from medcat_service.dependencies import get_medcat_processor, get_settings
from medcat_service.types import ProcessAPIInputContent
from medcat_service.types_entities import Entity

logger = logging.getLogger(__name__)


class EntityAnnotation(BaseModel):
"""
Expected data format for NER in gradio
"""

entity: str
score: float
index: int
word: str
start: int
end: int


headers = ["Pretty Name", "Identifier", "Confidence Score", "Start Index", "End Index", "ID"]


class EntityAnnotationDisplay(BaseModel):
"""
DIsplay data format for use in a datatable
"""

pretty_name: str
identifier: str
score: float
start: int
end: int
id: int
# Misisng Meta Anns


class EntityResponse(BaseModel):
"""
Expected data format of gradio highlightedtext component
"""

entities: list[EntityAnnotation]
text: str


def convert_annotation_to_ner_model(entity: Entity, index: int) -> EntityAnnotation:
return EntityAnnotation(
entity=entity.get("cui", "UNKNOWN"),
score=entity.get("acc", 0.0),
index=index,
word=entity.get("detected_name", ""),
start=entity.get("start", -1),
end=entity.get("end", -1),
)


def convert_annotation_to_display_model(entity: Entity) -> EntityAnnotationDisplay:
return EntityAnnotationDisplay(
pretty_name=entity.get("pretty_name", ""),
identifier=entity.get("cui", "UNKNOWN"),
score=entity.get("acc", 0.0),
start=entity.get("start", -1),
end=entity.get("end", -1),
id=entity.get("id", -1),
# medcat-demo-app/webapp/demo/views.py
# if key == 'meta_anns':
# meta_anns=ent.get("meta_anns", {})
# if meta_anns:
# for meta_ann in meta_anns.keys():
# new_ent[meta_ann]=meta_anns[meta_ann]['value']
)


def convert_entity_dict_to_annotations(entity_dict_list: list[dict[str, Entity]]) -> list[EntityAnnotation]:
annotations: list[EntityAnnotation] = []
for entity_dict in entity_dict_list:
for key, entity in entity_dict.items():
annotations.append(convert_annotation_to_ner_model(entity, index=int(key)))
return annotations


def convert_entity_dict_to_display_model(entity_dict_list: list[dict[str, Entity]]) -> list[EntityAnnotationDisplay]:
logger.debug("Converting entity dict to display model") annotations: list[EntityAnnotationDisplay] = []
for entity_dict in entity_dict_list:
for key, entity in entity_dict.items():
annotations.append(convert_annotation_to_display_model(entity))
return annotations


def convert_display_model_to_list_of_lists(entity_display_model: list[EntityAnnotationDisplay]) -> list[list[str]]:
return [
[str(getattr(entity, field)) for field in EntityAnnotationDisplay.model_fields]
for entity in entity_display_model
]


def perform_named_entity_resolution(input_text: str):
"""
Performs clinical coding by processing the input text with MedCAT to extract and
annotate medical concepts (entities).

Returns:
1. A dictionary following the NER response model (EntityResponse), containing the original text
and the list of detected entities.
2. A datatable-compatible list of lists, where each sublist represents an entity annotation and
its attributes for display purposes.

This method is used as the main function for the Gradio MedCAT demo and MCP server,
enabling users to input free text and receive automatic annotation and coding of clinical entities.

Args:
input_text (str): The input text to be processed and annotated for medical entities by MedCAT.

Returns:
Tuple:
- dict: A dictionary following the NER response model (EntityResponse), containing the
original text and the list of detected entities.
- list[list[str]]: A datatable-compatible list of lists, where each sublist represents an
entity annotation and its attributes for display purposes.

"""
logger.debug("Performing named entity resolution")
if not input_text or not input_text.strip():
return None, None

processor = get_medcat_processor(get_settings())
input = ProcessAPIInputContent(text=input_text)

result = processor.process_content(input.model_dump())

entity_ner_format: list[EntityAnnotation] = convert_entity_dict_to_annotations(result.annotations)

logger.debug("Converting entity dict to display model")
annotations_as_display_format = convert_entity_dict_to_display_model(result.annotations)
response_datatable_format = convert_display_model_to_list_of_lists(annotations_as_display_format)

response: EntityResponse = EntityResponse(entities=entity_ner_format, text=input_text)
result = response.model_dump(), response_datatable_format
logger.debug("Returning final result")
return result
Loading
Loading