Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ac4d879
CU-869bydfx8: Initial commit for plugin download stuff
mart-r Jan 28, 2026
add3035
CU-869bydfx8: Allow ignoring remote data
mart-r Jan 28, 2026
a228f45
CU-869bydfx8: Fix remote data URL
mart-r Jan 28, 2026
9ee3ecb
CU-869bydfx: Add tests for new modules
mart-r Jan 28, 2026
b571260
CU-869bydfx8: Allow SSH-based URLs
mart-r Jan 28, 2026
9502888
CU-869bydfx8: Fix gliner plugin data
mart-r Jan 28, 2026
fd9b146
CU-869bydfx8: Fix some json indentation inconsistencies
mart-r Jan 28, 2026
7e7af8e
CU-869bydfx8: Make plugins case-insensitive and '-' vs '_' insensitive
mart-r Jan 28, 2026
5c759ed
CU-869bydfx8: Add installation instructions to MissingPluginError
mart-r Jan 28, 2026
82b45e9
CU-869bydfx8: Add some documentation regarding plugins and curated pl…
mart-r Jan 28, 2026
f04d237
CU-869bydfx8: Treat data as a package to allow for its package-based …
mart-r Jan 28, 2026
9378683
CU-869bydfx8: Add built in plugin catalog to the built wheel to inclu…
mart-r Jan 28, 2026
49ceee3
CU-869bydfx8: Fix minor linting issue
mart-r Jan 28, 2026
bba5be1
CU-869bydfx8: Fix another minor linting issue
mart-r Jan 29, 2026
2d58c9a
CU-869bydfx8: Fix linting issue, finally
mart-r Jan 29, 2026
4462f5a
CU-869bydfx8: Some refactor to keep common data together and avoid re…
mart-r Jan 29, 2026
5bb40af
CU-869bydfx8: Update models to pydantic ones for easier parsing
mart-r Jan 29, 2026
a5e07d3
CU-869bydfx8: Update catalog tests in line with recent changes
mart-r Jan 29, 2026
408e107
CU-869bydfx8: Avoid using upper case collections for typing
mart-r Jan 29, 2026
d1b9680
CU-869bydfx8: Add a few catalog merge tests
mart-r Jan 29, 2026
5d40c64
CU-869bydfx8: Fix default plugin catalog auth marker
mart-r Jan 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions medcat-v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,40 @@ pip install "medcat[deid]" # for DeID models
pip install "medcat[spacy,meta-cat,deid,rel-cat,dict-ner]" # for all of the above
```

### Installing plugins

MedCAT v2 supports **external plugins** that can provide new components (e.g. alternative NER models, addons, tokenizers) via Python entry points.

- **Curated plugins**: The `medcat.plugins.catalog` module ships with a curated plugin catalog that can be updated from a remote JSON file.
- **Installer**: The `medcat.plugins.installer.PluginInstallationManager` wraps a `pip`-based installer and knows how to resolve a compatible plugin version for your current MedCAT version.
- **CLI**: You can install curated plugins directly from the command line:

```bash
python -m medcat plugins install medcat-gliner
```

This will:

- look up `medcat-gliner` in the curated catalog,
- resolve a version compatible with your installed MedCAT,
- and install it using `pip`.

You can also:

- pass `--dry-run` to show what would be installed without making changes:

```bash
python -m medcat plugins install --dry-run medcat-gliner
```

- override the version/ref explicitly (e.g. when testing a branch or tag):

```bash
python -m medcat plugins install medcat-gliner --force-version main
```

If a plugin requires authentication (for example, private Git repositories), MedCAT will log a warning and the installer will surface pip’s error messages if credentials are missing or incorrect.

### Version / update checking

MedCAT now has the ability to check for newer versions of itself on PyPI (or a local mirror of it).
Expand Down
10 changes: 10 additions & 0 deletions medcat-v2/docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,16 @@ All components are registered in a central registry. This means you can:
### Plugins
**Plugins** are external Python packages that provide new component implementations or other functionality. They integrate with MedCAT through Python entry points, allowing automatic discovery and registration without modifying MedCAT's core code.

MedCAT v2 also includes a **curated plugin catalog** and an **installer**:

- `medcat.plugins.catalog.PluginCatalog` maintains a list of known plugins, their metadata, and MedCAT compatibility rules (e.g. “this plugin supports `>=2.5.0,<3.0.0`”).
- `medcat.plugins.installer.PluginInstallationManager` uses that catalog to select a compatible version and install it (currently via `pip`), with support for:
- PyPI packages
- Git repositories (including subdirectories such as monorepo layouts)
- Direct URLs (e.g. wheels or tarballs)

The curated catalog can be updated from a remote JSON file, and plugins can be installed either programmatically or via the `python -m medcat plugins install ...` CLI.

---

## Working with Core Components
Expand Down
4 changes: 3 additions & 1 deletion medcat-v2/medcat/__main__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import sys
from medcat.utils.download_scripts import main as __download_scripts
from medcat.plugins.cli import install_plugins_command as __install_plugins


_COMMANDS = {
"download-scripts": __download_scripts
"download-scripts": __download_scripts,
"install-plugins": __install_plugins,
}


Expand Down
242 changes: 242 additions & 0 deletions medcat-v2/medcat/plugins/catalog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
"""Management of the curated plugin catalog."""

import json
import logging
from typing import Optional
import importlib.resources
import requests

from packaging.specifiers import SpecifierSet
from packaging.version import Version
from pydantic import BaseModel, Field

from .downloadable import PluginSourceSpec

logger = logging.getLogger(__name__)


LOCAL_CATALOG_PATH = (
importlib.resources.files('medcat.plugins.data') /
'plugin_catalog.json'
)

class PluginCompatibility(BaseModel):
medcat_version: str
plugin_version: str


class PluginInfo(BaseModel):
name: str
display_name: str
description: str
source_spec: PluginSourceSpec
homepage: str
compatibility: list[PluginCompatibility]
requires_auth: bool = False

def can_merge(self, other: 'PluginInfo') -> bool:
"""Checks if 2 plugin infos can be merged.

This checks to make sure the name and the source spec is the same.
In that case the two objects likely refer to the same plugin. But
one might have updated information.

Args:
other (PluginInfo): The other plugin info.

Returns:
bool: Whether they can be merged.
"""
return (
self.name == other.name and
self.source_spec == other.source_spec)

def merge(self, other: 'PluginInfo', prefer_other: bool = True) -> None:
"""Merge other plugin info into this one.

Normally it is likely the "other" plugin info is newer so we want to
prefer its data if/when possible.

Args:
other (PluginInfo): The other plugin info.
prefer_other (bool): Whether to prefer other. Defaults to True.

Raises:
UnmergablePluginInfo: If the infos cannot be merged.
"""
if not self.can_merge(other):
raise UnmergablePluginInfo(self, other)
if prefer_other:
self.display_name = other.display_name
self.description = other.description
self.homepage = other.homepage
self.requires_auth = other.requires_auth
existing_plugin_versions = {cur.plugin_version for cur in self.compatibility}
for other_comp in other.compatibility:
if other_comp.plugin_version not in existing_plugin_versions:
self.compatibility.append(other_comp)
elif prefer_other:
prev_index = [idx for idx, cur in enumerate(self.compatibility)
if cur.plugin_version == other_comp.plugin_version][0]
self.compatibility[prev_index] = other_comp


class CatalogModel(BaseModel):
"""Pydantic model for the top-level catalog JSON."""
plugins: dict[str, PluginInfo] = Field(default_factory=dict)
version: str
last_updated: str

def merge(self, other: 'CatalogModel', prefer_other: bool = True) -> None:
"""Merge another catalog into this one.

Args:
other (CatalogModel): The other catalog to merge.
prefer_other (bool): Whether to prefer other. Defaults to True.
"""
if prefer_other:
self.version = other.version
self.last_updated = other.last_updated
for plugin_name, info in other.plugins.items():
if plugin_name not in self.plugins:
self.plugins[plugin_name] = info
elif prefer_other:
self.plugins[plugin_name].merge(info, prefer_other=prefer_other)


class PluginCatalog:
"""Manages the catalog of curated plugins."""

REMOTE_CATALOG_URL = (
"https://raw.githubusercontent.com/CogStack/cogstack-nlp/main/"
"medcat-v2/medcat/plugins/data/plugin_catalog.json"
)

def __init__(self, use_remote: bool = True):
"""
Initialize the plugin catalog.

Args:
use_remote: Whether to attempt fetching the remote catalog
"""
self._catalog: CatalogModel = CatalogModel(
version="N/A", last_updated='N/A', plugins={})
self._load_local_catalog()
if use_remote:
try:
self._update_from_remote()
except Exception as e:
logger.debug(f"Could not fetch remote catalog: {e}")

def _load_local_catalog(self):
"""Load the catalog from the packaged JSON file."""
try:
catalog_data = LOCAL_CATALOG_PATH.read_text()
self._parse_catalog(json.loads(catalog_data))
logger.debug("Loaded local plugin catalog")
except Exception as e:
logger.warning(f"Could not load local catalog: {e}")

def _update_from_remote(self, timeout: int = 5):
"""Fetch and update from the remote catalog."""
response = requests.get(self.REMOTE_CATALOG_URL, timeout=timeout)
response.raise_for_status()

self._parse_catalog(response.json())
logger.info("Updated plugin catalog from remote source")

def _parse_catalog(self, data: dict):
"""Parse catalog JSON data into PluginInfo objects.

This uses Pydantic models for schema validation and forward compatibility,
so that adding fields to the JSON does not require rewriting this method.
"""
payload = CatalogModel.model_validate(data)
self._catalog.merge(payload)


def get_plugin(self, name: str) -> Optional[PluginInfo]:
"""Get plugin info by name."""
plugin = self._catalog.plugins.get(name)
if plugin:
return plugin
# try lower case and with "-" instead of "_"
return self._catalog.plugins.get(name.lower().replace("_", "-"))


def list_plugins(self) -> list[PluginInfo]:
"""List all available plugins."""
return list(self._catalog.plugins.values())

def is_curated(self, name: str) -> bool:
"""Check if a plugin is in the curated catalog."""
return name in self._catalog.plugins

def get_compatible_version(
self,
plugin_name: str,
medcat_version: str
) -> str:
"""
Get compatible plugin version for given MedCAT version.

Args:
plugin_name: Name of the plugin
medcat_version: MedCAT version string

Raises:
NoSuchPluginException: If the plugin wasn't found / known.
NoCompatibleSpecException: If compatibility spec was unable to be met.

Returns:
Compatible version specifier
"""
plugin = self.get_plugin(plugin_name)
if not plugin:
raise NoSuchPluginException(plugin_name)

medcat_ver = Version(medcat_version)

for compat in plugin.compatibility:
spec = SpecifierSet(compat.medcat_version)
if medcat_ver in spec:
return compat.plugin_version

raise NoCompatibleSpecException(plugin, medcat_ver)


# Global catalog instance
_catalog: Optional[PluginCatalog] = None


def get_catalog() -> PluginCatalog:
"""Get the global plugin catalog instance."""
global _catalog
if _catalog is None:
_catalog = PluginCatalog()
return _catalog


class NoSuchPluginException(ValueError):

def __init__(self, plugin_name: str) -> None:
super().__init__(
f"No plugin by the name '{plugin_name}' is known to MedCAT")


class NoCompatibleSpecException(ValueError):

def __init__(self, plugin: PluginInfo, medcat_ver: Version) -> None:
super().__init__(
f"Was unable to find a version of the plugin {plugin.name} "
f"that was compatible with MedCAT version {medcat_ver}. "
f"Plugin details: {plugin}")


class UnmergablePluginInfo(ValueError):

def __init__(self, info1: PluginInfo, info2: PluginInfo) -> None:
super().__init__(
"The two plugin infos cannot be merged:\n"
f"One:\n{info1}\nand two:\n{info2}"
)
31 changes: 31 additions & 0 deletions medcat-v2/medcat/plugins/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""CLI entrypoint for MedCAT commands."""

import sys

from medcat.plugins.installer import PluginInstallationManager


# TODO: plugin listing and stuff like that


def install_plugins_command(*args: str):
opts = [arg for arg in args if arg.startswith("--")]
plugins = [arg for arg in args if arg not in opts]
dry_run = "--dry-run" in opts

manager = PluginInstallationManager()

if not plugins:
print("Error: No plugins specified", file=sys.stderr)
return 1

results = manager.install_multiple(plugins, dry_run=dry_run)

failed = [name for name, success in results.items() if not success]

if failed:
print(f"Failed to install: {', '.join(failed)}", file=sys.stderr)
return 1

print(f"Successfully installed: {', '.join(results.keys())}")
return 0
2 changes: 2 additions & 0 deletions medcat-v2/medcat/plugins/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""Packaged resources for MedCAT plugins (e.g. curated plugin catalog)."""

24 changes: 24 additions & 0 deletions medcat-v2/medcat/plugins/data/plugin_catalog.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"version": "1.0",
"last_updated": "2026-01-28",
"plugins": {
"medcat-gliner": {
"name": "medcat-gliner",
"display_name": "MedCAT-gliner",
"description": "Gliner based NER for MedCAT",
"source_spec": {
"source": "git@github.com:CogStack/cogstack-ops.git",
"source_type": "github_ssh_subdir",
"subdirectory": "medcat-gliner"
},
"homepage": "https://github.com/CogStack/cogstack-ops/tree/main/medcat-gliner",
"compatibility": [
{
"medcat_version": ">=2.5.0,<3.0.0",
"plugin_version": "main"
}
],
"requires_auth": true
}
}
}
Loading
Loading