Skip to content

Modularization of LORIS Python #1339

@MaximeBICMTL

Description

@MaximeBICMTL

Introduction

LORIS Python is a codebase whose code lives in a single package made up of the lib and scripts directories. This codebase contains several pipelines whose isolation from each other greatly varies: the DICOM importer code almost exclusively lives in the lib/import_dicom_study directory, the BIDS converter code mostly lives in the lib/dcm2bids_imaging_pipeline_lib directory, and the (current) BIDS importer lives in several files spread out across the codebase (lib/bidsreader.py, lib/candidate.py, lib/eeg.py, lib/mri.py, lib/session.py...)

I believe this is not a sustainable way to do development: since all the pipelines live in a single package, the boundaries between those are often blurry in practice, the pipelines themselves are less discoverable, and adding or modifying a pipeline becomes very hard to review and to merge as it risks impacting the whole codebase (which partly explains the BIDS importer refactor situation). Moreover, I personally want to add new pipelines to LORIS Python in the near or far future, notably for MEG support and imaging upload. Some of these pipelines may contain consequential features (probably an HTTP server), so I want those to be fully isolated, opt-in, and disabled by default.

In order to enforce better segmentation and modularity in the codebase, I propose to divide LORIS-Python into several packages, which can be installed, reviewed, or replaced independently.

Architecture

The new architecture I propose, given the current pipelines in the codebase, looks like this:

python/
├──loris_util/
├──loris_bids_importer/
├──loris_bids_reader/
├──loris_core/
├──loris_dicom_importer/
├──loris_eeg_chunker/
└──tests/
pyproject.toml

Notice the lib and scripts directories have been replaced by several loris_x packages, which each declares its own dependencies, including with other packages. There are basically four kinds of packages:

  • Packages that do not require to have a LORIS installation (loris_util, loris_bids_reader, loris_eeg_chunker).
  • The core package used to interact with the LORIS environment or database (loris_core).
  • The pipeline-specific packages that contains the libraries and scripts relevant to a pipeline (loris_dicom_importer, loris_bids_importer).
  • The main LORIS package declared at the root level and contains the other packages.

Each package should follow the conventional src-layout, which looks like this:

loris_x/
├──src/
│   └──loris_x/
│        └── code files...
├──pyproject.toml
└──README.md

Migration

A major question is obviously how to move from the current monolithic architecture to the modular one? Well, it is actually not that hard, and can be accomplished in the few following steps:

  1. Turn the "environment variables patching" project into a real package (done, see Add Python build backend (packaging PR 4) #1320).
  2. Make the EEG chunker directory into a real package (see Make the EEG chunker into a standalone Python package #1338). This is also convenient because I learned that C-BRAIN is currently pulling the whole of LORIS-MRI / LORIS Python to use the chunker, which could be avoided if it is made into a standalone package.
  3. Extract lib/util out of lib into its own loris_util package, so that it can be used by any other package. This is not a hard breaking change because this module was only added in last version, which has not been adopted yet by any project.
  4. Extract the BIDS reader (loris_bids_reader) out of the new BIDS importer (BIDS importer refactor #1325) into its own package. The BIDS importer is basically made up of two parts: one that reads the BIDS dataset, and the other that ingests its data in the database. The former part is completely independent of LORIS, and should therefore be put in its own package to make it more isolated and reusable.
  5. Make the lib and scripts directories into the LORIS core package (loris_core). This can be simply accomplished by putting scripts directory inside lib, renaming the latter to loris_core, and adding the package metadata (pyproject.toml).
  6. Extract the DICOM importer and new BIDS importer into their specific loris_dicom_importer and loris_bids_importer packages, that depend on loris_core. Those are pipelines that I wrote and that are already very well isolated so this is rather trivial.
  7. Extract the other pipelines into their own specific packages (such as the DICOM archive to BIDS converter). This is an optional step, as these pipelines can also simply live in the loris_core package until we decide to move them out.

I can write the PRs for all the steps except the last one myself (which as said, is not required anyway). While it seems like a lot of breaking changes at first glance, this is mostly just moving code around and adding package metadata. From a current user perspective this mostly results in simply updating imports, which is mostly a simple find-and-replace operation (which is also checked by CI!):

  • from lib import xfrom loris_core import x
  • from lib.util import xfrom loris_util import x
  • from lib.imaging_lib.bids import xfrom loris_bids_reader import x
  • from chunking import xfrom loris_eeg_chunker.chunking import x (already done)

Use

To install LORIS Python with this modular architecture, simply use pip install $LORIS_MRI_DIR_PATH exactly as it works now in CI (and that I also use on my VMs), this installs the main LORIS package and all the other packages it depends on (that is, loris_core and all the pipelines). The only subtlety is that to edit a specific package, you probably need to use pip install -e $YOUR_PACKAGE_PATH (-e stands for --editable) so that the changes in the code are reflected in the package behavior. This will likely need to be documented in a docs/python/Packaging.md file (just as I have written some for the database abstraction and tooling).

Moreover, I also plan to provide an optional UV configuration that allows to install all the packages as editable at once using uv pip install -e $LORIS_MRI_DIR_PATH. For those who don't know, UV is a third-party package manager by the same people that made Ruff (the linter we use), which has gained a lot of traction in recent years notably for its speed and multi-package management. In any case, note that this configuration is fully optional and that all the installation also works with PIP.

Overrides

There is a potential problem in that the current LORIS-MRI Python override process does not have a single piece of documentation and is therefore impossible to fully understand. I would like to schedule a call with @cmadjar so she can show me what it looks like in practice and I can better answer her needs.

In any case, any package in this modular architecture can be replaced by an alternative version of that package that contains all the overrides needed for a project.

Checklist

Conclusion

As said in the introduction, this is a rather critical project for me, as I need that modularity to develop the new MEG pipeline, and as I think the BIDS importer refactor mess has shown how much we need more isolation between the pipelines.

Thank you for reading, please tell your thoughts and questions in the issue comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions