-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Introduction
LORIS Python is a codebase whose code lives in a single package made up of the lib and scripts directories. This codebase contains several pipelines whose isolation from each other greatly varies: the DICOM importer code almost exclusively lives in the lib/import_dicom_study directory, the BIDS converter code mostly lives in the lib/dcm2bids_imaging_pipeline_lib directory, and the (current) BIDS importer lives in several files spread out across the codebase (lib/bidsreader.py, lib/candidate.py, lib/eeg.py, lib/mri.py, lib/session.py...)
I believe this is not a sustainable way to do development: since all the pipelines live in a single package, the boundaries between those are often blurry in practice, the pipelines themselves are less discoverable, and adding or modifying a pipeline becomes very hard to review and to merge as it risks impacting the whole codebase (which partly explains the BIDS importer refactor situation). Moreover, I personally want to add new pipelines to LORIS Python in the near or far future, notably for MEG support and imaging upload. Some of these pipelines may contain consequential features (probably an HTTP server), so I want those to be fully isolated, opt-in, and disabled by default.
In order to enforce better segmentation and modularity in the codebase, I propose to divide LORIS-Python into several packages, which can be installed, reviewed, or replaced independently.
Architecture
The new architecture I propose, given the current pipelines in the codebase, looks like this:
python/
├──loris_util/
├──loris_bids_importer/
├──loris_bids_reader/
├──loris_core/
├──loris_dicom_importer/
├──loris_eeg_chunker/
└──tests/
pyproject.toml
Notice the lib and scripts directories have been replaced by several loris_x packages, which each declares its own dependencies, including with other packages. There are basically four kinds of packages:
- Packages that do not require to have a LORIS installation (
loris_util,loris_bids_reader,loris_eeg_chunker). - The core package used to interact with the LORIS environment or database (
loris_core). - The pipeline-specific packages that contains the libraries and scripts relevant to a pipeline (
loris_dicom_importer,loris_bids_importer). - The main LORIS package declared at the root level and contains the other packages.
Each package should follow the conventional src-layout, which looks like this:
loris_x/
├──src/
│ └──loris_x/
│ └── code files...
├──pyproject.toml
└──README.md
Migration
A major question is obviously how to move from the current monolithic architecture to the modular one? Well, it is actually not that hard, and can be accomplished in the few following steps:
- Turn the "environment variables patching" project into a real package (done, see Add Python build backend (packaging PR 4) #1320).
- Make the EEG chunker directory into a real package (see Make the EEG chunker into a standalone Python package #1338). This is also convenient because I learned that C-BRAIN is currently pulling the whole of LORIS-MRI / LORIS Python to use the chunker, which could be avoided if it is made into a standalone package.
- Extract
lib/utilout oflibinto its ownloris_utilpackage, so that it can be used by any other package. This is not a hard breaking change because this module was only added in last version, which has not been adopted yet by any project. - Extract the BIDS reader (
loris_bids_reader) out of the new BIDS importer (BIDS importer refactor #1325) into its own package. The BIDS importer is basically made up of two parts: one that reads the BIDS dataset, and the other that ingests its data in the database. The former part is completely independent of LORIS, and should therefore be put in its own package to make it more isolated and reusable. - Make the
libandscriptsdirectories into the LORIS core package (loris_core). This can be simply accomplished by puttingscriptsdirectory insidelib, renaming the latter toloris_core, and adding the package metadata (pyproject.toml). - Extract the DICOM importer and new BIDS importer into their specific
loris_dicom_importerandloris_bids_importerpackages, that depend onloris_core. Those are pipelines that I wrote and that are already very well isolated so this is rather trivial. - Extract the other pipelines into their own specific packages (such as the DICOM archive to BIDS converter). This is an optional step, as these pipelines can also simply live in the
loris_corepackage until we decide to move them out.
I can write the PRs for all the steps except the last one myself (which as said, is not required anyway). While it seems like a lot of breaking changes at first glance, this is mostly just moving code around and adding package metadata. From a current user perspective this mostly results in simply updating imports, which is mostly a simple find-and-replace operation (which is also checked by CI!):
from lib import x→from loris_core import xfrom lib.util import x→from loris_util import xfrom lib.imaging_lib.bids import x→from loris_bids_reader import xfrom chunking import x→from loris_eeg_chunker.chunking import x(already done)
Use
To install LORIS Python with this modular architecture, simply use pip install $LORIS_MRI_DIR_PATH exactly as it works now in CI (and that I also use on my VMs), this installs the main LORIS package and all the other packages it depends on (that is, loris_core and all the pipelines). The only subtlety is that to edit a specific package, you probably need to use pip install -e $YOUR_PACKAGE_PATH (-e stands for --editable) so that the changes in the code are reflected in the package behavior. This will likely need to be documented in a docs/python/Packaging.md file (just as I have written some for the database abstraction and tooling).
Moreover, I also plan to provide an optional UV configuration that allows to install all the packages as editable at once using uv pip install -e $LORIS_MRI_DIR_PATH. For those who don't know, UV is a third-party package manager by the same people that made Ruff (the linter we use), which has gained a lot of traction in recent years notably for its speed and multi-package management. In any case, note that this configuration is fully optional and that all the installation also works with PIP.
Overrides
There is a potential problem in that the current LORIS-MRI Python override process does not have a single piece of documentation and is therefore impossible to fully understand. I would like to schedule a call with @cmadjar so she can show me what it looks like in practice and I can better answer her needs.
In any case, any package in this modular architecture can be replaced by an alternative version of that package that contains all the overrides needed for a project.
Checklist
- Main package (Add Python build backend (packaging PR 4) #1320, Rename the main Python package #1340)
- EEG chunker package (Make the EEG chunker into a standalone Python package #1338)
- Utilities package
- BIDS reader package
- Core package
- DICOM uploader package
- BIDS uploader package
- Add an optional UV configuration for package management
- Write documentation about LORIS Python packaging
- Modify CI to test each package individually instead of only as a whole
Conclusion
As said in the introduction, this is a rather critical project for me, as I need that modularity to develop the new MEG pipeline, and as I think the BIDS importer refactor mess has shown how much we need more isolation between the pipelines.
Thank you for reading, please tell your thoughts and questions in the issue comments.