This repository contains the Python source code from the paper.
In Multi-Instance Learning, each training object is represented by several feature vectors (bag) and a label. In our implementation, an example (i.e., a molecule) is presented by a bag of instances (i.e., a set of conformers), and a label (a bioactivity value) is available only for a bag (a molecule), but not for individual instances (conformations).
pip install qsarmilThe modelling pipeline is based on two supplementary packages:
- QSARmil – Molecular multi-instance machine learning
- milearn - Multi-instance machine learning in Python
Refer to these packages for more examples and application cases.
Original datasets can be found at datasets. The folder contains 200 datasets on ligand bioactivity extracted from ChEMBL. Follow the Notebook for usage example.