Skip to content

Commit 618f680

Browse files
roshkjrroshansreenathnair
authored
Resolves bugs in writing cif file and parity similarity calculation (#28)
* corrected the reversal of bondtype from zero to dative * use mol from sanitized result * enable parsing a subsets of ccds from components cif file * fix: fix error in writing cif file This fixes the error in generating inchikey while inchi is missing * style: formating * chore: removed unnecessary None * doc: add download badge and CLC feature * chore: add citation file * fix: write rdkit properties if it is present only * fix: correct use of length of list in python * fix: change bondtype from dative to zero for parity method SMARTS with dative bondtype fails to find substructures, hence dative bond types are changed to zero * test: add HEM for parity test * chore: linting and formatting * test: removed parity test using HEM to check if the segmentation fault in githubworkflow is due to this * fix: remove PDBe from unichem mapping sources * chore: Update dependencies and package management Update the project's dependencies and package management to use Poetry instead of pip. This includes installing Poetry and running `poetry install --with tests` to install the project dependencies. Remove the installation of `rdkit==2023.09.6` and `pre-commit`. * chore: Update pre-commit hooks and dependencies Update the pre-commit hooks to use the Ruff pre-commit hook repository and remove the black and flake8 hooks. Also, add the rST Formatter hook from the rstfmt repository. * chore: Update tests.yml to use Poetry for pre-commit and pytest commands Update the tests.yml file to use Poetry for the pre-commit and pytest commands. This ensures that the project's dependencies are installed correctly and that the pre-commit hooks are run using Poetry. The changes include replacing "pre-commit install" with "poetry run pre-commit install" and "pytest --cov=pdbeccdutils" with "poetry run pytest --cov=pdbeccdutils". * Minor formatting * chore: Install pre-commit package in tests.yml Add the installation of the pre-commit package in the tests.yml file to ensure that pre-commit hooks are run during the testing process. This will help catch and fix any code style or formatting issues before committing the changes. * chore: Update docs and publish pipelines to use poetry * test: add HEM to parity test * bump up version number * 🔥 remove __init__ file * 🔧 update documentation action to use poetry * 🔧 update publish action to use poetry * 🔧 update test action to use poetry * 🎨 move details from setup to pyproject.toml file * ♻️ use version information from pyproject.toml * 🩹 add CCDC to unichem resources * ✏️ fix typos * ✏️ fix typos * ♻️ refactor configs * 🎨 use single function to get properties of rdkit objects * 🎨 use rdkit_object_property function insted of get_componet_atom_id * ✨ get name of clc from entities * 🩹 import importlib.metadata * 🎨 linting and formatting * 🩹 removed "data" from the path * 🔧 add poetry.toml file * 🔧 update poetry.lock file * 🔧 remove pre-commit from test workflow * 🎨 linting and formatting * 📝 replace github downloads with pypi downloads * bump up version * 📝 update changelog for release 0.8.6 * 🔧 create hook to generate poetry.lock file * 📝 update readme with installation using poetry * 🔧 update the rdkit version number * 🔧 update poetry lock file --------- Co-authored-by: roshan <roshan@ebi.ac.uk> Co-authored-by: Sreenath Sasidharan Nair <sreenath@ebi.ac.uk>
1 parent dbe3b87 commit 618f680

39 files changed

+1626
-235
lines changed

.github/workflows/documentation.yml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,18 @@ jobs:
2020
uses: actions/setup-python@v4
2121
with:
2222
python-version: "3.10"
23-
- run: |
24-
pip install rdkit
25-
pip install -e ".[docs]"
23+
- name: Install poetry
24+
uses: abatilo/actions-poetry@v2
25+
- name: Define a cache for the virtual environment based on the dependencies lock file
26+
uses: actions/cache@v3
27+
with:
28+
path: ./.venv
29+
key: venv-${{ hashFiles('poetry.lock') }}
30+
- name: Install the package with doc dependencies
31+
run: poetry install --with docs
2632
- run: |
2733
cd doc
28-
make html
34+
poetry run sphinx-build -b html . _build/html
2935
- name: Deploy pages
3036
uses: peaceiris/actions-gh-pages@v3
3137
with:

.github/workflows/publish.yml

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,19 @@ jobs:
2525
uses: actions/setup-python@v3
2626
with:
2727
python-version: "3.10"
28-
- name: Install dependencies
29-
run: |
30-
python -m pip install --upgrade pip
31-
pip install build
28+
- name: Install poetry
29+
uses: abatilo/actions-poetry@v2
30+
- name: Define a cache for the virtual environment based on the dependencies lock file
31+
uses: actions/cache@v3
32+
with:
33+
path: ./.venv
34+
key: venv-${{ hashFiles('poetry.lock') }}
35+
- name: Install the package with doc dependencies
36+
run: poetry install --without docs,tests
3237
- name: Build package
33-
run: python -m build
38+
run: poetry build
3439
- name: Publish package
35-
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
40+
uses: pypa/gh-action-pypi-publish@release/v1
3641
with:
3742
user: ${{ secrets.PYPI_USERNAME }}
3843
password: ${{ secrets.PYPI_PASSWORD }}

.github/workflows/tests.yml

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,13 @@ jobs:
1818
uses: actions/setup-python@v4
1919
with:
2020
python-version: "3.10"
21-
21+
- name: Install poetry
22+
uses: abatilo/actions-poetry@v2
23+
- name: Define a cache for the virtual environment based on the dependencies lock file
24+
uses: actions/cache@v3
25+
with:
26+
path: ./.venv
27+
key: venv-${{ hashFiles('poetry.lock') }}
2228
- run: |
23-
pip install rdkit==2023.09.6
24-
pip install -e ".[tests]"
25-
pip install pre-commit
26-
pre-commit install && pre-commit run --all
27-
- run: pytest --cov=pdbeccdutils
29+
poetry install --with tests
30+
poetry run pytest --cov=pdbeccdutils

.pre-commit-config.yaml

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,25 @@ repos:
1313
- id: fix-byte-order-marker
1414
- id: end-of-file-fixer
1515
- id: check-ast
16+
- id: no-commit-to-branch
1617

17-
- repo: https://github.com/ambv/black
18-
rev: 22.3.0
18+
- repo: https://github.com/astral-sh/ruff-pre-commit
19+
rev: v0.3.5 # Ruff version.
1920
hooks:
20-
- id: black
21-
22-
- repo: https://github.com/pycqa/flake8
23-
rev: 3.9.1
21+
- id: ruff
22+
args: [--fix]
23+
- id: ruff-format
24+
- repo: https://github.com/dzhu/rstfmt
25+
rev: v0.0.14
2426
hooks:
25-
- id: flake8
26-
args: ["--max-line-length=88", "--ignore=E501,W503"]
27-
exclude: \.cif$
27+
- id: rstfmt
28+
name: rST Formatter
29+
- repo: https://github.com/python-poetry/poetry
30+
rev: "1.8.2"
31+
hooks:
32+
- id: poetry-check
33+
- id: poetry-lock
34+
args:
35+
- "--no-update"
36+
- "--check"
37+

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Changelog
22

3+
## RELEASE 0.8.6 - Oct 28, 2024
4+
5+
### Features
6+
* Enable parsing of a subset of CCDs from the Chemical Component Dictionary
7+
* Added CCDC to UniChem resources
8+
9+
310
## RELEASE 0.8.5 - May 26, 2024
411

512
### Features

CITATION.cff

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
cff-version: 1.2.0
2+
message: "If you use this software, please cite it as below."
3+
authors:
4+
- family-names: "Kunnakkattu"
5+
given-names: "Ibrahim Roshan"
6+
orcid: "https://orcid.org/0000-0002-8646-0969"
7+
- family-names: "Pravda"
8+
given-names: "Lukas"
9+
- family-names: "Yuan"
10+
given-names: "Qi"
11+
- family-names: "S.Smart"
12+
given-names: "Oliver"
13+
- family-names: "Nadzirin"
14+
given-names: "Nurul"
15+
- family-names: "Anyango"
16+
given-names: "Stephen"
17+
- family-names: "Nair"
18+
given-names: "Sreenath"
19+
20+
title: "PDBe CCDUtils"
21+
version: 0.8.5
22+
date-released: 22/05/2024
23+
url: "https://github.com/PDBeurope/ccdutils"
24+
preferred-citation:
25+
type: article
26+
authors:
27+
- family-names: "Kunnakkattu"
28+
given-names: "Ibrahim Roshan"
29+
orcid: "https://orcid.org/0000-0002-8646-0969"
30+
- family-names: "Choudhary"
31+
given-names: "Preeti"
32+
orcid: "https://orcid.org/0000-0003-2340-3278"
33+
- family-names: "Pravda"
34+
given-names: "Lukas"
35+
- family-names: "Yuan"
36+
given-names: "Qi"
37+
- family-names: "S.Smart"
38+
given-names: "Oliver"
39+
- family-names: "Nadzirin"
40+
given-names: "Nurul"
41+
- family-names: "Anyango"
42+
given-names: "Stephen"
43+
- family-names: "Nair"
44+
given-names: "Sreenath"
45+
- family-names: "Velankar"
46+
given-names: "Sameer"
47+
orcid: "https://orcid.org/0000-0002-8439-5964"
48+
doi: "10.1186/s13321-023-00786-w"
49+
journal: "Journal of Cheminformatics"
50+
month: 12
51+
title: "PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank"
52+
volume: 15
53+
year: 2023

README.md

Lines changed: 54 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,83 @@
1-
[![CodeFactor](https://www.codefactor.io/repository/github/pdbeurope/ccdutils/badge/master)](https://www.codefactor.io/repository/github/pdbeurope/ccdutils/overview/master) ![PYPi](https://img.shields.io/pypi/v/pdbeccdutils?color=green&style=flat) ![GitHub](https://img.shields.io/github/license/pdbeurope/ccdutils) ![ccdutils documentation](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20documentation/badge.svg) ![ccdutils tests](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20tests/badge.svg)
1+
[![CodeFactor](https://www.codefactor.io/repository/github/PDBeurope/ccdutils/badge/master)](https://www.codefactor.io/repository/github/PDBeurope/ccdutils/overview/master) ![PYPi](https://img.shields.io/pypi/v/pdbeccdutils?color=green&style=flat) ![GitHub](https://img.shields.io/github/license/PDBeurope/ccdutils) ![ccdutils documentation](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20documentation/badge.svg) ![ccdutils tests](https://github.com/PDBeurope/ccdutils/workflows/ccdutils%20tests/badge.svg) ![PyPI Downloads](https://img.shields.io/pypi/dm/pdbeccdutils)
2+
23

34
# pdbeccdutils
45

5-
* A set of python tools to deal with PDB chemical components definitions
6-
for small molecules, taken from the [wwPDB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd) and [wwPDB The Biologically Interesting Molecule Reference Dictionary](https://www.wwpdb.org/data/bird)
6+
An RDKit-based python toolkit for parsing and processing small molecule definitions in [wwPDB Chemical Component Dictionary](https://www.wwpdb.org/data/ccd) and [wwPDB The Biologically Interesting Molecule Reference Dictionary](https://www.wwpdb.org/data/bird).`pdbeccdutils` provides streamlined access to all metadata of small molecules in the PDB and offers a set of convenient methods to compute various properties of small molecules using RDKIt such as 2D depictions, 3D conformers, physicochemical properties, matching common fragments and scaffolds, mapping to small-molecule databases using UniChem.
7+
8+
## Features
79

8-
* The tools use:
9-
* [RDKit](http://www.rdkit.org/) for chemistry. Presently tested with `2022.09.4`
10+
* `gemmi` CCD read/write.
11+
* Generation of 2D depictions (`No image available` generated if the flattening cannot be done) along with the quality check.
12+
* Generation of 3D conformations.
13+
* Fragment library search (PDBe hand-curated library, ENAMINE, DSI).
14+
* Chemical scaffolds (Murcko scaffold, Murcko general, BRICS).
15+
* Lightweight implementation of [parity method](https://doi.org/10.1016/j.str.2018.02.009) by Jon Tyzack.
16+
* RDKit molecular properties per component.
17+
* UniChem mapping.
18+
* Generating complete representation of multiple [Covalently Linked Components (CLC)](https://www.ebi.ac.uk/pdbe/news/introducing-covalently-linked-components)
19+
20+
## Dependencies
21+
22+
* [RDKit](http://www.rdkit.org/) for small molecule representation. Presently tested with `2023.9.6`
1023
* [GEMMI](https://gemmi.readthedocs.io/en/latest/index.html) for parsing mmCIF files.
1124
* [scipy](https://www.scipy.org/) for depiction quality check.
1225
* [numpy](https://www.numpy.org/) for molecular scaling.
1326
* [networkx](https://networkx.org/) for bound-molecules.
1427

15-
* Please note that the project is under active development.
1628

17-
## Installation instructions
29+
## Installation
1830

19-
* `pdbeccdutils` requires RDKit to be installed.
20-
The official RDKit documentation has [installation instructions for a variety of platforms](http://www.rdkit.org/docs/Install.html).
21-
For Linux/macOS this is most easily done using the Anaconda Python with commands similar to:
31+
create a [virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#create-and-use-virtual-environments) and install using pip
2232

23-
```console
24-
conda create -n rdkit-env rdkit python=3.9
25-
conda activate rdkit-env
26-
```
27-
28-
* Once you have installed RDKit, as described above then install `pdbeccdutils` using `pip`:
29-
30-
```console
33+
```bash
3134
pip install pdbeccdutils
3235
```
3336

34-
## Features
37+
## Contribution
38+
We encourage you to contribute to this project. The package uses [poetry](https://python-poetry.org/) for packaging and dependency management. You can develop locally using:
3539

36-
* `gemmi` CCD read/write.
37-
* Generation of 2D depictions (`No image available` generated if the flattening cannot be done) along with the quality check.
38-
* Generation of 3D conformations.
39-
* Fragment library search (PDBe hand-curated library, ENAMINE, DSI).
40-
* Chemical scaffolds (Murcko scaffold, Murcko general, BRICS).
41-
* Lightweight implementation of [parity method](https://doi.org/10.1016/j.str.2018.02.009) by Jon Tyzack.
42-
* RDKit molecular properties per component.
43-
* UniChem mapping.
40+
```bash
41+
git clone https://github.com/PDBeurope/ccdutils.git
42+
cd ccdutils
43+
pip install poetry
44+
poetry install --with tests,docs
45+
pre-commit install
46+
```
4447

45-
## TODO list
48+
The pre-commit hook will run linting, formatting and update `poetry.lock`. The `poetry.lock` file will lock all dependencies and ensure that they match pyproject.toml versions.
4649

47-
* Add more unit/regression tests to get higher code coverage.
48-
* Further improvements of the documentation.
50+
To add a new dependency
4951

52+
```bash
53+
# Latest resolvable version
54+
poetry add <package>
5055

51-
## Documentation
56+
# Optionally fix a version
57+
poetry add <package>@<version>
58+
```
59+
60+
To change a version of a dependency, either edit pyproject.toml and run:
5261

53-
The documentation depends on the following packages:
62+
```bash
63+
poetry sync --with dev
64+
```
5465

55-
* `sphinx`
56-
* `sphinx_rtd_theme`
57-
* `myst-parser`
58-
* `sphinx-autodoc-typehints`
66+
or
5967

60-
Note that `sphinx` needs to be a part of the virtual environment, if you want to generate documentation by yourself.
61-
Otherwise it cannot pick `rdkit` module. `sphinx_rtd_theme` is a theme providing nice `ReadtheDocs` mobile friendly style.
68+
```bash
69+
poetry add <package>@<version>
70+
```
6271

63-
* Generate *.rst* files to be included as a part of the documentation. Inside the directory `pdbeccdutils/doc` run the following commands to generate documentation.
64-
* Alternatively, use the `myst-parser` package to get the Markdown working.
6572

66-
Use the following to generate initial markup files to be used by sphinx. This needs to be used when adding another sub-packages.
73+
## Documentation
6774

68-
```console
69-
sphinx-apidoc -f -o /path/to/output/dir ../pdbeccdutils/
70-
```
75+
The documentation is generated using `sphinx` in `sphinx_rtd_theme` and hosted on GitHub Pages. To generate the documentation locally,
7176

72-
Use this to re-generate the documentation from the doc/ directory:
77+
```bash
78+
cd doc
79+
poetry run sphinx-build -b html . _build/html
7380

74-
```console
75-
make html
81+
# See the documentation at http://localhost:8080.
82+
python -m http.server 8080 -d _build/html
7683
```

doc/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,17 @@
1313
# documentation root, use os.path.abspath to make it absolute, like shown here.
1414
#
1515

16-
import pdbeccdutils
16+
import importlib.metadata
1717

1818
# region Project information
1919
project = "pdbeccdutils"
2020
copyright = "2020, Protein Data Bank in Europe"
2121
author = "Protein Data Bank in Europe"
2222

2323
# The short X.Y version
24-
version = pdbeccdutils.__version__
24+
version = importlib.metadata.version("pdbeccdutils")
2525
# The full version, including alpha/beta/rc tags
26-
release = pdbeccdutils.__version__
26+
release = importlib.metadata.version("pdbeccdutils")
2727

2828
# endregion
2929

pdbeccdutils/__init__.py

Lines changed: 0 additions & 1 deletion
This file was deleted.

pdbeccdutils/computations/parity_method.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
from rdkit import Chem
66
from rdkit.Chem import rdFMCS
77

8+
from pdbeccdutils.helpers import mol_tools
9+
from rdkit.Chem import BondType
810
from pdbeccdutils.core.models import ParityResult
911

1012

@@ -88,8 +90,17 @@ def compare_molecules(template, query, thresh=0.01, exact_match=False):
8890
ParityResult: Result of the PARITY comparison.
8991
"""
9092

91-
template_atoms = template.GetNumAtoms()
92-
query_atoms = query.GetNumAtoms()
93+
template_copy = Chem.RWMol(template)
94+
query_copy = Chem.RWMol(query)
95+
96+
# changing bondtype from DATIVE to ZERO as the SMARTS with DATIVE bondtype were missing
97+
# substructures using GetSubstructMatches (e.g. HEM)
98+
# refer rdkit github issue https://github.com/rdkit/rdkit/issues/7280
99+
mol_tools.change_bonds_type(template_copy, BondType.DATIVE, BondType.ZERO)
100+
mol_tools.change_bonds_type(query_copy, BondType.DATIVE, BondType.ZERO)
101+
102+
template_atoms = template_copy.GetNumAtoms()
103+
query_atoms = query_copy.GetNumAtoms()
93104

94105
min_num_atoms = min(template_atoms, query_atoms)
95106
max_sim_score = float(min_num_atoms) / float(
@@ -101,15 +112,15 @@ def compare_molecules(template, query, thresh=0.01, exact_match=False):
101112

102113
if not exact_match:
103114
mcs_graph = rdFMCS.FindMCS(
104-
[template, query],
115+
[template_copy, query_copy],
105116
bondCompare=rdFMCS.BondCompare.CompareAny,
106117
atomCompare=rdFMCS.AtomCompare.CompareAny,
107118
timeout=40,
108119
completeRingsOnly=True,
109120
)
110121
else:
111122
mcs_graph = rdFMCS.FindMCS(
112-
[template, query],
123+
[template_copy, query_copy],
113124
bondCompare=rdFMCS.BondCompare.CompareOrderExact,
114125
atomCompare=rdFMCS.AtomCompare.CompareElements,
115126
timeout=40,

0 commit comments

Comments
 (0)