Skip to content

Commit fbace43

Browse files
committed
add scripts and ReadMe from 2017 mof subset paper NO_JIRA
1 parent 74a5289 commit fbace43

File tree

5 files changed

+324
-34
lines changed

5 files changed

+324
-34
lines changed

scripts/ReadMe.md

+37-25
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,57 @@
1-
## Contents
1+
# Contents
22

3-
This folder contains scripts submitted by users or CCDC scientists for anyone to use freely.
3+
## Concat Mol2
44

5-
### Hydrogen bond propensity
6-
- Writes a `.docx report` of a hydrogen bond propensity calculation for any given `.mol2`/refcode.
5+
- Concatenates mol2 files present in working directory to a single `.mol2` file.
76

8-
### Multi-component hydrogen bond propensity
9-
- Performs a multi-component HBP calculation for a given library of co-formers.
7+
## Create CASTEP Input
108

11-
### Packing similarity dendrogram
12-
- Construct a dendrogram for an input set of structures based on packing-similarity analysis.
9+
- Creates input files (`.cell` and `.param`) files for a given compound through Mercury.
1310

14-
### GOLD-multi
15-
- Use the CSD Docking API and the multiprocessing module to parallelize GOLD docking.
11+
## Create GAUSSIAN Input
12+
13+
- Create GAUSSIAN input file (`.gjf`) for a given CSD refcode or `.mol2` file.
14+
15+
## Find Binding Conformation
1616

17-
### Find Binding Conformation
1817
- Generates idealized conformers for ligands and evaluates their RMSD to the conformation in the PDB.
1918

20-
### Concat Mol2
21-
- Concatenates mol2 files present in working directory to a single `.mol2` file.
19+
## GOLD-multi
2220

23-
### Create CASTEP Input
24-
- Creates input files (`.cell` and `.param`) files for a given compound through Mercury.
21+
- Use the CSD Docking API and the multiprocessing module to parallelize GOLD docking.
2522

26-
### Create GAUSSIAN Input
27-
- Create GAUSSIAN input file (`.gjf`) for a given CSD refcode or `.mol2` file.
23+
## Hydrogen bond propensity
24+
25+
- Writes a `.docx report` of a hydrogen bond propensity calculation for any given `.mol2`/refcode.
26+
27+
## MOF subset 2017 Chem Mater publication
28+
29+
- Two scripts that were supplementary information in the publication "Development of a Cambridge Structural Database Subset:
30+
A Collection of Metal–Organic Frameworks for Past, Present, and Future" DOI: <https://doi.org/10.1021/acs.chemmater.7b00441>
31+
32+
## Multi-component hydrogen bond propensity
33+
34+
- Performs a multi-component HBP calculation for a given library of co-formers.
35+
36+
## Packing similarity dendrogram
37+
38+
- Construct a dendrogram for an input set of structures based on packing-similarity analysis.
39+
40+
## Particle Rugosity
2841

29-
### Particle Rugosity
3042
- Calculates the simulated BFDH particle rugosity weighted by facet area.
3143

32-
## Tips
33-
A section for top tips in using the repository and GitHub.
34-
### Searching tips:
44+
## Tips
45+
46+
A section for top tips in using the repository and GitHub.
47+
48+
### Searching tips
3549

3650
The search bar in GitHub allows you to search for keywords mentioned in any file throughout the repository (in the main branch).
3751

3852
It is also possible to filter which file type you are interested in.
3953

40-
For example:
41-
"hydrogen bond"
54+
For example:
55+
"hydrogen bond"
4256

4357
<img src="../assets/search.gif" width="500px">
44-
45-
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
#
2+
# This script can be used for any purpose without limitation subject to the
3+
# conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx
4+
#
5+
# This permission notice and the following statement of attribution must be
6+
# included in all copies or substantial portions of this script.
7+
#
8+
# 2016-12-15: created by S. B. Wiggin, the Cambridge Crystallographic Data Centre
9+
# 2024-07-02: minor update to include using ccdc utilities to find the solvent file
10+
11+
"""
12+
Script to identify and remove bound solvent molecules from a MOF structure.
13+
14+
Solvents are identified using a defined list.
15+
Output in CIF format includes only framework component with all monodentate solvent removed.
16+
"""
17+
#######################################################################
18+
19+
import os
20+
import glob
21+
import argparse
22+
23+
from ccdc import io
24+
from ccdc import utilities
25+
26+
#######################################################################
27+
28+
arg_handler = argparse.ArgumentParser(description=__doc__)
29+
arg_handler.add_argument(
30+
'input_file',
31+
help='CSD .gcd file from which to read MOF structures'
32+
)
33+
arg_handler.add_argument(
34+
'-o', '--output-directory',
35+
help='Directory into which to write stripped structures'
36+
)
37+
arg_handler.add_argument(
38+
'-m', '--monodentate', default=False, action='store_true',
39+
help='Whether or not to strip all unidenate (or monodentate) ligands from the structure'
40+
)
41+
arg_handler.add_argument(
42+
'-s', '--solvent-file',
43+
help='Location of solvent file'
44+
)
45+
46+
args = arg_handler.parse_args()
47+
if not args.output_directory:
48+
args.output_directory = os.path.dirname(args.input_file)
49+
50+
# Define the solvent smiles patterns
51+
if not args.solvent_file:
52+
args.solvent_file = utilities.Resources().get_ccdc_solvents_dir()
53+
54+
if os.path.isdir(args.solvent_file):
55+
solvent_smiles = [
56+
io.MoleculeReader(f)[0].smiles
57+
for f in glob.glob(os.path.join(args.solvent_file, '*.mol2'))
58+
]
59+
else:
60+
solvent_smiles = [m.smiles for m in io.MoleculeReader(args.solvent_file)]
61+
62+
63+
#######################################################################
64+
65+
66+
def is_multidentate(c, mol):
67+
"""
68+
Check for components bonded to metals more than once.
69+
If monodentate is not specified in the arguments, skip this test.
70+
"""
71+
if not args.monodentate:
72+
return True
73+
got_one = False
74+
for a in c.atoms:
75+
orig_a = mol.atom(a.label)
76+
if any(x.is_metal for b in orig_a.bonds for x in b.atoms):
77+
if got_one:
78+
return True
79+
got_one = True
80+
return False
81+
82+
83+
def is_solvent(c):
84+
"""Check if this component is a solvent."""
85+
return c.smiles == 'O' or c.smiles in solvent_smiles
86+
87+
88+
def has_metal(c):
89+
"""Check if this component has any metals."""
90+
return any(a.is_metal for a in c.atoms)
91+
92+
93+
# Iterate over entries
94+
try:
95+
for entry in io.EntryReader(args.input_file):
96+
if entry.has_3d_structure:
97+
# Ensure labels are unique
98+
mol = entry.molecule
99+
mol.normalise_labels()
100+
# Use a copy
101+
clone = mol.copy()
102+
# Remove all bonds containing a metal atom
103+
clone.remove_bonds(b for b in clone.bonds if any(a.is_metal for a in b.atoms))
104+
# Work out which components to remove
105+
to_remove = [
106+
c
107+
for c in clone.components
108+
if not has_metal(c) and (not is_multidentate(c, mol) or is_solvent(c))
109+
]
110+
# Remove the atoms of selected components
111+
mol.remove_atoms(
112+
mol.atom(a.label) for c in to_remove for a in c.atoms
113+
)
114+
# Write the CIF
115+
entry.crystal.molecule = mol
116+
with io.CrystalWriter('%s/%s_stripped.cif' % (args.output_directory, entry.identifier)) as writer:
117+
writer.write(entry.crystal)
118+
except RuntimeError:
119+
print('File format not recognised')
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
#
2+
# This script can be used for any purpose without limitation subject to the
3+
# conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx
4+
#
5+
# This permission notice and the following statement of attribution must be
6+
# included in all copies or substantial portions of this script.
7+
#
8+
# 2016-12-15: created by S. B. Wiggin, the Cambridge Crystallographic Data Centre
9+
# 2024-07-02: minor update to include using ccdc utilities to find the solvent file
10+
11+
"""
12+
Script to identify and remove bound solvent molecules from a MOF structure.
13+
14+
Solvents are identified using a defined list.
15+
Output in CIF format includes only framework component with all monodentate solvent removed.
16+
"""
17+
#######################################################################
18+
19+
import os
20+
import glob
21+
22+
from ccdc import io
23+
from ccdc import utilities
24+
from mercury_interface import MercuryInterface
25+
26+
#######################################################################
27+
28+
helper = MercuryInterface()
29+
solvent_smiles = []
30+
31+
# Define the solvent smiles patterns
32+
solvent_file = utilities.Resources().get_ccdc_solvents_dir()
33+
34+
if os.path.isdir(solvent_file):
35+
solvent_smiles = [
36+
io.MoleculeReader(f)[0].smiles
37+
for f in glob.glob(os.path.join(solvent_file, '*.mol2'))
38+
]
39+
40+
else:
41+
html_file = helper.output_html_file
42+
f = open(html_file, "w")
43+
f.write('<br>')
44+
f.write('Sorry, unable to locate solvent files in the CCDC directory')
45+
f.write('<br>')
46+
f.close()
47+
# a user-defined solvent directory could be added here instead
48+
49+
#######################################################################
50+
51+
52+
def is_solvent(c):
53+
"""Check if this component is a solvent."""
54+
return c.smiles == 'O' or c.smiles in solvent_smiles
55+
56+
57+
def has_metal(c):
58+
"""Check if this component has any metals."""
59+
return any(a.is_metal for a in c.atoms)
60+
61+
62+
entry = helper.current_entry
63+
if entry.has_3d_structure:
64+
# Ensure labels are unique
65+
mol = entry.molecule
66+
mol.normalise_labels()
67+
# Use a copy
68+
clone = mol.copy()
69+
# Remove all bonds containing a metal atom
70+
clone.remove_bonds(b for b in clone.bonds if any(a.is_metal for a in b.atoms))
71+
# Work out which components to remove
72+
to_remove = [
73+
c
74+
for c in clone.components
75+
if not has_metal(c) and is_solvent(c)
76+
]
77+
# Remove the atoms of selected components
78+
mol.remove_atoms(
79+
mol.atom(a.label) for c in to_remove for a in c.atoms
80+
)
81+
# Write the CIF
82+
entry.crystal.molecule = mol
83+
with (io.CrystalWriter('%s/%s_stripped.cif' % (helper.options['working_directory_path'], entry.identifier)) as
84+
writer):
85+
writer.write(entry.crystal)
86+
html_file = helper.output_html_file
87+
f = open(html_file, "w")
88+
f.write('<br>')
89+
f.write('Cif file containing MOF framework without monodentate solvent written to your output directory')
90+
f.write('<br>')
91+
f.close()
92+
else:
93+
html_file = helper.output_html_file
94+
f = open(html_file, "w")
95+
f.write('<br>')
96+
f.write('Sorry, this script will only work for CSD entries containing atomic coordinates')
97+
f.write('<br>')
98+
f.close()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# MOF solvent removal
2+
3+
## Summary
4+
5+
Scripts included in the supporting information of the article "Development of a Cambridge Structural Database Subset:
6+
A Collection of Metal–Organic Frameworks for Past, Present, and Future", Peyman Z. Moghadam, Aurelia Li,
7+
Seth B. Wiggin, Andi Tao, Andrew G. P. Maloney, Peter A. Wood, Suzanna C. Ward, and David Fairen-Jimenez
8+
*Chem. Mater.* **2017**, 29, 7, 2618–2625, DOI: <https://doi.org/10.1021/acs.chemmater.7b00441>
9+
10+
Scripts are essentially equivalent: one is designed to be run through the Mercury CSD Python API menu to
11+
remove solvent from a single structure present in the visualiser, the second runs from the command line
12+
and takes a list of CSD entries (a .gcd file) to run through the solvent removal process in bulk.
13+
14+
## Requirements
15+
16+
Tested with CSD Python API 3.9.18
17+
18+
## Licensing Requirements
19+
20+
CSD-Core
21+
22+
## Instructions on running
23+
24+
For the script Mercury_MOF_solvent_removal.py:
25+
26+
- In Mercury, pick **CSD Python API** in the top-level menu, then **Options…** in the resulting pull-down menu.
27+
- The Mercury Scripting Configuration control window will be displayed; from the *Additional Mercury Script Locations*
28+
section, use the **Add Location** button to navigate to a folder location containing the script
29+
- It will then be possible to run the script directly from the CSD Python API menu, with the script running on the structure
30+
shown in the visualiser
31+
32+
For the script Command_prompt_MOF_solvent_removal.py
33+
34+
```cmd
35+
python Command_prompt_MOF_solvent_removal.py <search_results>.gcd
36+
```
37+
38+
```cmd
39+
positional arguments:
40+
input_file CSD .gcd file from which to read MOF structures
41+
42+
optional arguments:
43+
-h, --help show this help message and exit
44+
-o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
45+
Directory into which to write stripped structures
46+
-m, --monodentate
47+
Whether or not to strip all unidenate (or monodentate) ligands from the structure
48+
-s SOLVENT_FILE, --solvent-file SOLVENT_FILE
49+
The location of a solvent file
50+
```
51+
52+
## Author
53+
54+
*S.B.Wiggin* (2016)
55+
56+
> For feedback or to report any issues please contact [[email protected]](mailto:[email protected])

0 commit comments

Comments
 (0)