Skip to content

Tutorial02

Claudio Bantaloukas edited this page Nov 2, 2021 · 1 revision

Tutorial 2: Solving a Structure from a Low Resolution Data Set

Introduction

The aim of this tutorial is to guide you through the structure solution of chloropropamide and it assumes you have completed Tutorial 1. In this Tutorial you will learn how to:

  • Handle structures that are more flexible than hydrochlorothiazide.

  • Solve a structure from a low resolution data set.

  • See one of the potential pitfalls of global optimisation i.e. local minima.

Data

The data set Tutorial_2.xye is a synchrotron X-ray diffraction data set collected on BM16 at the European Synchrotron Radiation Facility. The incident wavelength was 0.800077 Å.

Stage 1: Reading the data

  • Open DASH and select the directory where the data resides.

  • Select View data / determine peak positions and click Next >.

  • Select the file Tutorial_2.xye using the Browse... button.

  • Click Next >.

  • Check that the wavelength and radiation source have been set correctly and click Next >.

Stage 2: Examining the data

Note that this data set was collected quickly at the end of a day’s beamtime, and so only extends to 22° 2θ. Hence the data set extends to a resolution of only ~2 Å. Truncate the data to start at 1.5° to remove the data points affected by the beam stop and then subtract the background using the default window value of 100 and click Next >.

Stage 3. Fitting the peaks to determine the exact peak positions

Select the first twenty peaks using the method described in Tutorial 1: Step-by-Step Structure Solution of Hydrochlorothiazide.

Here is a guide to the positions (° 2θ) of the first 20 peaks:

3.4383 6.1080 6.8792 8.5344 8.9466
9.4316 9.9800 10.1033 10.2499 10.3269
10.7041 11.1635 11.3767 11.5027 12.2579
12.3053 13.3092 13.4047 13.5143 13.5696
  • Click Next >.

  • Select Run> to run DICVOL or use another indexing program as described in Tutorial 1.

Stage 4. Indexing

Your indexing program may reveal a number of possible unit cells. The unit cell with the highest figures of merit should be orthorhombic with a volume of ~1266 Å3. DICVOL, for example, returns an orthorhombic cell with a = 26.66826 Å, b = 9.08435 Å, c = 5.22571 Å and volume = 1265.999 Å3 with figures of merit M(20) = 107.1 and F(20) = 506.6.

Closer inspection of the other unit cells that are suggested by the indexing program will reveal that many of them are slight monoclinic distortions of the above unit cell, with almost identical volumes and lattice parameters and beta ~90°. Other suggestions generally have much lower figures of merit and can be ruled out immediately.

Considering that the orthorhombic unit cell has the best figures of merit, and that it is usually best to try the simplest option first, we will proceed to the next stage assuming an orthorhombic unit cell, with the lattice parameters given above.

Stage 5. Stop and Think

Does the cell make sense? In this case we estimate the molecular volume to be ~330 Å3, from the formula C10H13N2O3SCl and approximate volumes C, N, O = 15 Å3, S, Cl = 25 Å3 and H = 5 Å3 in the molecule. Therefore, given the unit cell volume of ~1266 Å3 we know from this very rough approximation that the cell is most likely to accommodate 4 molecules. At this point, your knowledge of space group frequencies should suggest that P212121 is a strong possibility. (A list of space groups and their frequencies is given in Appendix D of the DASH User Guide.)

Stage 6. Checking the Cell and Determining the Space Group

The space group P222 will automatically have been selected. The presence of some excess tick marks indicates probable systematic absences; this means that a space group of higher symmetry might be more appropriate. Scroll through some of the possible space groups. You will see that some of the space groups can be ruled out immediately; for example, face-centred and body-centred lattices leave some peaks unaccounted for. Many of the primitive lattice space groups appear likely from the tick mark positions. In this situation, where more than one possible space group exists, it is logical to begin with the most frequently occurring space group. In this case, the most frequently occurring orthorhombic space group is P212121, so select this (number 19), confirm visually that it matches the data and click Next >.

Stage 7. Extracting Intensities

Choose 7 isolated peaks from across the pattern. Fit these peaks using the method described in Tutorial 1 and then carry out the Pawley refinement. The initial 3 cycles of least squares refinement only involve the terms corresponding to the background and to the individual reflection intensities, accept these three cycles. The next 5 cycles of least squares refinement involve the terms describing background, intensities, unit cell and zero point. These refinement details will be suggested automatically by DASH.

When these cycles are complete check the difference line; this should be almost flat by this point. The final Pawley χ2 should be between 3 and 4.

Accept this Pawley fit and save it as Tutorial_2.sdi (exit from DASH, if you wish, at this point in the tutorial).

Stage 8. Molecule Construction

Construct a 3D molecular description of the molecule using your favourite modelling software and save it in pdb, mol or mol2 format. This can be done, for example, by importing an ISIS/Draw sketch into WebLabViewer (see Tutorial 1 for further details). Save this as Tutorial_2.pdb, Tutorial_2.mol or Tutorial_2.mol2. (If you do not have a model building program to hand, there is a file supplied with the tutorial, Tutorial_2.mol2)

Stage 9. Setting up the Structure Solution Run

  • Start DASH as before and select Simulated annealing structure solution from the Wizard.

  • Select the Tutorial_2.sdi file.

  • Click on the icon and select either Tutorial_2.pdb, Tutorial_2.mol, or Tutorial_2.mol2 (the file that you created in Stage 8); a Z-matrix file called Tutorial_2_1.zmatrix will be generated automatically.

  • Read in the Tutorial_2_1.zmatrix file and click Next >.

Note that as Z = 4 for P212121, it follows that Z’ = 1 because we know from Stage 5 that the cell is most likely to accommodate 4 molecules. Therefore, only one Z-matrix is required.

At this point DASH will confirm that there are 12 independent parameters. These parameters are listed when you click on Next >. There are 3 parameters describing the positional co-ordinates, 4 (of which 3 independent) describing the molecular orientation within the unit cell and 6 variable torsion angles. All F boxes are unticked by default, indicating that all 13 parameters are allowed to vary during structure solution. Click Next > to proceed to the Simulated Annealing Protocol window. The default values can be used for this example, so click Next >, then Solve > to begin the simulated annealing.

NB: Keen chemists should resist the urge to restrict the torsional rotations pertaining to the two bonds around the carboxyl group!

Stage 10. Monitoring Structure Solution Progress

The progress of the structure solution can be followed by monitoring the profile χ2 and the difference plot.

Once a profile χ2 of approximately 10 - 12 or less is reached, you can be sure that a very good structure has been found, as this value is only ~3 times the Pawley χ2 value. Finalise the solution by selecting the Local minimisation button and accepting the answer.

If your final profile χ2 is a bit higher than 10, you are clearly close and perhaps only a single atom at the end of the chain is slightly misplaced. Take a close look at the output structure and read the section below.

Stage 11. Examining the Output Structure

View the structure using the View button in the Simulated Annealing Status window. The structure should be chemically reasonable in terms of molecular conformation and intermolecular distances. The potential for H-bonding is obvious.

We can examine similar structures in the Cambridge Structural Database, and observe that where there are more acceptors, O, than donors NH, firstly the donors must be satisfied, and secondly, bifurcation of the H-bonds is quite common. This is what we see below, using the Mercury visualiser with Packing and H-bond switched on:

If you have time try doing several SA solution runs and compare the results. This is easy to do in DASH, notice that in the Simulated Annealing Protocol window there is the option to start a set of runs, each with a different seed for the random number sequence. If you identify that reasonable termination criteria would be Max. number of moves / run = 2,000,000 and Multiplier for Pawley χ2 as 3.5, the runs will terminate either at move number 2,000,000, or when the profile χ2 falls below a value of 3.5 times the χ2 for the Pawley fit. The best solution files are stored in sequence, if you called your run fit1 the output files are fit1_001.pdb, fit1_002.pdb, etc.

The accuracy of the solution can be assessed by comparing these independent solutions. An example is given here of an output structure (red) a final profile χ2 of only slightly higher than the lowest χ2 solutions found in a set of runs.

In this case, it is a structure that differs only slightly from the correct structures, corresponding to a local minimum with a profile χ2 only slightly higher than that of the correct crystal structure. The H-bonding scheme is correct, but there are small differences in the terminal side-chain torsion angles.

Stage 12. Applying Modal Torsion Angle Restraints

In the following section the use of modal torsion angle ranges during the Simulated Annealing stage is demonstrated using DASH, and also how this can be facilitated using the CSD Portfolio which now includes Mogul. Mogul is a molecular geometry database which forms part of the CSD Portfolio and is available separately from the CCDC.

  • Press <Back (from the Solution Summary dialogue) to return to the introductory Wizard Window. Choose the option Simulated annealing structure solution. Reload the z-matrix file for Tutorial 2 and then proceed to the Parameter Bounds dialogue box as before.

  • Two methods of accessing torsion angle distributions from the CSD are provided:
    Using DASH with Mogul.
    Using DASH with the CSD, ConQuest and Mercury.

Using DASH with Mogul

  • If DASH has access to Mogul, the distributions of each torsion angle in the CSD will be examined using Mogul and restricted ranges will be determined from these data will be applied. Scroll down the parameters until the torsion angles are visible.

  • The first torsion angle listed in the table is S8:N11:C12:O13. Click on the Modal button. If the correct path to Mogul is present in the DASH Configuration window (access this from the top-level menu by selecting select Options, then Configuration) then a histogram of the Mogul hits for the selected torsion angle will appear. If a path to Mogul is not present in the Configuration window, hit the Browse... button in this window and find the location of your installation of the Mogul executable. If a standard installation of Mogul has been performed, DASH should automatically pick up the path to Mogul from the Windows Registry:

  • 47 hits for the torsion angle have been found by Mogul, and these can be viewed by clicking on the View Structures tab in the Mogul window. If individual bars of the histogram are selected (deselect all hits in histogram and then click on the histogram bars of interest) only these structures are displayed in the View Structures window. For example, if the bars around 180o are selected and the structures viewed then Refcodes QERXUK, TOHBUN and TOHBUN01 are displayed.

  • Returning to the histogram in the Results and analysis pane, it is clear that the torsion angle is most often found to be around 0o with a very small percentage of structures found with torsion angles of 180o. (It should be noted that the Mogul histogram displays all torsion angles, positive and negative on positive axes, i.e. 0-180o). Close the Mogul window (select File from the top-level menu and click on Exit in the pull-down menu).

  • The Modal Torsion Angle Ranges window of DASH will now be displayed. DASH performs a very simple analysis of the distribution of torsion angles returned from Mogul and, if it recognises the torsion angle distribution, will recommend a range of torsion angles to be searched during the simulated annealing; these are displayed in the Sampling Ranges section of the window.

  • These ranges are only a recommendation and can be edited and altered. To alter the torsion angle ranges, type the new value in the boxes labelled Lower and Upper. DASH will calculate the other torsion angle ranges depending on whether the torsion angle has been chosen to be bimodal or trimodal; these ranges are displayed in the grey boxes. To accept the torsion angles displayed click on OK. To reject modal torsion angle ranges, click on Non Modal. Clicking Cancel will remove any edits made to the torsion angle ranges since OK was last clicked.

  • In this case, the suggestion of torsion angle ranges of -20o to 0o and 0o to 20o is appropriate and the ranges suggested by DASH should be accepted by clicking on OK. The Parameter Bounds dialogue box will be displayed and the torsion angle S8:N11:C12:O13 will be displayed in red indicating that modal torsion angle ranges have been applied.

  • Next click on the Modal button for the next torsion angle, C15:N14:C12:N11. Again, a histogram generated by Mogul will appear and this time it will show a very clear distribution of torsion angles around 180o. When the Mogul window is closed the Modal Torsion Angle Ranges dialogue box will be shown with a recommended torsion angle distribution of Bimodal around 180 degrees. The torsion angle ranges displayed in the Sampling Ranges boxes are satisfactory so click OK. This procedure should be repeated for all torsion angle ranges.

  • For torsion angle C4:S8:N11:C12, the histogram displayed in Mogul shows a cluster of data around 50o to 100o. Upon closing the Mogul window, DASH recommends a bimodal torsion angle range of 45o to 135o. This range adequately covers the distribution returned by Mogul and can be accepted by clicking OK. If you wish, the range can be narrowed by editing the Upper bounds box.

  • In the case of torsion angle C16:C15:N14:C12 DASH cannot process the torsion angle distribution returned from Mogul as it does not recognise the shape of the distribution. Modal torsion angle ranges can either be entered manually in the Sampling Ranges boxes (for example a lower bound of 50o and an upper bound of 180o could be used) or no torsion angle ranges need be applied. In this case, click the Non-Modal button.

  • For torsion angle C3:C4:S8:N11 a bimodal distribution is recommended by DASH, 45o to 135o. This covers the majority of the torsion angle distribution returned from Mogul. However, if this range is accepted and OK is clicked a warning will pop-up stating that the initial value of the torsion angle is not within the defined ranges. In this case it is acceptable to change the initial value of the torsion angle to, for example 50o. Clicking OK now will apply the torsion angle ranges.

  • For torsion angle C17:C16:C15:N15 the histogram of Mogul shows peaks at approximately 60o and 180o indicating a trimodal distribution. Upon closing the Mogul window DASH recommends a trimodal distribution with ranges -150o to 150o, 30o to 90o and -30o to -90o. These ranges are appropriate so click on OK to accept them.

  • Out of the 6 torsion angles, modal ranges have been set for 5 of them. Proceed through the simulated annealing, as before.

  • A simulated annealing run with 10 starts, maximum number of moves 10 million, random seeds 315, 159 was performed with the modal torsion angle ranges recommended by DASH. Of the 10 runs, 8 had a value of profile 2 below 10 and the average number of moves required was 646 750. A similar run performed without modal torsion angles resulted in 7/10 solutions with a profile 2 below 10 and the average number of moves required was 1, 828 450.

Using DASH with the CSD, ConQuest and Mercury

  • If you have access to the Cambridge Structural Database (CSD), ConQuest and Mercury you can perform the following torsion angle searches for yourself. If not, results for the searches are given. The first torsion angle listed in the Parameter Bounds dialogue box is S8:N11:C12:O13 and it has an initial value of 0.25o. Draw an appropriate fragment in ConQuest and define the torsion angle of interest. A screenshot of a query used is given below:
  • By viewing in the Data Analysis module of Mercury the torsion angles returned, it is clear that this torsion angle is well described by a bimodal distribution at -160 to 160o and -20 to 20o:
  • Return to the Parameter Bounds dialogue box in DASH, hit the Modal button in the row of the S8:N11:C12:O13 torsion angle. The Modal Torsion Angle Ranges dialogue box will pop up and it is here that the determined ranges can be entered:

  • In the Lower box enter -20.00 and in the Upper box enter 20.00. Since the Bimodal radio button is active (at the top of the dialogue box) the complementary bimodal range at -160.00 and 160.00o will be determined and displayed. Once you are satisfied that the correct ranges are displayed, press OK. This will return you to the Parameter Bounds dialogue box and the row of the S8:N11:C12:O13 torsion angle will be displayed in red, indicating that modal ranges are active.

  • Should you wish to define a trimodal torsion angle range, enter the upper and lower bounds of a single range in the Upper and Lower boxes (for example -160o to 160o). Hitting the Trimodal radio button will generate two further torsion angle ranges at +/- 120o from the initial range you have specified (for example at 40 to 80o, and -40 to -80o).

  • The following table details the results of searches performed in the CSD v5.36 for all the six torsion angles of this molecule:

Torsion Angle Initial value (o) Mode Modal Ranges (o) Number of Observations
S8:N11:C12:O13 0.25 Bimodal -160 to 160 and -20 to 20 126
C15:N14:C12:N11 179.72 Bimodal -160 to 160 and -20 to 20 51
C4:S8:N11:C12 65.08 Bimodal 50 to 90 and -50 to -90 117
C16:C15:N14:C12 -179.37 Unimodal 90 to -90 415
C3:C4:S8:N11 27.90[^1] Bimodal 60 to 120 and -60 to -120 5080
C17:C16:C15:N14 -178.54 Trimodal -160 to 160,b40 to 80 and -40 to -80 287
  • Enter the above torsion angle ranges and start the simulated annealing process.

  • In our hands a simulated annealing run started with random seeds 159 and 314 gave 10/10 solutions in an average of 1199250 moves when no restraints were applied. 3/10 solutions had a profile 2 value below 10.0. With the above restraints applied and therefore the search space reduced, 10/10 runs (starting with the same random seeds) solved and the average number of moves required was 796250. There were 10 solutions found with a profile 2 below 10.0

  • Thus if you have a problem that is proving difficult to solve, with no restraints applied during simulated annealing, it may be valuable to see if there are torsion angle ranges that can be defined (from a search of the CSD) to reduce the search space.

Stage 13. Conclusion

Global optimisation processes may locate local minima, particularly if (a) Z’ > 1 or (b) the data are of limited resolution. Looking at the above example of a false minimum, it is clear that superficially, they can look chemically sensible. This is hardly surprising, as they lie at a point on the χ2 hypersurface very close to the global minimum of the crystal structure. Accordingly, it is always prudent to run a structure solution multiple times (with different random number seeds) to ensure that a consistent minimum is reached.

References

*DICVOL Program:
*D. Louer & M. Louer (1972) J. Appl. Crystallogr. 5, 271-275.
A. Boultif & D. Louer (1991) J. Appl. Crystallogr. 24, 987-993.

*Single crystal structure (CSD reference code BEDMIG):
*C.H. Koo, S.I. Cho, Y.H. Yeon (1980) Arch. Pharm. Res., 3, 37.

*Retrieval of Crystallographically-Derived Molecular Geometry Information
*Bruno, I.J., Cole, J.C., Kessler, M., Luo, J., Motherwell, W.D.S., Purkis, L.H., Smith, B.R.,Taylor, R., Cooper, R.I., Harris, S.E., Orpen, A.G.
J. Chem. Inf. Comput. Sci. (2004), 44, 2133-2144