Updated Extra MD Analysis notebook

- Formatted header - Added TOC and sections - added some explanatory text for each section - added reference to MDAnalysis manual
CCPBioSim · Sep 6, 2024 · 29532e7 · 29532e7
1 parent fecb90f
commit 29532e7
Showing 1 changed file with 74 additions and 103 deletions.
diff --git a/5_Analysis_MDAnalysis/5_Extra_p24_analysis.ipynb b/5_Analysis_MDAnalysis/5_Extra_p24_analysis.ipynb
@@ -16,12 +16,40 @@
     "**Author**: Dr Matteo Degiacomi ([email protected])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3d12646a-486f-43b8-aaee-76157acd66cf",
+   "metadata": {},
+   "source": [
+    "**Jupyter cheat sheet:**\n",
+    "- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;\n",
+    "- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;\n",
+    "\n",
+    "<div class=\"alert alert-info\"><b> Remember: variables persist between cells</b> \n",
+    "    \n",
+    "Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the <em>order</em> in which they appear. Python will remember <em>all</em> the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. </div> "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e969ae8-2e99-48d4-aa1d-c0c090c274e9",
+   "metadata": {},
+   "source": [
+    "## Table of Contents\n",
+    "\n",
+    "[1.   Introduction](#intro)  \n",
+    "[2.   Root Mean Square Deviations (RMSDs)](#rmsd)    \n",
+    "[3.   Pairwise RMSD](#p_rmsd)  \n",
+    "[4.   Root Mean Square Fluctuation (RMSF)](#rmsf)   \n",
+    "[5.   Radius of gyration and end-to-end distance](#rgyr)    "
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ea2fe146-eea8-46f7-8696-ff5fa5cb823d",
    "metadata": {},
    "source": [
-    "## Google Colab setup\n",
+    "## 0. Google Colab setup\n",
     "<div class=\"alert alert-warning\">\n",
     "<b>Attention:</b> Please only run the following cells if you are using Colab! These cells install necessary packages and download data.</div>"
    ]
@@ -65,33 +93,13 @@
     "os.chdir(f\"CCP5_Simulation_of_BioMolecules{os.sep}4_Analysis_MDAnalysis\")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "3d12646a-486f-43b8-aaee-76157acd66cf",
-   "metadata": {},
-   "source": [
-    "## Jupyter cheat sheet\n",
-    "\n",
-    "- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;\n",
-    "- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1d143168-1643-4caa-907e-15c87cdfb52d",
-   "metadata": {},
-   "source": [
-    "<div class=\"alert alert-warning\"><b> REMEMBER: variables persist between cells</b> \n",
-    "    \n",
-    "Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the <em>order</em> in which they appear. Python will remember <em>all</em> the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. </div> "
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "fe8671d2-93ec-4fb3-b733-a6d47f818a2b",
    "metadata": {},
    "source": [
-    "## Introduction"
+    "## 1. Introduction\n",
+    "<a id='intro'></a>"
    ]
   },
   {
@@ -141,15 +149,16 @@
    "id": "30b29017-8118-4ea9-9476-b0471cfb10c0",
    "metadata": {},
    "source": [
-    "## Root Mean Square Deviations (RMSD)"
+    "## 2. Root Mean Square Deviations (RMSDs)\n",
+    "<a id='rmsd'></a>"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "47d163d5-4f5c-4397-9610-e26a38e3bae5",
    "metadata": {},
    "source": [
-    "Let's demonstrate how the time evolution of RMSD with respect to the first frame changes with different alignments and atoms of interest."
+    "Let's demonstrate how the time evolution of [RMSD](https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/rms.html) with respect to the first frame changes with different alignments and atoms of interest."
    ]
   },
   {
@@ -172,6 +181,14 @@
     "R_D2.run()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2147a729-4b24-42e6-b289-4e51eda09c5f",
+   "metadata": {},
+   "source": [
+    "Now, let's plot everything!"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -229,7 +246,7 @@
    "id": "eb99eb0f-fc8a-4a2a-b74b-f575b26bbc07",
    "metadata": {},
    "source": [
-    "For the first slide on RMSD, let's also plot only a single RMSD profile"
+    "For the first slide in the lecture featuring RMSD, let's also plot only a single RMSD profile"
    ]
   },
   {
@@ -257,7 +274,16 @@
    "id": "408d030b-a5c6-446f-9840-bf67820e6443",
    "metadata": {},
    "source": [
-    "## pairwise RMSD"
+    "## 3. Pairwise RMSD\n",
+    "<a id='p_rmsd'></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee67f243-2f0c-495b-a310-ea141b7b6f3a",
+   "metadata": {},
+   "source": [
+    "Now, let's generate a [pairwise RMSD](https://userguide.mdanalysis.org/stable/examples/analysis/alignment_and_rms/pairwise_rmsd.html) plot, i.e., a surface plot reporting on the RMSD of each conformation vs each other conformation."
    ]
   },
   {
@@ -292,15 +318,16 @@
    "id": "9d30c163-f7a7-489a-b9fe-06608d9cd245",
    "metadata": {},
    "source": [
-    "## Root Mean Square Fluctuations (RMSF)"
+    "## 4. Root Mean Square Fluctuations (RMSF)\n",
+    "<a id='rmsf'></a>"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "e4e4c72a-4255-4fd2-8ecf-502d212c697b",
    "metadata": {},
    "source": [
-    "We start by defining a function that aligns the trajectory and calculates the RMSF of a selection of interest."
+    "The Root Mean Square Fluctuation ([RMSF](https://userguide.mdanalysis.org/stable/examples/analysis/alignment_and_rms/rmsf.html)) reports on the amount of displacement of an amino acid w.r.t. its mean position during the simulation. We start by defining a function that aligns the trajectory and calculates the RMSF of a selection of interest."
    ]
   },
   {
@@ -395,7 +422,16 @@
    "id": "453510f6-2bd4-4856-a871-3c28bafd4f49",
    "metadata": {},
    "source": [
-    "## Radius of gyration and end-to-end distance"
+    "## 5. Radius of gyration and end-to-end distance\n",
+    "<a id='rgyr'></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9e0f916-9497-42e6-a782-cbd51e7e12ec",
+   "metadata": {},
+   "source": [
+    "To calculate radius of gyration (Rg) and end-to-end distance of a protein, we will create a few <code>AtomGroup</code>s. The radius of gyration is a quantity that can be directly extracted from any <code>AtomGroup</code> (here, we will select the whole protein). N-terminus and C-terminus coordinates, necessary to calculate the end-to-end distance, can be extracted as the first and last atom in <code>AtomGroup</code>s containing coordinates of N and C atoms, respectively."
    ]
   },
   {
@@ -405,7 +441,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#u = mda.Universe(\"trajectory_formatted.pdb\")\n",
     "nterm = u.select_atoms('name N')[0]\n",
     "cterm = u.select_atoms('name C')[-1]\n",
     "bb = u.select_atoms('protein')\n",
@@ -421,6 +456,14 @@
     "    rg.append(rgyr)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c1dfd9f2-2d55-4ab1-a9d2-13685b853947",
+   "metadata": {},
+   "source": [
+    "Let's now plot the quantities we have extracted for each simulation snapshot!"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -444,78 +487,6 @@
     "fig.savefig(\"rg_dist_p24.png\")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "86865f4e-ea12-4fda-860d-99ba2f250daf",
-   "metadata": {},
-   "source": [
-    "## Hydrogen bonds"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ecae01ea-b687-43ac-ae3c-3b86d37a4859",
-   "metadata": {},
-   "source": [
-    "The function below can work the hydrogen bonds in your protein. Can you work out how to use it?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1657a552-0442-4c57-a25a-f1bc019c30e1",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def hbonds(hydrogens, acceptors):\n",
-    "    \n",
-    "    \"\"\" this function calculates hydrogen bonds \"\"\"\n",
-    "    \n",
-    "    acc_idx, hyd_idx = idx.T\n",
-    "    \n",
-    "    idx, dists = mda.lib.distances.capped_distance(acceptors.positions, \n",
-    "                                                   hydrogens.positions, \n",
-    "                                                   max_cutoff=3.0,\n",
-    "                                                   box=acceptors.dimensions)    \n",
-    "\n",
-    "    \n",
-    "    acc_idx, hyd_idx = idx.T\n",
-    "\n",
-    "    # select potential hydrogen bonds to check angles\n",
-    "    potential_hbond_acceptors = acceptors[acc_idx]\n",
-    "    potential_hbond_hydrogens = hydrogens[hyd_idx]\n",
-    "\n",
-    "    # select hydrogen bond donors by looping over hydrogens and selecting the bonded oxygens\n",
-    "    potential_hbond_donors = sum(h.bonded_atoms[0] for h in potential_hbond_hydrogens)\n",
-    "    \n",
-    "    angles = mda.lib.distances.calc_angles(potential_hbond_acceptors.positions,\n",
-    "                                  potential_hbond_hydrogens.positions,\n",
-    "                                  potential_hbond_donors.positions, \n",
-    "                                  box=u.dimensions)\n",
-    "    #convert to degrees\n",
-    "    angles = np.rad2deg(angles)\n",
-    "    \n",
-    "    #check angles are larger than 130 degrees\n",
-    "    angle_idx = np.where(angles >= 130.0)\n",
-    "    \n",
-    "    hbond_acceptors = potential_hbond_acceptors[angle_idx]\n",
-    "    hbond_hydrogens = potential_hbond_hydrogens[angle_idx]\n",
-    "    hbond_donors = potential_hbond_donors[angle_idx]\n",
-    "    \n",
-    "    return hbond_acceptors, hbond_hydrogens, hbond_donors"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0f7002d1-05a0-473c-93eb-384676f12848",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Try using the hbonds function here!\n",
-    "\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "960746cd-7b96-4f92-82ed-608e23dea2d7",