This repository contains code that was used to analyse Base Specific In Situ Sequencing (BaSISS) data, create maps of cancer clones and describe their distinctive immunological and phenotypic properties.
It accompanies the publication Spatial genomics maps the structure, character and evolution of cancer clones, available at bioRxiv.
After cloning the repository with
git clone [email protected]:gerstung-lab/BaSISS.git
Create a new conda environment with all the necessary packages installed
conda env create -f environment.yml
conda activate basiss
Complete BaSISS and ISS datasets that are necessary to run the analysis are depositied at Sanger's FTP server ftp://ftp.sanger.ac.uk/pub/cancer/LomakinEtAl_BaSISS. Be aware that the size of the complete dataset is ~ 60GB.
Bulk tissue WGS data are deposited in the European Genome Phenome Archive and are available for download on request (https://ega-archive.org/datasets) with the following accessions: EGAD00001002696 (P2 samples, with IDs PD14780a, PD14780b, PD14780d and PD14780e) and EGAD00001000898 (P1 samples, with IDs PD9694a, PD9694b, PD9694c and PD9694d).
Registered fluorescent microscopy images from ISS experiments have been deposited at BioImage Archive (https://www.ebi.ac.uk/bioimage-archive/) under accession number S-BIAD537.
Public data used for single-cell RNA sequencing analysis were obtained from the NCBI’s Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176078). Source data are provided with this paper.
An interactive viewer of the generated data and inferred clone maps is accessible at https://www.cancerclonemaps.org/
All the essential steps for BaSISS analysis are shown in the example notebooks
The experimental data consists of several layers of information, the main parts are 1) BaSISS or ISS singles, 2) Background tissue image (DAPI) and 3) Selected regions of interest. To make the downstream analysis easier we store these layers in a single basiss.preprocessing.Sample
object.
In-situ sequencing signals do not directly reveal which cell they are derived from. This makes downstream analysis limited, since differential expression
analysis is ambiguous: the same changes could be observed due to a change in expression programs of due to a local cell composition change. In this notebook we assign ISS signals to nuclei and conduct cell type assignment based on the selected marker genes with basiss.sc_annot.iss_annotation
function.
The main novelty of the experimental approach is the ability to trace multiple cancer lineage specific alleles in space. However, the direct interpretation of their spatial pattern is difficult and require a further deconvolution step. In this section, we generate continuous spatial clone maps using a statistical algorithm that exploits BaSISS signals as well as local cell counts (derived from the DAPI channel during the fluorescence microscopy of BaSISS) using two-dimensional Gaussian processes. The variational Bayesian model also accounts for unspecific or wrongly decoded signals and variable probe efficiency and is augmented by variant allele fractions in the bulk genomic sequencing data.
Clone mapping in case 1 (PD9694) which includes two oestrogen receptor (ER) positive invasive primary breast cancers (ER1 and ER2, or PD9694a and PD9694c), and three samples with ductal carcinoma in situ (D1, D2 and D3, or PD9694d, PD9694l, PD9694m). In addition, samples D1, ER1 and ER2 had a technical replicas (consecutive slides), to serve as a validation.
Clone mapping in case 2 (PD14780) which includes two ER negative invasive breast cancers (TN1 and TN2, or PD14780a and PD14780d) and a draining axillary lymph node that contains metastatic cells (LN1 or PD14780e).
In this section we show that:
- Raw signal distribution is, although noisy, make sense as they replicate the nested structure of clonal evolution (Data makes sense)
- Laser capture microscopy validate inferred clonal composition (Model works correctly)
- BaSISS and ISS signals correlate with bulk-WGS and bulk-RNA respectively, validating the panel of selected genes
After performing spatial lineage tracing of cancer clones, we integrate genetic clone maps with multimodal spatial data layers, such as histology, expression and cell composition. We find that genetically similar regions can be scattered across wide areas yet maintain similar transcriptional and histological features and foster recurrent ecosystems.
In addition, we find that genetic progression, which encapsulates the historical order of events, does not necessarily translate faithfully to transitions in histological state that are commonly assumed to reflect the stages of cancer progression. For example genetically similar clones could exist in both pre-invasive and invasive stages.
In this notebook we combine multiple level of information obtained previously, such as histology, cell composition,
cell type specific expression and genetic maps in a basiss.histology.Histogenomic_associations
. Then we
construct input data for differential composition and expression analysis and display the results of phenotype-genotype association.
Clone associated environment composition and expression data modeled with Generalised Linear Mixture Models. Modelling results are then passed back to the
Clone specific phenotype and environment analysis notebook.
To run this notebook numpyro
should be installed with pip install numpyro
.
Artem Lomakin, Jessica Svedlund, Carina Strell, Milana Gataric, Artem Shmatko, Gleb Rukhovich, Jun Sung Park, Young Seok Ju, Stefan Dentro, Vitalii Kleshchevnikov, Vasyl Vaskivskyi, Tong Li, Omer Ali Bayraktar, Sarah Pinder, Andrea L. Richardson, Sandro Santagata, Peter J. Campbell, Hege Russnes, Moritz Gerstung, Mats Nilsson & Lucy R. Yates.
Spatial genomics maps the structure, nature and evolution of cancer clones.
Nature 611, 594–602 (2022). https://doi.org/10.1038/s41586-022-05425-2