Skip to content

Depository of all R files for the "Extensive protein dosage compensation in aneuploid human cancers" BioRxiv manuscript. Authors: KM Schukken and JM Sheltzer

Notifications You must be signed in to change notification settings

kschukken/CCLE_protein_dosage_compensation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CCLE_protein_dosage_compensation

Depository of all R files for Extensive protein dosage compensation in aneuploid human cancers

This is a repository for the BioRxiv preprint "Extensive protein dosage compensation in aneuploid human cancers". By Klaske M Schukken and Jason M Sheltzer

The Summary_of_files.xlsx is an overview of which R file generates which datasets and graphs, also listed below.

Many of the code files need additional datasets retrieved from outside sources-- The source of the datasets are often listed in the ## Get data ## section of the R files. If data source is not listed in the R code itself, it will be credited in the "Extensive protein dosage compensation in aneuploid human cancers" BioRXiv pre-print. Several intermediate datasets are made available for ease of use.

All code was build on R version 3.6.3 GUI 1.70 El Capitan build, and R studio version 1.1.383.

In order to run the code, open the .R files, download the indicated datasets, set the proper working directory, and run through the code.

ORGANIZATION

R CODE:

Protein_RNA_filtered_CellLine.R

  • All figures
  • Filter protein expression and RNA expression data to isolate cell lines with RNA expression, protein expression, and arm call data

Protein_filtered_analysis_3cat.R

  • Figure 1, S1
  • Calculate protein expression difference for all genes upon chrm arm gain or loss (ex. gain chrm 5, protein expression diff in chrm 1,2,3,4,5,6, etc. )

RNA_filtered_analysis_3cat.R

  • Figure 1, S1
  • Calculates RNA expression difference for all genes upon chrm arm gain or loss (ex. gain chrm 5, RNA expression diff in chrm 1,2,3,4,5,6, etc. )

ChrmArm.Scatterplot.Protein.RNA.R

  • Figure 1
  • Protein and RNA expression difference per chrm arm scatterplot (ex. Gain chromosome 5q: protein expression difference on chrm 5q, and on other chromosomes)

DNA_Protein_RNA_filtered.R

  • Figure 1, S1
  • Make graphs of chrm arm difference from neutral-ploidy, for specific cells (ex. MCF7)

Protein_RNA_Heatmap_filtered_min10percategory.R

  • Figure 1
  • Generate heatmaps: protein/RNA expression difference upon chrm arm gain/loss, and additional summary heatmap

Protein_RNA_expression.PerCell_v2.R

  • Figure 2,3,5,6, S2, S3, S5
  • Difference upon chrm gain/loss dataset(s), categorize (scaling, buffering, anti-scaling). Boxplots of RNA or Protein expression vs DNA copy number category (gain, neutral, loss). Near diploid vs near triploid difference analysis. High aneuploidy score and low aneuploidy score difference analysis.

Protein_RNA_1d.plot_v2

  • Figure 3, 6, S2, S5
  • Plot 1dimentional density plot of gene expression changes per group (Protein and RNA). Also look at low vs high aneuploidy cells only

gProfile_Quantile_Bargraph_v2.R

  • Figure 3, 4, S2 ,S4, S5
  • Make graphs of key Gene Ontology enrichment terms (for multiple analysis')

Yeast_proteomics_v2.R

  • Figure S3
  • Aneuploid yeast analysis (Protein & RNA)

DownSyndrome_analysis_v2.R

  • Figure S3
  • Down syndrome fibroblast analysis (Protein and RNA)

Stingele_Protein_filter_v2.R

  • Figure S3
  • Stable aneuploid cell line analysis (Protein and RNA)

Aneuploid_Score.R

  • Figure 4, S4
  • Generate cellular aneuploidy scores

Protein_expression_data_GeneScore.R

  • Figure 4
  • Protein expression correlate with cellular aneuploidy, and high/low aneuploid cell only analysis

CCLE_RNA_Analysis_geneAn.R

  • Figure S4
  • RNA aneuploidy score correlation analysis

Variance_aneuploidy.R

  • Figure 5
  • Calculate RNA and Protein neutral-ploidy variance

Protein_buffering_factors_v2.R

  • Figure 5
  • Protein buffering factors. Calculate ROC AUC and make boxplots of scores/values

TSG.OG_CNV_Difference_v2.R

  • Figure 6
  • Oncogene and tumor suppressor gene difference upon chromosome copy number changes and gene copy number changes

ADDITIONAL FILES:

arm_calls_data_bendavid.csv

  • Aneuploid arm calls per chromosome arm per cell line analyzed. Used to generate aneuploidy score per cell line, and used to identify which cell lines have chromosome arm gains and losses per chromosome arm. Data from Cohen-Sharir et al. 2021 Nature, from the lab of Dr. Uri Ben-David.

bp_per_arm.csv

  • number of basepairs per chromosome arm. Used to generate a basepair-based aneuploidy score

Score2.aneuploid.cell_line.csv

  • Cellular aneuploidy scores per cell line. Multupiple forms of the aneuploidy score are calculated, the gene_ploidy_score was used in the manuscript unless otherwise indicated.

RNA_Protein_Shared_cells.csv

  • Cell lines with both RNA expression data and protein expression data. Not all of these cells are also present in the Cohen-Sharir arm call data.

Protein_location_info.csv

  • Information about the chromosome arm location per gene.

Protein_ID_info.csv

  • Protein_IDs matched with gene symbol and uniprot accesion

RNA.Protein_loss.neutral.gain_Difference_Pvalue_min10points.csv

  • Dataset of gene expression difference on cells with chromosome gain or loss, relative to cells with a neutral ploidy for chromosome arm a gene is located on.

RNA.Protein_loss.neutral.gain_Difference_Pvalue_min10points_3cat.csv

  • Dataset of gene expression difference on cells with chromosome gain or loss, relative to cells with a neutral ploidy for chromosome arm a gene is located on.
    with gene classifications as either "Anti-Scaling", "Buffering" or "Scaling" upon chromosome gain and loss

Trisomy 21 comparison.xlsx

  • Protein expression difference between Down Syndrome (Ts21) fibroblasts and control fibroblasts, set relative to cancer cell line protein expression difference. Yeast data from: Liu et al. 2017, Nature Communications. Systematic proteome and proteostasis profiling in human Trisomy 21 fibroblast cells

yeast-human aneuploidy.xlsx

  • Yeast protein expression difference upon chromosome gain. Data originated from: Dephoure et al. 2014, eLife; 3:e03023 Quantitative proteomic analysis reveals posttranslational responses to aneuploidy in yeast

AmplificationRatio_perGene1.75.csv

  • Calculation of the percent of cells in the filtered dataset with gene amplifications per gene. Used in initial factor analysis, but not included in the final manuscript. Data from CCLE gene copy number data.

Protein.AllFactors.csv

  • A list of all genes, difference upon chromosome gain or loss, and factor scores/values per gene. 5' UTR length and 3' UTR length were not included, because there are frequently multiple 5' or 3' UTR lengths per gene, and this duplication can scew data when investigating other factors. See Protein_buffering_factors_v2.R, to incorperate 5' UTR length and 3'UTR length if desired.

Factor.ROCAUC.correlationscore.csv

  • A list of all investigated factors, their AUC ROC values with regard to buffering, scaling and anti-scaling genes, and the correlation coefficients with RNA and Protein expression difference.

About

Depository of all R files for the "Extensive protein dosage compensation in aneuploid human cancers" BioRxiv manuscript. Authors: KM Schukken and JM Sheltzer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages