How to run modeling steps when comparing within a single variate #71

gbc529 · 2022-07-11T23:15:19Z

gbc529
Jul 11, 2022

Hello,

I am looking to compare methylation levels from samples taken from different regions within the same tissue. This data is all within a single column of my meta data "Sample.Region" and the three options are "S, O, C". My code for pre-processing and running the DML is below:

`

Pre-processing_____:

setwd("/projects/b1122/gannon/CUB/data/IDAT_Files")
list.files(pattern='*.idat')

#load sesame
library(sesame)
library(sesameData)
library(SummarizedExperiment)
library(tidyverse)
library(ggrepel)
library(data.table)

#Run preprocessing using opensesame. Output of Beta-values
taka_betas <- openSesame(".")

#Make SummarizedExperiment Object
pdata <- read.csv("/path/to/meta/file/meta.csv")
pdata[is.na(pdata)] = 0
pdata <- column_to_rownames(pdata, "IDAT")
pdata_wBMI<- pdata[,1:6]
betas_t <- t(taka_betas)
se <- SummarizedExperiment(t(betas_t), colData = pdata_wBMI)
se

Modeling Differential Methylation____:

packageVersion('sesame')

#NOTE: BMI & Age not included because not a factor or character
se_ok = (checkLevels(assay(se), colData(se)$ID) &
checkLevels(assay(se), colData(se)$Sample.Region) &
checkLevels(assay(se),colData(se)$Race) &
checkLevels(assay(se),colData(se)$Case.Control))

sum(se_ok)

se_filtered = se[se_ok,]

#Set all variables to factors for analysis
colData(se_filtered)$Case.Control= relevel(factor(colData(se_filtered)$Case.Control), ref='case') #case is reference since it is the only one (random)
colData(se_filtered)$ID= relevel(factor(colData(se_filtered)$ID), ref='ID-006') #ID-006 is reference since it is the lowest numbered sample (random)
colData(se_filtered)$Age= relevel(factor(colData(se_filtered)$Age), ref='53') #53 is mean age for cohort
colData(se_filtered)$Sample.Region= relevel(factor(colData(se_filtered)$Sample.Region), ref='S') #S is reference because we know S methylation levels will be high
colData(se_filtered)$Race= relevel(factor(colData(se_filtered)$Race), ref='Black or African American') #"Black or African American" race is reference (random)
colData(se_filtered)$BMI= relevel(factor(colData(se_filtered)$BMI), ref='27.1') #27.1 is mean BMI for cohort

str(colData(se_filtered))
betas_filtered <- assay(se_filtered)

#Summarize data (START HERE)

smry_se = DML(se_filtered, ~Sample.Region) #Time consuming
smry_se
smry_se[[1]] #inspects summary results
res_se <- summaryExtractTest(smry_se) #creates df of DML data above

#size of data frame:
dim(res_se)

colnames(res_se)
"Probe_ID" "Est_X.Intercept." "Est_Sample.Region" "Est_Sample.RegionC" "Est_Sample.RegionO" "Pval_X.Intercept." "Pval_Sample.Region" "Pval_Sample.RegionC" "Pval_Sample.RegionO" "FPval_Sample.Region" "Eff_Sample.Region"

head(res_se)
`

Is there no Est_, Pval_, etc. for "S" because I set it as the reference?
What does the stand alone "Est_Sample.Region" value showing?
How can I run a pairwise comparison O to C, O to S, and S to C?

Thank you in advance for any help! I have also copied my sessionInfo below

`> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS: /hpc/software/R/4.1.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/software/R/4.1.1/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggrepel_0.9.1 data.table_1.14.2 forcats_0.5.1 stringr_1.4.0
[5] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[9] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1 sesame_1.15.4
[13] sesameData_1.13.36 ExperimentHub_2.2.1 AnnotationHub_3.2.2 BiocFileCache_2.2.1
[17] dbplyr_2.2.1 SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1
[21] GenomeInfoDb_1.30.1 IRanges_2.28.0 MatrixGenerics_1.6.0 matrixStats_0.62.0
[25] S4Vectors_0.32.4 BiocGenerics_0.40.0

loaded via a namespace (and not attached):
[1] fs_1.5.2 bitops_1.0-7 lubridate_1.8.0
[4] bit64_4.0.5 filelock_1.0.2 RColorBrewer_1.1-3
[7] httr_1.4.3 backports_1.4.1 tools_4.1.1
[10] utf8_1.2.2 R6_2.5.1 DBI_1.1.3
[13] colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[16] bit_4.0.4 curl_4.3.2 compiler_4.1.1
[19] preprocessCore_1.56.0 rvest_1.0.2 cli_3.3.0
[22] xml2_1.3.3 DelayedArray_0.20.0 scales_1.2.0
[25] rappdirs_0.3.3 digest_0.6.29 XVector_0.34.0
[28] pkgconfig_2.0.3 htmltools_0.5.2 fastmap_1.1.0
[31] readxl_1.4.0 rlang_1.0.3 rstudioapi_0.13
[34] RSQLite_2.2.14 shiny_1.7.1 generics_0.1.3
[37] jsonlite_1.8.0 wheatmap_0.2.0 BiocParallel_1.28.3
[40] RCurl_1.98-1.7 magrittr_2.0.3 GenomeInfoDbData_1.2.7
[43] Matrix_1.3-4 Rcpp_1.0.9 munsell_0.5.0
[46] fansi_1.0.3 lifecycle_1.0.1 stringi_1.7.6
[49] yaml_2.3.5 zlibbioc_1.40.0 plyr_1.8.7
[52] grid_4.1.1 blob_1.2.3 parallel_4.1.1
[55] promises_1.2.0.1 crayon_1.5.1 lattice_0.20-44
[58] haven_2.5.0 Biostrings_2.62.0 hms_1.1.1
[61] KEGGREST_1.34.0 pillar_1.7.0 reshape2_1.4.4
[64] reprex_2.0.1 glue_1.6.2 BiocVersion_3.14.0
[67] modelr_0.1.8 BiocManager_1.30.18 png_0.1-7
[70] vctrs_0.4.1 tzdb_0.3.0 httpuv_1.6.5
[73] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[76] cachem_1.0.6 mime_0.12 xtable_1.8-4
[79] broom_1.0.0 later_1.3.0 AnnotationDbi_1.56.2
[82] memoise_2.0.1 ellipsis_0.3.2 interactiveDisplayBase_1.32.0`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run modeling steps when comparing within a single variate #71

{{title}}

Replies: 0 comments

Select a reply

How to run modeling steps when comparing within a single variate #71

gbc529 Jul 11, 2022

Pre-processing_____:

Modeling Differential Methylation____:

Replies: 0 comments

gbc529
Jul 11, 2022