eCOMET is a platform for processing LC-MS/MS data for chemical ecology. It gets input from MZMine and SIRIUS. The major toolboxes of eCOMET are as follows:

Following input files are required. See '/raw_data' of demo files.
- Feature quantification table generated by MZMine4 (
features.csv) This file is generated by [Feature list methods → Export feature list → CSV]. See https://mzmine.github.io/mzmine_documentation/module_docs/io/feat-list-export.html - Annotation of features by SIRIUS (
canopus_formula_summary.tsvandstructure_identification.tsv) These are included in the standard output of SIRIUS. Make sure you run all SIRIUS → ZODIAC → CSI:FingerID → CANOPUS. Then export as tsv. - Metadata of samples (
metadata.csv) Metadata should have three or more columns; the first three being sample, group, and mass. Additional information can be provided (see Part 9. Regression with metadata). - (Optional) Custom annotation of features ('custom_DB_glucosinolates.csv') The custom DB includes three columns: compound, mz, rt
- (Optional) Chemical similarity table from MZMine 4. The metric can be cosine, DreaMS, and MS2DeepScore (cosine.csv, dreams.csv, and ms2deepscore.csv) To obtain these data, you need to run molecular networking at MZMine4 for each metric. See https://mzmine.github.io/mzmine_documentation/module_docs/group_spectral_net/molecular_networking.html Set the minimal similarity value as 0 to get all pairwise similarity. These tables are impoted and processed for heatmap visualization and chemical diversity quantification (see 10. Chemical Diversity Measures)
The demo files are metabolomics analysis files of Arabidopsis thaliana (Col-0) attacked by two different herbivores (Spodoptera litura; sl, and Lipaphis erysimi; le). Eight replicates were sampled and analyzed for each group. See metadata.csv for more details.
All input files are imported to construct a mmo object. The mmo object will be used in downstream statistics.
###
# 1.1. Give directories
mzmine_featuredir <- 'raw_data/250724_features_ms2.csv'
metadatadir <- "raw_data/250728_mmo_metadata.csv"
canopus_formuladir <- "raw_data/canopus_formula_summary.tsv"
canopus_structuredir <- "raw_data/structure_identifications.tsv"
gls_db <- 'raw_data/250428_GLS_Ath_simplegrad_MMO.csv'
cos_dir <- 'raw_data/250728_sim_ms2_(modified)_cosine.csv'
dreams_dir <- 'raw_data/250728_sim_dreams.csv'
m2ds_dir <- 'raw_data/250728_sim_ms2deepscore.csv'
First import the mzmine feature list and metadata to create a mmo object.
mmo <- GetMZmineFeature(mzmine_featuredir, metadatadir) head(mmo$featre_data)
then add annotation generated from sirius to the object
mmo <- AddSiriusAnnot(mmo, canopus_structuredir, canopus_formuladir) head(mmo$sirius_annot)
Custom annotation can be added based on m/z and RT.
mmo <- AddCustomAnnot(mmo, DB = gls_db, mztol = 5, rttol = 0.1) # Add annotation
then process the quantification data
# Normalize data
mmo <- ReplaceZero(mmo, method = 'one') # Replace 0 and NA values by 1
mmo <- MassNormalization(mmo) # Normalize peak area by sample mass in metadata
mmo <- MeancenterNormalization(mmo) # Add mean-centered area
mmo <- LogNormalization(mmo) # Add log-transformed area
mmo <- ZNormalization(mmo) # Add Zscore
The chemical similarity tables are imported then transformed into distance matrix
# Import chemical distance data for chemical diversity analyses mmo <- AddChemDist(mmo, cos_dir = cos_dir, dreams_dir = dreams_dir, m2ds_dir = m2ds_dir)
Check every data are imported well
summary(mmo)
The mmo object contains following information
- Feature quantification value ($feature_data)
- Metadata ($metadata)
- Pairwise comparison data ($pairwise)
- Log, mean-centered, and Z-transformed feature quantification value ($log, $meancentered, and #zscore)
- Sirius annotation ($sirius_annot)
- cosine distance ($cos.dissim)
- DreaMS distance ($dreams.dissim)
- MS2DeepScore distance (m2ds.dissim)
All features have id (from mzmine) and feature (mz_rt generated by GetMZmineFeature()). Both can be interchangable by FeatureToID(mmo, features) and IDToFeature(mmo, IDs).
Some feature groups may deserve particular interest. For the demo file, glucosinolates and flavonoids are. We can briefly annotate those metabolites using annotation from sirius.
features <- mmo$sirius_annot
# Get list of glucosinolated using custom annoatation DB
gls_hits <- mmo$custom_annot %>% filter(lengths(custom_annot) > 0)
GLSs <- gls_hits %>% pull(feature)
# Get list of flavonoids using sirius annotation
FLVs <- features %>% filter(str_detect(features[[46]], "Flavonoid")) %>% pull(feature)
By using PLS-DA plot, the general distribution of metabolic profile of each sample and group can be visualized. For each group give colors
# Define your custom colors for each group
custom_colors <- c("ctrl" = "#999999", "sl1" = "#fdcdac", "le1" = "#b3e2cd")
#Or automatically give colors
custom_colors <- setNames(brewer.pal(length(unique(mmo$metadata$group)), "Set3"), unique(mmo$metadata$group))
Then plot PLS-DA
PLSDAplot(mmo, color = custom_colors, outdir = 'plots/plsda/PLSDA_Z.pdf', normalization = 'Z')
PLSDAplot(mmo, color = custom_colors, topk = 0, outdir = 'plots/plsda/PLSDA_Z_noLoadings.pdf', normalization = 'Z')
The topk parameter can be adjusted to plot loadings of top features on the plot. The normalization can be None, Log, Meancentered, and Z.
If a subset of features are to be used for ploting, set filter_feature = TRUE then provide the list of features to the feature_list
If a subset of groups are to be plotted, set 'filter_group = TRUE' then provide the list of groups to the group_list.
# Glucosinolates only
PLSDAplot(mmo, custom_colors, topk = 0, outdir = 'plots/plsda_GLS.pdf', normalization = 'Z', filter_feature = TRUE, feature_list = GLSs)
Many analyses targets to find Differentially Accumulated Metabolites (DAMs). DAMs can be defined by thresholds of log2-fold change and adjusted p-value. Those two metrics are calculated by following code. Note that the divisor group is at the left.
# 3.1. Add pairwise comparisons mmo <- PairwiseComp(mmo, 'ctrl', 'sl1') mmo <- PairwiseComp(mmo, 'ctrl', 'le1')
Then DAMs of each comparison can be extracted.
# 3.2. Get DAMs from the comparisons
DAMs <- GetDAMs(mmo, fc_cutoff = 0.5849625, pval_cutoff = 0.1) # log2(1.5) = 0.5849625
DAMs_up <- DAMs$DAMs_up
DAMs_down <- DAMs$DAMs_down
head(DAMs_up)
The cutoffs for DAMs can be adjusted in the GetDAMs function
How many features are overlapping and specific are the common questions following DAM analysis. It can be visualized by UpSet plot and Venn diagram. The inputs for both are identical and can be prepared as following
# 4.1. Define groups
VennInput <- list(
sl1.up = DAMs_up$ctrl_vs_sl1.up,
le1.up = DAMs_up$ctrl_vs_le1.up
)
Then can be plotted as following
# 4.2. Plot
# 4.2.1. Venn Diagram
ggvenn(VennInput, stroke_size = 0.5, set_name_size = 4, show_percentage = FALSE) +
theme(legend.position = "none")
ggsave("plots/Venn/Venn_Upreg.pdf", height = 5, width = 5)
# 4.2.2. Upset plot
pdf("plots/Venn/Upset_Upreg.pdf", 7, 5)
upset(fromList(VennInput), nsets=10, nintersects=20,order.by='freq', mainbar.y.label='Features in Set', line.size=1, point.size=4, shade.color='white', text.scale=1, show.numbers=FALSE)
dev.off()
The pairwise comparion can be visualized by a volcano plot. Following code generates volcano plots for all comparison made in section 3.
comparison_columns <- colnames(mmo$pairwise)
log2FC_columns <- grep("log2FC", comparison_columns, value = TRUE)
comparisons <- sub("_$", "", unique(sub("log2FC", "", log2FC_columns))) # Remove trailing underscore from comparisons
for (comp in comparisons){
VolcanoPlot(mmo, comp = comp, topk = 10, outdir = paste('plots/volcano/volcano_', comp, '.pdf', sep = ''))
}
A heatmap is a great way to visualize the whole metabolome. Typically the features are clustered by the distribution pattern but alternatively the chemical similarity between features can be visualized. For heatmap, required inputs can be generated from mmo as follows:
# 6.1. Generate inputs for heat map
HMinput <- GenerateHeatmapInputs(mmo, filter_feature = FALSE, feature_list = feature_list,
filter_group = FALSE, group_list = group_list,
summarize = 'mean', control_group = 'ctrl',
normalization = 'Z', distance = 'dreams')
summary(HMinput)
Where filter_feature and filter_group can be used on desire. The 'summarize' argument can be 'mean' or 'fold_change'. If 'fold_change' is used, the 'control_group' should be noted. If the chemical similarities are to be plotted, the distance metric to be used should be designated in 'distance' as one of dreams, cosine, or m2ds. Then the class-level annotation table is generated.
# 6.2. Generate NPC-based annotation table for heatmap
sirius_annot <- mmo$sirius_annot
# Get NPC Annotations
NPC_pathway <- unique(sirius_annot[[32]])
NPC_pathway <- NPC_pathway[!is.na(NPC_pathway) & NPC_pathway != ""] # remove NA and empty
NPC_superclass <- unique(sirius_annot[[34]])
NPC_superclass <- NPC_superclass[!is.na(NPC_superclass) & NPC_superclass != ""] # remove NA and empty
NPC_class <- unique(sirius_annot[[36]])
NPC_class <- NPC_class[!is.na(NPC_class) & NPC_class != ""] # remove NA and empty
sirius_annot_filtered <- sirius_annot %>%
# select(id = 1, NPC_pathway = 30, NPC_class = 34) %>%
select(id = 1, NPC_class = 36, NPC_superclass = 34, NPC_pathway = 32) %>%
# select(id = 1, NPC_pathway = 30, NPC_superclass = 32) %>%
filter(id %in% rownames(distance_matrix)) # get features with fingerprints
rownames(sirius_annot_filtered) <- sirius_annot_filtered$id
sirius_annot_filtered$id <- NULL
ann_colors = list(
NPC_pathway = setNames(brewer.pal(length(NPC_pathway), "Set2"), NPC_pathway),
NPC_class = setNames(viridis(length(NPC_class)), NPC_class),
NPC_superclass = setNames(viridis(length(NPC_superclass)), NPC_superclass)
)
The ann_colors will be plotted along with the heapmap to show which chemical class each features are included in. The heatmap can be visualized as follows:
pdf("plots/heatmap/dreams_total_Z.pdf", width = 15, height = 20)
pheatmap(mat = HMinput$FC_matrix,
cluster_rows = TRUE, #do not change
clustering_distance_rows = HMinput$dist_matrix, # Delete this line for UPGMA clustering of rows
cluster_cols = FALSE,
clustering_method = "average", #UPGMA
show_rownames = TRUE,
show_colnames = TRUE,
annotation_row = sirius_annot_filtered, # Dataframe with the rownames are identical with 'mat' and gives annotation
annotation_colors = ann_colors,
cellwidth = 25,
cellheight = 0.05,
treeheight_row = 100,
fontsize_row = 3,
fontsize_col = 15,
scale = 'none',
annotation_names_row = TRUE,
# labels_row = row_label,
border_color = NA,
color = colorRampPalette(c("blue", "white", "red"))(100))
dev.off()
Biological questions ask which class of chemical compounds are enriched in a set of compounds of interest (e.g., DAMs from above). This is analogue to the Gene Ontology enrichment analysis performed in transcriptomics. In MMO, NPC and Classyfire terms annotated by Canopus of SIRIUS are used to perform chemical class enrichment analysis of given list of features. The enrichment score of each term is calculated to plot the number of each term and the significance.
# 7.1. For a single set of features, a detailed enrichment plot can be generated
# There are two plotting styles available
CanopusListEnrichmentPlot(mmo, DAMs_up$ctrl_vs_sl1.up, pthr = 0.1, outdir = 'plots/enrichment/sl1_up.pdf', height = 6, width = 6)
CanopusListEnrichmentPlot_2(mmo, DAMs_up$ctrl_vs_sl1.up, pthr = 0.1, outdir = 'plots/enrichment/sl1_up_2.pdf', topn = 10, height = 6, width = 6)
# 7.2. For a list of sets features, a summary enrichment plot can be generated
# The summary enrichment plot can be generated for either a single level of CANOPUS classification (8.2.1) or for all levels (8.2.2)
# 7.2.1. For a single level of CANOPUS classification
term_levels <- c('NPC_class', 'NPC_superclass', 'NPC_pathway', 'ClassyFire_most_specific', 'ClassyFire_level5', 'ClassyFire_subclass', 'ClassyFire_class', 'ClassyFire_superclass')
CanopusLevelEnrichmentPlot(mmo, DAMs_up, term_level = 'NPC_class', pthr = 0.1, prefix = 'plots/enrichment/DAMs_up_NPC_class.pdf')
# 7.2.2. For all levels of CANOPUS classification
# All levels, or onlt NPC or ClassyFire
terms <- c('NPC', 'ClassyFire', 'all_terms')
CanopusAllLevelEnrichmentPlot(mmo, DAMs_up, term_level = 'all_terms', pthr = 0.1, prefix = 'plots/enrichment/DAMs_up_all_terms', width = 8, height = 12)
We are interested in finding anti-herbivore resistive compounds from the plant metabolome and testing whether such compounds are upregulated by insect attack. To do so, we first fit linear mixed model to the amount of each feature and the herbivore performance fed on each plant sample (see metadata). The negative effect size of the model represents the resistive value of the feature. We then test whether resistive features are upregulated (log2FC > 1) by plotting the effect size of the LMM and the log2FC as scatter plot. In the demo file, resistive compounds (which have neative effect sizes) were upregulated, which implies plants produce resistive compounds properly in response to sl attack. The analysis can be done as follows:
# To perform regression, the phenotype of interest should be defined in the metadata as a column
# 8.1. Regression of individual feature against a phenotype
# model can be 'lm' or 'pearson' or 'lmm'
# groups can be a vector of group names, or a single group name which has the phenotpye values in the metadata
FeaturePerformanceRegression(mmo, target = '477.0636_7.5687', phenotype = 'sl', groups = c('sl1'),
model = 'lm', normalization = 'Z',
output = paste('plots/phenotype_regression/477.0636_7.5687_sl_lm.pdf'))
# 8.2. Regression of all features against a phenotype and plotting the results
sl.lm <- GetPerformanceFeatureRegression(mmo, phenotype = 'sl', groups = c('sl1'),
DAM.list = list(sl.up = DAMs_up$ctrl_vs_sl1.up, sl.down = DAMs_down$ctrl_vs_sl1.down), comparisons = c('ctrl_vs_sl1'))
sl.lm.sig <- sl.lm %>% filter(p_value < 0.1)
head(sl.lm.sig)
sl.cor <- GetPerformanceFeatureCorrelation(mmo, phenotype = 'sl', groups = c('sl1'),
DAM.list = list(sl.up = DAMs_up$ctrl_vs_sl1.up, sl.down = DAMs_down$ctrl_vs_sl1.down), comparisons = c('ctrl_vs_sl1'))
sl.cor.sig <- sl.cor %>% filter(p_value < 0.1)
head(sl.cor.sig)
# One may want to plot the regression results along with the fold change values
PlotFoldchangeResistanceRegression(sl.lm.sig, fold_change = 'ctrl_vs_sl1_log2FC',
color = c('sl.up' = '#d42525ff', 'sl.down' = '#281e99ff'),
output_dir = 'plots/phenotype_regression/sl_lm_sig.pdf')
The chemical diversity is one of the key parameters in ecological studies. MMO quantifies the chemical diversity idices using idea from ChemoDiv package. The fingerprint distances are treated as phylogenetic diversity as in measuring taxonomic diversity. Alpha diversity can be calculated as following.
# 9.1. Alpha diversity by Hill numbers
# q : Hill number order, 0 for richness, 1 for Shannon, 2 for Simpson
# mode : weighted for the chemical structure (use distance argument), unweighted for no chemical weight
# filter_feature : if TRUE, only features in the feature_list are used
# feature_list : a list of features to filter the alpha diversity calculation
alphadiv <- GetAlphaDiversity(mmo, q = 3, normalization = 'Log', mode = 'weighted', distance = 'dreams', filter_feature = FALSE, feature_list = NULL)
# 9.1.1 Plot the alpha diversity
ggplot(alphadiv, aes(x = group, y = hill_number)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(size = 0.5) +
theme_classic() +
labs(title = 'Alpha Diversity by Hill Numbers', x = 'Group', y = 'Hill Number') +
theme(legend.position = "none", axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
ggsave('plots/alphadiv/dreams_q3_log.pdf', width = 5, height = 5)
# Test for significant differences between groups with ANOVA
anova <- anova_tukey_dunnett(alphadiv, 'hill_number ~ group')
write_anova(anova, 'plots/alphadiv/anova_hill_number.csv')
# 9.1.2 Plot the feature distribution
# Function to plot histogram of values for each feature for given groups
groups <- c('ctrl', 'sl1', 'le1')
group.mean <- GetGroupMeans(mmo, normalization = 'Z', filter_feature = FALSE, feature_list = NULL)[,-1]
long.group.mean <- data.frame(value = double(), group = character())
for (group in groups) {
group_data <- data.frame(value = group.mean[, group], group = group)
colnames(group_data) <- c('value', 'group')
long.group.mean <- rbind(long.group.mean, group_data)
}
ggplot(data = long.group.mean, aes(x = value, fill = group, color = group)) +
geom_density(position = 'identity', alpha = 0) +
theme_classic() +
labs(x = "Normalized peak intensity", y = "Density") +
# scale_fill_manual(values = custom_colors) +
theme(legend.position = "right")
ggsave('plots/alphadiv/density_function_Z.pdf', height = 6, width = 6)
metadata <- mmo$metadata
feature <- mmo$zscore
# The distribution should be shown!
plot_data <- data.frame(value = double(), rank = double(), sample = character(), group = character())
for (group in groups) {
group_samples <- metadata %>% filter(group == !!group) %>% pull(sample)
for (sample in group_samples) {
sample_data <- unlist(as.vector(feature[, sample]))
sorted_data <- sort(sample_data, decreasing = TRUE)
sample_plot_data <- data.frame(value = sorted_data, rank = seq_along(sorted_data), group = group, sample = sample)
plot_data <- rbind(plot_data, sample_plot_data)
}
}
ggplot(plot_data, aes(x = rank, y = value, color = group)) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2, position = position_dodge(width = 0.5), alpha = 0.06) +
stat_summary(fun = mean, geom = "line", position = position_dodge(width = 0.5)) +
theme_classic() +
labs(title = "Sorted Feature", x = "Rank", y = "Value")
ggsave('plots/alphadiv/sorted_intensity_Z.png', height = 6, width = 6)
# 9.2. Beta diversity
# Calculate beta diversity for different normalizations and methods
# For unweighted beta diversity, Bray-Curtis or Jaccard distance can be used
bray <- GetBetaDiversity(mmo, method = 'bray', normalization = 'Log', filter_feature = FALSE, feature_list = NULL)
jaccard <- GetBetaDiversity(mmo, method = 'jaccard', normalization = 'Log', filter_feature = FALSE, feature_list = NULL)
# For weighted beta diversity, Generalized UniFrac can be used
guni <- GetBetaDiversity(mmo, method = 'Gen.Uni', normalization = 'Log', distance = 'dreams', filter_feature = FALSE, feature_list = NULL)
guni.0 <- guni[,,'d_0'] # GUniFrac with alpha 0
guni.05 <- guni[,,'d_0.5'] # GUniFrac with alpha 0.5
guni.1 <- guni[,,'d_1'] # GUniFrac with alpha 1
# 9.2.1. NMDS plots for beta diversity
nmds <- metaMDS(guni.05, k = 2, try = 50, trymax = 100)
nmds_coords <- as.data.frame(scores(nmds))
groups <- c()
for (col in colnames(mmo$feature_data)[-c(1, 2)]) {
groups <- append(groups, metadata[metadata$sample == col, ]$group)
}
nmds_coords$group <- groups
ggplot(nmds_coords, aes(x = NMDS1, y = NMDS2, color = group)) +
geom_point(size = 3) +
#geom_text_repel(aes(label = group), size = 3) +
theme_classic() +
stat_ellipse(level = 0.90) +
labs(x = "NMDS1", y = "NMDS2") +
theme(legend.position = "right")
ggsave('plots/betadiv/NDMS_guni05.pdf', height = 6, width = 6)
# 9.2.2. Quantification of beta diversity
group_distances <- CalculateGroupBetaDistance(mmo, beta_div = guni.05, reference_group = 'ctrl', groups = c('le1', 'sl1'))
ggplot(group_distances, aes(x = group, y = distance)) +
geom_boxplot(outlier.shape = NA) +
geom_beeswarm(size = 0.5) +
theme_classic() +
labs(x = "Group", y = "Beta Diversity")
ggsave('plots/betadiv/group_dist.pdf', height = 6, width = 6)
ExportFeaturesToCSV(mmo, feature_list = DAMs_up$ctrl_vs_sl1.up, normalization = 'None', output_dir = 'output/sl1_up_features.csv')