Skip to content

Improve Spec2Vec integration for macOS compatibility across run.py, annotation.py, and annotation_refined.py #20

@Mattesimone

Description

@Mattesimone

I report the modifications I applied to make the software run correctly on macOS.

Environment
OS: macOS (Sequoia 15.6)
Architecture: arm64 (M1 Max)
MS2LDA: 2.0.1 version

1. Suggested improvements in run.py

1.1 Add default annotation path handling
Introduce a helper function to ensure Spec2Vec paths are valid and automatically set defaults if missing or incorrect:

def set_default_annotation_paths(annotation_parameters):
    base_path = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/"
    defaults = {
        "s2v_model_path": base_path + "150225_Spec2Vec_pos_CleanedLibraries.model",
        "s2v_library_embeddings": base_path + "150225_CleanedLibraries_Spec2Vec_pos_embeddings.npy",
        "s2v_library_db": base_path + "150225_CombLibraries_spectra.db",
    }
    for key, default_path in defaults.items():
        current_path = annotation_parameters.get(key)
        if current_path is None or not os.path.isfile(current_path):
            print(f"WARNING: '{key}' invalid or not found ('{current_path}'). Setting to default: '{default_path}'")
            annotation_parameters[key] = default_path

Invocation where Spec2Vec parameters are initialized:

# Ensure that the Spec2Vec paths are correct and valid
set_default_annotation_paths(annotation_parameters)

1.2 Save dataset reference in visualization parameters
Add dataset to the saved visualization dictionary if motif count is below 500:

# Save additional visualization data
if n_motifs < 500:
    # near the end of `run()` (or right before calling save_visualization_data)
    parameters_for_viz = {
        "dataset": dataset,      ### add this row
        "n_motifs": n_motifs,
        "n_iterations": n_iterations,
        "dataset_parameters": dataset_parameters,
        "train_parameters": train_parameters,
        "model_parameters": model_parameters,
        "convergence_parameters": convergence_parameters,
        "annotation_parameters": annotation_parameters,
        "motif_parameter": motif_parameter,
        "preprocessing_parameters": preprocessing_parameters,
        "fingerprint_parameters": fingerprint_parameters,
    }

1.3 Correct Spec2Vec model path reference
Ensure the correct absolute path is used when loading the model:

def s2v_annotation(motif_spectra, annotation_parameters):
    # Correct absolute path to the Spec2Vec model
    path_model = annotation_parameters.get(
        "s2v_model_path",
        "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_Spec2Vec_pos_CleanedLibraries.model"
    )
    print(f"DEBUG [run.py] path_model: {path_model}")

2. Suggested improvements in annotation.py

2.1 Ensure Spec2Vec can be imported
Explicitly append the Spec2Vec package path to avoid “module not found” errors:

import sys
sys.path.append("/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/spec2vec")

from spec2vec import Spec2Vec

2.2 Add function load_s2v_and_library
Introduce a dedicated function to load both the Spec2Vec model and the associated spectral library database.

def load_s2v_and_library(path_model, path_library):
    """
    Loads the Spec2Vec model and the spectral library database.

    Parameters
    ----------
    path_model : str
        Path to the Spec2Vec model file (gensim Word2Vec format).
    path_library : str
        Path to the SQLite file containing the spectral library.

    Returns
    -------
    s2v_similarity : Spec2Vec
        Loaded Spec2Vec model object.
    library : sqlite3.Connection
        Open connection to the spectral library SQLite database.
    """
# Load the Word2Vec model
w2v_model = Word2Vec.load(path_model)
s2v_similarity = Spec2Vec(
    model=w2v_model,
    intensity_weighting_power=0.5,
    allowed_missing_percentage=100.0
)

# Open the SQLite database connection
library = sqlite3.connect(path_library)

return s2v_similarity, library

3. Suggested improvements in annotation_refined.py

Add explicit loading of Spec2Vec model and library at script start:

from MS2LDA.Add_On.Spec2Vec.annotation import load_s2v_and_library

At the end of the script, I added

import os

path_model = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_Spec2Vec_pos_CleanedLibraries.model"
path_library = "/Users/matteosimone/miniconda3/envs/ms2lda/lib/python3.11/site-packages/ms2lda/Add_On/Spec2Vec/model_positive_mode/150225_CombLibraries_spectra.db"

print("DEBUG: path_model =", path_model)
print("Exists path_model?", os.path.exists(path_model))
print("Exists path_library?", os.path.exists(path_library))

s2v_similarity, library = load_s2v_and_library(path_model, path_library)
print("Model loaded ...")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions