Skip to content

Accessing actual used peaks #6

@florian-huber

Description

@florian-huber

As @sdrogers mentioned: In the first preprint, version Fig 5b shows the possible matches between peaks, but maybe it would be nicer (?) to display the ones that get selected by the greedy algorithm used in ModifiedCosine.

Here just quickly where you would have to look for that in matchms: the actual matching peak selection is done in score_best_matches from matchms.similarity.spectrum_similarity_functions. Currently this outputs only the number of used matching pairs and, so that needs to be modified to also output the used pairs.

import numpy

def score_best_matches_simile(matching_pairs: numpy.ndarray, spec1: numpy.ndarray,
                              spec2: numpy.ndarray, mz_power: float = 0.0,
                              intensity_power: float = 1.0) -> Tuple[float, int]:
    """Calculate cosine-like score by multiplying matches. Does require a sorted
    list of matching peaks (sorted by intensity product)."""
    score = float(0.0)
    used_matches = []
    used1 = set()
    used2 = set()
    for i in range(matching_pairs.shape[0]):
        if not matching_pairs[i, 0] in used1 and not matching_pairs[i, 1] in used2:
            score += matching_pairs[i, 2]
            used1.add(matching_pairs[i, 0])  # Every peak can only be paired once
            used2.add(matching_pairs[i, 1])  # Every peak can only be paired once
            used_matches.append(i)

    # Normalize score:
    spec1_power = spec1[:, 0] ** mz_power * spec1[:, 1] ** intensity_power
    spec2_power = spec2[:, 0] ** mz_power * spec2[:, 1] ** intensity_power

    score = score/(numpy.sum(spec1_power ** 2) ** 0.5 * numpy.sum(spec2_power ** 2) ** 0.5)
    return score, used_matches

Unfortunately, other involved functions are build in as subfunctions in ModifiedCosine, so you would essentially have to write your own edited version of that (or skip the class and build a function instead).

class ModifiedCosineSimile(BaseSimilarity):
    ...
    def pair(...
    
        spec1 = get_peaks_array(reference)
        spec2 = get_peaks_array(query)
        matching_pairs = get_matching_pairs()
        if matching_pairs.shape[0] == 0:
            return None
        score, used_matches = score_best_matches_simile(matching_pairs, spec1, spec2,
                                   self.mz_power, self.intensity_power)
        return score, [matching_pairs[i] for i in used_matches]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions