Skip to content

modified the source code to exclude duplicated metabolite info.#703

Merged
htsugawa merged 1 commit intomasterfrom
gcms_annotation_pipline_changed
Feb 12, 2026
Merged

modified the source code to exclude duplicated metabolite info.#703
htsugawa merged 1 commit intomasterfrom
gcms_annotation_pipline_changed

Conversation

@htsugawa
Copy link
Contributor

@htsugawa htsugawa commented Feb 8, 2026

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce duplicated metabolite annotations by (a) deduplicating GC/MS MSP top-hit reporting across features and (b) adjusting alignment-level annotation deduplication to treat previously demoted (“Putative”) spots differently.

Changes:

  • GC/MS annotation: track “used” references with a custom MoleculeMsReference equality comparer when OnlyReportTopHitInMspSearch is enabled.
  • Alignment deduplication: exclude spots whose Name starts with "Putative" from dedup passes and change the putative prefix formatting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs Adds a custom HashSet comparer for deduplicating MoleculeMsReference in top-hit MSP reporting.
src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs Changes alignment annotation deduplication filtering and modifies how low-score duplicates are labeled as putative.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

static void ChangeAnnotationToLowScore(AlignmentSpotProperty spot) {
spot.MatchResults.Representative.IsReferenceMatched = false;
spot.Name = "putative: " + spot.Name;
//spot.MatchResults.Representative.IsReferenceMatched = false;
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChangeAnnotationToLowScore() no longer clears spot.MatchResults.Representative.IsReferenceMatched. That means (for spots where reference-matched status comes from MatchResults) the “deduplicated” low-score spots may still be treated as reference-matched throughout the pipeline (e.g., blank filtering keeps ref-matched features). If the intent is to demote duplicates to putative, this should still update match flags (and/or clear the representative result) rather than only prefixing the name.

Suggested change
//spot.MatchResults.Representative.IsReferenceMatched = false;
var representative = spot.MatchResults?.Representative;
if (representative != null) {
representative.IsReferenceMatched = false;
}

Copilot uses AI. Check for mistakes.
Comment on lines 58 to 61
spotList = spots
.Where(n => n.IsReferenceMatched(_evaluator))
.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative"))
.Where(n => !string.IsNullOrEmpty(n.MatchResults?.Representative?.InChIKey) && n.MatchResults.Representative.InChIKey.Length > 1)
.OrderByDescending(spot => spot.MatchResults.Representative.InChIKey)
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartsWith("Putative") uses current-culture, case-sensitive comparison and will also exclude names like "PutativeX" (no colon). If this is meant to detect the prefix added by ChangeAnnotationToLowScore, consider checking for the exact prefix (e.g., "Putative:") with an explicit StringComparison (Ordinal/OrdinalIgnoreCase) to avoid unintended exclusions.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to 36
var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList();
var currentPeakId = 0;
var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID;

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage: this change alters the alignment deduplication behavior (Putative prefix filtering + match flag behavior), but the existing RefineMspDuplicateTest in LcmsAlignmentRefinerTests is currently marked [Ignore], and there don’t appear to be active tests covering MatchResultAnnotationDeduplicator. Adding/enabling a test for the updated behavior would help prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to 56
var used = new HashSet<MoleculeMsReference>(new MoleculeMsReferenceEqualityComparer());
foreach (var (container, i) in containers.WithIndex().OrderByDescending(p => p.Item1.Representative.TotalScore)) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new HashSet(new MoleculeMsReferenceEqualityComparer()) is unsafe because the comparer’s Equals()/GetHashCode() are inconsistent: Equals() can return true via Name/ScanID matching even when only one object has InChIKey/ScanID set, but GetHashCode() prefers the first available field on each object. This breaks HashSet invariants (same item may not be found / duplicates may slip through). Consider switching to a key-based approach (e.g., HashSet of normalized string keys and add all applicable keys), or redefine equality to use a single canonical key selection that GetHashCode mirrors.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +121
public bool Equals(MoleculeMsReference x, MoleculeMsReference y) {
if (ReferenceEquals(x, y)) return true;
if (x is null || y is null) return false;

if (!string.IsNullOrEmpty(x.InChIKey) && !string.IsNullOrEmpty(y.InChIKey)) {
if (x.InChIKey == y.InChIKey) return true;
}

if (x.ScanID != 0 && y.ScanID != 0 && x.ScanID == y.ScanID) {
return true;
}

if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) {
if (x.Name.ToLower() == y.Name.ToLower()) return true;
}

return false;
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoleculeMsReferenceEqualityComparer violates the requirement that if Equals(x,y) is true then GetHashCode(x) must equal GetHashCode(y). With the current “match on any of InChIKey/ScanID/Name” logic, two objects can be equal by Name while hashing by different fields, which will break HashSet/Dictionary behavior. The comparer should either (1) compute hash from a single canonical key that Equals also uses, or (2) avoid OR-equality and instead deduplicate by maintaining multiple key sets (InChIKey/ScanID/Name) separately.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +118
if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) {
if (x.Name.ToLower() == y.Name.ToLower()) return true;
}
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Name.ToLower() for comparisons/hash codes is culture-sensitive and allocates. Prefer a case-insensitive ordinal comparison (e.g., string.Equals(..., StringComparison.OrdinalIgnoreCase)) and a matching hash strategy (e.g., StringComparer.OrdinalIgnoreCase.GetHashCode(name)) to avoid locale-dependent behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to 36
var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList();
var currentPeakId = 0;
var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID;

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new filter !n.Name.StartsWith("Putative") can make spotList empty, but the code immediately indexes spotList[0] (line 35), which will throw. Add an early return when spotList.Count == 0 before accessing spotList[currentPeakId] (and consider using a more specific prefix like "Putative:" with an explicit StringComparison).

Copilot uses AI. Check for mistakes.
@htsugawa htsugawa merged commit 420489e into master Feb 12, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments