modified the source code to exclude duplicated metabolite info. by htsugawa · Pull Request #703 · systemsomicslab/MsdialWorkbench

htsugawa · 2026-02-08T14:42:05Z

No description provided.

Copilot

Pull request overview

This PR aims to reduce duplicated metabolite annotations by (a) deduplicating GC/MS MSP top-hit reporting across features and (b) adjusting alignment-level annotation deduplication to treat previously demoted (“Putative”) spots differently.

Changes:

GC/MS annotation: track “used” references with a custom MoleculeMsReference equality comparer when OnlyReportTopHitInMspSearch is enabled.
Alignment deduplication: exclude spots whose Name starts with "Putative" from dedup passes and change the putative prefix formatting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs	Adds a custom `HashSet` comparer for deduplicating `MoleculeMsReference` in top-hit MSP reporting.
src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs	Changes alignment annotation deduplication filtering and modifies how low-score duplicates are labeled as putative.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T14:52:42Z

src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs

        static void ChangeAnnotationToLowScore(AlignmentSpotProperty spot) {
-            spot.MatchResults.Representative.IsReferenceMatched = false;
-            spot.Name = "putative: " + spot.Name;
+            //spot.MatchResults.Representative.IsReferenceMatched = false;


ChangeAnnotationToLowScore() no longer clears spot.MatchResults.Representative.IsReferenceMatched. That means (for spots where reference-matched status comes from MatchResults) the “deduplicated” low-score spots may still be treated as reference-matched throughout the pipeline (e.g., blank filtering keeps ref-matched features). If the intent is to demote duplicates to putative, this should still update match flags (and/or clear the representative result) rather than only prefixing the name.

Suggested change

//spot.MatchResults.Representative.IsReferenceMatched = false;

var representative = spot.MatchResults?.Representative;

if (representative != null) {

representative.IsReferenceMatched = false;

}

Copilot · 2026-02-08T14:52:42Z

src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs

            spotList = spots
-                       .Where(n => n.IsReferenceMatched(_evaluator))
+                       .Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative"))
                       .Where(n => !string.IsNullOrEmpty(n.MatchResults?.Representative?.InChIKey) && n.MatchResults.Representative.InChIKey.Length > 1)
                       .OrderByDescending(spot => spot.MatchResults.Representative.InChIKey)


StartsWith("Putative") uses current-culture, case-sensitive comparison and will also exclude names like "PutativeX" (no colon). If this is meant to detect the prefix added by ChangeAnnotationToLowScore, consider checking for the exact prefix (e.g., "Putative:") with an explicit StringComparison (Ordinal/OrdinalIgnoreCase) to avoid unintended exclusions.

Copilot · 2026-02-08T14:52:43Z

src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs

+            var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList();
            var currentPeakId = 0;
            var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID;



Test coverage: this change alters the alignment deduplication behavior (Putative prefix filtering + match flag behavior), but the existing RefineMspDuplicateTest in LcmsAlignmentRefinerTests is currently marked [Ignore], and there don’t appear to be active tests covering MatchResultAnnotationDeduplicator. Adding/enabling a test for the updated behavior would help prevent regressions.

Copilot · 2026-02-08T14:52:43Z

src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs

+                    var used = new HashSet<MoleculeMsReference>(new MoleculeMsReferenceEqualityComparer());
                    foreach (var (container, i) in containers.WithIndex().OrderByDescending(p => p.Item1.Representative.TotalScore)) {


The new HashSet(new MoleculeMsReferenceEqualityComparer()) is unsafe because the comparer’s Equals()/GetHashCode() are inconsistent: Equals() can return true via Name/ScanID matching even when only one object has InChIKey/ScanID set, but GetHashCode() prefers the first available field on each object. This breaks HashSet invariants (same item may not be found / duplicates may slip through). Consider switching to a key-based approach (e.g., HashSet of normalized string keys and add all applicable keys), or redefine equality to use a single canonical key selection that GetHashCode mirrors.

Copilot · 2026-02-08T14:52:43Z

src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs

+        public bool Equals(MoleculeMsReference x, MoleculeMsReference y) {
+            if (ReferenceEquals(x, y)) return true;
+            if (x is null || y is null) return false;
+
+            if (!string.IsNullOrEmpty(x.InChIKey) && !string.IsNullOrEmpty(y.InChIKey)) {
+                if (x.InChIKey == y.InChIKey) return true;
+            }
+
+            if (x.ScanID != 0 && y.ScanID != 0 && x.ScanID == y.ScanID) {
+                return true;
+            }
+
+            if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) {
+                if (x.Name.ToLower() == y.Name.ToLower()) return true;
+            }
+
+            return false;
+        }


MoleculeMsReferenceEqualityComparer violates the requirement that if Equals(x,y) is true then GetHashCode(x) must equal GetHashCode(y). With the current “match on any of InChIKey/ScanID/Name” logic, two objects can be equal by Name while hashing by different fields, which will break HashSet/Dictionary behavior. The comparer should either (1) compute hash from a single canonical key that Equals also uses, or (2) avoid OR-equality and instead deduplicate by maintaining multiple key sets (InChIKey/ScanID/Name) separately.

Copilot · 2026-02-08T14:52:43Z

src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs

+            if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) {
+                if (x.Name.ToLower() == y.Name.ToLower()) return true;
+            }


Using Name.ToLower() for comparisons/hash codes is culture-sensitive and allocates. Prefer a case-insensitive ordinal comparison (e.g., string.Equals(..., StringComparison.OrdinalIgnoreCase)) and a matching hash strategy (e.g., StringComparer.OrdinalIgnoreCase.GetHashCode(name)) to avoid locale-dependent behavior.

Copilot · 2026-02-08T14:52:43Z

src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs

+            var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList();
            var currentPeakId = 0;
            var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID;



The new filter !n.Name.StartsWith("Putative") can make spotList empty, but the code immediately indexes spotList[0] (line 35), which will throw. Add an early return when spotList.Count == 0 before accessing spotList[currentPeakId] (and consider using a more specific prefix like "Putative:" with an explicit StringComparison).

modified the source code to exclude duplicated metabolite info.

0d5c691

htsugawa requested review from YukiMatsuzawa and Copilot February 8, 2026 14:42

Copilot started reviewing on behalf of htsugawa February 8, 2026 14:42 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

htsugawa merged commit 420489e into master Feb 12, 2026
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modified the source code to exclude duplicated metabolite info.#703

modified the source code to exclude duplicated metabolite info.#703
htsugawa merged 1 commit intomasterfrom
gcms_annotation_pipline_changed

htsugawa commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

-            //spot.MatchResults.Representative.IsReferenceMatched = false;
+            var representative = spot.MatchResults?.Representative;
+            if (representative != null) {
+                representative.IsReferenceMatched = false;
+            }

		var used = new HashSet<MoleculeMsReference>(new MoleculeMsReferenceEqualityComparer());
		foreach (var (container, i) in containers.WithIndex().OrderByDescending(p => p.Item1.Representative.TotalScore)) {

Conversation

htsugawa commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments