modified the source code to exclude duplicated metabolite info.#703
modified the source code to exclude duplicated metabolite info.#703
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce duplicated metabolite annotations by (a) deduplicating GC/MS MSP top-hit reporting across features and (b) adjusting alignment-level annotation deduplication to treat previously demoted (“Putative”) spots differently.
Changes:
- GC/MS annotation: track “used” references with a custom
MoleculeMsReferenceequality comparer whenOnlyReportTopHitInMspSearchis enabled. - Alignment deduplication: exclude spots whose
Namestarts with"Putative"from dedup passes and change the putative prefix formatting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| src/MSDIAL5/MsdialGcMsApi/Algorithm/Annotation.cs | Adds a custom HashSet comparer for deduplicating MoleculeMsReference in top-hit MSP reporting. |
| src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs | Changes alignment annotation deduplication filtering and modifies how low-score duplicates are labeled as putative. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static void ChangeAnnotationToLowScore(AlignmentSpotProperty spot) { | ||
| spot.MatchResults.Representative.IsReferenceMatched = false; | ||
| spot.Name = "putative: " + spot.Name; | ||
| //spot.MatchResults.Representative.IsReferenceMatched = false; |
There was a problem hiding this comment.
ChangeAnnotationToLowScore() no longer clears spot.MatchResults.Representative.IsReferenceMatched. That means (for spots where reference-matched status comes from MatchResults) the “deduplicated” low-score spots may still be treated as reference-matched throughout the pipeline (e.g., blank filtering keeps ref-matched features). If the intent is to demote duplicates to putative, this should still update match flags (and/or clear the representative result) rather than only prefixing the name.
| //spot.MatchResults.Representative.IsReferenceMatched = false; | |
| var representative = spot.MatchResults?.Representative; | |
| if (representative != null) { | |
| representative.IsReferenceMatched = false; | |
| } |
| spotList = spots | ||
| .Where(n => n.IsReferenceMatched(_evaluator)) | ||
| .Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")) | ||
| .Where(n => !string.IsNullOrEmpty(n.MatchResults?.Representative?.InChIKey) && n.MatchResults.Representative.InChIKey.Length > 1) | ||
| .OrderByDescending(spot => spot.MatchResults.Representative.InChIKey) |
There was a problem hiding this comment.
StartsWith("Putative") uses current-culture, case-sensitive comparison and will also exclude names like "PutativeX" (no colon). If this is meant to detect the prefix added by ChangeAnnotationToLowScore, consider checking for the exact prefix (e.g., "Putative:") with an explicit StringComparison (Ordinal/OrdinalIgnoreCase) to avoid unintended exclusions.
| var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList(); | ||
| var currentPeakId = 0; | ||
| var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID; | ||
|
|
There was a problem hiding this comment.
Test coverage: this change alters the alignment deduplication behavior (Putative prefix filtering + match flag behavior), but the existing RefineMspDuplicateTest in LcmsAlignmentRefinerTests is currently marked [Ignore], and there don’t appear to be active tests covering MatchResultAnnotationDeduplicator. Adding/enabling a test for the updated behavior would help prevent regressions.
| var used = new HashSet<MoleculeMsReference>(new MoleculeMsReferenceEqualityComparer()); | ||
| foreach (var (container, i) in containers.WithIndex().OrderByDescending(p => p.Item1.Representative.TotalScore)) { |
There was a problem hiding this comment.
The new HashSet(new MoleculeMsReferenceEqualityComparer()) is unsafe because the comparer’s Equals()/GetHashCode() are inconsistent: Equals() can return true via Name/ScanID matching even when only one object has InChIKey/ScanID set, but GetHashCode() prefers the first available field on each object. This breaks HashSet invariants (same item may not be found / duplicates may slip through). Consider switching to a key-based approach (e.g., HashSet of normalized string keys and add all applicable keys), or redefine equality to use a single canonical key selection that GetHashCode mirrors.
| public bool Equals(MoleculeMsReference x, MoleculeMsReference y) { | ||
| if (ReferenceEquals(x, y)) return true; | ||
| if (x is null || y is null) return false; | ||
|
|
||
| if (!string.IsNullOrEmpty(x.InChIKey) && !string.IsNullOrEmpty(y.InChIKey)) { | ||
| if (x.InChIKey == y.InChIKey) return true; | ||
| } | ||
|
|
||
| if (x.ScanID != 0 && y.ScanID != 0 && x.ScanID == y.ScanID) { | ||
| return true; | ||
| } | ||
|
|
||
| if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) { | ||
| if (x.Name.ToLower() == y.Name.ToLower()) return true; | ||
| } | ||
|
|
||
| return false; | ||
| } |
There was a problem hiding this comment.
MoleculeMsReferenceEqualityComparer violates the requirement that if Equals(x,y) is true then GetHashCode(x) must equal GetHashCode(y). With the current “match on any of InChIKey/ScanID/Name” logic, two objects can be equal by Name while hashing by different fields, which will break HashSet/Dictionary behavior. The comparer should either (1) compute hash from a single canonical key that Equals also uses, or (2) avoid OR-equality and instead deduplicate by maintaining multiple key sets (InChIKey/ScanID/Name) separately.
| if (!string.IsNullOrEmpty(x.Name) && !string.IsNullOrEmpty(y.Name)) { | ||
| if (x.Name.ToLower() == y.Name.ToLower()) return true; | ||
| } |
There was a problem hiding this comment.
Using Name.ToLower() for comparisons/hash codes is culture-sensitive and allocates. Prefer a case-insensitive ordinal comparison (e.g., string.Equals(..., StringComparison.OrdinalIgnoreCase)) and a matching hash strategy (e.g., StringComparer.OrdinalIgnoreCase.GetHashCode(name)) to avoid locale-dependent behavior.
| var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator) && !n.Name.StartsWith("Putative")).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList(); | ||
| var currentPeakId = 0; | ||
| var currentLibraryId = spotList[currentPeakId].MatchResults.Representative.LibraryID; | ||
|
|
There was a problem hiding this comment.
The new filter !n.Name.StartsWith("Putative") can make spotList empty, but the code immediately indexes spotList[0] (line 35), which will throw. Add an early return when spotList.Count == 0 before accessing spotList[currentPeakId] (and consider using a more specific prefix like "Putative:" with an explicit StringComparison).
No description provided.