Conversation
Cristianetaniguti
commented
May 21, 2025
- When marker ID has more than one "_", last one is considered to extract the position and the remaining are collapsed as chromosome ID
- If more than one base pair polymorphism is found between Ref_0001 and Alt_0002, tag is discarded because it is not possible to identify which is the target and its position
There was a problem hiding this comment.
Pull Request Overview
This PR refines marker ID parsing for strawberry, preserving multi-part chromosome IDs and ensuring only single-base polymorphisms are processed.
- Add
check.names = FALSEto CSV reads and data frame creation to retain original column names - Collapse all but the last underscore-separated segment of
CloneIDinto the chromosome ID - Modify comparison logic to only accept single-base polymorphisms and discard tags with multiple mismatches
| if(isBotLoci) one_tag$AlleleSequence <- sapply(one_tag$AlleleSequence, function(x) as.character(reverseComplement(DNAString(x)))) | ||
|
|
||
| chr <- sapply(strsplit(cloneID, "_"), function(x) x[-length(x)]) | ||
| if(length(chr > 1)) chr <- paste(chr, collapse = "_") |
There was a problem hiding this comment.
The condition uses length(chr > 1), which computes the length of a logical comparison vector. It should be length(chr) > 1 to correctly check for multiple segments before collapsing.
| if(length(chr > 1)) chr <- paste(chr, collapse = "_") | |
| if(length(chr) > 1) chr <- paste(chr, collapse = "_") |
| # If target alternative have N, discard whole tag | ||
| if(all(alt_base %in% c("A", "T", "C", "G"))) { | ||
| # Always only one polymorphism, if there are more than one, not sure which is the target | ||
| if(all(alt_base %in% c("A", "T", "C", "G")) & length(align@pattern@mismatch@unlistData) == 1) { |
There was a problem hiding this comment.
Use && instead of & for scalar logical operations in if conditions to avoid unintended vectorized behavior.
| if(all(alt_base %in% c("A", "T", "C", "G")) & length(align@pattern@mismatch@unlistData) == 1) { | |
| if(all(alt_base %in% c("A", "T", "C", "G")) && length(align@pattern@mismatch@unlistData) == 1) { |
|
|
||
| clust <- makeCluster(n.cores) | ||
| #clusterExport(clust, c("hap_seq","add_ref_alt")) | ||
| #clusterExport(clust, c("hap_seq","add_ref_alt", "nsamples")) |
There was a problem hiding this comment.
The clusterExport call is commented out, so hap_seq, nsamples, and add_ref_alt won't be available on the workers. Uncomment or add the export call before parLapply to ensure these objects are passed to the cluster.
| #clusterExport(clust, c("hap_seq","add_ref_alt", "nsamples")) | |
| clusterExport(clust, c("hap_seq", "add_ref_alt", "nsamples")) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #37 +/- ##
===============================================
+ Coverage 84.13% 84.14% +0.01%
===============================================
Files 16 16
Lines 1147 1148 +1
===============================================
+ Hits 965 966 +1
Misses 182 182 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|