Skip to content

EVA-4059 duplicate SS accession QC job#493

Merged
nitin-ebi merged 3 commits intoEBIvariation:masterfrom
nitin-ebi:duplicate-ss-acc-qc
Feb 5, 2026
Merged

EVA-4059 duplicate SS accession QC job#493
nitin-ebi merged 3 commits intoEBIvariation:masterfrom
nitin-ebi:duplicate-ss-acc-qc

Conversation

@nitin-ebi
Copy link
Copy Markdown
Contributor

No description provided.

@nitin-ebi nitin-ebi self-assigned this Feb 2, 2026
public class DuplicateSSAccQCWriterConfiguration {
@Bean(DUPLICATE_SS_ACC_QC_WRITER)
@StepScope
DuplicateSSAccQCWriter duplicateSSAccQCWrite(InputParameters parameters) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DuplicateSSAccQCWriter duplicateSSAccQCWrite(InputParameters parameters) {
DuplicateSSAccQCWriter duplicateSSAccQCWriter(InputParameters parameters) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

}

@Override
public void open(org.springframework.batch.item.ExecutionContext executionContext) throws ItemStreamException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just import ExecutionContext

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one use the full path and the other do not ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not, it is because of copy-paste where sometimes IntelliJ automatically imports with full path.
Updated

if (duplicateSSAccQCResultList != null && !duplicateSSAccQCResultList.isEmpty()) {
appendToFile(duplicateSSAccQCResultList);
} else {
logger.info("No duplicate SS IDs in the batch to append");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to log in this case, as it's the normal case and might bloat the logs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

@Test
public void contextLoads() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

}

@Override
public void open(org.springframework.batch.item.ExecutionContext executionContext) throws ItemStreamException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one use the full path and the other do not ?

/**
* Read all SubmittedVariant Accessions from VCF file in batches
*/
public class DuplicateSSAccQCFileReader implements ItemStreamReader<List<Long>> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be called a DuplicateSSAccQCFileReader ? There is nothing that makes it specific to finding duplicates and it could be reused elsewhere is we want to read ids from a VCF.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Query query = query(where(ACCESSION_FIELD).in(sveAccessions).and(REMAPPED_FROM_FIELD).exists(false));
logger.info("Issuing find in EVA collection for SVEs containing the given accessions : {}", query);
List<SubmittedVariantEntity> evaResults = mongoTemplate.find(query, SubmittedVariantEntity.class);
List<DbsnpSubmittedVariantEntity> dbsnpResults = mongoTemplate.find(query, DbsnpSubmittedVariantEntity.class);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to query DbsnpSubmittedVariantEntity.class here since all new ssid will be added exclusively to SubmittedVariantEntity.class ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@nitin-ebi nitin-ebi requested a review from tcezard February 5, 2026 11:43
@nitin-ebi nitin-ebi merged commit 453be80 into EBIvariation:master Feb 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants