Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Input samplesheet validation] Sample names with point can result in accidental sample merge #102

Open
TimHHH opened this issue May 24, 2022 · 5 comments · Fixed by #135
Assignees
Milestone

Comments

@TimHHH
Copy link
Collaborator

TimHHH commented May 24, 2022

Given two samples with the same name before a point can result in an unwanted sample merge, e.g.:
Votintseva2017.614406.m
Votintseva2017.614406.1
In this case both different samples are interpreted as one, namely Votintseva2017.614406
Hence we should make a note in the manual that points are not allowed in the sample sheet.

@abhi18av
Copy link
Member

abhi18av commented Aug 1, 2022

I'm thinking about addressing this in a general sense of creating a samplesheet validation logic?

A python script would be triggered to validate the samplesheet with all the nuances - what do you think?

CC @LennertVerboven

@abhi18av abhi18av changed the title Sample names with point can result in accidental sample merge [Input samplesheet validation] Sample names with point can result in accidental sample merge Aug 30, 2022
@abhi18av
Copy link
Member

Notes from meeting on 30-Aug-2022

  • Sample names (i.e. the sample column) should not have dots (non-dash symbols). Add a list of symbols not allowed.
  • All fields have a value (no empty columns)
  • Checks for dots in reference genome names (SNPEFF / default_configs)
  • TODO: Evaluate quoted strings in the samplesheet
  • Read-1 should be different from Read-2
  • (OPTIONAL) Both of these files should exist

@TimHHH @LennertVerboven please feel free to add other validations.

@TimHHH
Copy link
Collaborator Author

TimHHH commented Sep 1, 2022

This one Checks for dots in reference genome names (SNPEFF / default_configs) can be dropped. Our pipeline is not designed for using other reference genomes because of downstream process that require H37Rv. However, modifying XBS-nf to run with a different reference genome is certainly doable for those with a programming background.

Another requirement: no two rows should exist with exactly the same Study Sample Library Attempt. (at least the attempt number should differ)

@TimHHH TimHHH assigned TimHHH and LennertVerboven and unassigned TimHHH Sep 6, 2022
@abhi18av
Copy link
Member

The initial effort has been done by @LennertVerboven and added here https://github.com/TORCH-Consortium/xbs-nf/blob/master/bin/sample_sheet_validation.py

@abhi18av
Copy link
Member

abhi18av commented Dec 5, 2022

TODO: @abhi18av Need to add another check for any duplicates in the samplesheet.

@abhi18av abhi18av reopened this Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants