Skip to content

Commit

Permalink
update schemas and sample files (virusseq#18)
Browse files Browse the repository at this point in the history
* update schemas and sample files

* new line
  • Loading branch information
jaserud authored Apr 12, 2021
1 parent 324b5f1 commit d403906
Show file tree
Hide file tree
Showing 15 changed files with 199 additions and 101 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ A docker-compose setup is available with all required services for MUSE in `./co
## Run:
Move to compose folder: `cd compose`

Start containers: `docker compose up -d`

Initialize minio: `sh ./init.sh`
Initialize dependency services: `./init-dep.sh`

Note: Sometimes SONG-server might fail to start if DB is not ready yet; restart should fix it

Expand Down
2 changes: 1 addition & 1 deletion compose/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ services:
SPRING_FLYWAY_ENABLED: "true"
SPRING_FLYWAY_LOCATIONS: "classpath:flyway/sql,classpath:db/migration"
SPRING_PROFILES: demo, auth
JWT_DURATIONMS: 300000 # expire tokens in 5 min for local testing
JWT_DURATIONMS: 1800000 # expire tokens in 30 min for local testing
expose:
- "8080"
ports:
Expand Down
3 changes: 3 additions & 0 deletions compose/init.sh → compose/init-dep.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
echo Start dependency services
docker-compose up -d

echo Make default test bucket
docker run --rm \
--network host \
Expand Down
2 changes: 1 addition & 1 deletion compose/song-db-init/song-init.sql

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,11 @@
<artifactId>velocity</artifactId>
<version>1.7</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion sample-files/test.fasta → sample-files/L00210314.fasta
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
>MuseTest/Qc-L00244359/2020 seq_method:Illumina_NexteraFlex|assemb_method:ivar|snv_call_method:ivar
>MuseTest/Qc-L00210314/2020 seq_method:Illumina_NexteraFlex|assemb_method:ivar|snv_call_method:ivar
NNNNNNNNNNNNNNNNNTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
AGATGATTACCAAGGTAAACCTTTGGAATTTGGTGCCACTTCTGCTGCTCTTCAACCTGA
AGAAGAGCAAGAAGAAGATTGGTTAGATGATGATAGTCAACAAACTGTTGGTCAACAAGA
Expand Down
48 changes: 48 additions & 0 deletions sample-files/L00212401.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
>MuseTest/Qc-L00212401/2020 seq_method:Illumina_NexteraFlex|assemb_method:ivar|snv_call_method:ivar
NNNNNNNNNNNNNNNNNTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
AGATGATTACCAAGGTAAACCTTTGGAATTTGGTGCCACTTCTGCTGCTCTTCAACCTGA
AGAAGAGCAAGAAGAAGATTGGTTAGATGATGATAGTCAACAAACTGTTGGTCAACAAGA
CGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTCAACCTCAATT
AGAGATGGAACTTACACCAGTTGTTCAGACTATTGAAGTGAATAGTTTTAGTGGTTATTT
CAACCAAATGTGCCTTTCAACTCTCATGAAGTGTGATCATTGTGGTGAAACTTCATGGCA
GACGGGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGA
AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC
GCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATGAATG
CAACCAAATGTGCCTTTCAACTCTCATGAAGTGTGATCATTGTGGTGAAACTTCATGGCA
GACGGGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGA
AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC
ATGTCACAATTCAGAAGTAGGACCTGAGCATAGTCTTGCCGAATACCATAATGAATCTGG
CTTGAAAACCATTCTTCGTAAGGGTGGTCGCACTATTGCCTTTGGAGGCTGTGTGTTCTC
TTATGTTGGTTGCCATAACAAGTGTGCCTATTGGGTTCCACGTGCTAGCGCTAACATAGG
TTGTAACCATACAGGTGTTGTTGGAGAAGGTTCCGAAGGTCTTAATGACAACCTTCTTGA
AATACTCCAAAAAGAGAAAGTCAACATCAATATTGTTGGTGACTTTAAACTTAATGAAGA
GATCGCCATTATTTTGGCATCTTTTTCTGCTTCCACAAGTGCTTTTGTGGAAACTGTGAA
AGGTTTGGATTATAAAGCATTCAAACAAATTGTTGAATCCTGTGGTAATTTTAAAGTTAC
AAAAGGAAAAGCTAAAAAAGGTGCCTGGAATATTGGTGAACAGAAATCAATACTGAGTCC
TCTTTATGCATTTGCATCAGAGGCTGCTCGTGTTGTACGATCAATTTTCTCCCGCACTCT
TGAAACTGCTCAAAATTCTGTGCGTGTTTTACAGAAGGCCGCTATAACAATACTAGATGG
AATTTCACAGTATTCACTGAGACTCATTGATGCTATGATGTTCACATCTGATTTGGCTAC
TAACAATCTAGTTGTAATGGCCTACATTACAGGTGGTGTTGTTCAGTTGACTTCGCAGTG
GCTAACTAACATCTTTGGCACTGTTTATGAAAAACTCAAACCCGTCCTTGATTGGCTTGA
AGAGAAGTTTAAGGAAGGTGTAGAGTTTCTTAGAGACGGTTGGGAAATTGTTAAATTTAT
CTCAACCTGTGCTTGTGAAATTGTCGGTGGACAAATTGTCACCTGTGCAAAGGAAATTAA
GGAGAGTGTTCAGACATTCTTTAAGCTTGTAAATAAATTTTTGGCTTTGTGTGCTGACTC
AGAAGGTGATTGTGAAGAAGAAGAGTTTGAGCCATCAACTCAATATGAGTATGGTACTGA
AGATGATTACCAAGGTAAACCTTTGGAATTTGGTGCCACTTCTGCTGCTCTTCAACCTGA
AGAAGAGCAAGAAGAAGATTGGTTAGATGATGATAGTCAACAAACTGTTGGTCAACAAGA
CGGCAGTGAGGACAATCAGACAACTACTATTCAAACAATTGTTGAGGTTCAACCTCAATT
AGAGATGGAACTTACACCAGTTGTTCAGACTATTGAAGTGAATAGTTTTAGTGGTTATTT
AAAACTTACTGACAATGTATACATTAAAAATGCAGACATTGTGGAAGAAGCTAAAAAGGT
AAAACCAACAGTGGTTGTTAATGCAGCCAATGTTTACCTTAAACATGGAGGAGGTGTTGC
AGGAGCCTTAAATAAGGCTACTAACAATGCCATGCAAGTTGAATCTGATGATTACATAGC
TACTAATGGACCACTTAAAGTGGGTGGTAGTTGTGTTTTAAGCGGACACAATCTTGCTAA
GCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATGAATG
CAACCAAATGTGCCTTTCAACTCTCATGAAGTGTGATCATTGTGGTGAAACTTCATGGCA
GACGGGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGA
AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC
GCTTGATGGCTTTATGGGTAGAATTCGATCTGTCTATCCAGTTGCGTCACCAAATGAATG
CAACCAAATGTGCCTTTCAACTCTCATGAAGTGTGATCATTGTGGTGAAACTTCATGGCA
GACGGGCGATTTTGTTAAAGCCACTTGCGAATTTTGTGGCACTGAGAATTTGACTAAAGA
AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC
TTTAGTAGTGCTATCCCCATGTGATTTTAATAGCTTCNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNN
5 changes: 3 additions & 2 deletions sample-files/metadata.tsv
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
studyId submitterSampleId submitterSpecimenId gender submitterDonorId host_age
COVIDPR Qc-L00244359 CoV-2-001-SP Female CoV-2-001 80
specimen collector sample ID sample collected by sequence submitted by sample collection date sample collection date precision geo_loc_name (country) geo_loc_name (state/province/territory) geo_loc_name (city) organism isolate purpose of sampling purpose of sampling details NML submitted specimen type anatomical material anatomical part body product environmental material environmental site collection device collection method host (scientific name) host disease host age host age unit host age bin host gender purpose of sequencing purpose of sequencing details sequencing date library ID sequencing instrument sequencing protocol name raw sequence data processing method dehosting method consensus sequence software name breadth of coverage value depth of coverage value consensus genome length Ns per 100 kbp reference genome accession lineage/clade name lineage/clade analysis software name lineage/clade analysis software version variant designation variant evidence
Qc-L00210314 Not Provided Laboratoire de santé publique du Québec (LSPQ) 2021-03-01 month Canada Quebec Montreal Severe acute respiratory syndrome coronavirus 2 Canada/Qc-L00210314/2020 Cluster/Outbreak investigation Not Provided Not Applicable Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Homo sapiens COVID-19 41 year 40 - 49 Female Baseline surveillance (random sampling) Not Provided 2020-05-01 exp_001 Oxford Nanopore 1D_DNA_MinION Trimmomatic v. 0.38 Nanostripper bcftools 95% 400x 38677 330 NC_045512.2 B.1.1.7 Pangolin 2.1.10 Not Applicable Sequencing
Qc-L00212401 Not Provided Laboratoire de santé publique du Québec (LSPQ) 2021-03-04 month Canada Quebec Montreal Severe acute respiratory syndrome coronavirus 2 Canada/Qc-L00212401/2020 Cluster/Outbreak investigation Not Provided Not Applicable Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Not Provided Homo sapiens COVID-19 42 year 40 - 49 Male Baseline surveillance (random sampling) Not Provided 2020-08-01 exp_002 Oxford Nanopore 1D_DNA_MinION Trimmomatic v. 0.38 Nanostripper bcftools 98% 400x 28677 240 NC_045512.2 B.1.1.4 Pangolin 2.1.10 Not Provided Sequencing
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,13 @@
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.JsonNodeFactory;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.StringWriter;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.BiFunction;
import java.util.function.BinaryOperator;
import lombok.*;
import lombok.extern.slf4j.Slf4j;
import org.apache.velocity.VelocityContext;
import org.apache.velocity.app.Velocity;
import org.apache.commons.text.StringSubstitutor;
import org.cancogenvirusseq.muse.config.MuseAppConfig;
import org.cancogenvirusseq.muse.exceptions.submission.PayloadFileMapperException;
import org.cancogenvirusseq.muse.model.SubmissionFile;
Expand Down Expand Up @@ -95,11 +93,10 @@ private static ObjectNode fromJsonStr(String jsonStr) {

private static String convertRecordToPayload(
Map<String, String> valuesMap, String payloadTemplate) {
val context = new VelocityContext();
valuesMap.forEach(context::put);
val writer = new StringWriter();
Velocity.evaluate(context, writer, "", payloadTemplate);
return writer.toString();
val sub = new StringSubstitutor(valuesMap);
// throw error if valuesMap is missing template values in payloadTemplate
sub.setEnableUndefinedVariableException(true);
return sub.replace(payloadTemplate);
}

private static JsonNode createFilesObject(SubmissionFile submissionFile) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ private ParserValidResult parseAndValidate(String[] lines, String[] headers) {

Map<String, String> record = new HashMap<>();
for (int i = 0; i < headers.length; ++i) {
record.put(headers[i], i > data.length ? "" : data[i]);
record.put(headers[i], i >= data.length ? "" : data[i]);
}

// collect field errors for record
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public MuseAppConfig() {
val mapper = new ObjectMapper();
val fieldSchemas = mapper.readValue(url1, new TypeReference<List<TsvFieldSchema>>() {});

URL url2 = Resources.getResource("payload-template.vm");
URL url2 = Resources.getResource("payload-template");

this.tsvFieldSchemas = ImmutableList.copyOf(fieldSchemas);
this.expectedTsvHeaders =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ public Flux<Tuple3<String, Upload, SubmissionFile>> toStreamOfPayloadUploadAndSu
.originalFilePair(List.of(submissionFile.getFileName()))
.build();

log.debug(payload.toPrettyString());

return Tuples.of(payload.toString(), upload, submissionFile);
}));
}
Expand Down
82 changes: 82 additions & 0 deletions src/main/resources/payload-template
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"analysisType": {
"name": "consensus_sequence"
},
"studyId": "COVIDPR",
"samples": [
{
"submitterSampleId": "${specimen collector sample ID}",
"matchedNormalSubmitterSampleId": null,
"sampleType": "Total RNA",
"specimen": {
"submitterSpecimenId": "${specimen collector sample ID}",
"tumourNormalDesignation": "Normal",
"specimenTissueSource": "Other",
"specimenType": "Normal"
},
"donor": {
"submitterDonorId": "${specimen collector sample ID}",
"gender": "${host gender}"
}
}
],
"sample_collection": {
"sample_collected_by": "${sample collected by}",
"sequence_submitted_by": "${sequence submitted by}",
"sample_collection_date": "${sample collection date}",
"sample_collection_date_precision": "${sample collection date precision}",
"geo_loc_country": "${geo_loc_name (country)}",
"geo_loc_province": "${geo_loc_name (state/province/territory)}",
"geo_loc_city": "${geo_loc_name (city)}",
"organism": "${organism}",
"isolate": "${isolate}",
"purpose_of_sampling": "${purpose of sampling}",
"purpose_of_sampling_details": "${purpose of sampling details}",
"nml_submitted_specimen_type": "${NML submitted specimen type}",
"anatomical_material": "${anatomical material}",
"anatomical_part": "${anatomical part}",
"body_product": "${body product}",
"environmental_material": "${environmental material}",
"environmental_site": "${environmental site}",
"collection_device": "${collection device}",
"collection_method": "${collection method}"
},
"host": {
"host_scientific_name": "${host (scientific name)}",
"host_disease": "${host disease}",
"host_age": ${host age},
"host_age_unit": "${host age unit}",
"host_age_bin": "${host age bin}",
"host_gender": "${host gender}"
},
"experiment": {
"purpose_of_sequencing": "${purpose of sequencing}",
"purpose_of_sequencing_details": "${purpose of sequencing details}",
"sequencing_date": "${sequencing date}",
"library_id": "${library ID}",
"sequencing_instrument": "${sequencing instrument}",
"sequencing_protocol_name": "${sequencing protocol name}"
},
"sequence_analysis": {
"raw_sequence_data_processing_method": "${raw sequence data processing method}",
"dehosting_method": "${dehosting method}",
"consensus_sequence_software_name": "${consensus sequence software name}",
"consensus_sequence_software_version": null,
"metrics": {
"breadth_of_coverage": "${breadth of coverage value}",
"depth_of_coverage": "${depth of coverage value}",
"consensus_genome_length": ${consensus genome length},
"Ns_per_100kbp": ${Ns per 100 kbp}
},
"reference_genome_accession": "${reference genome accession}",
"bioinformatics_protocol": null
},
"lineage_analysis": {
"lineage_name": "${lineage/clade name}",
"lineage_analysis_software_name": "${lineage/clade analysis software name}",
"lineage_analysis_software_version": "${lineage/clade analysis software version}",
"variant_designation": "${variant designation}",
"variant_evidence": "${variant evidence}",
"variant_evidence_details": null
}
}
62 changes: 0 additions & 62 deletions src/main/resources/payload-template.vm

This file was deleted.

66 changes: 45 additions & 21 deletions src/main/resources/tsv-schema.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,47 @@
[
{ "name": "studyId", "valueType": "string" },
{
"name": "submitterSampleId",
"valueType": "string"
},
{
"name": "submitterSpecimenId",
"valueType": "string"
},
{
"name": "gender",
"valueType": "string"
},
{
"name": "submitterDonorId",
"valueType": "string"
},
{
"name": "host_age",
"valueType": "number"
}
{ "name": "specimen collector sample ID", "valueType": "string" },
{ "name": "sample collected by", "valueType": "string" },
{ "name": "sequence submitted by", "valueType": "string" },
{ "name": "sample collection date", "valueType": "string" },
{ "name": "sample collection date precision", "valueType": "string" },
{ "name": "geo_loc_name (country)", "valueType": "string" },
{ "name": "geo_loc_name (state/province/territory)", "valueType": "string" },
{ "name": "geo_loc_name (city)", "valueType": "string" },
{ "name": "organism", "valueType": "string" },
{ "name": "isolate", "valueType": "string" },
{ "name": "purpose of sampling", "valueType": "string" },
{ "name": "purpose of sampling details", "valueType": "string" },
{ "name": "NML submitted specimen type", "valueType": "string" },
{ "name": "anatomical material", "valueType": "string" },
{ "name": "anatomical part", "valueType": "string" },
{ "name": "body product", "valueType": "string" },
{ "name": "environmental material", "valueType": "string" },
{ "name": "environmental site", "valueType": "string" },
{ "name": "collection device", "valueType": "string" },
{ "name": "collection method", "valueType": "string" },
{ "name": "host (scientific name)", "valueType": "string" },
{ "name": "host disease", "valueType": "string" },
{ "name": "host age", "valueType": "number" },
{ "name": "host age unit", "valueType": "string" },
{ "name": "host age bin", "valueType": "string" },
{ "name": "host gender", "valueType": "string" },
{ "name": "purpose of sequencing", "valueType": "string" },
{ "name": "purpose of sequencing details", "valueType": "string" },
{ "name": "sequencing date", "valueType": "string" },
{ "name": "library ID", "valueType": "string" },
{ "name": "sequencing instrument", "valueType": "string" },
{ "name": "sequencing protocol name", "valueType": "string" },
{ "name": "raw sequence data processing method", "valueType": "string" },
{ "name": "dehosting method", "valueType": "string" },
{ "name": "consensus sequence software name", "valueType": "string" },
{ "name": "breadth of coverage value", "valueType": "string" },
{ "name": "depth of coverage value", "valueType": "string" },
{ "name": "consensus genome length", "valueType": "number" },
{ "name": "Ns per 100 kbp", "valueType": "number" },
{ "name": "reference genome accession", "valueType": "string" },
{ "name": "lineage/clade name", "valueType": "string" },
{ "name": "lineage/clade analysis software name", "valueType": "string" },
{ "name": "lineage/clade analysis software version", "valueType": "string" },
{ "name": "variant designation", "valueType": "string" },
{ "name": "variant evidence", "valueType": "string" }
]

0 comments on commit d403906

Please sign in to comment.