Skip to content

Commit

Permalink
Merge pull request #8 from broadinstitute/multiple-inputs
Browse files Browse the repository at this point in the history
Support multiple inputs
  • Loading branch information
mtomko authored Dec 13, 2022
2 parents 7e51712 + 86d1b41 commit 5a39702
Show file tree
Hide file tree
Showing 22 changed files with 8,286 additions and 62 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
- uses: actions/[email protected]
- uses: jrouly/scalafmt-native-action@v2
with:
version: '3.3.2'
version: '3.5.8'
- name: Set up JDK 8
uses: actions/[email protected]
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
- run: git config --global user.name "GPP Informatics"
- uses: jrouly/scalafmt-native-action@v2
with:
version: '3.3.2'
version: '3.5.8'
- name: Set up JDK 8
uses: actions/[email protected]
with:
Expand Down
2 changes: 1 addition & 1 deletion .scalafmt.conf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# PoolQ3 .scalafmt configuration
version=3.3.2
version=3.5.8
style = IntelliJ

maxColumn = 120
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 3.6.0
* Support reading multiple reads files sequentially as part of a single run

## 3.5.0
Released as open source under a BSD 3-Clause license

Expand Down
17 changes: 13 additions & 4 deletions docs/MANUAL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# PoolQ
PoolQ is a counter for indexed samples from next-gen sequencing of pooled DNA.

*This documentation covers PoolQ version 3.5.0 (last updated 11/22/2022).*
*This documentation covers PoolQ version 3.6.0 (last updated 12/05/2022).*

## Background
The Broad Institute Genetic Perturbation Platform (GPP) uses Illumina sequencing to tally the
Expand Down Expand Up @@ -141,6 +141,15 @@ case, the file with shorter reads is assumed to contain the column barcodes, whi
longer reads is assumed to contain the row barcodes. This mode of operation requires that FASTQ
record IDs match between the two input files.

Certain sequencing technologies produce multiple reads files for a single run of sequencing; these
files could be processed by earlier versions of PoolQ by simply concatenating them sequentially.
However PoolQ now also has the ability to read multiple files one after another; it will accumulate
and report read counts for the files in aggregate. To process multiple files at the command line,
simply provide the list of input files separated by commas. PoolQ will process the files in the
order they are provided. This option works in combination with split reads (where row an column
barcodes are split between files), but you must take care to provide the input files of each type
in corresponding order.

### Reference Files
Reference files map DNA barcodes to their associated identifiers. PoolQ uses two reference files to
define the rows and columns found in the counts file.
Expand Down Expand Up @@ -481,7 +490,7 @@ PoolQ you will need a Java 8 JDK. You can download an appropriate JRE or JDK fro
You can download PoolQ from an as yet undetermined location. The file you download is a ZIP file
that you will need to unzip. In most cases, this is as simple as right-clicking on the zip file, and
selecting something like "extract contents" from the popup menu. This will create a new folder on
your computer named `poolq-3.5.0`, with the following contents:
your computer named `poolq-3.6.0`, with the following contents:

* `poolq3.jar`
* `poolq3.bat`
Expand Down Expand Up @@ -531,7 +540,7 @@ You can run PoolQ from any Windows, Mac, or Linux machine, but it requires some
how to launch programs from the command line on your given operating system.

1. Open a terminal window for your operating system
2. Change directories to the `poolq-3.5.0` directory
2. Change directories to the `poolq-3.6.0` directory
* On Windows, run:

> `poolq3.bat`
Expand All @@ -547,7 +556,7 @@ how to launch programs from the command line on your given operating system.
If you successfully launched PoolQ, you should see a usage message explaining all of the
command-line options:

poolq 3.5.0
poolq 3.6.0
Usage: poolq [options]

--row-reference <file> reference file for row barcodes (i.e., constructs)
Expand Down
89 changes: 65 additions & 24 deletions src/main/scala/org/broadinstitute/gpp/poolq3/PoolQConfig.scala
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ import java.nio.file.{Files, Path, Paths}

import scala.collection.mutable

import cats.data.{NonEmptyList => Nel}
import cats.syntax.all._
import org.broadinstitute.gpp.poolq3.PoolQConfig.DefaultPath
import org.broadinstitute.gpp.poolq3.reports.{GctDialect, PoolQ2Dialect, PoolQ3Dialect, ReportsDialect}
import org.broadinstitute.gpp.poolq3.types.ReadIdCheckPolicy
import org.broadinstitute.gpp.poolq3.types.{PoolQException, ReadIdCheckPolicy}
import scopt.{OptionParser, Read}

final case class PoolQInput(
Expand All @@ -25,8 +26,31 @@ final case class PoolQInput(
reverseRowReads: Option[Path] = None,
colReads: Option[Path] = None,
reads: Option[Path] = None,
readIdCheckPolicy: ReadIdCheckPolicy = ReadIdCheckPolicy.Strict
)
readIdCheckPolicy: ReadIdCheckPolicy = ReadIdCheckPolicy.Strict,
// these are companion to rowReads, reverseRowReads, colReads, and reads
// they are added thusly to retain source compatibility with the old object
addlRowReads: List[Path] = Nil,
addlReverseRowReads: List[Path] = Nil,
addlColReads: List[Path] = Nil,
addlReads: List[Path] = Nil
) {

def readsSourceE: Either[Exception, ReadsSource] = (rowReads, reverseRowReads, colReads, reads) match {
case (None, None, None, Some(r)) => Right(ReadsSource.SelfContained(Nel(r, addlReads)))
case (Some(rr), None, Some(cr), None) =>
val rs = ReadsSource.Split(Nel(cr, addlColReads), Nel(rr, addlRowReads))
if (rs.forward.length == rs.index.length) Right(rs)
else Left(PoolQException("Number of row, column, and reverse reads files must match"))
case (Some(rr), Some(rrr), Some(cr), None) =>
val rs = ReadsSource.PairedEnd(Nel(cr, addlColReads), Nel(rr, addlRowReads), Nel(rrr, addlReverseRowReads))
if (rs.forward.length == rs.index.length && rs.forward.length == rs.reverse.length) Right(rs)
else Left(PoolQException("Number of row and column reads files must match"))
case _ => Left(PoolQException("Conflicting input options"))
}

def readsSource: ReadsSource = readsSourceE.fold(e => throw e, rs => rs)

}

final case class PoolQOutput(
countsFile: Path = Paths.get("counts.txt"),
Expand Down Expand Up @@ -61,7 +85,12 @@ final case class PoolQConfig(
noopConsumer: Boolean = false
) {

def isPairedEnd = input.reverseRowReads.isDefined && reverseRowBarcodePolicyStr.isDefined
def isPairedEnd =
reverseRowBarcodePolicyStr.isDefined &&
(input.readsSourceE match {
case Right(ReadsSource.PairedEnd(_, _, _)) => true
case _ => false
})

}

Expand All @@ -71,6 +100,13 @@ object PoolQConfig {

implicit private[this] val readPath: Read[Path] = implicitly[Read[File]].map(_.toPath)

implicit private[this] val readPaths: Read[(Path, List[Path])] = implicitly[Read[Seq[File]]].map { files =>
files.toList.map(_.toPath) match {
case Nil => throw new IllegalArgumentException(s"No argument provided")
case (x :: xs) => (x, xs)
}
}

implicit private[this] val readReadIdCheckPolicy: Read[ReadIdCheckPolicy] =
implicitly[Read[String]].map(ReadIdCheckPolicy.forName)

Expand Down Expand Up @@ -106,29 +142,31 @@ object PoolQConfig {
c.copy(input = c.input.copy(globalReference = Some(f.toPath)))
}

opt[Path]("row-reads")
.valueName("<file>")
.action((f, c) => c.copy(input = c.input.copy(rowReads = Some(f))))
opt[(Path, List[Path])]("row-reads")
.valueName("<files>")
.action { case ((p, ps), c) => c.copy(input = c.input.copy(rowReads = Some(p), addlRowReads = ps)) }
.text("required if reads are split between two files")
.validate(existsAndIsReadable)
.validate { case (p, ps) => (p :: ps).traverse_(existsAndIsReadable) }

opt[Path]("rev-row-reads")
.valueName("<file>")
.action((f, c) => c.copy(input = c.input.copy(reverseRowReads = Some(f))))
opt[(Path, List[Path])]("rev-row-reads")
.valueName("<files>")
.action { case ((p, ps), c) =>
c.copy(input = c.input.copy(reverseRowReads = Some(p), addlReverseRowReads = ps))
}
.text("required for processing paired-end sequencing data")
.validate(existsAndIsReadable)
.validate { case (p, ps) => (p :: ps).traverse_(existsAndIsReadable) }

opt[Path]("col-reads")
.valueName("<file>")
.action((f, c) => c.copy(input = c.input.copy(colReads = Some(f))))
opt[(Path, List[Path])]("col-reads")
.valueName("<files>")
.action { case ((p, ps), c) => c.copy(input = c.input.copy(colReads = Some(p), addlColReads = ps)) }
.text("required if reads are split between two files")
.validate(existsAndIsReadable)
.validate { case (p, ps) => (p :: ps).traverse_(existsAndIsReadable) }

opt[Path]("reads")
.valueName("<file>")
.action((f, c) => c.copy(input = c.input.copy(reads = Some(f))))
opt[(Path, List[Path])]("reads")
.valueName("<files>")
.action { case ((p, ps), c) => c.copy(input = c.input.copy(reads = Some(p), addlReads = ps)) }
.text("required if reads are contained in a single file")
.validate(existsAndIsReadable)
.validate { case (p, ps) => (p :: ps).traverse_(existsAndIsReadable) }

opt[ReadIdCheckPolicy]("read-id-check-policy")
.valueName("<policy>")
Expand Down Expand Up @@ -272,16 +310,19 @@ object PoolQConfig {
// umi
val umiInfo = (config.input.umiReference, config.umiBarcodePolicyStr).tupled

def files(name: String, path: Option[Path], addl: List[Path]): Option[(String, String)] =
path.map(file => (name, (file :: addl).map(_.getFileName.toString).mkString(",")))

// input files
val input = config.input
args += (("row-reference", input.rowReference.getFileName.toString))
args += (("col-reference", input.colReference.getFileName.toString))
umiInfo.map(_._1).foreach(file => args += (("umi-reference", file.getFileName.toString)))
input.globalReference.foreach(file => args += (("global-reference", file.getFileName.toString)))
input.rowReads.foreach(file => args += (("row-reads", file.getFileName.toString)))
input.reverseRowReads.foreach(file => args += (("rev-row-reads", file.getFileName.toString)))
input.colReads.foreach(file => args += (("col-reads", file.getFileName.toString)))
input.reads.foreach(file => args += (("reads", file.getFileName.toString)))
files("row-reads", input.rowReads, input.addlRowReads).foreach(t => args += t)
files("rev-row-reads", input.reverseRowReads, input.addlReverseRowReads).foreach(t => args += t)
files("col-reads", input.colReads, input.addlColReads).foreach(t => args += t)
files("reads", input.reads, input.addlReads).foreach(t => args += t)
args += (("read-id-check-policy", input.readIdCheckPolicy.name))

// run control
Expand Down
20 changes: 20 additions & 0 deletions src/main/scala/org/broadinstitute/gpp/poolq3/ReadsSource.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
* Copyright (c) 2022 The Broad Institute, Inc. All rights reserved.
*
* SPDX-License-Identifier: BSD-3-Clause
*/
package org.broadinstitute.gpp.poolq3

import java.nio.file.Path

import cats.data.{NonEmptyList => Nel}

sealed trait ReadsSource extends Product with Serializable

object ReadsSource {

final case class SelfContained(paths: Nel[Path]) extends ReadsSource
final case class Split(index: Nel[Path], forward: Nel[Path]) extends ReadsSource
final case class PairedEnd(index: Nel[Path], forward: Nel[Path], reverse: Nel[Path]) extends ReadsSource

}
85 changes: 58 additions & 27 deletions src/main/scala/org/broadinstitute/gpp/poolq3/barcode/package.scala
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ package org.broadinstitute.gpp.poolq3

import java.nio.file.Path

import org.broadinstitute.gpp.poolq3.parser.{CloseableIterable, FastqParser, SamParser, TextParser}
import scala.collection.mutable

import org.broadinstitute.gpp.poolq3.ReadsSource
import org.broadinstitute.gpp.poolq3.parser.{CloseableIterable, CloseableIterator, FastqParser, SamParser, TextParser}
import org.broadinstitute.gpp.poolq3.types.{BamType, FastqType, Read, ReadsFileType, SamType, TextType}

package object barcode {
Expand All @@ -19,39 +22,31 @@ package object barcode {
colBarcodePolicy: BarcodePolicy,
umiBarcodePolicyOpt: Option[BarcodePolicy]
): CloseableIterable[Barcodes] =
(config.rowReads, config.reverseRowReads, config.colReads, config.reads) match {
case (Some(row), None, Some(col), _) =>
(config.readsSource, revRowBarcodePolicyOpt) match {
case (ReadsSource.Split(index, forward), None) =>
new TwoFileBarcodeSource(
parserFor(row),
parserFor(col),
parserFor(forward.toList),
parserFor(index.toList),
rowBarcodePolicy,
colBarcodePolicy,
umiBarcodePolicyOpt,
config.readIdCheckPolicy
)

case (Some(row), Some(revRow), Some(col), _) =>
revRowBarcodePolicyOpt match {
case None =>
throw new IllegalArgumentException("Paired end sequencing mode requires a reverse barcode policy")
case Some(revRowBarcodePolicy) =>
new ThreeFileBarcodeSource(
parserFor(row),
parserFor(revRow),
parserFor(col),
rowBarcodePolicy,
revRowBarcodePolicy,
colBarcodePolicy,
umiBarcodePolicyOpt,
config.readIdCheckPolicy
)
}

case (None, None, None, Some(reads)) =>
new SingleFileBarcodeSource(parserFor(reads), rowBarcodePolicy, colBarcodePolicy, umiBarcodePolicyOpt)

case (ReadsSource.PairedEnd(index, forward, reverse), Some(revRowBarcodePolicy)) =>
new ThreeFileBarcodeSource(
parserFor(forward.toList),
parserFor(reverse.toList),
parserFor(index.toList),
rowBarcodePolicy,
revRowBarcodePolicy,
colBarcodePolicy,
umiBarcodePolicyOpt,
config.readIdCheckPolicy
)
case (ReadsSource.SelfContained(paths), None) =>
new SingleFileBarcodeSource(parserFor(paths.toList), rowBarcodePolicy, colBarcodePolicy, umiBarcodePolicyOpt)
case _ =>
throw new IllegalArgumentException("Either reads or row and column reads files must be specified")
throw new IllegalArgumentException("Incompatible reads and barcode policy settings")
}

def parserFor(file: Path): CloseableIterable[Read] =
Expand All @@ -62,4 +57,40 @@ package object barcode {
case None => throw new IllegalArgumentException(s"File $file is of an unknown file type")
}

def parserFor(files: List[Path]): CloseableIterable[Read] =
parserFor[Path, Read](files, p => parserFor(p).iterator)

private[barcode] def parserFor[A, B](sources: List[A], mkIterator: A => CloseableIterator[B]): CloseableIterable[B] =
new CloseableIterable[B] {

override def iterator: CloseableIterator[B] = new CloseableIterator[B] {

private val queue: mutable.Queue[A] = mutable.Queue.from(sources)

var current: CloseableIterator[B] = _

override def hasNext: Boolean = {
var currentHasNext = if (current == null) false else current.hasNext
while (!currentHasNext && queue.nonEmpty) {
val head = queue.dequeue()
if (head != null) {
val old = current
current = mkIterator(head)
if (old != null) {
old.close()
}
currentHasNext = current.hasNext
}
}
currentHasNext
}

override def next(): B = if (current == null) throw new NoSuchElementException else current.next()

override def close(): Unit = Option(current).foreach(_.close())

}

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ final class ScoringConsumer(
(parsedBarcode.row, parsedBarcode.revRow, parsedBarcode.col) match {
case (f @ Some(_), revRowOpt, None) =>
// a forward row barcode region was found; extract the sequence and update stats
updateRowBarcodePositionStats(f, (if (pairedEndMode) revRowOpt else None))
updateRowBarcodePositionStats(f, if (pairedEndMode) revRowOpt else None)

case (f @ Some(parsedRow), None, Some(parsedCol)) =>
updateRowBarcodePositionStats(f, None)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* SPDX-License-Identifier: BSD-3-Clause
*/
package org.broadinstitute.gpp.poolq3

import org.broadinstitute.gpp.poolq3.parser.ReferenceEntry

/** Provides classes implementing reference databases as well as utility functions used by the various reference
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/*
* Copyright (c) 2022 The Broad Institute, Inc. All rights reserved.
*
* SPDX-License-Identifier: BSD-3-Clause
*/
package org.broadinstitute.gpp.poolq3.types

/** General exception raised by PoolQ */
final case class PoolQException(msg: String) extends RuntimeException(msg)
Loading

0 comments on commit 5a39702

Please sign in to comment.