dbtk-dnabert

An implementation of DNABERT using Pytorch and the deepbio-toolkit library.

Getting Started

Install the dbtk-dnabert package

pip install dbtk-dnabert

Pull pre-trained DNABERT model

from dnabert import DnaBert

# Load the pre-trained model
model = DnaBert.from_pretrained("SirDavidLudwig/dnabert", revision="64d-silva16s-250bp")

Examples

Embed DNA sequences

# Sequences to embed
sequences = [
    "ACTGAATGAGAC",
    "TTGAGTAGCCAA"
]

# Tokenize sequences
sequence_tokens = torch.tensor([model.tokenizer(sequence) for sequence in sequences])

# Embed sequences
output = model(sequence_tokens)

# Sequence-level embeddings from class token
embeddings = output["class"]

# Sequence-level embeddings from averaged tokens
embeddings = output["tokens"].mean(dim=1)

Pre-trained Models

Model Name	Embedding Dim.	Maximum Length	Pre-training Dataset
`64d-silva16s-250bp`	64	250bp	Silva 16S
`768d-silva16s-250bp`	768	250bp	Silva 16S

Development

1. Model Configuration

Template model configurations can be generated using the dbtk model config command.

2. Pre-training

The model can be pre-trained using the supplied configurations with the command:

dbtk model fit \
    -c ./configs/datamodules/pretrain_silva_16s_250bp.yaml \
    -c ./configs/models/pretrain_dnabert_768d_250bp.yaml \
    -c ./configs/trainers/pretrainer.yaml \
    ./logs/dnabert_768d_250bp

3. Exporting

The trained model can be exported to a Huggingface model with the following command.

dbtk model export ./logs/dnabert_768d_250bp/last.ckpt ./exports/dnabert_768d_250bp

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
src/dnabert		src/dnabert
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbtk-dnabert

Getting Started

Examples

Pre-trained Models

Development

1. Model Configuration

2. Pre-training

3. Exporting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

DLii-Research/dbtk-dnabert

Folders and files

Latest commit

History

Repository files navigation

dbtk-dnabert

Getting Started

Examples

Pre-trained Models

Development

1. Model Configuration

2. Pre-training

3. Exporting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages