PrimeSpecPCR: Species-Specific Primer Design Toolkit

Introduction

PrimeSpecPCR is a tool for designing species-specific oligonucleotides for quantitative PCR (qPCR) applications in microbiology, environmental science, food safety, and clinical diagnostics. It automates the entire workflow from retrieving genetic sequences from public databases to generating highly specific primer-probe sets optimized for target organisms. The toolkit has been validated through laboratory testing, confirming that computationally designed primers perform successfully under real PCR conditions.

The toolkit addresses a critical challenge in molecular diagnostics: developing oligonucleotides that can reliably discriminate between closely related species. PrimeSpecPCR achieves this through a rigorous, multi-stage approach that systematically identifies conserved regions within a target species while simultaneously evaluating cross-reactivity with non-target organisms.

Features

Automated Sequence Retrieval: Direct access to NCBI GenBank sequences using taxonomy IDs
Gene Feature Detection: Automatic identification and categorization of genes from retrieved sequences
Advanced Multiple Sequence Alignment: Integration with MAFFT for high-quality alignments with customizable parameters
Consensus Sequence Generation: Creation of consensus sequences with adjustable thresholds
Optimized Primer and Probe Design: Implementation using the Primer3-py library for thermodynamic assessment, structural evaluation, and generation of qPCR-optimized primer-probe sets
Specificity Testing: In silico validation against GenBank to identify potential cross-reactivity
Species-Specificity Verification: Taxonomic assessment of primer matches to ensure target specificity
Interactive Results Visualization: HTML dashboards for exploring primer specificity profiles
User-Friendly GUI Interface: Step-by-step workflow guidance with contextual help
Flexible Configuration: Customizable parameters for all analysis stages
Logging: Detailed tracking of each analysis step for reproducibility
- Laboratory Validated: Experimentally verified primer performance through PCR amplification and Sanger sequencing

System Requirements

Operating Systems:
- Linux (Ubuntu 20.04+, Debian 10+, Fedora 32+)
- macOS (10.15 Catalina or newer)
- Windows support is experimental and not fully tested
Software Dependencies:
- Python 3.8 or higher
- MAFFT (v7.450 or higher) for sequence alignment
- Internet connection for accessing NCBI databases
Python Package Dependencies:
- biopython (≥1.79)
- primer3-py (≥0.6.1)
- pandas (≥1.3.0)
- numpy (≥1.20.0)
- tqdm (≥4.61.0)
- validators (≥0.18.2)
- requests (≥2.25.0)
- tkinter (for GUI)
NCBI Credentials:
- Valid email address
- NCBI API key (obtain at: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)

Installation

Linux Installation

Install system dependencies:

sudo apt update
sudo apt install python3 python3-pip python3-tk mafft

Clone the repository:

git clone https://github.com/Adv20202/PrimeSpecPCR.git
cd PrimeSpecPCR

Install Python dependencies:
```
pip3 install -r requirements.txt
```
Run the application:
```
python3 run.py
```

Note: For Fedora/RHEL-based systems, use dnf instead of apt and python3-tkinter instead of python3-tk.

macOS Installation

Install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install system dependencies:
```
brew install python3 mafft
```

Clone the repository:

git clone https://github.com/Adv20202/PrimeSpecPCR.git
cd PrimeSpecPCR

Install Python dependencies:
```
pip3 install -r requirements.txt
```
Run the application:
```
python3 run.py
```

Installing from Source

For users who prefer to build from source or need to customize the installation:

Clone the repository:

git clone https://github.com/Adv20202/PrimeSpecPCR.git
cd PrimeSpecPCR

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install MAFFT if not already installed:
- Linux: sudo apt install mafft
- macOS: brew install mafft
Run the installation script to check all dependencies:
```
python install_dependencies.py
```
Launch the application:
```
python run.py
```

Workflow Overview

PrimeSpecPCR employs a systematic, sequential workflow divided into four main modules:

Genetic Sequence Retrieval: Downloads and organizes genetic sequences from NCBI based on taxonomy ID
Multiple Sequence Alignment: Aligns sequences and generates consensus for conserved regions identification
PCR Primer Design: Designs primer-probe sets optimized for qPCR applications
Primer Specificity Testing: Evaluates primer specificity against GenBank sequences

Each module builds upon the outputs of the previous one, creating an analysis pipeline for species-specific primer design.

Modules

Module 1: Genetic Sequence Retrieval

Purpose: Automatically retrieve genetic sequences from NCBI GenBank for specified taxonomic IDs (TaxIDs).

Input:

TaxID number(s) for target organism(s)
NCBI email address and API key

Process:

Validates the provided TaxID(s) against NCBI Taxonomy database
Searches for all available nucleotide sequences for the specified organism
Analyzes sequences to identify and categorize gene features
Groups sequences by gene annotations with consistent identifiers
Presents a ranked list of genes based on sequence availability
Allows selection of genes for further analysis
For each selected gene, displays candidate reference sequences
Uses selected reference sequences for BLAST searches to identify homologous regions
Downloads and organizes sequences for selected genes

Output:

FASTA files containing sequences for each selected gene
Statistics file summarizing gene distribution
Log file detailing the retrieval process

Directory: 1_/

Key Features:

Batch processing for efficient handling of large datasets
Automatic gene feature detection
Guided reference sequence selection
BLAST-based homology search to ensure sequence relevance

Module 2: Multiple Sequence Alignment

Purpose: Align genetic sequences to identify conserved and variable regions across the target organism's genome.

Input:

FASTA files from Module 1
User selection of files to align
Reference sequence selection for each alignment group

Process:

Displays available FASTA files from Module 1
Allows grouping of files for alignment
Sorts sequences by length for easier reference selection
Performs high-quality alignment using MAFFT
Filters sequences based on length threshold (excludes sequences >3000 nt)
Generates consensus sequences from alignments
Applies user-defined threshold for consensus generation

Output:

MAFFT alignment files
BLAST-like alignment visualizations
Consensus sequences in FASTA format
Alignment quality statistics

Directory: 2_/

Key Features:

Integration with MAFFT for accurate alignments
Interactive file and reference sequence selection
Consensus generation with adjustable thresholds
Alignment quality assessment

Module 3: PCR Primer Design

Purpose: Design PCR primer-probe sets optimized for qPCR based on the consensus sequences.

Input:

Consensus sequences from Module 2
User-specified primer design parameters (amplicon size range, etc.)

Process:

Loads consensus sequences from Module 2
Configures Primer3 with optimized parameters for qPCR applications
Implements iterative design to generate diverse primer sets
Evaluates thermodynamic properties (Tm, GC content, self-complementarity)
Ranks primer sets by quality scores
Provides multiple alternative primer sets for each target

Output:

CSV files containing primer set information
Detailed thermodynamic and structural analyses
Primer set rankings based on design quality

Directory: 3_/

Key Features:

Optimized for real-time PCR applications
Comprehensive thermodynamic analysis
Progressive region exclusion for diverse primer options
Customizable design parameters via PCR_primer_settings.txt

Module 4: Primer Specificity Testing

Purpose: Test the designed primers against GenBank sequences to ensure target specificity and identify potential cross-reactivity.

Input:

Primer sets from Module 3
Optional target taxonomy IDs for specificity evaluation

Process:

Performs BLAST searches for each primer against GenBank
Retrieves matching sequences and aligns primers to matches
Evaluates specificity based on mismatch patterns and positions
Places special emphasis on 3' region specificity
Categorizes matches based on binding potential
Associates sequences with taxonomy information
Generates interactive HTML report for result visualization

Output:

HTML dashboard with primer specificity profiles
Specificity rankings for primer sets
Visual alignment representations
Taxonomic analysis of potential matches

Directory: 4_/

Key Features:

Region-specific mismatch analysis
Structured scoring system for binding potential
Taxonomy-based specificity assessment
Interactive visualization of results
Customizable specificity parameters via primer_specificity_settings.txt

Using the GUI Interface

Figure 1: PrimeSpecPCR's main interface showing the configuration panel (left), program output (right), and input controls (bottom).

The PrimeSpecPCR graphical user interface provides a user-friendly environment for executing the entire workflow. The interface is divided into several key sections:

Configuration Panel

Located on the left side of the interface, this panel contains:

User Parameters:
- Email input field: For your NCBI registered email
- API Key input field: For your NCBI API key
- TaxID input field: For entering taxonomy IDs (comma-separated)
- "Save Configuration" button: Saves your settings for future use
Module Control Buttons:
- Sequential buttons for running each of the four modules
Process Control:
- "Stop Current Process" button: Halts the currently running module
- "Restart Application" button: Refreshes the application

Module Controls

The module buttons are organized sequentially, reflecting the workflow order:

Genetic Sequence Retrieval: Initiates the sequence download process
Multiple Sequence Alignment: Starts the alignment process
PCR Primer Design: Launches the primer design module
Primer Specificity Testing: Begins the specificity analysis

These buttons should be used in order, as each module depends on the output of the previous one.

Help and Guidance Panel

This panel provides context-sensitive help based on the current operation:

Initial Setup: Shows information about required credentials
Module-Specific Help: Displays relevant guidance when a module is selected
Input Guidance: Provides detailed explanations when user input is required

The help content updates dynamically based on the current program state and user actions.

Log and Output Panel

Located on the right side of the interface, this panel shows:

Process Logs: Real-time updates from the running module
Error Messages: Notifications about any issues encountered
Progress Indicators: Information about current processing status

This panel provides transparency into the program's operation and helps with troubleshooting.

Input Controls

At the bottom of the interface:

Input Field: For responding to program prompts
Send Button: Submits your input to the running module
Input Message: Yellow-highlighted area that displays the current prompt

These controls are enabled when the program requires user input and disabled otherwise.

Working with Examples

PrimeSpecPCR includes several example datasets located in the examples/ directory. Each example demonstrates the application of the toolkit to different target organisms:

Aspergillus tubingensis: Fungal pathogen
Aviadenovirus: Viral animal pathogen
Blumeria graminis: Plant pathogenic fungus
Burkholderia glumae: Bacterial plant pathogen
Daubentonia madagascariensis: Mammalian species
Erwinia psidii: Bacterial plant pathogen
Prevotella dentalis: Bacterial human pathogen
Puccinia recondita: Wheat leaf rust fungus
Treponema denticola: Bacterial human pathogen
Welwitschia mirabilis: Gymnosperm plant species

Each example folder contains its own README file with specific information about:

The target organism and its relevance
Special considerations for primer design
Recommended parameters
Expected results
Validation information (where available)

To use an example, navigate to its directory and follow the instructions in its README file.

Experimental Validation

PrimeSpecPCR has been validated through laboratory testing using biological samples. The experimental validation confirms that computationally designed primers perform successfully in laboratory PCR conditions.

Validation Dataset

The validation study tested primer sets designed for five different organisms:

Blumeria graminis f. sp. tritici (wheat powdery mildew)
Blumeria hordei (barley powdery mildew)
Capsella bursa-pastoris (shepherd's purse)
Equisetum arvense (field horsetail)
Zymoseptoria tritici (wheat septoria leaf blotch)

All validation data is available in the exp_evaluation/ directory.

Validation Methodology

The experimental validation included:

PCR Amplification: Testing primer sets under standard PCR conditions
Gel Electrophoresis: Confirming amplification products of expected sizes
Sanger Sequencing: Verifying presence of PCR products
BLAST Analysis: Confirming species specificity of amplified sequences
Comparative Analysis: Benchmarking against Primer3-designed primers

Validation Data Structure

The experimental validation data is organized by organism in the exp_evaluation/ directory:

For each tested organism:

1_/, 2_/, 3_/, 4_/ - Complete PrimeSpecPCR pipeline outputs
PCR_verification/ - Gel electrophoresis images
Primer3_verification/ - Comparative Primer3 results
Sanger_sequencing/ - Raw (.ab1) and processed (.seq) sequencing files
Sequence_analysis/ - BLAST alignment results
readme.md - Detailed experimental protocols

Key Validation Results

Success Rate: All tested primer sets successfully amplified target sequences
Specificity Confirmation: Sanger sequencing confirmed species-specific amplification
Comparative Performance: PrimeSpecPCR identified one primer set not found by Primer3 alone
Reproducibility: Duplicate PCR reactions showed consistent results

Accessing Validation Data

The complete validation dataset, including:

Gel images showing successful amplifications
Raw Sanger sequencing files (.ab1 format)
Processed sequence data and BLAST results

is available in the exp_evaluation/ directory of this repository.

Note: For complete methodological details and experimental conditions please refer to the associated publication [citation to be added upon publication].

Command Line Usage

While the GUI is recommended for most users, PrimeSpecPCR can also be run from the command line for advanced users or automated workflows:

# Running individual modules
python 1_Genetic_Background.py
python 2_MSA_Alignment.py
python 3_PCR_Primers_Design.py
python 4_Primers_Specificity.py

# Running with interactive mode for direct command line interaction
python 1_Genetic_Background.py --interactive

Command-line usage requires the same dependencies as the GUI version but allows for better integration with bioinformatics pipelines.

Parameter Files

PCR_primer_settings.txt

This file controls the primer design parameters in Module 3. Key settings include:

Primer Size Parameters: Length constraints for primers and probes
Tm Parameters: Melting temperature ranges and differentials
GC Content Parameters: Acceptable GC percentage ranges
Structural Parameters: Settings to prevent hairpins and dimers
Salt and DNA Concentration: Physical conditions for Tm calculations

Edit this file to customize primer design for specific applications or conditions.

primer_specificity_settings.txt

This file configures the specificity testing parameters in Module 4:

Search Parameters: BLAST settings for database searches
Specificity Stringency: Mismatch tolerances and critical regions
Cache Settings: Directory configurations for sequence storage

Modify these settings to adjust the stringency of specificity testing based on your research needs.

Troubleshooting

MAFFT Installation Issues:

Ensure MAFFT is properly installed: mafft --version
On Linux, install directly with: sudo apt install mafft
On macOS, install with: brew install mafft
If still not working, check the system PATH or use the built-in MAFFT installation option

NCBI API Access:

Verify your API key is valid and correctly entered
Check your internet connection
Ensure you're not exceeding NCBI's usage limits
Use the built-in rate limiting settings to prevent API blocks

Alignment Failures:

Check for unusual characters in sequence files
Try reducing the number of sequences being aligned
Adjust memory allocation if working with very large sequences
Use the length filtering option to exclude excessively long sequences

Primer Design Issues:

Inspect consensus sequences for high ambiguity (N) content
Adjust the primer design parameters for more permissive conditions
Verify the amplicon size range is appropriate for your target regions
Check that the consensus threshold wasn't set too high

GUI Interface Problems:

Verify tkinter is installed: python3 -c "import tkinter; print(tkinter.TkVersion)"
Check for console output indicating missing packages
Run the install_dependencies.py script to resolve dependency issues

Citations

When using PrimeSpecPCR in your research, please cite:

[Citation information will be added upon publication]

PrimeSpecPCR also incorporates several external tools and libraries that should be acknowledged:

MAFFT: Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772-780.
Biopython: Cock PJ, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422-1423.

License

PrimeSpecPCR is released under the MIT License. See the LICENSE file for details.

Acknowledgements

We gratefully acknowledge the support of:

NCBI for providing access to their genetic databases
The developers of MAFFT, Primer3-py library, and other open-source tools incorporated in this project
The research community for valuable feedback and testing

For questions, suggestions, or contributions, please contact the developer at akuzdralinski@pjwstk.edu.pl

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
examples		examples
exp_evaluation		exp_evaluation
images		images
mafft		mafft
.gitignore		.gitignore
1_Genetic_Background.py		1_Genetic_Background.py
2_MSA_Alignment.py		2_MSA_Alignment.py
3_PCR_Primers_Design.py		3_PCR_Primers_Design.py
4_Primers_Specificity.py		4_Primers_Specificity.py
LICENSE		LICENSE
PCR_primer_settings.txt		PCR_primer_settings.txt
README.md		README.md
build_executable.py		build_executable.py
gui_application.py		gui_application.py
install_dependencies.py		install_dependencies.py
primer_specificity_settings.txt		primer_specificity_settings.txt
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

PrimeSpecPCR: Species-Specific Primer Design Toolkit

Table of Contents

Introduction

Features

System Requirements

Installation

Linux Installation

macOS Installation

Installing from Source

Workflow Overview

Modules

Module 1: Genetic Sequence Retrieval

Module 2: Multiple Sequence Alignment

Module 3: PCR Primer Design

Module 4: Primer Specificity Testing

Using the GUI Interface

Configuration Panel

Module Controls

Help and Guidance Panel

Log and Output Panel

Input Controls

Working with Examples

Experimental Validation

Validation Dataset

Validation Methodology

Validation Data Structure

Key Validation Results

Accessing Validation Data

Command Line Usage

Parameter Files

PCR_primer_settings.txt

primer_specificity_settings.txt

Troubleshooting

Citations

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages