Skip to content

Development#145

Merged
alex-sandercock merged 59 commits intomainfrom
development
Sep 23, 2025
Merged

Development#145
alex-sandercock merged 59 commits intomainfrom
development

Conversation

@alex-sandercock
Copy link
Collaborator

This pull request introduces a major update to the Docker build and deployment workflow for the BIGapp project, refactors the Dockerfile setup for improved efficiency and multi-architecture support, and adds new VCF sanity checking functionality to the R Shiny application. It also updates the app version and makes several improvements to the genotype matrix formatting and filtering logic.

Docker build and deployment improvements:

  • Added a new GitHub Actions workflow (.github/workflows/dockerhub-on-version.yml) to automatically build and push multi-architecture Docker images (amd64 and arm64) to DockerHub when the package version changes, including manifest creation for multi-arch support.
  • Refactored the main Dockerfile to use a dependency image (bigapp-deps), improving build caching and separating base dependencies from app-specific installation. Runtime improvements include health checks and proper user setup.
  • Introduced Dockerfile.deps to build and cache R/CRAN/Bioc dependencies efficiently, including fallback logic for missing packages and version snapshotting for auditability.

VCF sanity checking and error handling:

  • Added new exported functions (vcf_sanity_check, vcf_sanity_messages) and integrated VCF sanity checks into both the DosageCall and Filtering modules, providing user feedback and halting analysis if checks fail. [1] [2] [3]

Genotype matrix and relationship matrix logic:

  • Updated get_relationship_mat to use the Gmatrix function with explicit parameters (ploidy.correction, ratio, missingValue) for more robust polyploid support and consistency. [1] [2]
  • Improved genotype matrix formatting in format_geno_matrix to restrict rrBLUP codification to diploid and Gmatrix scenarios only.

User interface and filtering enhancements:

  • Updated the app UI to display the dynamic package version using utils::packageVersion.
  • Enhanced the filtering module to include a placeholder for download UI, better error messages for malformed VCF files, and improved sample/marker tracking during filtering. [1] [2] [3] [4]

Version and dependency updates:

  • Bumped the package version to 1.5.1 in DESCRIPTION, and removed the BIGr remote from Imports (now handled via Docker dependencies). [1] [2]

Let me know if you want to dive deeper into any of these areas or need help understanding how the new Docker workflow or VCF sanity checks work!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements a comprehensive update to the BIGapp project with new VCF file validation capabilities, Docker infrastructure improvements, and enhanced genotype matrix processing. The update introduces VCF sanity checking across all modules that process genomic data and modernizes the build/deployment workflow with multi-architecture Docker support.

Key changes:

  • Added comprehensive VCF sanity checking functionality with exported functions for validation and user feedback
  • Refactored Docker build system with dependency image separation and multi-architecture support via GitHub Actions
  • Enhanced genotype matrix processing to use explicit Gmatrix parameters for improved polyploid support
  • Updated app version display to use dynamic package versioning and added citation information across help files

Reviewed Changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
R/vcf_sanity_check.R New comprehensive VCF validation functions with checks for headers, columns, format consistency, and data integrity
tests/testthat/test-vcf_sanity_check.R Test coverage for VCF sanity checking functionality
R/utils.R Added VCF compression detection utility and integrated sanity checks into read_geno_file function
R/GS_functions.R Updated genotype matrix formatting and relationship matrix generation with explicit Gmatrix parameters
Dockerfile/Dockerfile.deps Modernized Docker build with dependency separation and multi-architecture support
.github/workflows/dockerhub-on-version.yml Automated Docker build and push workflow triggered by version changes
R/mod_*.R Integration of VCF sanity checks across all genomic analysis modules

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
gzfile(vcf_path, open = "rt")
} else if(is_gz == "bzip2 (.bz2)") {
if (verbose) warning("File is compressed th bzip2 (.bz2), which is not supported.")
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in warning message: 'th' should be 'with'.

Suggested change
if (verbose) warning("File is compressed th bzip2 (.bz2), which is not supported.")
if (verbose) warning("File is compressed with bzip2 (.bz2), which is not supported.")

Copilot uses AI. Check for mistakes.
if(is.null(checks$checks)){
shinyalert(
title = "File Error",
text = checks$message,
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function accesses checks$message in line 437, but this property doesn't exist in the vcf_sanity_check object structure (which has checks, messages, duplicates, ploidy_max). This should be checks$messages.

Suggested change
text = checks$message,
text = checks$messages,

Copilot uses AI. Check for mistakes.
if (!inherits(x, "vcf_sanity_check")) stop("Object must be of class 'vcf_sanity_check'.")

if(is.null(x$checks)) {
warning(paste("Checks could not be performed. Because:", x$message))
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous issue, x$message should be x$messages since the vcf_sanity_check object structure uses 'messages' not 'message'.

Suggested change
warning(paste("Checks could not be performed. Because:", x$message))
warning(paste("Checks could not be performed. Because:", x$messages))

Copilot uses AI. Check for mistakes.
warning_if_true = NULL)

print(checks)
print(checks_result)
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements should be removed from production code.

Suggested change
print(checks_result)

Copilot uses AI. Check for mistakes.
warning_if_true = NULL)

print(checks)
print(checks_result)
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements should be removed from production code.

Suggested change
print(checks_result)

Copilot uses AI. Check for mistakes.
read_geno_file <- function(file_path, requires = c("GT")){
read_geno_file <- function(file_path, requires = c("GT"), ploidy, check=TRUE) {
if (grepl("\\.csv$", file_path)) {
geno <- read.csv(geno_path, header = TRUE, row.names = 1, check.names = FALSE)
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable 'geno_path' is undefined. It should be 'file_path' based on the function parameter.

Suggested change
geno <- read.csv(geno_path, header = TRUE, row.names = 1, check.names = FALSE)
geno <- read.csv(file_path, header = TRUE, row.names = 1, check.names = FALSE)

Copilot uses AI. Check for mistakes.
# --- CHROM and POS column checks ---
chrom_pos <- do.call(rbind, chrom_pos)
checks["chrom_info"] <- all(chrom_pos[,1] != "." | chrom_pos[,1] != "" | !is.na(chrom_pos[,1]))
checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical error in condition: using OR (|) operators means this will always be TRUE. Should use AND (&) operators to check that values are NOT missing: all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))

Suggested change
checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))
checks["chrom_info"] <- all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))
checks["pos_info"] <- all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Copilot uses AI. Check for mistakes.
# --- CHROM and POS column checks ---
chrom_pos <- do.call(rbind, chrom_pos)
checks["chrom_info"] <- all(chrom_pos[,1] != "." | chrom_pos[,1] != "" | !is.na(chrom_pos[,1]))
checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logical error as above: using OR (|) operators means this will always be TRUE. Should use AND (&) operators: all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Suggested change
checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))
checks["chrom_info"] <- all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))
checks["pos_info"] <- all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Copilot uses AI. Check for mistakes.
@alex-sandercock alex-sandercock merged commit 04e80e2 into main Sep 23, 2025
4 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants