Development by alex-sandercock · Pull Request #145 · Breeding-Insight/BIGapp

alex-sandercock · 2025-08-25T20:21:55Z

This pull request introduces a major update to the Docker build and deployment workflow for the BIGapp project, refactors the Dockerfile setup for improved efficiency and multi-architecture support, and adds new VCF sanity checking functionality to the R Shiny application. It also updates the app version and makes several improvements to the genotype matrix formatting and filtering logic.

Docker build and deployment improvements:

Added a new GitHub Actions workflow (.github/workflows/dockerhub-on-version.yml) to automatically build and push multi-architecture Docker images (amd64 and arm64) to DockerHub when the package version changes, including manifest creation for multi-arch support.
Refactored the main Dockerfile to use a dependency image (bigapp-deps), improving build caching and separating base dependencies from app-specific installation. Runtime improvements include health checks and proper user setup.
Introduced Dockerfile.deps to build and cache R/CRAN/Bioc dependencies efficiently, including fallback logic for missing packages and version snapshotting for auditability.

VCF sanity checking and error handling:

Added new exported functions (vcf_sanity_check, vcf_sanity_messages) and integrated VCF sanity checks into both the DosageCall and Filtering modules, providing user feedback and halting analysis if checks fail. [1] [2] [3]

Genotype matrix and relationship matrix logic:

Updated get_relationship_mat to use the Gmatrix function with explicit parameters (ploidy.correction, ratio, missingValue) for more robust polyploid support and consistency. [1] [2]
Improved genotype matrix formatting in format_geno_matrix to restrict rrBLUP codification to diploid and Gmatrix scenarios only.

User interface and filtering enhancements:

Updated the app UI to display the dynamic package version using utils::packageVersion.
Enhanced the filtering module to include a placeholder for download UI, better error messages for malformed VCF files, and improved sample/marker tracking during filtering. [1] [2] [3] [4]

Version and dependency updates:

Bumped the package version to 1.5.1 in DESCRIPTION, and removed the BIGr remote from Imports (now handled via Docker dependencies). [1] [2]

Let me know if you want to dive deeper into any of these areas or need help understanding how the new Docker workflow or VCF sanity checks work!

avoid if length > 1

Development

…t/BIGapp into vcf_sanity_check

Add vcf sanity check - issue #32

Switch to Gmatrix

Vcf sanity check

Copilot

Pull Request Overview

This pull request implements a comprehensive update to the BIGapp project with new VCF file validation capabilities, Docker infrastructure improvements, and enhanced genotype matrix processing. The update introduces VCF sanity checking across all modules that process genomic data and modernizes the build/deployment workflow with multi-architecture Docker support.

Key changes:

Added comprehensive VCF sanity checking functionality with exported functions for validation and user feedback
Refactored Docker build system with dependency image separation and multi-architecture support via GitHub Actions
Enhanced genotype matrix processing to use explicit Gmatrix parameters for improved polyploid support
Updated app version display to use dynamic package versioning and added citation information across help files

Reviewed Changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
R/vcf_sanity_check.R	New comprehensive VCF validation functions with checks for headers, columns, format consistency, and data integrity
tests/testthat/test-vcf_sanity_check.R	Test coverage for VCF sanity checking functionality
R/utils.R	Added VCF compression detection utility and integrated sanity checks into read_geno_file function
R/GS_functions.R	Updated genotype matrix formatting and relationship matrix generation with explicit Gmatrix parameters
Dockerfile/Dockerfile.deps	Modernized Docker build with dependency separation and multi-architecture support
.github/workflows/dockerhub-on-version.yml	Automated Docker build and push workflow triggered by version changes
R/mod_*.R	Integration of VCF sanity checks across all genomic analysis modules

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-25T20:23:44Z

R/vcf_sanity_check.R

+    }
+    gzfile(vcf_path, open = "rt") 
+  } else if(is_gz == "bzip2 (.bz2)") {
+    if (verbose) warning("File is compressed th bzip2 (.bz2), which is not supported.")


Typo in warning message: 'th' should be 'with'.

Suggested change

if (verbose) warning("File is compressed th bzip2 (.bz2), which is not supported.")

if (verbose) warning("File is compressed with bzip2 (.bz2), which is not supported.")

Copilot · 2025-08-25T20:23:44Z

R/vcf_sanity_check.R

+  if(is.null(checks$checks)){
+    shinyalert(
+      title = "File Error",
+      text = checks$message,


The function accesses checks$message in line 437, but this property doesn't exist in the vcf_sanity_check object structure (which has checks, messages, duplicates, ploidy_max). This should be checks$messages.

Suggested change

text = checks$message,

text = checks$messages,

Copilot · 2025-08-25T20:23:45Z

R/vcf_sanity_check.R

+  if (!inherits(x, "vcf_sanity_check")) stop("Object must be of class 'vcf_sanity_check'.")
+
+  if(is.null(x$checks)) {
+    warning(paste("Checks could not be performed. Because:", x$message))


Similar to the previous issue, x$message should be x$messages since the vcf_sanity_check object structure uses 'messages' not 'message'.

Suggested change

warning(paste("Checks could not be performed. Because:", x$message))

warning(paste("Checks could not be performed. Because:", x$messages))

Copilot · 2025-08-25T20:23:45Z

R/mod_Filtering.R

+                                         warning_if_true = NULL)
+
+    print(checks)
+    print(checks_result)


Debug print statements should be removed from production code.

Suggested change

print(checks_result)

Copilot · 2025-08-25T20:23:45Z

R/mod_Filtering.R

+                                         warning_if_true = NULL)
+
+    print(checks)
+    print(checks_result)


Debug print statements should be removed from production code.

Suggested change

print(checks_result)

Copilot · 2025-08-25T20:23:45Z

R/utils.R

-read_geno_file <- function(file_path, requires = c("GT")){
+read_geno_file <- function(file_path, requires = c("GT"), ploidy, check=TRUE) {
  if (grepl("\\.csv$", file_path)) {
    geno <- read.csv(geno_path, header = TRUE, row.names = 1, check.names = FALSE)


Variable 'geno_path' is undefined. It should be 'file_path' based on the function parameter.

Suggested change

geno <- read.csv(geno_path, header = TRUE, row.names = 1, check.names = FALSE)

geno <- read.csv(file_path, header = TRUE, row.names = 1, check.names = FALSE)

Copilot · 2025-08-25T20:23:46Z

R/vcf_sanity_check.R

+    # --- CHROM and POS column checks ---
+    chrom_pos <- do.call(rbind, chrom_pos)
+    checks["chrom_info"] <- all(chrom_pos[,1] != "." | chrom_pos[,1] != "" | !is.na(chrom_pos[,1]))
+    checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))


Logical error in condition: using OR (|) operators means this will always be TRUE. Should use AND (&) operators to check that values are NOT missing: all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))

Suggested change

checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))

checks["chrom_info"] <- all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))

checks["pos_info"] <- all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Copilot · 2025-08-25T20:23:46Z

R/vcf_sanity_check.R

+    # --- CHROM and POS column checks ---
+    chrom_pos <- do.call(rbind, chrom_pos)
+    checks["chrom_info"] <- all(chrom_pos[,1] != "." | chrom_pos[,1] != "" | !is.na(chrom_pos[,1]))
+    checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))


Same logical error as above: using OR (|) operators means this will always be TRUE. Should use AND (&) operators: all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Suggested change

checks["pos_info"] <- all(chrom_pos[,2] != "." | chrom_pos[,2] != "" | !is.na(chrom_pos[,2]))

checks["chrom_info"] <- all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))

checks["pos_info"] <- all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

polyrad_bugfix

Cristianetaniguti and others added 30 commits April 23, 2025 15:24

#32 in progress

1d21ee3

in progress

0e99c89

avoid if length > 1

198913c

add error

6c44f19

Merge pull request #138 from Breeding-Insight/noUpdog_fix

8a8da90

avoid if length > 1

Merge pull request #139 from Breeding-Insight/development

7100112

Development

Merge branch 'vcf_sanity_check' of https://github.com/Breeding-Insigh…

e6a3047

…t/BIGapp into vcf_sanity_check

check FORMAT consistancy + tests

beef565

GS ok #32

2d383ae

GWAS ok #32

1534267

Diversity ok #32

c2b39de

PCA tab ok #32

b2f015d

Dosage tab ok #32

4144a97

GSAcc tab ok #32

88db67c

DAPC tab ok #32

eb5c568

fix check

7d62494

Switch to Gmatrix

f012389

#32 adding check for un/compressed files

b86b9e3

change to point error

de5498f

removed ref/alt warnings

25ca4b8

updated version

520044a

Merge pull request #140 from Breeding-Insight/vcf_sanity_check

28315c4

Add vcf sanity check - issue #32

Merge branch 'development' into Gmatrix

bf605d5

updated test

48b9246

Merge pull request #141 from Breeding-Insight/Gmatrix

09dc6d0

Switch to Gmatrix

issue #32 complete

54de06e

vcf check never breaks + viewpoly output

8890234

update Dockerfile

d4e41ca

version 1.3.1

52db70b

docker fast build

8a6cb41

Cristianetaniguti and others added 19 commits August 15, 2025 17:03

add vcf_sanity_check to filter tab + export PCA tab

1fa8481

max markers as warning

0737930

error if doesnt have .gz but is compressed

71c5823

up version + BIGr from CRAN

2ad082f

run arm64 #92

a925296

fix format #92

f91617f

sintax fix #92

a303e3d

change schedule for test #92

b3157e4

use a native ARM64 runner #92

3f04519

add BIG citation to help files

e546fa9

add i

9f1a10d

deps manifest okay

bb6e208

adjust version on footer

5abfd30

update manifest decl.

21b00d9

up version

901b813

update images

ef0b54d

option to save removed sample list

0549b03

Merge branch 'development' into vcf_sanity_check

30527fa

Merge pull request #143 from Breeding-Insight/vcf_sanity_check

91f9b9a

Vcf sanity check

alex-sandercock requested review from Cristianetaniguti and Copilot August 25, 2025 20:22

Copilot AI reviewed Aug 25, 2025

View reviewed changes

Cristianetaniguti added 2 commits September 5, 2025 14:48

polyrad_bugfix

1ad81b3

bugfix pipe search

a6e0ee4

Cristianetaniguti approved these changes Sep 9, 2025

View reviewed changes

alex-sandercock and others added 4 commits September 9, 2025 13:58

retain RefAlt info fix

3ed5535

Merge branch 'main' into development

d6a4fce

Merge pull request #147 from Breeding-Insight/polyrad_bugfix

ea5b19e

polyrad_bugfix

fix check

46788c9

alex-sandercock merged commit 04e80e2 into main Sep 23, 2025
4 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development#145

Development#145
alex-sandercock merged 59 commits intomainfrom
development

alex-sandercock commented Aug 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Copilot AI Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (verbose) warning("File is compressed th bzip2 (.bz2), which is not supported.")
	if (verbose) warning("File is compressed with bzip2 (.bz2), which is not supported.")

	warning(paste("Checks could not be performed. Because:", x$message))
	warning(paste("Checks could not be performed. Because:", x$messages))

	geno <- read.csv(geno_path, header = TRUE, row.names = 1, check.names = FALSE)
	geno <- read.csv(file_path, header = TRUE, row.names = 1, check.names = FALSE)

	checks["pos_info"] <- all(chrom_pos[,2] != "." \| chrom_pos[,2] != "" \| !is.na(chrom_pos[,2]))
	checks["chrom_info"] <- all(chrom_pos[,1] != "." & chrom_pos[,1] != "" & !is.na(chrom_pos[,1]))
	checks["pos_info"] <- all(chrom_pos[,2] != "." & chrom_pos[,2] != "" & !is.na(chrom_pos[,2]))

Conversation

alex-sandercock commented Aug 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants