Skip to content

feature: runWithOutputs()#185

Closed
andersone1 wants to merge 7 commits intomainfrom
outputs-various
Closed

feature: runWithOutputs()#185
andersone1 wants to merge 7 commits intomainfrom
outputs-various

Conversation

@andersone1
Copy link
Collaborator

@andersone1 andersone1 commented Dec 18, 2025

Feature: Track script side effects with runWithOutputs()

Summary

This PR introduces runWithOutputs(), a utility designed to execute R scripts in isolation while monitoring the file system for changes. This is particularly useful for auditing data pipelines, ensuring that scripts are producing the expected outputs, and identifying unintended side-effects.

Changes

🚀 New Functionality

  • runWithOutputs(): Executes an R script via callr::rscript and identifies created or modified files by comparing file system snapshots (path, size, and modification time) before and after execution.
  • Isolated Execution: Uses a background R process to ensure the script environment does not interfere with the global environment.
  • Automatic Reporting: Prints a clean, YAML-formatted summary of changed files to the console using cli.

🛠 Infrastructure & Dependencies

  • New Dependencies: Added callr, here, yaml, and purrr to Imports.

  • Package Documentation:

  • Added a new "Execution Tools" category to the pkgdown site.

  • Exported the function in NAMESPACE.

  • Global Variables: Updated R/reviewPackage.R to include rel_path, modification_time, and size to pass R CMD check.

🧪 Testing

Added a robust test suite (tests/testthat/test-runWithOutputs.R) covering:

  • Creation of new files and modification of existing files.
  • Detection of "touched" files (timestamp changes with identical content).
  • Correct behavior of exclude_dirs (e.g., ignoring .git or renv changes).
  • Graceful error handling when a script fails.

Example Usage

# Run a script and capture the relative paths of generated files
outputs <- runWithOutputs("scripts/process_data.R")

# Example Console Output:
# ── runWithOutputs('scripts/process_data.R') ─────────────────── START
# [Script output here...]
# ── runWithOutputs() ──────────────────────────────────────── COMPLETE
# ✔ Files saved by this run:
# outputs:
#   - data/processed_results.csv
#   - plots/diagnostic_plot.png

Technical Notes

  • Comparison Logic: The function uses dplyr::anti_join on a composite key of (path, modification_time, size). This ensures that even if a file's size remains identical, a change in modification time (a "touch") is still captured.
  • Path Handling: All paths are resolved relative to the project root (defaulting to here::here()) for consistency across different environments.

@andersone1 andersone1 changed the title Outputs various feature: runWithOutputs() Dec 18, 2025
@seth127 seth127 marked this pull request as draft December 18, 2025 22:17
@andersone1 andersone1 closed this Jan 21, 2026
@andersone1 andersone1 deleted the outputs-various branch January 21, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant