Skip to content

write_derived: skip metadata regeneration unless csv/spec-list actually change#255

Open
andersone1 wants to merge 38 commits intomainfrom
check-if-need-update
Open

write_derived: skip metadata regeneration unless csv/spec-list actually change#255
andersone1 wants to merge 38 commits intomainfrom
check-if-need-update

Conversation

@andersone1
Copy link
Collaborator

@andersone1 andersone1 commented Feb 13, 2026

📝 Summary

  • Skip unnecessary metadata regeneration: xpt and define documents are now only regenerated when the .csv or spec-list content actually changes (md5-based check), avoiding noisy version-control diffs from embedded timestamps.
  • Add spec diffing: New execute_spec_diffs() function detects added, removed, and updated spec variables/fields alongside the existing data diffs.
  • Redesigned diff summary output: Replaces the old diffs.csv artifact with a human-readable diff-summary.txt that includes SVN author/revision/date metadata, structured into Data Changes and Spec Changes sections.
  • Improved diff formatting: Data diffs now report absolute counts with parenthetical deltas (e.g., 1020 (+5)) instead of descriptive strings. Variable-level diffs are also separated from standard summary rows (Rows, Columns, Subjects).
  • Legacy cleanup: Automatically detects and removes old-style metadata folders containing diffs.csv before regenerating.
  • Removed dead code: Deleted get_base_df(), diffdf_value_changes_to_df(), get_sdtm_lookup(), and the bundled sdtm-lookup.yaml data file.
  • Dependency cleanup: Replaced stringr with tidyselect for the comma check (now uses grepl with fixed = TRUE for speed); added fs as a dependency.

📂 New Files

File Purpose
R/get-svn-baseline.R Replaces get_base_df() — retrieves baseline from SVN with author/rev/date metadata, supports arbitrary file readers.
R/execute-spec-diffs.R Compares two spec-list objects and summarizes changes.
R/build-run-summary-lines.R Formats the structured diff summary for console and file output.

🧪 Test Coverage

  • Added: test-write-derived-scenarios.R (693 lines) covering end-to-end scenarios for write_derived().
  • Added: Tests for get_svn_baseline(), execute_spec_diffs(), and build_run_summary_lines().
  • Removed: Obsolete tests for deleted functions (get_base_df, diffdf_value_changes_to_df).
  • Net change: +1,201 / -119 lines in tests.

…ly change

  - Always write the output CSV and `spec-list.yml`.
  - Determine `.needs_update` from old vs new MD5 of:
    - `.file` (CSV)
    - `<meta>/spec-list.yml`

  `.needs_update` is TRUE in these cases:
  1. CSV did not exist before this call.
  2. `spec-list.yml` did not exist before this call.
  3. CSV content changed.
  4. `spec-list.yml` content changed.

  When `.needs_update` is TRUE:
  - regenerate XPT
  - rerender define output
  - run diff generation only if `base_df` exists and `.execute_diffs` is TRUE
  - write `diffs.csv` only if generated diffs are non-empty (no delete/overwrite when empty)
  - as a result, `diffs.csv` reflects the most recent run that produced at least one diff

  When `.needs_update` is FALSE:
  - skip XPT and define regeneration
  - skip diff generation (even if `.prev_file` changes)
  - keep any existing `diffs.csv` unchanged (no delete/overwrite)

  tests:
  - assert `diffs.csv` is not created when no diffs are found
  - assert XPT is not rewritten when rerunning with unchanged CSV/spec-list
  - assert existing `diffs.csv` is preserved (mtime unchanged) when no new diffs are generated
  - assert diff generation is skipped when `.needs_update` is FALSE
…leanup

  - write human-readable latest-data-diff.txt (timestamp/user header + same diff rows)
  - keep skip-if-unchanged behavior intact
  - if metadata contains legacy diffs.csv, show clear cli warning and recreate folder
  - update docs and tests for new output + backward-compatibility migration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant