Skip to content

Separating provenance data from answers#1230

Merged
JackWilb merged 8 commits into
devfrom
zc/separateProv
May 22, 2026
Merged

Separating provenance data from answers#1230
JackWilb merged 8 commits into
devfrom
zc/separateProv

Conversation

@ZachCutler04
Copy link
Copy Markdown
Contributor

Give a longer description of what this PR addresses and why it's needed

We decided to separate out the provenance data (which can get quite long) from the answers data as a way to ensure the answers doesnt grow and cause slow downs in studies with increased provenance data.

Added a few tests, but this is a bit difficult to test. I did make sure that its backwards compatible, so anything with old provenance format will still work, with it favoring the new format. Replays still work, which afaik is the only place we actually use provenance. The exported json now will not include provenance data, and a separate button is added to download provenance (similar to the audio). This might make analysis slightly more difficult since wed have to merge in provenance if you actually wanted to analyse it but, by default I think makes sense, to keep json download speeds fast.

@ZachCutler04 ZachCutler04 requested review from JackWilb and Copilot May 18, 2026 06:29
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

🪓 PR closed, deleted preview.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the study runtime + storage layer to keep Trrack provenance graphs out of the per-trial answers payload and instead persist provenance as separate per-participant task assets (similar to audio/screen recordings), improving performance and export/download speed while maintaining backward compatibility for legacy inline provenance.

Changes:

  • Removed inline provenanceGraph from StoredAnswer and added a dedicated StoredProvenance model plus normalization/migration helpers.
  • Added StorageEngine support for saving/loading provenance under a new provenance/ asset namespace, including migration behavior when legacy inline provenance is encountered.
  • Updated replay/analysis UI to fetch provenance via getProvenance(...) (falling back to legacy inline provenance when present) and added/updated tests around provenance splitting and persistence.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/store/types.ts Introduces StoredProvenance and removes provenance from StoredAnswer.
src/store/store.tsx Stops initializing stored answers with inline provenance.
src/store/provenance.ts Adds helpers to normalize/extract/strip legacy inline provenance and split it from answers.
src/store/provenance.spec.ts Unit tests for provenance helper behaviors (null/non-object tolerance and stripping).
src/store/hooks/useNextStep.ts Persists provenance separately via storageEngine.saveProvenance(...) alongside saving answers.
src/store/hooks/useNextStep.spec.tsx Updates mocks/fixtures to accommodate separate provenance persistence.
src/storage/types.ts Updates documentation to reflect provenance being stored separately from answers.
src/storage/tests/highLevel.spec.ts Adds coverage ensuring saveAnswers strips inline provenance and stores it as a provenance asset.
src/storage/engines/utils/participantDataRecovery.spec.ts Updates fixtures for the removed inline provenance field.
src/storage/engines/types.ts Implements saveProvenance/getProvenance, migrates legacy inline provenance in saveAnswers, and includes provenance in copy/delete/snapshot flows.
src/components/audioAnalysis/TaskProvenanceTimeline.tsx Switches the timeline input from answers-derived provenance to an explicit StoredProvenance prop.
src/components/audioAnalysis/AudioProvenanceVis.tsx Loads provenance via storageEngine.getProvenance(...) with legacy fallback and updates all provenance reads accordingly.
src/analysis/individualStudy/thinkAloud/ThinkAloudFooter.tsx Fetches provenance via storage engine for legend building, with legacy fallback.
src/analysis/individualStudy/summary/utils.test.ts Updates fixtures for the removed inline provenance field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JackWilb JackWilb merged commit 3af76fb into dev May 22, 2026
6 of 7 checks passed
@JackWilb JackWilb deleted the zc/separateProv branch May 22, 2026 19:53
JackWilb added a commit that referenced this pull request May 28, 2026
* Refine startup error handling

* Handle startup storage fallback and resume alerts

* Inline Shell startup helpers

* Guard Shell startup participant lookup fallback

* Fix typo in library calvi question description

* Fix UI break when user rejects audio recording

* Add aria-disabled and tab index -1 to mic error icon

* Revert record screen field in library-screen-recording

* Add screen recording icon to timeline

* Address PR comment

* Fix mic icon bug if mic permission is disabled

* Fix back button enabled in the study replay

* Fix replay task bug

* Bump the npm_and_yarn group across 1 directory with 2 updates

Bumps the npm_and_yarn group with 2 updates in the / directory: [uuid](https://github.com/uuidjs/uuid) and [postcss](https://github.com/postcss/postcss).


Updates `uuid` from 11.1.0 to 14.0.0
- [Release notes](https://github.com/uuidjs/uuid/releases)
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md)
- [Commits](uuidjs/uuid@v11.1.0...v14.0.0)

Updates `postcss` from 8.5.6 to 8.5.12
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](postcss/postcss@8.5.6...8.5.12)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 14.0.0
  dependency-type: direct:production
  dependency-group: npm_and_yarn
- dependency-name: postcss
  dependency-version: 8.5.12
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>

* Speed up first load by clearing the hot path from un-necessary awaits

Co-authored-by: Copilot <copilot@github.com>

* Prevent metadata writes in demo mode

* Fix showing app header warning

* Address PR comments

* Refactor GlobalConfigParser to reduce config fetches

* Enhance Shell component with loading overlay and completion check logic

* Add error handling and user feedback for participant completion check in Shell component

* Refactor Shell component to improve loading state handling and simplify rendering logic

* Refactor AuthProvider to enhance loading state handling and conditionally render children based on route match

* Eliminate unnecessary await

* Handle transient completion-status lookup failures to allow study entry

* Fix small issue with operators

* Refactor isStorageStartupFailure function to simplify logic and remove unused parameter

* Fix microphone permission handling in AppHeader component tests

* Fix replay previous-step navigation

* Fix replay bug and remove test fixture

* Remove currentTrial query param plumbing

* Fix dynamic backward navigation

* Refactor tests to preserve dynamic child routes from pathname and improve navigation assertions

* Initial plan

* feat: add component auto-advance timeout options

Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff

* test: refine timeout warning messaging

Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff

* fix: handle auto-advance warning placeholders consistently

Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff

* refactor: simplify NextButton and useNextStep hooks

- Remove unnecessary useMemo calls for trivial config value reads
  in NextButton (nextButtonDisableTime, nextButtonEnableTime,
  nextOnEnter, previousButtonText)
- Remove unnecessary useMemo for modes.dataCollectionEnabled
  destructuring in useNextStep
- Fix consistent-return lint error by unconditionally returning
  cleanup function in nextOnEnter useEffect
- Fix prefer-destructuring lint warning in useNextStep

* fix: correct E2E test setup for timeout component study

- Use correct tab name 'Tests' instead of 'Test Studies'
- Activate the Tests tab before opening the study
- Handle custom studyEndMsg instead of relying on default message
- Simplify study card locator to avoid strict mode violation

* Address feedback from review

* Separating provenance data from answers (#1230)

* moving provenance

* ensuring fallback works

* Persist provenance as separate assets

* Add provenance bulk download export

* Ignore coverage output

* Fix type error in test

---------

Co-authored-by: Jack Wilburn <jackwilburn@tutanota.com>

* Fix useNextStep skip evaluation typing

* Fix Shell completion check error visibility

* Fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jay Kim <76601570+yeonkim1213@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ZachCutler04 <zach.t.cutler@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants