Separating provenance data from answers#1230
Merged
Merged
Conversation
|
🪓 PR closed, deleted preview. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the study runtime + storage layer to keep Trrack provenance graphs out of the per-trial answers payload and instead persist provenance as separate per-participant task assets (similar to audio/screen recordings), improving performance and export/download speed while maintaining backward compatibility for legacy inline provenance.
Changes:
- Removed inline
provenanceGraphfromStoredAnswerand added a dedicatedStoredProvenancemodel plus normalization/migration helpers. - Added
StorageEnginesupport for saving/loading provenance under a newprovenance/asset namespace, including migration behavior when legacy inline provenance is encountered. - Updated replay/analysis UI to fetch provenance via
getProvenance(...)(falling back to legacy inline provenance when present) and added/updated tests around provenance splitting and persistence.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/store/types.ts | Introduces StoredProvenance and removes provenance from StoredAnswer. |
| src/store/store.tsx | Stops initializing stored answers with inline provenance. |
| src/store/provenance.ts | Adds helpers to normalize/extract/strip legacy inline provenance and split it from answers. |
| src/store/provenance.spec.ts | Unit tests for provenance helper behaviors (null/non-object tolerance and stripping). |
| src/store/hooks/useNextStep.ts | Persists provenance separately via storageEngine.saveProvenance(...) alongside saving answers. |
| src/store/hooks/useNextStep.spec.tsx | Updates mocks/fixtures to accommodate separate provenance persistence. |
| src/storage/types.ts | Updates documentation to reflect provenance being stored separately from answers. |
| src/storage/tests/highLevel.spec.ts | Adds coverage ensuring saveAnswers strips inline provenance and stores it as a provenance asset. |
| src/storage/engines/utils/participantDataRecovery.spec.ts | Updates fixtures for the removed inline provenance field. |
| src/storage/engines/types.ts | Implements saveProvenance/getProvenance, migrates legacy inline provenance in saveAnswers, and includes provenance in copy/delete/snapshot flows. |
| src/components/audioAnalysis/TaskProvenanceTimeline.tsx | Switches the timeline input from answers-derived provenance to an explicit StoredProvenance prop. |
| src/components/audioAnalysis/AudioProvenanceVis.tsx | Loads provenance via storageEngine.getProvenance(...) with legacy fallback and updates all provenance reads accordingly. |
| src/analysis/individualStudy/thinkAloud/ThinkAloudFooter.tsx | Fetches provenance via storage engine for legend building, with legacy fallback. |
| src/analysis/individualStudy/summary/utils.test.ts | Updates fixtures for the removed inline provenance field. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6 tasks
JackWilb
approved these changes
May 22, 2026
JackWilb
added a commit
that referenced
this pull request
May 28, 2026
* Refine startup error handling * Handle startup storage fallback and resume alerts * Inline Shell startup helpers * Guard Shell startup participant lookup fallback * Fix typo in library calvi question description * Fix UI break when user rejects audio recording * Add aria-disabled and tab index -1 to mic error icon * Revert record screen field in library-screen-recording * Add screen recording icon to timeline * Address PR comment * Fix mic icon bug if mic permission is disabled * Fix back button enabled in the study replay * Fix replay task bug * Bump the npm_and_yarn group across 1 directory with 2 updates Bumps the npm_and_yarn group with 2 updates in the / directory: [uuid](https://github.com/uuidjs/uuid) and [postcss](https://github.com/postcss/postcss). Updates `uuid` from 11.1.0 to 14.0.0 - [Release notes](https://github.com/uuidjs/uuid/releases) - [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md) - [Commits](uuidjs/uuid@v11.1.0...v14.0.0) Updates `postcss` from 8.5.6 to 8.5.12 - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](postcss/postcss@8.5.6...8.5.12) --- updated-dependencies: - dependency-name: uuid dependency-version: 14.0.0 dependency-type: direct:production dependency-group: npm_and_yarn - dependency-name: postcss dependency-version: 8.5.12 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> * Speed up first load by clearing the hot path from un-necessary awaits Co-authored-by: Copilot <copilot@github.com> * Prevent metadata writes in demo mode * Fix showing app header warning * Address PR comments * Refactor GlobalConfigParser to reduce config fetches * Enhance Shell component with loading overlay and completion check logic * Add error handling and user feedback for participant completion check in Shell component * Refactor Shell component to improve loading state handling and simplify rendering logic * Refactor AuthProvider to enhance loading state handling and conditionally render children based on route match * Eliminate unnecessary await * Handle transient completion-status lookup failures to allow study entry * Fix small issue with operators * Refactor isStorageStartupFailure function to simplify logic and remove unused parameter * Fix microphone permission handling in AppHeader component tests * Fix replay previous-step navigation * Fix replay bug and remove test fixture * Remove currentTrial query param plumbing * Fix dynamic backward navigation * Refactor tests to preserve dynamic child routes from pathname and improve navigation assertions * Initial plan * feat: add component auto-advance timeout options Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff * test: refine timeout warning messaging Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff * fix: handle auto-advance warning placeholders consistently Agent-Logs-Url: https://github.com/revisit-studies/study/sessions/85bdb0ee-0034-45cf-9535-f19f250395ff * refactor: simplify NextButton and useNextStep hooks - Remove unnecessary useMemo calls for trivial config value reads in NextButton (nextButtonDisableTime, nextButtonEnableTime, nextOnEnter, previousButtonText) - Remove unnecessary useMemo for modes.dataCollectionEnabled destructuring in useNextStep - Fix consistent-return lint error by unconditionally returning cleanup function in nextOnEnter useEffect - Fix prefer-destructuring lint warning in useNextStep * fix: correct E2E test setup for timeout component study - Use correct tab name 'Tests' instead of 'Test Studies' - Activate the Tests tab before opening the study - Handle custom studyEndMsg instead of relying on default message - Simplify study card locator to avoid strict mode violation * Address feedback from review * Separating provenance data from answers (#1230) * moving provenance * ensuring fallback works * Persist provenance as separate assets * Add provenance bulk download export * Ignore coverage output * Fix type error in test --------- Co-authored-by: Jack Wilburn <jackwilburn@tutanota.com> * Fix useNextStep skip evaluation typing * Fix Shell completion check error visibility * Fix test --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jay Kim <76601570+yeonkim1213@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ZachCutler04 <zach.t.cutler@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Give a longer description of what this PR addresses and why it's needed
We decided to separate out the provenance data (which can get quite long) from the answers data as a way to ensure the answers doesnt grow and cause slow downs in studies with increased provenance data.
Added a few tests, but this is a bit difficult to test. I did make sure that its backwards compatible, so anything with old provenance format will still work, with it favoring the new format. Replays still work, which afaik is the only place we actually use provenance. The exported json now will not include provenance data, and a separate button is added to download provenance (similar to the audio). This might make analysis slightly more difficult since wed have to merge in provenance if you actually wanted to analyse it but, by default I think makes sense, to keep json download speeds fast.