Skip to content

Conversation

@bobbai00
Copy link
Contributor

@bobbai00 bobbai00 commented Sep 22, 2025

Overview

This PR introduces a set of example workflows and datasets, as well as the dockerfile for building an image to load these data. This image is being used to load example data since April 2025 but is separately managed in another repo. Adding it to the main repo can make it easier to manage, and we can later curate more example datasets and workflows.

Changes

  • Added texera-example-data-loader.dockerfile
  • Included 2 datasets and 2 workflows
  • Included setup scripts for example datasets and workflows

@bobbai00 bobbai00 self-assigned this Sep 22, 2025
@bobbai00 bobbai00 changed the title feat(deployment): add the image of loading example datasets and workflows feat(deployment): add example datasets, workflows and the dockerfile for the loader image Sep 22, 2025
@bobbai00 bobbai00 requested a review from aicam September 22, 2025 06:42
@bobbai00 bobbai00 force-pushed the feat/texera-examples-loader branch from b543591 to 47f6ad7 Compare September 22, 2025 06:44
@chenlica
Copy link
Contributor

@kunwp1 Can you also review this PR?

@bobbai00 bobbai00 force-pushed the feat/texera-examples-loader branch from 47f6ad7 to df90d03 Compare September 22, 2025 21:04
Copy link
Contributor

@kunwp1 kunwp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbai00 I reviewed your PR and left some comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we keep two example workflows as hard-coded JSON in the repo. If the workflow JSON spec changes, these files can silently break. To avoid regressions, could we adopt one of these approaches?

  1. Generate on setup: Dynamically create the two example JSONs from the current schema or a small generator during the setup script.
  2. Enforce consistency in CI: Add a guard that fails when the JSON spec (or template) changes without updating the example workflows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add brief docs for each setup script: what it does and when to run it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants