Skip to content

chore: shared pipeline under same catalog with compatibility #6143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 30, 2025

Conversation

shuiyisong
Copy link
Contributor

@shuiyisong shuiyisong commented May 21, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

As mentioned here, remove the schema in the pipeline information and cache key, so that all databases under the same catalog can access the shared pipelines.

Note: This is a breaking change. If the pipeline is already in use, check if pipelines have the same names but different schemas before upgrading.

Allow shared pipeline under the same catalog.

  1. The newly created pipeline would be labeled under an empty string schema. Note, this would include the 'update' process, which means only the empty string schema is getting 'updated' or created.
  2. The query of the pipeline would try to be as compatible as possible

cache

  1. Try to find the pipeline labeled empty string schema, return if found.
  2. Try to find the pipeline with the current schema, return if found.
  3. Search all pipelines with the name and version suffix: if exactly one is found, return; else, throw an error.

database

  1. Find all pipelines with the given name and version
  2. If only one result is found, use the result
  3. Check if there's an empty string schema result, use it if found
  4. Check if there's a result labeled with the current schema, use it if found
  5. Throw an error for multiple schema pipelines are found, but none of them is under the empty string schema and the current schema

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@shuiyisong shuiyisong requested a review from a team as a code owner May 21, 2025 06:53
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 21, 2025
@killme2008
Copy link
Contributor

Note: This is a breaking change. If the pipeline is already in use, check if pipelines have the same names but different schemas before upgrading.

I am afraid some users already use it in this way. The breaking may not be acceptable.

@sunng87
Copy link
Member

sunng87 commented May 21, 2025

Is it possible to do it in a semi-compatible way:

  • Always find pipeline by name
    • if there are pipelines with same name from different schema, use the pipeline of current schema
    • otherwise use the only pipeline returned (this is the breaking behaviour)
  • Do not store schema for newly created pipeline so there won't be duplicated pipelines
  • When updated existed pipeline, keep the schema name

@shuiyisong
Copy link
Contributor Author

shuiyisong commented May 21, 2025

Is it possible to do it in a semi-compatible way:

* Always find pipeline by name
  
  * if there are pipelines with same name from different schema, use the pipeline of current schema
  * otherwise use the only pipeline returned (this is the breaking behaviour)

* Do not store schema for newly created pipeline so there won't be duplicated pipelines

* When updated existed pipeline, keep the schema name

There is one obscure case. If there are multiple pipelines with the same name but none of them is under the current schema.

This should be rare, we can throw an error if it occurs. Is this acceptable?

@sunng87
Copy link
Member

sunng87 commented May 21, 2025

This should be rare, we can throw an error if it occurs.

Sounds good.

@shuiyisong shuiyisong force-pushed the chore/remove_schema_in_pipeline branch from b8bd0bf to a71ee54 Compare May 22, 2025 06:36
@shuiyisong shuiyisong changed the title chore!: remove schema in pipeline info chore: shared pipeline under same catalog with compatibility May 22, 2025
@shuiyisong
Copy link
Contributor Author

@sunng87 @killme2008 PTAL

@shuiyisong shuiyisong mentioned this pull request May 19, 2025
10 tasks
@killme2008
Copy link
Contributor

I wonder that the cloud catalog doesn't contain a public schema; how do we process it? @shuiyisong

@shuiyisong
Copy link
Contributor Author

I wonder that the cloud catalog doesn't contain a public schema; how do we process it? @shuiyisong

This doesn't check the actual existence of the schema, which in this case is public. It's merely a string field in the greptime_private.pipelines.

@shuiyisong shuiyisong requested a review from sunng87 May 23, 2025 07:53
@paomian
Copy link
Contributor

paomian commented May 27, 2025

Rest LGTM

@sunng87 sunng87 enabled auto-merge May 28, 2025 07:18
@sunng87 sunng87 added this pull request to the merge queue May 30, 2025
Merged via the queue into GreptimeTeam:main with commit af6cf99 May 30, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants