Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix postgres_cdc input #3075

Merged
merged 8 commits into from
Dec 13, 2024
Merged

Fix postgres_cdc input #3075

merged 8 commits into from
Dec 13, 2024

Conversation

mihaitodor
Copy link
Collaborator

@mihaitodor mihaitodor commented Dec 12, 2024

Allow quoted identifiers for the table names.

Note: This is a very naive implementation and we might want to instead just quote the identifier like @rockwotj suggested (and implemented here for Snowflake).

If we always quote, then it's a bit unclear how to validate the identifiers because we can't tell when it genuinely needs to be quoted to apply the alternate validation rules: https://www.postgresql.org/docs/current/sql-syntax-lexical.html

Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.) This allows constructing table or column names that would otherwise not be possible, such as ones containing spaces or ampersands. The length limitation still applies.

Also, users might get confused if they specify the table name as upper case in the config for a non-quoted table:

> CREATE TABLE MIHAI(ID INT NOT NULL, CATEGORY VARCHAR);
> INSERT INTO "MIHAI"(ID, CATEGORY) VALUES(13, 'FOOBAR');
relation "MIHAI" does not exist
LINE 1: INSERT INTO "MIHAI"(ID, CATEGORY) VALUES(13, 'FOOBAR')

How to test

Run Postgres and connect to it via pgcli:

$ docker run --rm -it -e POSTGRES_USER=testuser -e "POSTGRES_PASSWORD=testpass" -e POSTGRES_DB=testdb -p5432:5432 postgres -c wal_level=logical
$ pgcli postgres://testuser:testpass@localhost:5432/testdb

Create a quoted table:

> CREATE SCHEMA "TestABC";
> CREATE TABLE "TestABC"."FooBar"(ID INT NOT NULL, CATEGORY VARCHAR);
> CREATE TABLE "TestABC"."BarFoo"(ID INT NOT NULL, CATEGORY VARCHAR);

Run Connect:

input:
  postgres_cdc:
    dsn: postgres://testuser:testpass@localhost:5432/testdb?sslmode=disable
    include_transaction_markers: true
    stream_snapshot: false
    snapshot_memory_safety_factor: 1
    snapshot_batch_size: 0
    schema: public # No default (required)
    tables:
      - '"FooBar"'
      - '"BarFoo"'
    checkpoint_limit: 1024
    temporary_slot: false
    slot_name: "mihaifoo" # No default (optional)
    pg_standby_timeout: 10s
    pg_wal_monitor_interval: 3s
    max_parallel_snapshot_tables: 1
    auto_replay_nacks: true

output:
  stdout: {}

Insert rows:

> INSERT INTO "TestABC"."FooBar"(ID, CATEGORY) VALUES(13, 'FOOBAR');
> INSERT INTO "TestABC"."BarFoo"(ID, CATEGORY) VALUES(13, 'FOOBAR');

Comment out the "BarFoo" table in the Connect config, restart it and insert another row in "FooBar".

@mihaitodor mihaitodor requested a review from rockwotj December 12, 2024 19:02
@mihaitodor mihaitodor force-pushed the mihaitodor-fix-postgres-cdc branch 2 times, most recently from c8148a6 to 22076dc Compare December 13, 2024 02:35
Allow quoted identifiers for the table names

Signed-off-by: Mihai Todor <[email protected]>
@mihaitodor mihaitodor force-pushed the mihaitodor-fix-postgres-cdc branch from 22076dc to ec0d162 Compare December 13, 2024 02:37
@rockwotj rockwotj force-pushed the mihaitodor-fix-postgres-cdc branch from 99f61fc to d8f02c4 Compare December 13, 2024 09:52
There is currently a mess of "what does this string mean?", which means
it's time to introduce some typesafety to this problem.

TableFQN is a Schema+Table pair that is prevalidated to not have SQL
injection opportunities and we can pass these around to make things a
bit more clear as well as ensure we're handling quoted stuff correctly.
@rockwotj rockwotj force-pushed the mihaitodor-fix-postgres-cdc branch from d8f02c4 to 20e90bf Compare December 13, 2024 10:00
@rockwotj
Copy link
Collaborator

Okay updates here:

  • Added a new type to clarify we always are working with table FQN (with schema)
  • We only validate identifiers once - at the package entrypoint
  • We validate quoted identifiers

@rockwotj
Copy link
Collaborator

I also updated one of our integration tests to use a PascalCase name to codify the test that Mihai wrote in the PR.

thank you @mihaitodor for this! If you have time today can you double check my commits look OK? If so please merge.

@rockwotj rockwotj requested a review from ligfx December 13, 2024 10:02
@rockwotj rockwotj force-pushed the mihaitodor-fix-postgres-cdc branch from b654c6d to 9b77304 Compare December 13, 2024 10:10
@rockwotj
Copy link
Collaborator

Technically this also opens us up to allowing a single input with different schemas - not sure if you see that as needed @ligfx

Signed-off-by: Mihai Todor <[email protected]>
Signed-off-by: Mihai Todor <[email protected]>
Copy link

@ligfx ligfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and works for me locally. Love the normalization of passed in identifiers.

@@ -363,28 +363,72 @@ func SQLQuery(sql string, args ...any) (string, error) {
return query.Sanitize(args...)
}

// ValidatePostgresIdentifier checks if a string is a valid PostgreSQL identifier
// QuotePostgresIdentifier returns the valid escaped identifier.
func QuotePostgresIdentifier(name string) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return `"` + name.Replace(`"`, `""`) + `"`

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's simpler. This happens to be faster, but this isn't on a critical path. I'll fix it next time I hack on this input

@rockwotj rockwotj force-pushed the mihaitodor-fix-postgres-cdc branch from 90218b1 to 9e48cc7 Compare December 13, 2024 16:21
@rockwotj rockwotj force-pushed the mihaitodor-fix-postgres-cdc branch from e15bfa5 to 5606ca9 Compare December 13, 2024 16:53
@rockwotj rockwotj merged commit 881e6b2 into main Dec 13, 2024
4 checks passed
@rockwotj rockwotj deleted the mihaitodor-fix-postgres-cdc branch December 13, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants