Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-redshift] Can't read schema if schema contains late-binding views #48832

Open
1 task
JakeCowton opened this issue Dec 6, 2024 · 2 comments
Open
1 task

Comments

@JakeCowton
Copy link

Connector Name

source-redshift

Connector Version

0.5.2

What step the error happened?

Configuring a new connector

Relevant information

If a Redshift source contains a late-binding view, when building a connection using that source, it throws a null pointer exception when scanning the schema for streams.

If you select only schemas that don't contain late-binding views, then it works fine.

Relevant log output

2024-12-06 13:58:20 platform > Checking if airbyte/source-redshift:0.5.2 exists...
2024-12-06 13:58:20 platform > airbyte/source-redshift:0.5.2 was found locally.
2024-12-06 13:58:20 platform > Creating docker container = source-redshift-discover-b0eaaae5-78fd-4501-bd51-12414cb30fca-0-kdxgm with resources io.airbyte.config.ResourceRequirements@4a11f47[cpuRequest=,cpuLimit=8,memoryRequest=,memoryLimit=60000M,additionalProperties={}] and allowedHosts null
2024-12-06 13:58:20 platform > Preparing command: docker run --rm --init -i -w /data/b0eaaae5-78fd-4501-bd51-12414cb30fca/0 --log-driver none --name source-redshift-discover-b0eaaae5-78fd-4501-bd51-12414cb30fca-0-kdxgm --network host -v airbyte_workspace:/data -v oss_local_root:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-redshift:0.5.2 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e AIRBYTE_ROLE=dev -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.63.13 -e WORKER_JOB_ID=b0eaaae5-78fd-4501-bd51-12414cb30fca --cpus=8 --memory=60000M airbyte/source-redshift:0.5.2 discover --config source_config.json
2024-12-06 13:58:20 platform > Reading messages from protocol version 0.2.0
2024-12-06 13:58:20 platform > 2024-12-06 13:58:20 INFO i.a.i.s.r.RedshiftSource(main):140 - starting source: class io.airbyte.integrations.source.redshift.RedshiftSource
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.c.i.b.IntegrationCliParser(parseOptions):126 - integration args: {discover=null, config=source_config.json}
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.c.i.b.IntegrationRunner(runInternal):132 - Running integration: io.airbyte.integrations.source.redshift.RedshiftSource
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.c.i.b.IntegrationRunner(runInternal):133 - Command: DISCOVER
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.c.i.b.IntegrationRunner(runInternal):134 - Integration config: IntegrationConfig{command=DISCOVER, configPath='source_config.json', catalogPath='null', statePath='null'}
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO c.z.h.HikariDataSource(<init>):79 - HikariPool-1 - Starting...
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO c.z.h.HikariDataSource(<init>):81 - HikariPool-1 - Start completed.
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO c.z.h.p.PoolBase(getAndSetNetworkTimeout):537 - HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon][JDBC](10220) Driver does not support this optional feature.)
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.i.s.r.RedshiftSource(discoverInternal):101 - No schemas explicitly set on UI to process, so will process all of existing schemas in DB
2024-12-06 13:58:21 platform > 2024-12-06 13:58:21 INFO i.a.c.i.s.j.AbstractJdbcSource(discoverInternal):169 - Internal schemas to exclude: [catalog_history, information_schema, pg_catalog, pg_internal]
2024-12-06 13:58:27 platform > 2024-12-06 13:58:27 INFO c.z.h.HikariDataSource(close):349 - HikariPool-1 - Shutdown initiated...
2024-12-06 13:58:27 platform > 2024-12-06 13:58:27 INFO c.z.h.HikariDataSource(close):351 - HikariPool-1 - Shutdown completed.
2024-12-06 13:58:27 platform > 2024-12-06 13:58:27 ERROR i.a.c.i.b.AirbyteExceptionHandler(uncaughtException):64 - Something went wrong in the connector. See the logs for more details.
2024-12-06 13:58:27 platform > java.lang.NullPointerException: null value in entry: isNullable=null
2024-12-06 13:58:27 platform >      at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:33) ~[guava-33.0.0-jre.jar:?]
2024-12-06 13:58:27 platform >      at com.google.common.collect.ImmutableMapEntry.<init>(ImmutableMapEntry.java:54) ~[guava-33.0.0-jre.jar:?]
2024-12-06 13:58:27 platform >      at com.google.common.collect.ImmutableMap.entryOf(ImmutableMap.java:345) ~[guava-33.0.0-jre.jar:?]
2024-12-06 13:58:27 platform >      at com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:454) ~[guava-33.0.0-jre.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.source.jdbc.AbstractJdbcSource.getColumnMetadata(AbstractJdbcSource.java:248) ~[airbyte-cdk-db-sources-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.db.jdbc.JdbcDatabase$1.tryAdvance(JdbcDatabase.java:84) ~[airbyte-cdk-core-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.Spliterator.forEachRemaining(Spliterator.java:332) ~[?:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
2024-12-06 13:58:27 platform >      at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.db.jdbc.DefaultJdbcDatabase.bufferedResultSetQuery(DefaultJdbcDatabase.java:57) ~[airbyte-cdk-core-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.source.jdbc.AbstractJdbcSource.discoverInternal(AbstractJdbcSource.java:171) ~[airbyte-cdk-db-sources-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.source.jdbc.AbstractJdbcSource.discoverInternal(AbstractJdbcSource.java:258) ~[airbyte-cdk-db-sources-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.integrations.source.redshift.RedshiftSource.discoverInternal(RedshiftSource.java:102) ~[io.airbyte.airbyte-integrations.connectors-source-redshift-0.50.50.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.integrations.source.redshift.RedshiftSource.discoverInternal(RedshiftSource.java:30) ~[io.airbyte.airbyte-integrations.connectors-source-redshift-0.50.50.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.discoverWithoutSystemTables(AbstractDbSource.java:268) ~[airbyte-cdk-db-sources-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.source.relationaldb.AbstractDbSource.discover(AbstractDbSource.java:126) ~[airbyte-cdk-db-sources-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:159) ~[airbyte-cdk-core-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125) ~[airbyte-cdk-core-0.20.4.jar:?]
2024-12-06 13:58:27 platform >      at io.airbyte.integrations.source.redshift.RedshiftSource.main(RedshiftSource.java:141) ~[io.airbyte.airbyte-integrations.connectors-source-redshift-0.50.50.jar:?]
2024-12-06 13:58:27 platform > Discover job subprocess finished with exit codee 1
2024-12-06 13:58:27 INFO i.a.c.t.HeartbeatUtils(withBackgroundHeartbeat):64 - Stopping temporal heartbeating...
2024-12-06 13:58:27 platform > 
2024-12-06 13:58:27 INFO i.a.c.t.HeartbeatUtils(withBackgroundHeartbeat):73 - Temporal heartbeating stopped.
2024-12-06 13:58:27 platform > ----- END DISCOVER SOURCE CATALOG -----

Contribute

  • Yes, I want to contribute
@olegflo
Copy link

olegflo commented Dec 9, 2024

+1, had a similar issue here #48856 (closed in favour of this issue).

I tested in my setup and could reproduce the point mention about schemas with/without late-binding views - it started working for me after I specified a schema without late-binding views 🍏

@JakeCowton curious how did you come to this finding? Was it mentioned anywhere in logs or else?

@JakeCowton
Copy link
Author

@olegflo

how did you come to this finding? Was it mentioned anywhere in logs or else?

Outbound syncs from Redshift started failing as soon as I introduced late-binding views. If I excluded schemas that had the late-binding views it worked fine, so I'm fairly certain the error is caused by them in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants