Skip to content

Automatically infer schema of a stream from Avro schema #912

@zliang-min

Description

@zliang-min

Nowadays, we can use a Avro schema with external streams to tell the stream to use the schema to read/write data, like this:

CREATE FORMAT SCHEMA my_avro_schema AS
$$
{
  "type": "record",
  "name": "Data",
  "fields" : [
    {"name": "a", "type": "int"},
    {"name": "b", "type": "string},
    {"name": "c", "type": "float}
  ]
}
$$
TYPE = Avro;

CREATE EXTERNAL STREAM example (
a int,
b string,
c float32
) SETTINGS type = 'kafka', brokers = 'some-broker:9092', topic = 'my-topic', data_format = 'Avro', data_schema = 'my_avro_schema';

This works well except that we have to repeat the schema twice, one in the format schema, the other in the external stream. This is cumbersome and error-prone, esp. when the scema is complex.

What we want to achieve is that, when an external stream is attached to a Avro schema, it can automatically infer the stream's schema from the Avro schema. So the above example will become

CREATE EXTERNAL STREAM example SETTINGS type = 'kafka', brokers = 'some-broker:9092', topic = 'my-topic', data_format = 'Avro', data_schema = 'my_avro_schema';

And then, users can run DESC example to see the stream's schema which was inferred from the Avro schema.

This is similar to how the ClickHouse and Timeplus external table generate the schema automatically without the user specifying it explicitly.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions