Skip to content

Commit d9e1754

Browse files
committed
AVRO-3666: Separate parsing from Schema class
This allows using pluggable parser implementations, allowing multiple formats to be parsed with the same code.
1 parent da98719 commit d9e1754

File tree

13 files changed

+1081
-26
lines changed

13 files changed

+1081
-26
lines changed

doc/content/en/docs/++version++/Getting started (Java)/_index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ You may also build the required Avro jars from source. Building Avro is beyond t
7777

7878
## Defining a schema
7979

80-
Avro schemas are defined using JSON. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed). You can learn more about Avro schemas and types from the specification, but for now let's start with a simple schema example, user.avsc:
80+
Avro schemas are defined using JSON or IDL (the latter requires an extra dependency). Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed). You can learn more about Avro schemas and types from the specification, but for now let's start with a simple schema example, user.avsc:
8181

8282
```json
8383
{"namespace": "example.avro",
@@ -209,10 +209,10 @@ Data in Avro is always stored with its corresponding schema, meaning we can alwa
209209
Let's go over the same example as in the previous section, but without using code generation: we'll create some users, serialize them to a data file on disk, and then read back the file and deserialize the users objects.
210210

211211
### Creating users
212-
First, we use a Parser to read our schema definition and create a Schema object.
212+
First, we use a SchemaParser to read our schema definition and create a Schema object.
213213

214214
```java
215-
Schema schema = new Schema.Parser().parse(new File("user.avsc"));
215+
Schema schema = new SchemaParser().parse(new File("user.avsc"));
216216
```
217217

218218
Using this schema, let's create some users.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* https://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
package org.apache.avro;
19+
20+
import java.io.IOException;
21+
import java.net.URI;
22+
import java.util.Collection;
23+
24+
/**
25+
* Schema parser for a specific schema format.
26+
*
27+
* <p>
28+
* The {@link SchemaParser} class uses this interface, supporting text based
29+
* schema sources.
30+
* </p>
31+
*
32+
* <h2>Note to implementers:</h2>
33+
*
34+
* <p>
35+
* Implementations are located using a {@link java.util.ServiceLoader}. See that
36+
* class for details.
37+
* </p>
38+
*
39+
* <p>
40+
* You can expect that schemas being read are invalid, so you are encouraged to
41+
* return {@code null} upon parsing failure where the input clearly doesn't make
42+
* sense (e.g., reading "/**" when expecting JSON). If the input is likely in
43+
* the correct format, but invalid, throw a {@link SchemaParseException}
44+
* instead.
45+
* </p>
46+
*
47+
* <p>
48+
* Note that throwing anything other than a {@code SchemaParseException} will
49+
* abort the parsing process, so reserve that for rethrowing exceptions.
50+
* </p>
51+
*
52+
* @see java.util.ServiceLoader
53+
*/
54+
public interface FormattedSchemaParser {
55+
/**
56+
* Parse a schema from a text based source. Can use the base location of the
57+
* schema (e.g., the directory where the schema file lives) if available.
58+
*
59+
* <p>
60+
* Implementations should add all named schemas they parse to the collection.
61+
* </p>
62+
*
63+
* @param types a mutable collection of known types; parsed named
64+
* schemata will be added
65+
* @param baseUri the base location of the schema, or {@code null} if
66+
* not known
67+
* @param formattedSchema the schema as text
68+
* @return the parsed schema, or {@code null} if the format is not supported
69+
* @throws IOException when the schema cannot be read
70+
* @throws SchemaParseException when the schema cannot be parsed
71+
*/
72+
Schema parse(Collection<Schema> types, URI baseUri, CharSequence formattedSchema)
73+
throws IOException, SchemaParseException;
74+
}
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* https://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
package org.apache.avro;
19+
20+
import java.io.IOException;
21+
import java.net.URI;
22+
import java.util.ArrayList;
23+
import java.util.Collection;
24+
25+
/**
26+
* Schema parser for JSON formatted schemata. This initial implementation simply
27+
* delegates to the {@link Schema.Parser} class, though it should be refactored
28+
* out of there.
29+
*
30+
* <p>
31+
* Note: this class is intentionally not available via the Java
32+
* {@link java.util.ServiceLoader}, as its use is hardcoded as fallback when no
33+
* service exists. This enables users to reliably override the standard JSON
34+
* parser as well.
35+
* </p>
36+
*/
37+
public class JsonSchemaParser implements FormattedSchemaParser {
38+
/**
39+
* <p>
40+
* Parse a schema written in the internal (JSON) format without any validations.
41+
* </p>
42+
*
43+
* <p>
44+
* Using this method is only safe if used to parse a write schema (i.e., a
45+
* schema used to read Avro data). Other usages, for example by generated Avro
46+
* code, can cause interoperability problems.
47+
* </p>
48+
*
49+
* <p>
50+
* Use with care and sufficient testing!
51+
* </p>
52+
*
53+
* @param fragments one or more strings making up the schema (some schemata
54+
* exceed the compiler limits)
55+
* @return the parsed schema
56+
*/
57+
public static Schema parseInternal(String... fragments) {
58+
StringBuilder buffer = new StringBuilder();
59+
for (String fragment : fragments) {
60+
buffer.append(fragment);
61+
}
62+
return new JsonSchemaParser().parse(new ArrayList<>(), buffer, true);
63+
}
64+
65+
@Override
66+
public Schema parse(Collection<Schema> schemas, URI baseUri, CharSequence formattedSchema)
67+
throws IOException, SchemaParseException {
68+
return parse(schemas, formattedSchema, false);
69+
}
70+
71+
private Schema parse(Collection<Schema> schemas, CharSequence formattedSchema, boolean skipValidation)
72+
throws SchemaParseException {
73+
// TODO: refactor JSON parsing out of the Schema class
74+
Schema.Parser parser;
75+
if (skipValidation) {
76+
parser = new Schema.Parser(Schema.NameValidator.NO_VALIDATION);
77+
parser.setValidateDefaults(false);
78+
} else {
79+
parser = new Schema.Parser();
80+
}
81+
if (schemas != null) {
82+
parser.addTypes(schemas);
83+
}
84+
Schema schema = parser.parse(formattedSchema.toString());
85+
if (schemas != null) {
86+
schemas.addAll(parser.getTypes().values());
87+
}
88+
return schema;
89+
}
90+
}

0 commit comments

Comments
 (0)