Skip to content

Commit 7bebfeb

Browse files
authored
Refactor walk and update docs (#1202)
1 parent 4dbdc9f commit 7bebfeb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+3358
-1107
lines changed

README.md

Lines changed: 271 additions & 253 deletions
Large diffs are not rendered by default.

doc/1.x/README.md

Lines changed: 648 additions & 0 deletions
Large diffs are not rendered by default.
File renamed without changes.

doc/1.x/compatibility.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
## Compatibility with JSON Schema versions
2+
3+
[![Supported Dialects](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fsupported_versions.json)](https://bowtie.report/#/implementations/java-networknt-json-schema-validator)
4+
[![Draft 2020-12](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft2020-12.json)](https://bowtie.report/#/dialects/draft2020-12)
5+
[![Draft 2019-09](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft2019-09.json)](https://bowtie.report/#/dialects/draft2019-09)
6+
[![Draft 7](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft7.json)](https://bowtie.report/#/dialects/draft7)
7+
[![Draft 6](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft6.json)](https://bowtie.report/#/dialects/draft6)
8+
[![Draft 4](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft4.json)](https://bowtie.report/#/dialects/draft4)
9+
10+
The `pattern` and `format` `regex` validator by default uses the JDK regular expression implementation which is not ECMA-262 compliant and is thus not compliant with the JSON Schema specification. The library can however be configured to use a ECMA-262 compliant regular expression implementation such as `GraalJS` or `Joni`.
11+
12+
Annotation processing and reporting are implemented. Note that the collection of annotations will have an adverse performance impact.
13+
14+
This implements the Flag, List and Hierarchical output formats defined in the [Specification for Machine-Readable Output for JSON Schema Validation and Annotation](https://github.com/json-schema-org/json-schema-spec/blob/8270653a9f59fadd2df0d789f22d486254505bbe/jsonschema-validation-output-machines.md).
15+
16+
The implementation supports the use of custom keywords, formats, vocabularies and meta-schemas.
17+
18+
### Known Issues
19+
20+
There are currently no known issues with the required functionality from the specification.
21+
22+
The following are the tests results after running the [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) as at 18 Jun 2024 using version 1.4.1. As the test suite is continously updated, this can result in changes in the results subsequently.
23+
24+
| Implementations | Overall | DRAFT_03 | DRAFT_04 | DRAFT_06 | DRAFT_07 | DRAFT_2019_09 | DRAFT_2020_12 |
25+
|-----------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|---------------------------------------------------------------------|--------------------------------------------------------------------|------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------------------------------------------|
26+
| NetworkNt | pass: r:4803 (100.0%) o:2372 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | | pass: r:610 (100.0%) o:251 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:822 (100.0%) o:318 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:906 (100.0%) o:541 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:1220 (100.0%) o:625 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:1245 (100.0%) o:637 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) |
27+
28+
### Legend
29+
30+
| Symbol | Meaning |
31+
|:------:|:----------------------|
32+
| 🟢 | Fully implemented |
33+
| 🟡 | Partially implemented |
34+
| 🔴 | Not implemented |
35+
| 🚫 | Not defined |
36+
37+
### Keywords Support
38+
39+
| Keyword | Draft 4 | Draft 6 | Draft 7 | Draft 2019-09 | Draft 2020-12 |
40+
|:---------------------------|:-------:|:-------:|:-------:|:-------------:|:-------------:|
41+
| $anchor | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
42+
| $defs | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
43+
| $dynamicAnchor | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
44+
| $dynamicRef | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
45+
| $id | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
46+
| $recursiveAnchor | 🚫 | 🚫 | 🚫 | 🟢 | 🚫 |
47+
| $recursiveRef | 🚫 | 🚫 | 🚫 | 🟢 | 🚫 |
48+
| $ref | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
49+
| $vocabulary | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
50+
| additionalItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
51+
| additionalProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
52+
| allOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
53+
| anyOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
54+
| const | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
55+
| contains | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
56+
| contentEncoding | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
57+
| contentMediaType | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
58+
| contentSchema | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
59+
| definitions | 🟢 | 🟢 | 🟢 | 🚫 | 🚫 |
60+
| dependencies | 🟢 | 🟢 | 🟢 | 🚫 | 🚫 |
61+
| dependentRequired | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
62+
| dependentSchemas | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
63+
| enum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
64+
| exclusiveMaximum (boolean) | 🟢 | 🚫 | 🚫 | 🚫 | 🚫 |
65+
| exclusiveMaximum (numeric) | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
66+
| exclusiveMinimum (boolean) | 🟢 | 🚫 | 🚫 | 🚫 | 🚫 |
67+
| exclusiveMinimum (numeric) | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
68+
| if-then-else | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
69+
| items | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
70+
| maxContains | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
71+
| minContains | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
72+
| maximum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
73+
| maxItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
74+
| maxLength | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
75+
| maxProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
76+
| minimum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
77+
| minItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
78+
| minLength | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
79+
| minProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
80+
| multipleOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
81+
| not | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
82+
| oneOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
83+
| pattern | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
84+
| patternProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
85+
| prefixItems | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
86+
| properties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
87+
| propertyNames | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
88+
| readOnly | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
89+
| required | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
90+
| type | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
91+
| unevaluatedItems | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
92+
| unevaluatedProperties | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
93+
| uniqueItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
94+
| writeOnly | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
95+
96+
In accordance with the specification, unknown keywords are treated as annotations. This is customizable by configuring a unknown keyword factory on the respective meta-schema.
97+
98+
#### Content Encoding
99+
100+
Since Draft 2019-09, the `contentEncoding` keyword does not generate assertions.
101+
102+
#### Content Media Type
103+
104+
Since Draft 2019-09, the `contentMediaType` keyword does not generate assertions.
105+
106+
#### Content Schema
107+
108+
The `contentSchema` keyword does not generate assertions.
109+
110+
#### Pattern
111+
112+
By default the `pattern` keyword uses the JDK regular expression implementation validating regular expressions.
113+
114+
This is not ECMA-262 compliant and is thus not compliant with the JSON Schema specification. This is however the more likely desired behavior as other logic will most likely be using the default JDK regular expression implementation to perform downstream processing.
115+
116+
The library can be configured to use a ECMA-262 compliant regular expression validator which is implemented using [GraalJS](https://github.com/oracle/graaljs) or [Joni](https://github.com/jruby/joni). This can be configured by setting `setRegularExpressionFactory` to the respective `GraalJSRegularExpressionFactory` or `JoniRegularExpressionFactory` instances.
117+
118+
This also requires adding the `org.graalvm.js:js` or `org.jruby.joni:joni` dependency.
119+
120+
```xml
121+
<dependency>
122+
<!-- Used to validate ECMA 262 regular expressions -->
123+
<!-- Approximately 50 MB in dependencies -->
124+
<!-- GraalJSRegularExpressionFactory -->
125+
<groupId>org.graalvm.js</groupId>
126+
<artifactId>js</artifactId>
127+
<version>${version.graaljs}</version>
128+
</dependency>
129+
130+
<dependency>
131+
<!-- Used to validate ECMA 262 regular expressions -->
132+
<!-- Approximately 2 MB in dependencies -->
133+
<!-- JoniRegularExpressionFactory -->
134+
<groupId>org.jruby.joni</groupId>
135+
<artifactId>joni</artifactId>
136+
<version>${version.joni}</version>
137+
</dependency>
138+
```
139+
140+
#### Format
141+
142+
Since Draft 2019-09 the `format` keyword only generates annotations by default and does not generate assertions.
143+
144+
This can be configured on a schema basis by using a meta schema with the appropriate vocabulary.
145+
146+
| Version | Vocabulary | Value |
147+
|:----------------------|---------------------------------------------------------------|-------------------|
148+
| Draft 2019-09 | `https://json-schema.org/draft/2019-09/vocab/format` | `true` |
149+
| Draft 2020-12 | `https://json-schema.org/draft/2020-12/vocab/format-assertion`| `true`/`false` |
150+
151+
This behavior can be overridden to generate assertions by setting the `setFormatAssertionsEnabled` option to `true`.
152+
153+
| Format | Draft 4 | Draft 6 | Draft 7 | Draft 2019-09 | Draft 2020-12 |
154+
|:----------------------|:-------:|:-------:|:-------:|:-------------:|:-------------:|
155+
| date | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
156+
| date-time | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
157+
| duration | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
158+
| email | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
159+
| hostname | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
160+
| idn-email | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
161+
| idn-hostname | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
162+
| ipv4 | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
163+
| ipv6 | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
164+
| iri | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
165+
| iri-reference | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
166+
| json-pointer | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
167+
| relative-json-pointer | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
168+
| regex | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
169+
| time | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
170+
| uri | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
171+
| uri-reference | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
172+
| uri-template | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
173+
| uuid | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
174+
175+
##### Unknown Formats
176+
177+
When the format assertion vocabularies are used in a meta schema, in accordance to the specification, unknown formats will result in assertions. If the format assertion vocabularies are not used, unknown formats will only result in assertions if the assertions are enabled and if `setStrict("format", true)`.
178+
179+
##### Footnotes
180+
1. Note that the validation are only optional for some of the keywords/formats.
181+
2. Refer to the corresponding JSON schema for more information on whether the keyword/format is optional or not.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

doc/1.x/ecma-262.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Regular Expressions
2+
3+
For the `pattern` and `format` `regex` validators there are 3 built in options in the library.
4+
5+
A custom implementation can be made by implementing `com.networknt.schema.regex.RegularExpressionFactory` to return a custom implementation of `com.networknt.schema.regex.RegularExpression`.
6+
7+
| Regular Expression Factory | Description |
8+
|--------------------------------------------------|----------------------------------------------------|
9+
| `JDKRegularExpressionFactory` | Uses Java's standard `java.util.regex` and calls the `find()` method. Note that `matches()` is not called as that attempts to match the entire string, implicitly adding anchors. This is the default implementation and does not require any additional libraries. |
10+
| `JoniRegularExpressionFactory` | Uses `org.joni.Regex` with `Syntax.ECMAScript`. This requires adding the `org.jruby.joni:joni` dependency which will require about 2MB. |
11+
| `GraalJSRegularExpressionFactory` | Uses GraalJS with `new RegExp(pattern, 'u')`. This requires adding the `org.graalvm.js:js` dependency which will require about 50MB. |
12+
13+
## Specification
14+
15+
The use of Regular Expressions is specified in JSON Schema at https://json-schema.org/draft/2020-12/json-schema-core#name-regular-expressions.
16+
17+
```
18+
Keywords MAY use regular expressions to express constraints, or constrain the instance value to be a regular expression. These regular expressions SHOULD be valid according to the regular expression dialect described in ECMA-262, section 21.2.1 [ecma262].
19+
20+
Regular expressions SHOULD be built with the "u" flag (or equivalent) to provide Unicode support, or processed in such a way which provides Unicode support as defined by ECMA-262.
21+
22+
Furthermore, given the high disparity in regular expression constructs support, schema authors SHOULD limit themselves to the following regular expression tokens:
23+
24+
individual Unicode characters, as defined by the JSON specification [RFC8259];
25+
simple character classes ([abc]), range character classes ([a-z]);
26+
complemented character classes ([^abc], [^a-z]);
27+
simple quantifiers: "+" (one or more), "*" (zero or more), "?" (zero or one), and their lazy versions ("+?", "*?", "??");
28+
range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
29+
the beginning-of-input ("^") and end-of-input ("$") anchors;
30+
simple grouping ("(...)") and alternation ("|").
31+
Finally, implementations MUST NOT take regular expressions to be anchored, neither at the beginning nor at the end. This means, for instance, the pattern "es" matches "expression".
32+
```
33+
34+
## Considerations when selecting implementation
35+
36+
If strict compliance with the regular expression dialect described in ECMA-262 is required. Then only the `GraalJS` implementation meets that criteria.
37+
38+
The `Joni` implementation is configured to attempt to match the ECMA-262 regular expression dialect. However this dialect isn't directly maintained by its maintainers as it doesn't come from its upstream `Oniguruma`. The current implementation has known issues matching inputs with newlines and not respecting `^` and `$` anchors.
39+
40+
The `JDK` implementation is the default and uses `java.util.regex` with the `find()` method.
41+
42+
As the implementations are used when validating regular expressions, using `format` `regex`, one consideration is how the regular expression is used. For instance if the system that consumes the input is implemented in Javascript then the `GraalJS` implementation will ensure that this regular expression will work. If the system that consumes the input is implemented in Java then the `JDK` implementation may be better.
43+
44+
## Configuration of implementation
45+
46+
The following test case shows how to pass a config object to use the `GraalJS` factory.
47+
48+
```java
49+
public class RegularExpressionTest {
50+
@Test
51+
public void testInvalidRegexValidatorECMA262() throws Exception {
52+
SchemaValidatorsConfig config = SchemaValidatorsConfig.builder()
53+
.regularExpressionFactory(GraalJSRegularExpressionFactory.getInstance())
54+
.build();
55+
JsonSchemaFactory factory = JsonSchemaFactory.getInstance(VersionFlag.V202012);
56+
JsonSchema schema = factory.getSchema("{\r\n"
57+
+ " \"format\": \"regex\"\r\n"
58+
+ "}", config);
59+
Set<ValidationMessage> errors = schema.validate("\"\\\\a\"", InputFormat.JSON, executionContext -> {
60+
executionContext.getExecutionConfig().setFormatAssertionsEnabled(true);
61+
});
62+
assertFalse(errors.isEmpty());
63+
}
64+
}
65+
```
66+
67+
## Performance
68+
69+
The following is the relative performance of the different implementations.
70+
71+
```
72+
Benchmark Mode Cnt Score Error Units
73+
RegularExpressionBenchmark.graaljs thrpt 6 362696.226 ± 15811.099 ops/s
74+
RegularExpressionBenchmark.graaljs:gc.alloc.rate thrpt 6 2584.386 ± 112.708 MB/sec
75+
RegularExpressionBenchmark.graaljs:gc.alloc.rate.norm thrpt 6 7472.003 ± 0.001 B/op
76+
RegularExpressionBenchmark.graaljs:gc.count thrpt 6 130.000 counts
77+
RegularExpressionBenchmark.graaljs:gc.time thrpt 6 144.000 ms
78+
RegularExpressionBenchmark.jdk thrpt 6 2776184.321 ± 41838.479 ops/s
79+
RegularExpressionBenchmark.jdk:gc.alloc.rate thrpt 6 1482.565 ± 22.343 MB/sec
80+
RegularExpressionBenchmark.jdk:gc.alloc.rate.norm thrpt 6 560.000 ± 0.001 B/op
81+
RegularExpressionBenchmark.jdk:gc.count thrpt 6 74.000 counts
82+
RegularExpressionBenchmark.jdk:gc.time thrpt 6 78.000 ms
83+
RegularExpressionBenchmark.joni thrpt 6 1810229.581 ± 35230.798 ops/s
84+
RegularExpressionBenchmark.joni:gc.alloc.rate thrpt 6 1463.887 ± 28.483 MB/sec
85+
RegularExpressionBenchmark.joni:gc.alloc.rate.norm thrpt 6 848.003 ± 0.001 B/op
86+
RegularExpressionBenchmark.joni:gc.count thrpt 6 73.000 counts
87+
RegularExpressionBenchmark.joni:gc.time thrpt 6 77.000 ms
88+
```
89+
File renamed without changes.

0 commit comments

Comments
 (0)