Skip to content

Skipping array elements on a domain specific parse error #291

@jonahaapala

Description

@jonahaapala

First off, I just found this library less than a month ago and I've been blown away by how declarative this make my code. I was looking for a way to create a domain specific JSON parser and started on one myself, but since finding this project I really want to make it work for my use case. With that in mind, I am struggling to simply express a way to handle certain kinds of parsing errors in a way that doesn't completely bail on parsing the current payload.

My Problem

The API I own receives JSON payloads that are always an array of batches of data, and there are certain validations that if violated must not fail parsing entirely. Instead, I need to just skip to the next batch in the array and continue parsing from there. For example:

[
  {  
    "something": { 
		"key1": "hello",
		"key2": 1337
    },  
    "listOfThings": [ 
	    {
		    "key3": "world",
		    "key4": true,
		    "key5": {
				"foo": "bar",
				"baz": 42.0
		    }
	    },
		...
	]
  },
  ...
]

The Java code to model this would be as follows:

@CompiledJson
public record Batch(Something something, List<Thing> listOfThings) {}
public record Something(String key1, int key2) {}
public record Thing(String key3, boolean key4, Map<String, Object> key5) {}

DslJson<?> json = new DslJson<>(...);
List<Batch> batches = json.deserializeList(Batch.class, payloadBytes);

The generated code will then require that the value associated with "key1" (for instance) be a String. If it is not a String, then a parsing error is thrown and I can no longer continue parsing the payload. This requirement that the value be a String is not a JSON parsing error, but a domain specific error. As long as it's valid JSON, what I would like is to have the option to just skip the rest of the current object and continue parsing from the next array element. For error reporting back to our users, though, I would also like to have some ability to know that parsing failed, but I'll come back to that.

Another example of a domain specific requirement is when using some of the @JsonAttribute fields. For example, if a certain field is mandatory, then right now its absence under that setting would fail parsing of the larger payload rather than the current object within the payload being parsed. Again, so long as its valid JSON, it is desirable for these Object specific rules to not propagate to the rest of the payload being parsed. I've tried my hand at a few approaches to resolving these needs with the existing machinery, but I haven't come up with an ideal solution. As mentioned up top, I'm brand new to this library so I may be missing some feature that makes this simple, so I'm hoping you can point me in the right direction if that's the case!

Approach 1: Custom Converter

My first thought when trying to approach the type mismatch issue was to use a custom converter where I could bulkhead the parse exception:

@JsonConverter(target = String.class)
public class StringConverter {
    public static String read(JsonReader<?> reader) {
        if (reader.last() == '"') return reader.readSimpleString();
        reader.skip();
        return null;
    }
}

This works, but feels a bit boilerplate-y. For numbers, it gets a little less straightforward:

@JsonConverter(target = long.class)
public class LongConverter {
    public static long read(JsonReader<?> reader) {
        Object o = deserializeObject(reader);
        return o instanceof Long l ? l : -1;
        // try { return deserializeLong(reader); } catch (IOException ex) { reader.skip(); return -1L; }
    }
}

At first I thought about trying to parse assuming it's a long, which I think may still work but I'm not sure if I can rely on that (again assuming it's still all valid JSON). It looks like deserializeLong() may already skip past the whole token without need for skip() via scanNumber()? But you can see how having library support for it would be preferable to rolling my own. It's also, questionable what value it should return to indicate an error.

Instead, it might be nice if when a parse exception occurs where the issue is a mismatched type, the generated code could catch that exception and skip the rest of the object. Or maybe if this was a specific feature of lists/arrays, it would bubble up to the iteration loop and skip until the start of the next array object.

Approach 2: Iterate Over

My particular use case really only cares to bulkhead elements of arrays. I did notice the nifty iterateOver() method, that logically treats each element as effectively its own JSON blob. This then allows for something like the following:

Iterator<Batch> batchItr = json.iterateOver(Batch.class, payload);
List<Batch> batches = new ArrayList<>();

while (batchItr.hasNext()) {
    try {
        batches.add(batchItr.next());
    } catch (IOException ex) {
        // Still need to skip past the rest of the object
    }
}

This puts how to handle errors back in the hands of the user, especially if the exception was its own type and invalid JSON parsing errors could still bubble up completely. However, there is still the need for a method that would skip the rest of the partially parsed object. I've tinkered around with doing that myself, but again it would be nice to have support provided by the library for more confidence. Perhaps even next() or hasNext() can be aware when the reader is in this state and self correct?

Wrapping up

I recognize that trying to implement something like this probably opens up a whole host of questions depending on the approach taken:

  1. Should this only apply to arrays whereby regardless of object depth this should skip to the next array element?
  2. Or is there a more general problem here where domain failures like this just fail the object whose field just failed to parse?
  3. For general bulkheading converters, what is the default value? There are Java defaults, and I know the library has mechanisms like static jsonDefault() methods, but again I'm still new to all of this.

This was a lot of words, but hopefully my question makes sense. To try and recap/sum up my question, Is there a natural way to bulkhead domain parsing errors so that they only affect the current object being parsed? And by "domain errors" I mean to exclude general invalid JSON parsing errors and focus only on errors that arise from domain specific requirements like expected types, nullability, mandatory fields, etc...

Thanks for any help/insight you can provide me here!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions