Better handling of previous siblings when streaming

Hello!

I understand the general idea of streaming:

1. You iterate over the XML document.
2. On element end you check if it matches the streaming expression and if it does, you return it.
3. You remove all previous siblings of the match so that you don't have the entire DOM in memory (that would deny the whole idea of streaming).
4. GOTO 1

An unfortunate side effect is that there are use cases that could theoretically work but don't because the previous sibling removal is very aggressive.

Take this input XML as an example

```
<xml>
    <root>
        <name>Name</name>
        <child>
            <child_name>Child name1</child_name>
        </child>
        <child>
            <child_name>Child name2</child_name>
        </child>
    </root>
</xml>
```

For a streaming expression `//root/child` one may expect that an expression `/parent::root/name` might work. But because of the cleanup logic, it works only for the first streaming match. On the next `Read()` invocation the match along with all its previous siblings (namely the `name` element) get removed and parser returns the second match. But this time the parent expression no longer works because the parent node has no idea about the name element.

A possible solution can be modifying the `StreamParser::Read` method so that the cleanup loop only removes nodes where `QuerySelector(node, sp.p.streamElementXPath) != nil`. This would remove matching nodes but keep everything else. I understand that it would increase memory usage and potentially slow down streaming so it might be configurable in the streaming parser constructor function.

It that's still not something you are comfortable with, it would be great to make functions `getQuery` and `createParser` as well as the `parser` struct exported so that it is possible to create an own StreamParser implementation.

I am happy to provide a PR for each option 😄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better handling of previous siblings when streaming #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Better handling of previous siblings when streaming #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions