Skip to content

Better handling of previous siblings when streaming #130

@Andrewsville

Description

@Andrewsville

Hello!

I understand the general idea of streaming:

  1. You iterate over the XML document.
  2. On element end you check if it matches the streaming expression and if it does, you return it.
  3. You remove all previous siblings of the match so that you don't have the entire DOM in memory (that would deny the whole idea of streaming).
  4. GOTO 1

An unfortunate side effect is that there are use cases that could theoretically work but don't because the previous sibling removal is very aggressive.

Take this input XML as an example

<xml>
    <root>
        <name>Name</name>
        <child>
            <child_name>Child name1</child_name>
        </child>
        <child>
            <child_name>Child name2</child_name>
        </child>
    </root>
</xml>

For a streaming expression //root/child one may expect that an expression /parent::root/name might work. But because of the cleanup logic, it works only for the first streaming match. On the next Read() invocation the match along with all its previous siblings (namely the name element) get removed and parser returns the second match. But this time the parent expression no longer works because the parent node has no idea about the name element.

A possible solution can be modifying the StreamParser::Read method so that the cleanup loop only removes nodes where QuerySelector(node, sp.p.streamElementXPath) != nil. This would remove matching nodes but keep everything else. I understand that it would increase memory usage and potentially slow down streaming so it might be configurable in the streaming parser constructor function.

It that's still not something you are comfortable with, it would be great to make functions getQuery and createParser as well as the parser struct exported so that it is possible to create an own StreamParser implementation.

I am happy to provide a PR for each option 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions