-
Notifications
You must be signed in to change notification settings - Fork 93
Description
Hello!
I understand the general idea of streaming:
- You iterate over the XML document.
- On element end you check if it matches the streaming expression and if it does, you return it.
- You remove all previous siblings of the match so that you don't have the entire DOM in memory (that would deny the whole idea of streaming).
- GOTO 1
An unfortunate side effect is that there are use cases that could theoretically work but don't because the previous sibling removal is very aggressive.
Take this input XML as an example
<xml>
<root>
<name>Name</name>
<child>
<child_name>Child name1</child_name>
</child>
<child>
<child_name>Child name2</child_name>
</child>
</root>
</xml>
For a streaming expression //root/child
one may expect that an expression /parent::root/name
might work. But because of the cleanup logic, it works only for the first streaming match. On the next Read()
invocation the match along with all its previous siblings (namely the name
element) get removed and parser returns the second match. But this time the parent expression no longer works because the parent node has no idea about the name element.
A possible solution can be modifying the StreamParser::Read
method so that the cleanup loop only removes nodes where QuerySelector(node, sp.p.streamElementXPath) != nil
. This would remove matching nodes but keep everything else. I understand that it would increase memory usage and potentially slow down streaming so it might be configurable in the streaming parser constructor function.
It that's still not something you are comfortable with, it would be great to make functions getQuery
and createParser
as well as the parser
struct exported so that it is possible to create an own StreamParser implementation.
I am happy to provide a PR for each option 😄