Performance bottleneck in prune_unwanted_nodes
causing 200ms per call
#750
Labels
question
Further information is requested
prune_unwanted_nodes
causing 200ms per call
#750
When profiling
trafilatura.bare_extraction
method for some pages that took us a while to parse, I found that significant performance issues inextract_content
method.Root cause: Too many calls to
prune_unwanted_node
with each call taking upto ~200ms.Steps to reproduce:
bare_extraction
prune_unwanted_node
timingQuestions
The text was updated successfully, but these errors were encountered: