-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing document with a lot of HTML tags is slow #181
Comments
Have you tried to debug it with backfire or some other profiler? |
I didn't yet, but I can add that the specific content is not that important, the number of tags is. So, it looks like this library has problem with parsing big HTML pages. FYI, DOMDocument parses the sample in less than a second. |
So, it looks like a DOMElement::appendChild() is the main bottleneck. Here's some performance stats showing how number of tags makes a difference. PHP 7.4.
|
can you try to benchmark |
Nope, and it's the other way round (more tags, better time per tag). What's more the following script is blazingly fast (<1sec).
|
Hmm, weird... |
|
This turned out to be a PHP issue that can be workedaroud by doing $html5 = new Masterminds\HTML5([
'disable_html_ns' => true
]);
$node = $html5->loadHTML($html); The perf issue was introduced by https://github.com/php/php-src/blob/35e0a91db717fe441a89ca9554d8843d8ee63112/ext/dom/php_dom.c and php/php-src@84b90f6 |
Thanks for the workaround. With it my initial test script takes 8 seconds, not that bad. DOMDocument needs 0.3 second. Did you already create a ticket in PHP's bugtracker? |
I have a script that generates a HTML sample that is ~1.5MB in size. It emulates a real-world example. Then I parse it.
and here's the result:
I tested this with 2.7.0 and some older versions with no success. The sample half of that size works, but it takes 27 seconds to finish (so it's not linear).
Cross-ref: roundcube/roundcubemail#7331
The text was updated successfully, but these errors were encountered: