Parsing document with a lot of HTML tags is slow

I have a script that generates a HTML sample that is ~1.5MB in size. It emulates a real-world example. Then I parse it.
```
$html = '<HTML><BODY>';
$lines = 20000;
while ($lines--) {
 $html .= '&gt;&gt; ';
}

$html5 = new Masterminds\HTML5();
$node = $html5->loadHTML($html);
```
and here's the result:
```
PHP Fatal error: Maximum execution time of 120 seconds exceeded in vendor/masterminds/html5/src/HTML5/Parser/DOMTreeBuilder.php on line 433
PHP Stack trace:
PHP 1. {main}() test.php:0
PHP 2. Masterminds\HTML5->loadHTML() test.php:23
PHP 3. Masterminds\HTML5->parse() vendor/masterminds/html5/src/HTML5.php:98
PHP 4. Masterminds\HTML5\Parser\Tokenizer->parse() vendor/masterminds/html5/src/HTML5.php:174
PHP 5. Masterminds\HTML5\Parser\Tokenizer->consumeData() vendor/masterminds/html5/src/HTML5/Parser/Tokenizer.php:89
PHP 6. Masterminds\HTML5\Parser\Tokenizer->tagOpen() vendor/masterminds/html5/src/HTML5/Parser/Tokenizer.php:132
PHP 7. Masterminds\HTML5\Parser\Tokenizer->tagName() vendor/masterminds/html5/src/HTML5/Parser/Tokenizer.php:284
PHP 8. Masterminds\HTML5\Parser\DOMTreeBuilder->startTag() vendor/masterminds/html5/src/HTML5/Parser/Tokenizer.php:388
```
I tested this with 2.7.0 and some older versions with no success. The sample half of that size works, but it takes 27 seconds to finish (so it's not linear).

Cross-ref: https://github.com/roundcube/roundcubemail/issues/7331

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing document with a lot of HTML tags is slow #181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parsing document with a lot of HTML tags is slow #181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions