Turn any PCRE pattern into an easy-to-work-with Abstract Syntax Tree (AST) so you can understand, validate, and safely optimize regexes in PHP.
This library is alpha. It parses and analyzes real-world patterns but is not fully validated against the entire PCRE spec.
Current status: core parsing validated β’ ReDoS detection fixed β’ behavioral compliance tests green β’ 140 tests / 284 assertions β’ Symfony + Rector + PHPStan integrations available.
- Parse PCRE patterns into a traversable AST.
- Get plain-English explanations for complex regexes.
- Validate semantics (lookbehinds, backreferences, nested quantifiers).
- Detect and score ReDoS risks before deployment.
- Generate sample strings and optimized patterns.
- Works with PHP 8.4+, integrates with popular tools.
composer require yoeunes/regex-parserNeeds: PHP 8.4+, ext-mbstring.
<?php
use RegexParser\Regex;
$regex = Regex::create();
$pattern = '/(?<email>[\\w.-]+@[\\w.-]+\\.\\w+)/i';
// 1) Explain it (plain English)
echo $regex->explain($pattern);
// 2) Validate it (syntax + semantics + ReDoS)
$result = $regex->validate($pattern);
echo $result->isValid ? 'OK' : $result->error;
// 3) Generate a matching sample
echo $regex->generate($pattern); // e.g. [email protected]
// 4) Check safety score
echo $regex->analyzeReDoS($pattern)->severity->value; // safe/low/...use RegexParser\Regex;
use RegexParser\Exception\ParserException;
try {
$ast = Regex::create()->parse('/^Hello (?<name>\w+)!$/i');
echo $ast->flags; // i
} catch (ParserException $e) {
echo $e->getMessage();
}use RegexParser\Regex;
$regex = Regex::create();
$result = $regex->validate('/(a+)*b/');
echo $result->isValid ? 'OK' : $result->error; // Potential catastrophic backtracking: nested quantifiers detected.
$result = $regex->validate('/(?<!a*b)/');
echo $result->isValid ? 'OK' : $result->error; // Variable-length quantifiers (*) are not allowed in lookbehinds.use RegexParser\Regex;
echo Regex::create()->explain('/(foo|bar){1,2}?/s');Output:
Regex matches (with flags: s):
Start Quantified Group (between 1 and 2 times (as few as possible)):
Start Capturing Group:
EITHER:
Literal: 'foo'
OR:
Literal: 'bar'
End Group
End Quantified Group
use RegexParser\Regex;
echo Regex::create()->generate('/[a-f0-9]{4}-[a-f0-9]{4}/'); // e.g. c4e1-9b2ause RegexParser\Regex;
echo Regex::create()->optimize('/(?:a|b|c)/'); // /[abc]/Create a custom NodeVisitorInterface to analyze or transform patterns.
use RegexParser\Regex;
use RegexParser\NodeVisitor\DumperNodeVisitor;
$ast = Regex::create()->parse('/^(?<id>\d+)/');
$dumper = new DumperNodeVisitor();
echo $ast->accept($dumper);use RegexParser\Regex;
$regex = Regex::create();
$literals = $regex->extractLiterals('/user_(\d+)@example\.com/');
$prefix = $literals->getLongestPrefix(); // user_
$subject = '[email protected]';
if (!str_contains($subject, $prefix)) {
return false; // Skip regex entirely
}use RegexParser\Regex;
$analysis = Regex::create()->analyzeReDoS('/(a+)+b/');
echo $analysis->severity->value; // critical/high/...
echo $analysis->score; // 0-10
$isOkForRoutes = !$analysis->exceedsThreshold(\RegexParser\ReDoS\ReDoSSeverity::HIGH);
$isOkForUserInput = !$analysis->exceedsThreshold(\RegexParser\ReDoS\ReDoSSeverity::LOW);
// IDE-friendly tolerant parsing: returns partial AST + errors list instead of throwing.
$result = Regex::create()->parseTolerant('/(a+/');
var_dump($result->hasErrors()); // true
echo $result->errors[0]->getMessage(); // e.g. "Unclosed group"Severity levels: SAFE, LOW, MEDIUM, UNKNOWN, HIGH, CRITICAL (2^n worst cases; UNKNOWN means analysis could not complete safely).
Limitations: heuristic/static only; quantified alternations with complex character classes may still warn conservatively, and deeply recursive backreference/subroutine patterns can evade detection. Treat UNKNOWN as a signal to fail closed.
- Security: parse-first flow catches dangerous backtracking paths before runtime.
- Static analysis: AST visitors let you lint, rewrite, and document patterns with real structure instead of brittle string checks.
- ReDoS prevention: complexity scoring and path analysis detect catastrophic cases earlier than
preg_matchfailures.
- Parse with
Regex::create()->parse($pattern)and compile back using theCompilerNodeVisitor. - Run
preg_match($compiled, $subject)and compare against the AST-driven evaluator or visitors to ensure flags, delimiters, and groups match. - Keep failing cases as fixtures to guard against drift between the parser and PHP's PCRE engine.
- Fuzz the parser with random/edge-case inputs to ensure it never crashes or hangs on malformed patterns.
- Combine short seeds (lookbehinds, nested quantifiers, named groups) with mutation to surface parser and lexer edge cases.
- Keep regressions as deterministic tests so production builds stay resilient.
Parsing is CPU-heavy; cache ASTs to PHP files for Opcache to warm:
use RegexParser\Regex;
$regex = Regex::create(['cache' => __DIR__ . '/var/cache/regex']);
$ast = $regex->parse('/[A-Z][a-z]+/');Or plug your app cache (PSR-6/16) for shared keys:
use RegexParser\Regex;
use RegexParser\Cache\PsrCacheAdapter;
use RegexParser\Cache\PsrSimpleCacheAdapter;
// PSR-6 (CacheItemPoolInterface)
$cache = new PsrCacheAdapter($yourPool, prefix: 'route_login_');
$regex = Regex::create(['cache' => $cache]);
// PSR-16 (SimpleCache)
$cache = new PsrSimpleCacheAdapter($yourSimpleCache, prefix: 'constraint_user_email_');
$regex = Regex::create(['cache' => $cache]);Pass a writable directory string to Regex::create(['cache' => '/path']) or a custom CacheInterface implementation. Use null (default) to disable.
composer require yoeunes/regex-parser// src/Validator/Constraints/ValidRegex.php
namespace App\Validator\Constraints;
use Symfony\Component\Validator\Constraint;
#[\Attribute]
class ValidRegex extends Constraint
{
public string $message = 'The regex pattern "{{ pattern }}" is invalid: {{ error }}';
}// src/Validator/Constraints/ValidRegexValidator.php
namespace App\Validator\Constraints;
use RegexParser\Regex;
use Symfony\Component\Validator\Constraint;
use Symfony\Component\Validator\ConstraintValidator;
use Symfony\Component\Validator\Exception\UnexpectedTypeException;
class ValidRegexValidator extends ConstraintValidator
{
public function validate($value, Constraint $constraint): void
{
if (!$constraint instanceof ValidRegex) {
throw new UnexpectedTypeException($constraint, ValidRegex::class);
}
if (null === $value || '' === $value) {
return;
}
$regex = Regex::create();
$result = $regex->validate($value);
if (!$result->isValid) {
$this->context->buildViolation($constraint->message)
->setParameter('{{ pattern }}', $value)
->setParameter('{{ error }}', $result->error)
->addViolation();
}
}
}// In a form
use App\Validator\Constraints\ValidRegex;
use Symfony\Component\Form\AbstractType;
use Symfony\Component\Form\Extension\Core\Type\TextType;
use Symfony\Component\Form\FormBuilderInterface;
class RegexPatternType extends AbstractType
{
public function buildForm(FormBuilderInterface $builder, array $options): void
{
$builder->add('pattern', TextType::class, [
'label' => 'Regex Pattern',
'constraints' => [
new ValidRegex(),
],
]);
}
}composer require --dev rector/rector<?php
use Rector\Config\RectorConfig;
use RegexParser\Rector\RegexOptimizationRector;
return RectorConfig::configure()
->withPaths([__DIR__ . '/src'])
->withRules([RegexOptimizationRector::class]);vendor/bin/rector process --dry-runcomposer require --dev phpstan/phpstanincludes:
- vendor/yoeunes/regex-parser/extension.neon
parameters:
level: max
paths:
- srcvendor/bin/phpstan analyze# Full test suite
./vendor/bin/phpunit
# Targeted suites
./vendor/bin/phpunit tests/Unit
./vendor/bin/phpunit tests/Integration
./vendor/bin/phpunit tests/Integration/BehavioralComplianceTest.phpRun the validation script:
php validate_library.phpExpected output:
Test 1: Sample Generation 4/4 PASSED β
Test 2: ReDoS Detection 4/4 PASSED β
Test 3: PCRE Feature Coverage 12/12 PASSED β
Test 4: Round-trip Validation 4/4 PASSED β
Test 5: Invalid Pattern Detection 3/3 PASSED β
OVERALL: 27/27 tests passed (100%)
Behavioral Compliance Tests: 19/19 tests, 128 assertions - ALL PASS β
Web demo:
php server.php
# open http://localhost:5000php bin/regex-parser '/your_regex_here/flags'Example:
php bin/regex-parser '/(?<email>[\\w.-]+@[\\w.-]+\\.\\w+)/i'See CONTRIBUTING.md for code of conduct, dev setup, and PR guidelines.
- Class not found: run
composer installthencomposer dump-autoload. - PHPStan memory issues:
php -d memory_limit=512M vendor/bin/phpstan analyze. - Pattern fails to parse: ensure valid PCRE syntax; read the error message location.
- ReDoS false positives/backreferences: update to the latest version.
Literal extraction can speed up checks with prefixes/suffixes:
| Pattern | Subject | Without Optimization | With Optimization | Speedup |
|---|---|---|---|---|
/user_\d+/ |
"admin_123" | 1.2ΞΌs | 0.1ΞΌs | 12x faster |
/error: .*/ |
"info: msg" | 2.5ΞΌs | 0.2ΞΌs | 12.5x faster |
/\d{3}-\d{2}-\d{4}/ |
"abc-def-ghij" | 3.1ΞΌs | 0.15ΞΌs | 20x faster |
MIT License. See LICENSE.