Skip to content

Commit fdca058

Browse files
authored
Merge pull request #73 from yoeunes/dev
Enhances Regex API with new features and improvements
2 parents afdfd57 + a0d2724 commit fdca058

31 files changed

+2831
-444
lines changed

README.md

Lines changed: 62 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,7 @@ $result = $regex->validate('/^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i');
6868
if ($result->isValid()) {
6969
echo "OK ✅\n";
7070
} else {
71-
echo "Invalid regex:\n";
72-
foreach ($result->errors as $error) {
73-
echo '- '.$error->getMessage()."\n";
74-
}
71+
echo "Invalid regex: ".$result->getErrorMessage()."\n";
7572
}
7673
```
7774
@@ -138,14 +135,14 @@ $pattern = '/^(a+)+$/'; // classic catastrophic backtracking example
138135
$analysis = $regex->analyzeReDoS($pattern);
139136
140137
echo "Severity: ".$analysis->severity->name.PHP_EOL;
138+
echo "Score: ".$analysis->score.PHP_EOL;
139+
140+
if (!$analysis->isSafe()) {
141+
echo "Hotspot: ".($analysis->vulnerablePart ?? 'unknown').PHP_EOL;
141142
142-
foreach ($analysis->vulnerabilities as $vuln) {
143-
echo sprintf(
144-
"- [%s] %s (at position %d)\n",
145-
$vuln->severity->name,
146-
$vuln->message,
147-
$vuln->position
148-
);
143+
foreach ($analysis->recommendations as $recommendation) {
144+
echo "- ".$recommendation.PHP_EOL;
145+
}
149146
}
150147
151148
// Quick boolean check (for CI, input validation, etc.)
@@ -158,6 +155,18 @@ Under the hood it inspects quantifiers, nested groups, backreferences and charac
158155
159156
---
160157
158+
## Configuration / Options
159+
160+
`Regex::create()` accepts a small, validated option array (or a `RegexOptions` value object via `RegexOptions::fromArray()`):
161+
162+
- `max_pattern_length` (int, default: `Regex::DEFAULT_MAX_PATTERN_LENGTH`).
163+
- `cache` (`null` | path string | `RegexParser\Cache\CacheInterface`).
164+
- `redos_ignored_patterns` (list of strings to skip in ReDoS analysis).
165+
166+
Unknown or invalid keys throw `RegexParser\Exception\InvalidRegexOptionException`.
167+
168+
---
169+
161170
## Advanced Usage
162171
163172
### Parsing bare patterns vs PCRE strings
@@ -168,6 +177,12 @@ Most high‑level methods (`parse`, `validate`, `analyzeReDoS`) expect a **full
168177
$ast = $regex->parse('/pattern/ims');
169178
```
170179
180+
If you only have the body, `parsePattern()` will wrap delimiters/flags for you:
181+
182+
```php
183+
$ast = $regex->parsePattern('a|b', '#', 'i');
184+
```
185+
171186
If you already have just the pattern body, you can go lower‑level:
172187
173188
```php
@@ -212,7 +227,7 @@ if ($pattern instanceof AlternationNode) {
212227
213228
Each node exposes:
214229
215-
* `startPos` / `endPos`: byte offsets in the original pattern
230+
* `startPosition` / `endPosition`: byte offsets in the original pattern
216231
* Node‑specific properties (e.g. `QuantifierNode::$min`, `$max`, `$type`)
217232
218233
---
@@ -225,26 +240,28 @@ For experts: the “right” way to analyse patterns is to implement your own vi
225240
namespace App\Regex;
226241
227242
use RegexParser\Node\LiteralNode;
228-
use RegexParser\Node\NodeInterface;
229-
use RegexParser\NodeVisitor\NodeVisitorInterface;
243+
use RegexParser\Node\QuantifierNode;
244+
use RegexParser\Node\RegexNode;
245+
use RegexParser\Node\SequenceNode;
246+
use RegexParser\NodeVisitor\AbstractNodeVisitor;
230247
231248
/**
232-
* @implements NodeVisitorInterface<int>
249+
* @extends AbstractNodeVisitor<int>
233250
*/
234-
final class LiteralCountVisitor implements NodeVisitorInterface
251+
final class LiteralCountVisitor extends AbstractNodeVisitor
235252
{
236-
public function visitRegexNode(\RegexParser\Node\RegexNode $node): int
253+
public function visitRegex(RegexNode $node): int
237254
{
238255
return $node->pattern->accept($this);
239256
}
240257
241-
public function visitLiteralNode(LiteralNode $node): int
258+
public function visitLiteral(LiteralNode $node): int
242259
{
243260
return 1;
244261
}
245262
246263
// Aggregate over sequences and groups:
247-
public function visitSequenceNode(\RegexParser\Node\SequenceNode $node): int
264+
public function visitSequence(SequenceNode $node): int
248265
{
249266
$sum = 0;
250267
foreach ($node->children as $child) {
@@ -255,12 +272,10 @@ final class LiteralCountVisitor implements NodeVisitorInterface
255272
}
256273
257274
// For nodes you don't care about, just recurse or return 0
258-
public function visitQuantifierNode(\RegexParser\Node\QuantifierNode $node): int
275+
public function visitQuantifier(QuantifierNode $node): int
259276
{
260277
return $node->node->accept($this);
261278
}
262-
263-
// ... implement the remaining methods (or extend an AbstractVisitor)
264279
}
265280
```
266281
@@ -351,6 +366,8 @@ From lowest to highest:
351366
* `HIGH` — clear ReDoS risk; avoid on untrusted input.
352367
* `CRITICAL` — classic catastrophic patterns (nested `+`/`*` etc.).
353368
369+
`analyzeReDoS()` returns a `ReDoSAnalysis` with the severity, score, vulnerable substring (if any), and recommendations. `isSafe()` simply calls `analyzeReDoS()` and returns `true` only for severities considered safe/low (or below the optional threshold you pass in).
370+
354371
You choose what to tolerate:
355372
356373
```php
@@ -445,6 +462,7 @@ final readonly class Regex
445462
public static function create(array $options = []): self;
446463
447464
public function parse(string $regex): Node\RegexNode;
465+
public function parsePattern(string $pattern, string $delimiter = '/', string $flags = ''): Node\RegexNode;
448466
449467
public function parseTolerant(string $regex): TolerantParseResult;
450468
@@ -460,7 +478,7 @@ final readonly class Regex
460478
461479
public function analyzeReDoS(string $regex): ReDoS\ReDoSAnalysis;
462480
463-
public function isSafe(string $regex, ReDoS\ReDoSSeverity $threshold): bool;
481+
public function isSafe(string $regex, ?ReDoS\ReDoSSeverity $threshold = null): bool;
464482
465483
public function getLexer(): Lexer;
466484
public function getParser(): Parser;
@@ -471,25 +489,33 @@ Return types like `ValidationResult`, `LiteralSet`, `ReDoSAnalysis` are small, w
471489
472490
---
473491
474-
## Versioning & BC Policy
492+
## Exceptions
475493
476-
RegexParser follows **Semantic Versioning**:
494+
- `Regex::create()` throws `InvalidRegexOptionException` for unknown/invalid options.
495+
- `parse()` / `parsePattern()` can throw `LexerException`, `SyntaxErrorException` (syntax/structure), `RecursionLimitException` (too deep), and `ResourceLimitException` (pattern too long).
496+
- `parseTolerant()` wraps those errors into `TolerantParseResult` instead of throwing.
497+
- `validate()` converts parser/lexer errors into a `ValidationResult` (no exception on invalid input).
498+
- `analyzeReDoS()` / `isSafe()` share the same parsing exceptions as `parse()`; `isSafe()` is a boolean wrapper around `analyzeReDoS()`.
477499
478-
* **1.0.0** — Initial stable release.
479-
* **1.x** — No breaking changes to:
500+
Generic runtime errors (e.g., wrong argument types) are not part of the stable API surface.
480501
481-
* public methods of `Regex`,
482-
* AST node constructors & properties,
483-
* `NodeVisitorInterface`,
484-
* ReDoS public API.
502+
---
503+
504+
## Versioning & BC Policy
505+
506+
RegexParser follows **Semantic Versioning**:
485507
486-
We reserve the right to:
508+
* **Stable for 1.x** (API surface we commit to keep compatible):
509+
* Public methods and signatures on `Regex`.
510+
* Value objects: `ValidationResult`, `TolerantParseResult`, `LiteralSet`, `ReDoS\ReDoSAnalysis`.
511+
* Main exception interfaces/classes: `RegexParserExceptionInterface`, parser/lexer exceptions, `InvalidRegexOptionException`.
512+
* Supported option keys for `Regex::create()` / `RegexOptions`.
487513
488-
* Add new methods (with sensible defaults).
489-
* Add new node types in minor versions (without changing existing ones).
490-
* Improve analysis heuristics and error messages.
514+
* **Best-effort, may evolve within 1.x**:
515+
* AST node classes and `NodeVisitorInterface` (new node types/visit methods can be added).
516+
* Built-in visitors and analysis heuristics.
491517
492-
Breaking changes will be released as **2.0**.
518+
If you maintain custom visitors, plan to adjust them when new nodes appear. Breaking changes beyond this policy land in **2.0.0**.
493519
494520
---
495521

docs/EXTENDING_GUIDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ readonly class CalloutNode extends AbstractNode
9090
```
9191

9292
**Key Points**:
93-
- Extend `AbstractNode` (provides `startPos`, `endPos`)
93+
- Extend `AbstractNode` (provides `startPosition`, `endPosition`)
9494
- Use `readonly` properties (immutability)
9595
- Implement `accept()` method
9696
- Use strict types: `declare(strict_types=1)`
@@ -265,8 +265,8 @@ class CalloutNodeTest extends TestCase
265265
$node = new CalloutNode(null, 0, 4);
266266

267267
$this->assertNull($node->number);
268-
$this->assertEquals(0, $node->startPos);
269-
$this->assertEquals(4, $node->endPos);
268+
$this->assertEquals(0, $node->getStartPosition());
269+
$this->assertEquals(4, $node->getEndPosition());
270270
}
271271

272272
public function testCalloutWithNumber(): void
@@ -536,7 +536,7 @@ return null; // Silent failure
536536

537537
### 3. Parser Position Tracking
538538

539-
**Error**: Incorrect `startPos`/`endPos` values
539+
**Error**: Incorrect `startPosition`/`endPosition` values
540540

541541
**Fix**: Carefully track `$this->pos` before and after parsing
542542

0 commit comments

Comments
 (0)