-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverflowException in Xeger #5
Comments
As @ymotton points out the following regular expression also throws:
|
@moodmosaic Did you guys ever figure this out? |
@edwardskrod, no we haven't. See also #13 (comment). |
There is a non-zero probability that pretty much any non-trivial regular expression will throw a StackOverflowException. This is because generally there will be cycles in its automaton. Also, fun fact about the linked issue: If we pick one of |
Please do. There's no need for us to strictly follow the Java version (dk.brics.automaton and Xeger). 🚀 |
I finally had a little time to think about this properly, First of all, the linked document is obviously not relevant to this discussion as it deals with undirected graphs. Under current conditions, there are some regular expressions where the expected number of steps is exponential (for example: I thought about possible alternatives and there is a lot of them. It all depends on our requirements.
Of course, I assume we restrict generated words to some finite subset (like words of length at most These requirements go against each other. For example, in general you can't have both 1) and 2) ( As for the Xeger algorithm, I'd rather get rid of it completely. However, I understand that a lot of code may depend on its probability distribution. We can maybe create a modified version which will stop after a random number of steps and generate the rest of the word without using cycles. This shouldn't break the distribution too much and it will reduce the expected number of steps greatly. What do you think? |
I would definitely go with different generators, with an API that also supports LINQ, like this one. This one is from Hedgehog but the idea about how the API can look like is similar. We can have a (All examples I use come from Hedgehog, the property testing tool I'm working on, but the concept of data generation applies here as well.) At this point we might wonder if it's worth keeping the Xeger algorithm around; if I could go back in time, I wouldn't port Xeger and instead I would only port dk.brics.automaton instead of both... For now, I would keep Xeger around, perhaps refactor it to use the (new) Gen class but I wouldn't change its public methods if possible, to avoid breaking other peoples code until the next major version bump. |
The following regular expressions create a StackOverflowException:
The
Xeger
class keep traversing the Automaton's States resulting to a StackOverflowException. This happens in theIList<Transition> GetSortedTransitions()
method of theState
class.The issue appeared after this commit which addresses this issue.
The text was updated successfully, but these errors were encountered: