Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.InvalidOperationException: state #44

Open
rawwool opened this issue Aug 22, 2018 · 4 comments
Open

System.InvalidOperationException: state #44

rawwool opened this issue Aug 22, 2018 · 4 comments

Comments

@rawwool
Copy link

rawwool commented Aug 22, 2018

Xeger throws System.InvalidOperationException: state when trying to generate a string for this regular expression for emails:
^(?=.{6,50}$)([\w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([\w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

@moodmosaic
Copy link
Owner

Thank you for reporting this!

Project Fare turns Regular Expressions into Automatons by applying the algorithms of dk.brics.automaton and xeger.

Unfortunately, I don't have an answer to your question, as Project Fare is really a port of the above Java projects. – We'd have to try

^(?=.{6,50}$)([\w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([\w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

in Java and compare the results.

You may use a different pattern or use a different engine to reverse the Regular Expression into an Automaton. As an example, you can use the Rex engine.

@gukoff
Copy link
Contributor

gukoff commented Jun 19, 2024

The problem is this part: [\w-.].

Xeger interprets [\w-.] as a range from w to ., like with [A-Za-z].

Change it to [\w\-.] or [\w.-], and it will work.

@moodmosaic
Copy link
Owner

moodmosaic commented Jun 19, 2024

@gukoff, thank you. PRs more than welcome. (In this case, I think a possible PR would be a test-case demonstrating this, but still, it can be valuable.)

@gukoff
Copy link
Contributor

gukoff commented Jun 19, 2024

The current behaviour is correct, see the docs:

Because a positive character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

Let me check if I could improve the error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants