-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major flaw with $ logic #2
Comments
Hello Ed, thanks for the comment. This is not actually a bug, it's a design choice which I'll try to explain now. Noam doesn't support Perl regexes (or any other particular regex flavor). The language for defining regular expressions in Noam is intentionally extremely simple and minimal, closely akin to something you'd find in any automata/languages textbook. The goal of a Noam regular expression is to define a regular language - nothing more and nothing less. Specifically, the goal of regular expressions in Noam is not to enable users to match and slice up parts of text - that would really be a silly thing to reimplement as JavaScript regexes already do that job. In this context, defining the start or the end of the string is meaningless - they are both implicitly there at the start and end of the regular expression. You can't search for a match somewhere in your string. You can only test if the whole string is in a language defined by a regular expression or if it is not. With that in mind, very early on we decided to use the dollar symbol to represent the empty string (usually denoted by epsilon in textbooks) so that regular expressions containing them were more readable and less error prone (for example, you can't define an optional "a" with the regular expression "a?" as you might do normally as the question mark is actually not an operator at all... you'd use something like "a|$" which looks nicer than "a|"). When you're defining a language, epsilons will be much more frequent then empty strings would be in a regex you were using to match text. You can see a full explanation of the language for defining regular expressions here http://ivanzuzak.info/noam/webapps/regex_simplifier/ or in a comment around the 1450th line here https://github.com/izuzak/noam/blob/master/src/noam.re.js where the string representation of regular expressions is defined. I agree it might be helpful if we made this more explicit in the readme, but Noam started out with finite automata, their manipulation and visualization, and regular expressions were added afterwards primarily to make it easy to define languages. Hope this clears it up. Cheers! |
Hi Ivan, Thanks for the kind reply. Your explanation makes perfect sense, given that That said, may I encourage putting a disclaimer at the top of the page? The Best regards, On Fri, Jan 10, 2014 at 11:34 AM, Ivan Budiselic
|
Thanks for the suggestion, I'm inclined to agree that we should make this Ivan On Fri, Jan 10, 2014 at 7:15 PM, edcottrell [email protected]:
|
Agreed, I came looking for the same thing. |
FYI, there is a major flaw in this regex simplifier's logic.
$
does not represent the empty string; it represents the end of a string (or, with the/m
modifier, the end of a line). So,$+
is meaningless, and$a
can never match anything.For example,
foo$
matchesfoo
but notfoobar
.Debuggex Demo
The text was updated successfully, but these errors were encountered: