-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match start and end of string #13
Comments
It's currently only intentional for as long as I'm still musing on the nicest way to represent it in a DFA. Gilbert told me it's possible to handle arbitrary lookahead and lookbehind with derivatives, but I still don't get his description. Or I could generalise derivatives to consume boundary information as well as characters. |
Is something like this on the virtual in-your-head roadmap? I want to benchmark OMRN for matching of terminals of an LL(1) language and therefore only care about the longest match from START. |
It is, but I'm not sure of just what to do. At the moment it's possible I could have a function that just scans once from the start, but the issue is about handling ^ and $ which is more general, and there are a couple of ways to do them.
A way to match just at the start wouldn't be too hard still. Rather than wrapping every RE in |
I don't know my way through the code well enough to give this a stab. Could you point me to the right lines to edit or some branch so I could run an initial vague benchmark for my case? |
Here is a huge kludge to make OMRN generate code to only scan at the start of string. Note that it makes all regular expressions only match at the start of the string; I'd need to make a separate caches for lexing(?) and grepping functions which requires more changes. Also please pull from master as I needed to not hard-code one thing that should be a use of the (in-package :one-more-re-nightmare)
(defclass scan-from-start ()
())
;;; Copied verbatim from the START-CODE method for SCAN-EVERYTHING. Oh no
(defmethod start-code ((strategy scan-from-start) states)
(destructuring-bind (state) states
(let ((expression (state-expression state)))
(cond
((state-never-succeeds-p state)
;; Just return immediately if we're told to match nothing.
`(start (return)))
((re-empty-p expression)
;; Succeed for every character?
`(start
(cond
((> position end)
(return))
(t
(incf position)
,(setf-from-assignments (state-exit-effects state))
(win ,@(win-locations (state-exit-map state)))))))
(t
`(start
(setf position start)
(go ,(find-state-name state :bounds-check))))))))
(defmethod initial-states ((strategy scan-from-start) expression)
;; Don't do parallel GREP scanning stuff.
(list (alpha (add-tags expression) (empty-set))))
(defmethod macros-for-strategy append ((strategy scan-from-start))
'((restart ()
;; Don't actually restart the DFA.
'(return))))
(defun make-default-strategy (layout expression)
(declare (ignore layout expression))
(make-instance (dynamic-mixins:mix 'scan-from-start 'call-continuation))) After which: CL-USER> (one-more-re-nightmare:first-match "f" "food")
#(0 1)
CL-USER> (one-more-re-nightmare:first-match "f" "oodf")
NIL |
Great project. Is it intentional that a caret
^
does not match the start of the string and a dollar$
does not match the end? Wikipedia seems to think those are included in POSIX regex.The text was updated successfully, but these errors were encountered: