-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add (til) PEG special #1528
base: master
Are you sure you want to change the base?
add (til) PEG special #1528
Conversation
I have often encountered this pattern, so I can feel the pain here. Yet, I do not see it as a big problem. I do not understand all the changes in the PR correctly, but it feels like a lot of code to me :-D. |
For the (peg/match ~(split "=" (capture (to -1)))
"key=value")
# =>
@["key" "value"] This seems to avoid repeating May be there are other examples for which this type of approach is not so good perhaps? Update: Another idea: (peg/match ~(sequence (capture (to "="))
1
(capture (to -1)))
"key=value")
# =>
@["key" "value"] |
dfc3366
to
64b1d91
Compare
I pushed a new version with a slightly cleaner implementation (replace @sogaiu my example was pretty simple where the alternatives you propose would work, but imagine something more complicated, e.g.:
I don't know of a way to do this in general (with an arbitrary PEG as the separator) without this rule repetition. This is also useful with
Which is where I mostly wanted it in advent of code problems last year. |
Zooming out on this one step further: wouldn't it be nice if the PEG module were extensible from within the language, so we could define new "combinators" like this one on a per-project basis without needing to modify the language as a whole. That would allow for itches like this one to be solved on a per-project basis without needing to add baggage to every deployment of PEG across the entire Janet universe, and without requiring anyone to plumb around in the C to get a bespoke capture like this one working. |
Welllll you can already write helper combinators that use existing PEG machinery; this is (similarly to |
I don't find the short forms readily comprehensible so apologies for "translating" below (I hope I didn't mess any up!). For:
What came to mind for a peg to handle the two examples didn't involve an exact repetition of (peg/match ~(sequence (capture (to :s))
:s+
(number :d+))
"foo 123")
# =>
"foo 123" Perhaps this particular case is a matter of how our perceptions happened to view things. (Just to be clear, I'm not trying to claim the peg I wrote above is better or anything.) Regarding:
Faced with this example, what surfaced here was: (peg/match ~(sequence (sub (to ";")
(split ", " (capture :w+)))
1
:s+
(number :d+))
"foo, bar, baz; 123")
# =>
@["foo" "bar" "baz" 123] This is longer though more straight-forward to me (though I wrote it so not sure how much the latter point is worth). I don't find this much of a length difference to be significant, but the comprehension angle is to me (particularly from the maintenance, investigation, and learn-from-other-people's-code perspectives). I am not used to
|
So I've been thinking about this and while I like So I have what I think is a better idea: a |
(til sep subpattern) is a specialized (sub) that behaves like (sub (to sep) subpattern), but advances over the input like (thru sep).
64b1d91
to
9529062
Compare
@ianthehenry To aid in trying to digest the new version, please mention if this is different from what you mentioned in this discussion. In general, I find the current proposed changes (especially to the C code) to be far less worrying so 👍 on that front. If this proceeds further, I wonder if we could consider alternate names. I think For newcomers (and folks who need to "rediscover" later like yours truly), may be something a bit more "distant" could work a bit better on the front of being less confusing. Perhaps For others who might not have checked, there are tests in 9529062 which I'll reproduce below (though pardon my translation -- it likely will help my comprehension and possibly make the combined content more accessible to a wider audience): # basic matching
(peg/match ~(til "d" "abc")
"abcdef")
# =>
@[]
# second pattern can't see past the first occurrence of first pattern
(peg/match ~(til "d"
(sequence "abc" -1))
"abcdef")
# =>
@[]
# fails if first pattern fails
(peg/match ~(til "x" "abc")
"abcdef")
# =>
nil
# fails if second pattern fails
(peg/match ~(til "abc" "x")
"abcdef")
# =>
nil
# discards captures from initial pattern
(peg/match ~(til (capture "d")
(capture "abc"))
"abcdef")
# =>
@["abc"]
# positions inside second match are still relative to the entire input
(peg/match ~(sequence "one\ntw"
(til 0
(sequence (position) (line) (column))))
"one\ntwo\nthree\n")
# =>
@[6 2 3]
# advances to the end of the first pattern's first occurrence
(peg/match ~(sequence (til "d" "ab")
"e")
"abcdef")
# =>
@[] |
@sogaiu wrote:
I see the utility in having a way to write terse PEGs for this situation but I’m a bit wary of the name, too. To me For alternative names, I also like |
Would you mind expanding a bit on |
Oh lol I honestly forgot that I had implemented this before. Yeah, it’s the same proposal that I made a year ago. Apparently I already figured out to do it the “simple” way back then and had to independently rediscover this API after trying something more complicated. Baby brain is something else. I’m not married to the name; I agree I think |
@sogaiu wrote:
I was thinking 'over' in the sense of 'step over' or 'pass over'. That is to say, to move past but to ignore. @ianthehenry wrote:
I don't mean to belabour the point but just to explain things more clearly than I did in the original message: my perspective is that function names have two purposes in programs. The first is the identification (or referent) purpose (i.e. you're telling the machine, 'do the instruction referred to by the identifier X'). The second purpose is mnemonic (i.e. you're reminding the consumer, 'this does X'). It's for the latter reason that I don't like
At the risk of excessive bikeshedding, if you want something shorter, there's also # current
(peg/match ~(til "d" "abc") "abcdef") # => @[]
# alternatives (in alphabetical order)
(peg/match ~(by "d" "abc") "abcdef") # => @[]
(peg/match ~(over "d" "abc") "abcdef") # => @[]
(peg/match ~(pass "d" "abc") "abcdef") # => @[]
(peg/match ~(skip "d" "abc") "abcdef") # => @[]
(peg/match ~(sub-til "d" "abc") "abcdef") # => @[]
(peg/match ~(until "d" "abc") "abcdef") # => @[] |
Oh, and I should have added I lean towards |
@pyrmont Thanks for that explicit listing using the different names. I share the same concern regarding the "mnemonic" angle. @ianthehenry Thanks for the clarification regarding the version of I can see how ATM, I find |
Here’s a different tack: the operation, really, is
So I would still prefer a shorter name and want to consider |
Cast in the light of viewing things as a kind of "limited"
If so, perhaps there could be a short-hand name like Kind of like how |
That's a really interesting idea. For complete consistency with (peg/match ~(* (split :s+ ':w+) '(to -1) 1) "a b c d e")
# => ["a b c d e"] (nothing gets split)
(peg/match ~(* (split :s+ ':w+) '(to -1) 2) "a b c d e")
# => ["a" "b c d e"] (first occurrence gets split; first separator is nowhere to be seen) But that seems unintuitive to me. I think the behavior of (peg/match ~(* (split :s+ ':w+) '(to -1) 1) "a b c d e")
# => ["a" "b c d e"] (1 means match one separator)
(peg/match ~(* (split :s+ ':w+) '(to -1) 2) "a b c d e")
# => ["a" "b" "c d e"] (2 means match 2 separators) That is the behavior I'd expect. The inconsistency gives me a little pause but I think it makes sense in this case. I have trouble imagining a case where this generality is useful, though -- I have never actually wanted a limited split beyond this "parse to the next delimiter" case. I think I would actually invert it, and say |
@ianthehenry Are the parens misplaced in your most recent examples? You talk about an optional integer argument to the |
Ah, that sounds very nice! I haven't tested it out, but it seems plausible.
Also sounds good 👍 Regarding the name... To repeat what was stated earlier:
I like how the name itself ( Although as was mentioned earlier, As a historical note, I'm leaving a link to this PR which also mentions a |
This is... maybe not a real pull request; this is an idea with an implementation and some tests. But it's an idea that adds a bit of complexity to the PEG engine and is maybe not worth it.
I often want to write a PEG somewhere in between
to
andthru
, usually while parsing something likekey=value
: capture everything up to=
, skip over the=
, and then match everything after. This is a little clumsy right now:Because you have to repeat the separator (and even though this doesn't actually matter in any case where I've done this, it's a little sad that you evaluate the separator PEG one more time than is necessary).
So
(til)
is a special that makes this easy to write. It captures like(to)
, but advances like(thru)
.And this is... weird. Nowhere else is there a PEG that captures and advances differently -- captures are defined by how they advance the input. Until now. So this PR adds complexity to PEG rules, as each PEG rule can basically return two things now: how far to advance, and how far to capture (if currently capturing).
I'm mostly hesitant about this because it's not 100% obvious how each other PEG special should interact with
(til)
. I made judgment calls that I think are reasonable but I could see a case for e.g. either of these behaviors:(I chose the latter because it's slightly simpler to implement and this will never come up in practice, but I feel slightly weird about it.)
I think this test summarizes
(til)
best: