Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix generating partially valid tokens #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mattiasarro
Copy link

@mattiasarro mattiasarro commented May 30, 2023

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid, even if a substring of the token would result in a valid output.

The other option would be to tweak complete_re we run the if stop_after_match: block after every character of the token (rather than the full token text) to the output text, but that's less clean. Or is that needed to be able to generate some output sequences which can only occur by generating a larger invalid token and then pruning the output?

Edit: looks like we need the latter approach, see latest commit.

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid.
When using partial=True, we ensure we don't generate invalid output, but also this makes it impossible to generate certain output sequences. Therefore it's necessary to allow generating tokens which match only partially, and then take the substring of that token which matches the regex.
@mattiasarro mattiasarro changed the title ReTokenFilter.is_valid_token: partial=False Fix generating partially valid tokens May 31, 2023
@freckletonj
Copy link

freckletonj commented Sep 5, 2023

I'm interested in this solution too, as I was having the same parserllm issues as in: r2d4/parserllm#4

Also, the outlines project may interest you. They precompile valid continuations, and then inference happens in O(c).

The issue I have with outlines though is abominable lark support; their example is slooow: https://github.com/normal-computing/outlines/blob/main/examples/parsing.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants