Export an array of all tokens from `ct_token_map` #577

taminomara · 2025-05-23T11:25:39Z

This helps with writing structured input adapters for fuzzing. When fuzzing a parser specifically (as opposed to fuzzing lexer and parser at the same time), we'd like to supply it with an array of valid lexemes. This export helps us build such an array as we don't have to manually list all tokens in a fuzzing entry point.

Note that I didn't implement this functionality for generated lexers because there's already a way to get all tokens via mod_l::lexerdef().iter_rules().

Example of a fuzzing implementation after this PR:

#[derive(Debug)]
struct Token(u32, String);

impl<'a> Arbitrary<'a> for Token {
    fn arbitrary(u: &mut Unstructured<'a>) -> libfuzzer_sys::arbitrary::Result<Self> {
        Ok(Token(*u.choose(token_map::TOKENS)?, u.arbitrary()?))
    }
}

fuzz_target!(|data: Vec<Token>| {
    let mut text = String::new();
    let lexemes = data.into_iter().map(|tok| {
        let lexeme = DefaultLexeme::new(
            tok.0,
            text.len(),
            tok.1.len(),
        );
        text.push_str(&tok.1);
        lexeme
    }).collect();

    // Run parser...
}

ltratt · 2025-05-23T12:14:47Z

This is a part of the system I haven't thought about for a while. Is it possible to do the same thing with mod_l::lexerdef().iter_rules().map(|x| x.name()).collect() or similar? [Warning: untried!]

taminomara · 2025-05-23T12:42:18Z

Is it possible to do the same thing with mod_l::lexerdef().iter_rules().map(|x| x.name()).collect() or similar?

Yes, this seems to work.

ltratt · 2025-05-23T13:21:51Z

OK, then I think we don't need to generate the array?

taminomara · 2025-05-23T19:41:06Z

OK, then I think we don't need to generate the array?

That will only work when user has a generated lexer. If there's a custom lexer with ct_token_map, then there's no way to get a full array of tokens.

lrlex/src/lib/ctbuilder.rs

ltratt · 2025-05-23T21:36:22Z

If there's a custom lexer with ct_token_map, then there's no way to get a full array of tokens.

I take your point.

lrlex/src/lib/ctbuilder.rs

This helps with writing structured input adapters for fuzzing. When fuzzing a parser specifically (as opposed to fuzzing lexer and parser at the same time), we'd like to supply it with an array of valid lexemes. This export helps us build such an array as we don't have to manually list all tokens in a fuzzing entry point. Note that I didn't implement this functionality for generated lexers because there's already a way to get all tokens via `mod_l::lexerdef().iter_rules()`.

ratmice · 2025-05-24T18:33:25Z

Took a bit of head scratching until I grokked it (building a token stream directly rather than an intermediate vector!),
But once it clicked, it all seemed fine to me.

Seems fine to me now, unless Laurence has any further comments.

ltratt · 2025-05-24T19:23:46Z

@ratmice Thanks for the review!

@taminomara Thanks for the PR!

ratmice reviewed May 23, 2025

View reviewed changes

lrlex/src/lib/ctbuilder.rs Show resolved Hide resolved

ltratt reviewed May 23, 2025

View reviewed changes

lrlex/src/lib/ctbuilder.rs Outdated Show resolved Hide resolved

ltratt reviewed May 23, 2025

View reviewed changes

lrlex/src/lib/ctbuilder.rs Outdated Show resolved Hide resolved

taminomara force-pushed the master branch 4 times, most recently from e8356d9 to 49ba5e3 Compare May 24, 2025 14:53

taminomara force-pushed the master branch from 49ba5e3 to 248fe3f Compare May 24, 2025 15:11

ltratt added this pull request to the merge queue May 24, 2025

Merged via the queue into softdevteam:master with commit 7831d2d May 24, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Export an array of all tokens from `ct_token_map` #577

Export an array of all tokens from `ct_token_map` #577

taminomara commented May 23, 2025 •

edited

Loading

Uh oh!

ltratt commented May 23, 2025

Uh oh!

taminomara commented May 23, 2025

Uh oh!

ltratt commented May 23, 2025

Uh oh!

taminomara commented May 23, 2025

Uh oh!

Uh oh!

ltratt commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

ratmice commented May 24, 2025

Uh oh!

ltratt commented May 24, 2025

Uh oh!

Uh oh!

Uh oh!

Export an array of all tokens from ct_token_map #577

Export an array of all tokens from ct_token_map #577

Conversation

taminomara commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented May 23, 2025

Uh oh!

taminomara commented May 23, 2025

Uh oh!

ltratt commented May 23, 2025

Uh oh!

taminomara commented May 23, 2025

Uh oh!

Uh oh!

ltratt commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

ratmice commented May 24, 2025

Uh oh!

ltratt commented May 24, 2025

Uh oh!

Uh oh!

Uh oh!

Export an array of all tokens from `ct_token_map` #577

Export an array of all tokens from `ct_token_map` #577

taminomara commented May 23, 2025 •

edited

Loading