Skip to content

Latest commit

 

History

History
68 lines (61 loc) · 3.06 KB

flap-a-deterministic-parser-with-fused-lexing.md

File metadata and controls

68 lines (61 loc) · 3.06 KB

flap: A Deterministic Parser with Fused Lexing

  • Authors: Jeremy Yallop, Ningning Xie and Neel Krishnaswami
  • Conference: PLDI 2023
  • DOI: 10.1145/3591269
  • Source: gihub:yallop/ocaml-flap
  • Abstract:

    Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost.

    We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.

BibTeX
@article{10.1145/3591269,
  author = {Yallop, Jeremy and Xie, Ningning and Krishnaswami, Neel},
  title = {flap: A Deterministic Parser with Fused Lexing},
  year = {2023},
  issue_date = {June 2023},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {7},
  number = {PLDI},
  url = {https://doi.org/10.1145/3591269},
  doi = {10.1145/3591269},
  abstract = {Lexers and parsers are typically defined separately and connected by
  a token stream. This separate definition is important for modularity and reduces
  the potential for parsing ambiguity. However, materializing tokens as data
  structures and case-switching on tokens comes with a cost. We show how to fuse
  separately-defined lexers and parsers, drastically improving performance without
  compromising modularity or increasing ambiguity. We propose a deterministic
  variant of Greibach Normal Form that ensures deterministic parsing with a single
  token of lookahead and makes fusion strikingly simple, and prove that
  normalizing context free expressions into the deterministic normal form is
  semantics-preserving. Our staged parser combinator library, flap, provides a
  standard interface, but generates specialized token-free code that runs two to
  six times faster than ocamlyacc on a range of benchmarks.},
  journal = {Proc. ACM Program. Lang.},
  month = jun,
  articleno = {155},
  numpages = {24},
  keywords = {fusion, lexing, multi-stage programming, optimization, parsing}
}

Notes

@brendanzab:

A continuation of their prior paper, A typed, algebraic approach to parsing, this time with the addition of fused lexers.