diff --git a/src/SUMMARY.md b/src/SUMMARY.md index c3786707f..1f256b8ad 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -8,6 +8,7 @@ - [Input format](input-format.md) - [Keywords](keywords.md) - [Identifiers](identifiers.md) + - [Frontmatter](frontmatter.md) - [Comments](comments.md) - [Whitespace](whitespace.md) - [Tokens](tokens.md) diff --git a/src/frontmatter.md b/src/frontmatter.md new file mode 100644 index 000000000..541e64cb9 --- /dev/null +++ b/src/frontmatter.md @@ -0,0 +1,39 @@ +r[frontmatter] +# Frontmatter + +r[frontmatter.syntax] +```grammar,lexer +@root FRONTMATTER -> + FRONTMATTER_FENCE HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE* LF + (FRONTMATTER_LINE LF )* + FRONTMATTER_FENCE[^matched-fence] HORIZONTAL_WHITESPACE* LF + +FRONTMATTER_FENCE -> `---` `-`* + +INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )* + +FRONTMATTER_LINE -> (~INVALID_FRONTMATTER_LINE_START (~INVALID_FRONTMATTER_LINE_CONTINUE)*)? + +INVALID_FRONTMATTER_LINE_START -> (FRONTMATTER_FENCE[^escaped-fence] | LF) + +INVALID_FRONTMATTER_LINE_CONTINUE -> LF +``` + +[^matched-fence]: The closing fence must have the same number of `-` as the opening fence +[^escaped-fence]: A `FRONTMATTER_FENCE` at the beginning of a `FRONTMATTER_LINE` is only invalid if it has the same or more `-` as the `FRONTMATTER_FENCE` + +Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar. + +r[frontmatter.document] +Frontmatter may only be preceded by a [shebang] and whitespace. + +r[frontmatter.fence] +The delimiters are referred to as a "fence." The opening and closing fences must be at the start of a line. They must be a matching pair of three or more hyphens (`-`). A fence may be followed by horizontal whitespace. + +r[frontmatter.infostring] +Following the opening fence may be an infostring for identifying the intention of the contained content. An infostring may be followed by horizontal whitespace. + +r[frontmatter.body] +The body of the frontmatter may contain any content except for a line starting with as many or more hyphens (`-`) than in the fences. + +[shebang]: input-format.md#shebang-removal diff --git a/src/input-format.md b/src/input-format.md index cf35b2959..9d7008868 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -59,6 +59,11 @@ This prevents an [inner attribute] at the start of a source file being removed. > [!NOTE] > The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not. +r[input.frontmatter] +## Frontmatter removal + +After some whitespace, [frontmatter] may next appear in the input. + r[input.tokenization] ## Tokenization @@ -69,4 +74,5 @@ The resulting sequence of characters is then converted into tokens as described [comments]: comments.md [Crates and source files]: crates-and-source-files.md [_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix) +[frontmatter]: frontmatter.md [whitespace]: whitespace.md diff --git a/src/whitespace.md b/src/whitespace.md index b398d0c95..b274d611d 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -4,23 +4,32 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer @root WHITESPACE -> - U+0009 // Horizontal tab, `'\t'` - | U+000A // Line feed, `'\n'` - | U+000B // Vertical tab - | U+000C // Form feed - | U+000D // Carriage return, `'\r'` - | U+0020 // Space, `' '` - | U+0085 // Next line - | U+200E // Left-to-right mark - | U+200F // Right-to-left mark - | U+2028 // Line separator - | U+2029 // Paragraph separator - -TAB -> U+0009 // Horizontal tab, `'\t'` - -LF -> U+000A // Line feed, `'\n'` - -CR -> U+000D // Carriage return, `'\r'` + END_OF_LINE + | IGNORABLE_CODE_POINT + | HORIZONTAL_WHITESPACE + +END_OF_LINE -> + U+000A // line feed, `'\n'` + | U+000B // vertical tabulation + | U+000C // form feed + | U+000D // carriage return, `'\r'` + | U+0085 // next line + | U+2028 // LINE SEPARATOR + | U+2029 // PARAGRAPH SEPARATOR + +IGNORABLE_CODE_POINT -> + U+200E // LEFT-TO-RIGHT MARK + | U+200F // RIGHT-TO-LEFT MARK + +HORIZONTAL_WHITESPACE -> + U+0009 // horizontal tab, `'\t'` + | U+0020 // space, `' '` + +TAB -> U+0009 // horizontal tab, `'\t'` + +LF -> U+000A // line feed, `'\n'` + +CR -> U+000D // carriage return, `'\r'` ``` r[lex.whitespace.intro]