From bd462972090242542cf88dd66002484f9af23ee7 Mon Sep 17 00:00:00 2001 From: Alisdair Meredith Date: Sat, 23 Nov 2024 10:59:28 +0100 Subject: [PATCH] [lex] Provide unicode name for all control characters This commit does not touch the new-line character as paper P2348. It resricts itself to consistent use of the unicode character name for space, horizontal tab, and vertical tab. Compared to PR #7359 it deliberately does not touch the grammar that would necessitate a review by core review. The intent is to rebase that PR if this one lands. --- source/lex.tex | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/source/lex.tex b/source/lex.tex index 1ed432dda3..38351d54d8 100644 --- a/source/lex.tex +++ b/source/lex.tex @@ -140,9 +140,9 @@ would arise from a source file ending with an unclosed \tcode{/*} comment. \end{footnote} -Each comment\iref{lex.comment} is replaced by one space character. New-line characters are +Each comment\iref{lex.comment} is replaced by one \unicode{0020}{space} character. New-line characters are retained. Whether each nonempty sequence of whitespace characters other -than new-line is retained or replaced by one space character is +than new-line is retained or replaced by one \unicode{0020}{space} character is unspecified. As characters from the source file are consumed to form the next preprocessing token @@ -882,7 +882,8 @@ \end{footnote} operators, and other separators. \indextext{whitespace}% -Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments +Comments and the characters \unicode{0020}{space}, \unicode{0009}{character tabulation}, +\unicode{000b}{line tabulation}, \unicode{000c}{form feed}, and new-line (collectively, ``whitespace''), as described below, are ignored except as they serve to separate tokens. \begin{note}