This proposal provides a suggested concrete lexical syntax for comments.
Comments serve a variety of purposes in existing programming languages. The primary use cases are:
-
Documentation: human-readable commentary explaining to users and future maintainers of an API what function it performs and how to use it. Such comments are typically attached to function declarations, class definitions, public member declarations, at file scope, and similar levels of granularity in an API.
/// A container for a collection of connected widgets. class WidgetAssembly { /// Improve the appearance of the assembly if possible. void decorate(bool repaint_all = false); // ... };
-
Implementation comments: human-readable commentary explaining intent and mechanism to future readers or maintainers of code, or summarizing the behavior of code to avoid readers or maintainers needing to read it in detail. Such comments are typically used when such details may not be readily apparent from the code itself or may require non-trivial work to infer, and tend to be short.
void WidgetAssembly::decorate(bool repaint_all) { // ... // Paint all the widgets that have been changed since last time. for (auto &w : widgets) { if (repaint_all || w.modified > last_foo) w.paint(); } last_decorate = now(); // ... }
-
Syntactic disambiguation comments: comments that contain code or pseudocode intended to allow the human reader to more easily parse the code in the same way that the compiler does.
void WidgetAssembly::decorate(bool repaint_all /*= false*/) { // ... /*static*/ std::unique_ptr<WidgetAssembly> WidgetAssembly::make() { // ... assembly.decorate(/*repaint_all=*/true); // ... } // end namespace WidgetLibrary
-
Disabled code: comments that contain regions of code that have been disabled, because the code is incomplete or incorrect, or in order to isolate a problem while debugging, or as reference material for a change in progress. It is often considered bad practice to check such comments into version control.
In C++, there are three different ways in which comments are expressed in practice:
Single-line comments (and sometimes multiline comments) are expressed in C++
using // ...
:
// The next line declares a variable.
int n; // This is a comment about 'n'.
(These are sometimes called "BCPL comments".)
- Can appear anywhere (at the start of a line or after tokens).
- Can contain any text (other than newline).
- End at the end of the logical line.
- Can be continued by ending the comment with
\
(or??/
in C++14 and earlier). - Unambiguous with non-comment syntax.
- "Nest" in that
//
within//
has no effect. - Do not nest with other kinds of comment.
This comment syntax is often used to express documentation (sometimes with a
Doxygen-style ///
introducer) and implementation comments.
Comments within lines (or sometimes multiline comments) are expressed in C++
using /*...*/
:
f(/*size*/5, /*initial value*/1);
- Can appear anywhere (at the start of a line or after tokens).
- Can contain any text (other than
*/
). - End at the
*/
delimiter (which might be separated by a\
line continuation). - Ambiguous with non-comment syntax:
int a=1, *b=&a, c=a/*b;
though this is not a problem in practice. - Do not nest -- the first
*/
ends the comment.
This comment syntax is often used to express syntactic disambiguation comments,
and is sometimes used for disabled code. Some coding styles also use this
comment style for longer documentation comments (sometimes with a Doxygen-style
/**
introducer).
Blocks of code are often commented out in C++ programs using #if 0
:
#if 0
int n;
#endif
- Can appear only at the start of a logical line.
- Can only contain sequences of preprocessing tokens (including invalid tokens
such as
'
, but not including unterminated multiline string literals). - End at the matching
#endif
delimiter. - Unambiguous with any other syntax.
- Nest properly, and can have other kinds of comments nested within.
This syntax is generally only used for disabled code.
We provide only one kind of comment, which starts with //
and runs to the end
of the line. No code is permitted prior to a comment on the same line, and the
//
introducing the comment is required to be followed by whitespace.
This comment syntax is intended to support implementation comments and (experimentally) disabled code. The documentation use case is not covered, with the intent that a separate (non-comment) facility is explored for this use case. The syntactic disambiguation use case is not covered, with the intent that the language syntax is designed in a way that avoids this use case.
A comment is a lexical element beginning with the characters //
and running
to the end of the line. We have no mechanism for physical line continuation, so
a trailing \
does not extend a comment to subsequent lines.
Experimental: There can be no text other than horizontal whitespace before the
//
characters introducing a comment. Either all of a line is a comment, or none of it.
The character after the //
is required to be a whitespace character. Newline
is a whitespace character, so a line containing only //
is a valid comment.
The end of the file also constitutes whitespace.
All comments are removed prior to formation of tokens.
Example:
// This is a comment and is ignored. \
This is not a comment.
var Int: x; // error, trailing comments not allowed
Experimental: No support for block comments is provided. Commenting out larger regions of human-readable text or code is accomplished by commenting out every line in the region.
There is little value in supporting block comments for the implementation comments use case. We expect such comments to typically be short, and in existing C++ codebases with long implementation comments, it is typical for line comments rather than block comments to be used. Therefore, as we consider the documentation use case to be out of scope, and intend for the syntactic disambiguation use case to be solved by language syntax, the sole purpose of block comments would be for disabled code. Block comments could provide more ergonomic support for intra-line disabled code and multiline blocks of disabled code.
Existing block comment syntaxes are not a great fit for the use case of
disabling code. The /* ... */
block comment syntax does not nest in C++, and
cannot be used to reliably comment out a block of code because it can be
terminated by a */
appearing in a //
comment or in a string literal. The
#if 0 ... #endif
syntax would not be a good fit in Carbon as we do not intend
to have a preprocessor in general, and requires the text in between to consist
of a mostly-valid token sequence, disallowing certain forms of incomplete code.
We should be reluctant to invent something new: it is hard to justify the cost
of introducing a novel syntax for the transient and rare use case of disabling
code. And similarly, we should be reluctant to use existing syntax with novel
semantics, such as a /* ... */
comment that tokenizes its contents, to avoid
surprise to C++ developers.
The disabled code use cases can be addressed with line comments, by commenting out each line in the intended region, and reflowing or duplicating lines when disabling code within a line. That may be cumbersome, but it's unclear whether that burden is sufficient to warrant introducing another form of comment into the language. By providing no such form of comments, we aim to discover if the resulting friction warrants a language addition.
Comments in which the //
characters are not followed by whitespace are
reserved for future extension. Anticipated possible extensions are block
comments, documentation comments, and code folding region markers.
We anticipate the possibility of adding additional kinds of comment in the future. Reserving syntactic space in comment syntax, in a way that is easy for programs to avoid, allows us to add such additional kinds of comment as a non-breaking change.
We could include a feature similar to C-style block comments, as a way to provide comments that attach to some element of the program smaller than a line. In C++ code, such comments are frequently used to annotate function parameter names and similar syntactic disambiguation use cases:
render(/*use_world_coords=*/true, /*draw_frame=*/false);
We expect these use cases to be addressed by extensions to Carbon's grammar, such as by adding named parameters or annotation syntax, to allow such utterances to be expressed as code rather than as comments, so they are meaningful to both the Carbon programmer and the Carbon language tools.
We could permit trailing comments on a line that contains other content. Such comments are most frequently used in our sample C++ corpus to describe the meaning of an entity, label, or close brace on the same line:
namespace N {
int n; // number of hats
enum Mode {
mode1, // first mode
mode2 // second mode
};
} // end namespace N
In all cases but the last, we expect it to be reasonable to move the comment to before the declaration. The case of the "end namespace" comment is another instance of the syntactic disambiguation use case, which we expect to be addressed by grammar changes. In general, we should avoid any syntax that would need disambiguation comments, either by promoting those comments to the language grammar or by altering the syntax until the comment is unnecessary, such as by not providing a delimited scope syntax for describing the contents of large scopes such as namespaces and packages. For example:
// This declares the namespace N but does not open a scope.
namespace N;
// This declares a member of namespace N.
@"Number of hats."
var Int: N.n;
enum N.Mode {
@"First mode."
mode1;
@"Second mode."
mode2;
}
Intra-line comments present a challenge for code formatting tools, which would need to understand what part of the program syntax the comment "attaches to" in order properly reflow the comment with the code. This concern is mitigated, but not fully eliminated, by requiring comments to always be on their own line. We could restrict text comments to appear in only certain syntactic locations to fully resolve this concern, but doing so would remove the flexibility to insert comments in arbitrary places:
match (x) {
case .Foo(1, 2,
// This might be 3 or 4 depending on the size of the Foo.
Int: n) => { ... }
}
We could allow intra-line comments and still retain some idea of what the comment syntactically attaches to by using a directionality marker in the comment:
match (x) {
case .Foo(1, 2, //> either 3 or 4 >// Int: n) => { ... }
case .Foo(2, Int: n //< either 3 or 4 <//, 5) => { ... }
}
Even with an understanding of how comments attach, line wrapping such comments is a complex challenge. For example, formatting in a situation with aligned trailing comments across multiple lines requires special handling:
var Int: quality = 3; // The quality of the widget. It should always
// be between 1 and 9.
var Int: blueness = 72; // The blueness of the widget, as a percentage.
Here, a tool that renames blueness
to blue_percent
may need to reflow the
comment following quality
as well as the comment following blueness
.
Moreover, if the last line becomes too long, keeping the comment on the same
line as the variable may become untenable, requiring a more substantive
rewriting:
// The blueness of the widget, as a percentage.
var Int: blue_percent = Floor(ComputeBluenessRatio() * 100);
The decision to not support trailing and intra-line comments is experimental and should be revisited if we find there is a need for such comments in the context of the complete language design.
No support is provided for multi-line text comments. Instead, the intent is that
such comments are expressed by prepending each line with the same //
comment
marker.
Requiring each line to repeat the comment marker will improve readability, by
removing a source of non-local state, and removes a needless source of stylistic
variability. The resulting style of comment is common in other languages and
well-supported by editors. Even in C and C++ code that uses /* ... */
to
comment out a block of human-readable text, it is common to include a *
at the
start of each comment continuation line.
We considered various different options for block comments. Our primary goal was to permit commenting out a large body of Carbon code, which may or may not be well-formed (including code that contains a block comment, meaning that such comments would need to nest). Alternatives considered included:
- Fully line-oriented block comments, which would remove lines without regard for whether they are nested within a string literal, with the novel feature of allowing some of the contents of a block string literal to be commented out. This alternative has the disadvantage that it would result in surprising behavior inside string literals containing Carbon code.
- Fully lexed block comments, in which a token sequence between the opening
and closing comment marker is produced and discarded, with the lexing rules
relaxed somewhat to avoid rejecting ill-formed code. This would be analogous
to C and C++'s
#if 0
...#endif
. This alternative has the disadvantage that it would be unable to cope with incomplete code fragments, such as an unterminated block string literal. It would also be somewhat inefficient to process compared to non-lexing syntaxes, but that's likely to be largely irrelevant given that block comments are expected to be transient. - A hybrid approach, with
//\{
and//\}
delimiters that are invalid in non-raw string literals, and with an indentation requirement for raw string literals only. This alternative has the disadvantage of introducing additional complexity into the lexical rules by treating different kinds of string literals differently. - Use of
/*
and*/
as comment markers. This alternative has the disadvantage that it risks confusion by using similar syntax to C and C++ but with divergent semantics.
However, given the limited use cases for such comments and a desire to minimize our inventiveness, we are not pursuing any of these options in this proposal.
We could add a distinct comment syntax for documentation comments, perhaps
treating documentation comments as producing real tokens rather than being
stripped out by the lexer. However, during discussion, there was significant
support for using a syntax that does not resemble a comment for representing
documentation. For example, we could introduce an attribute syntax, such as
using @ <expression>
as a prefix to a declaration to attach attributes. Then a
string literal attribute can be treated as documentation:
@"Get the size of the thing."
fn GetThingSize() -> Int;
@"""
Rate the quality of the widget.
Returns a quality factor between 0.0 and 1.0.
"""
fn RateQuality(
@"The widget to rate."
Widget: w,
@"A widget quality database."
QualityDB: db) -> Float;
This use case will be explored by a future proposal.
Some code editors are able to "fold" regions of a source file in order to ease
navigation. In some cases, these fold regions can be customized by the use of
comment lines. For example, in VS Code, this is accomplished with comments
containing #region
and #endregion
:
// #region Functions F and G
fn f() { ... }
fn g() { ... }
// #endregion
Supporting such markers as normal text within line comments requires no additional effort. However, we could consider introducing a specific Carbon syntax for region comments, in order to encourage a common representation across code editors. Such support is not covered by this proposal, but could be handled by a new form of comment.
- Some comment syntax is necessary to support software evolution, readable and understandable code, and many other goals of Carbon.
- A single, simple, and consistent comment style supports Carbon's goal of easy to read and understand code, and fast development tools.
- The experiment of restricting comments to be the only non-whitespace text on a line supports Carbon's goal of software evolution.
- The careful open lexical space left supports Carbon's goal of language evolution.
- The use of
//
as the primary syntax marking comments supports interoperability with C++-trained programmers and codebases by avoiding unnecessary and unhelpful churn of comment syntax.