STATUS: Up-to-date on 09-Aug-2022, including proposals up through #1327.
- Introduction
- Code and comments
- Build modes
- Types are values
- Primitive types
- Values, objects, and expressions
- Composite types
- Expressions
- Declarations, Definitions, and Scopes
- Patterns
- Name-binding declarations
- Functions
- User-defined types
- Names
- Generics
- Bidirectional interoperability with C and C++
- Unfinished tales
This documentation describes the design of the Carbon language, and the rationale for that design. This documentation is an overview of the Carbon project in its current state, written for the builders of Carbon and for those interested in learning more about Carbon.
This document is not a complete programming manual, and, nor does it provide detailed and comprehensive justification for design decisions. These descriptions are found in linked dedicated designs.
This document includes much that is provisional or placeholder. This means that the syntax used, language rules, standard library, and other aspects of the design have things that have not been decided through the Carbon process. This preliminary material fills in gaps until aspects of the design can be filled in. Features that are provisional have been marked as such on a best-effort basis.
Here is a simple function showing some Carbon code:
import Math;
// Returns the smallest factor of `n` > 1, and
// whether `n` itself is prime.
fn SmallestFactor(n: i32) -> (i32, bool) {
let limit: i32 = Math.Sqrt(n) as i32;
var i: i32 = 2;
while (i <= limit) {
let remainder: i32 = n % i;
if (remainder == 0) {
Carbon.Print("{0} is a factor of {1}", i, n);
return (i, false);
}
if (i == 2) {
i = 3;
} else {
// Skip even numbers once we get past `2`.
i += 2;
}
}
return (n, true);
}
Carbon is a language that should feel familiar to C++ and C developers. This
example has familiar constructs like imports,
comments, function definitions,
typed arguments, and expressions.
Statements and
declarations are terminated with a ;
or something in curly braces {
...}
.
A few other features that are unlike C or C++ may stand out. First,
declarations start with introducer
keywords. fn
introduces a function declaration, and var
introduces a
variable declaration.
The example starts with an import
declaration. Carbon imports are
more like C++ modules than
textual inclusion during preprocessing using #include
.
The import
declaration imports a
library from a package. It must appear at the top
of a Carbon source file, the first thing after the
optional package
declaration. Libraries can optionally
be split into api and implementation files, like
C++'s header and source files but without requiring a source file in any cases.
This declaration from the example:
import Math;
imports the default library from package Math
. The names from this library are
accessible as members of Math
, like Math.Sqrt
. The Carbon.Print
function
comes from the Carbon
package's prelude library which is
imported by default. Unlike C++, the namespaces
of different packages are kept separate, so there are no name conflicts.
Carbon comments must be on a line by themselves starting
with //
:
// Returns the smallest factor of `n` > 1, and
// whether `n` itself is prime.
...
// Skip even numbers once we get past `2`.
A function definition consists of:
- the
fn
keyword introducer, - the function's name,
- a parameter list in round parens
(
...)
, - an optional
->
and return type, and - a body inside curly braces
{
...}
.
fn SmallestFactor(n: i32) -> (i32, bool) {
...
return (i, false);
...
return (n, true);
}
The body of the function is an ordered sequence of
statements and
declarations. Function execution ends
when it reaches a return
statement or the end of the function body. return
statements can also specify an expression whose value is returned.
Here i32
refers to a signed integer type, with 32 bits, and
bool
is the boolean type. Carbon also has
floating-point types like f32
and f64
, and
string types.
A variable declaration has three parts:
- the
var
keyword introducer, - the name followed by a
:
and a type, declared the same way as a parameter in a function signature, and - an optional initializer.
var i: i32 = 2;
You can modify the value of a variable with an assignment statement:
i = 3;
...
++i;
...
i += 2;
Constants are declared with the let
keyword
introducer. The syntax parallels variable declarations except the initializer is
required:
let limit: i32 = Math.Sqrt(n) as i32;
...
let remainder: i32 = n % i;
The initializer Math.Sqrt(n) as i32
is an expression. It first
calls the Math.Sqrt
function with n
as the argument. Then, the as
operator
casts the floating-point return value to i32
. Lossy conversions like that must
be done explicitly.
Other expressions include n % i
, which applies the binary %
modulo operator
with n
and i
as arguments, and remainder == 0
, which applies the ==
comparison operator producing a bool
result. Expression return values are
ignored when expressions are used as statements, as in this call to the
Carbon.Print
function:
Carbon.Print("{0} is a factor of {1}", i, n);
Function calls consist of the name of the function followed by the
comma-separated argument list in round parentheses (
...)
.
Control flow statements, including if
, while
, for
, break
, and
continue
, change the order that statements are executed, as they do in C++:
while (i <= limit) {
...
if (remainder == 0) {
...
}
if (i == 2) {
...
} else {
...
}
}
Every code block in curly braces {
...}
defines a scope. Names are visible
from their declaration until the end of innermost scope containing it. So
remainder
in the example is visible until the curly brace }
that closes the
while
.
The example function uses a tuple, a
composite type, to return multiple values. Both tuple values
and types are written using a comma-separated list inside parentheses. So
(i, false)
and (n, true)
are tuple values, and (i32, bool)
is their type.
Struct types are similar, except their members are referenced by name instead of position. The example could be changed to use structs instead as follows:
// Return type of `{.factor: i32, .prime: bool}` is a struct
// with an `i32` field named `.factor`, and a `bool` field
// named `.prime`.
fn SmallestFactor(n: i32) -> {.factor: i32, .prime: bool} {
...
if (remainder == 0) {
// Return a struct value.
return {.factor = i, .prime = false};
}
...
// Return a struct value.
return {.factor = n, .prime = true};
}
All source code is UTF-8 encoded text. Comments, identifiers, and strings are allowed to have non-ASCII characters.
var résultat: String = "Succès";
Comments start with two slashes //
and go to the end of the line. They are
required to be the only non-whitespace on the line.
// Compute an approximation of π
References:
- Source files
- Lexical conventions
- Proposal #142: Unicode source files
- Proposal #198: Comments
The behavior of the Carbon compiler depends on the build mode:
- In a development build, the priority is diagnosing problems and fast build time.
- In a performance build, the priority is fastest execution time and lowest memory usage.
- In a hardened build, the first priority is safety and second is performance.
References: Safety strategy
Expressions compute values in Carbon, and these values are always strongly typed
much like in C++. However, an important difference from C++ is that types are
themselves modeled as values; specifically, compile-time-constant values of type
type
. This has a number of consequences:
- Names for types are in the same namespace shared with functions, variables, namespaces, and so on.
- The grammar for writing a type is the expression grammar,
not a separate grammar for types. As a result, Carbon doesn't use angle
brackets
<
...>
in types, since<
and>
are used for comparison in expressions. - Function call syntax is used to specify parameters to a type, like
HashMap(String, i64)
.
References:
- Proposal #2360: Types are values of type
type
A value used in a type position, like after a :
in a variable declaration or
the return type after a ->
in a function declaration, must:
- be a compile-time constant, so the compiler can evaluate it at compile time, and
- have a defined implicit conversion to type
type
.
The actual type used is the result of the conversion to type type
. Of course
this includes values that already are of type type
, but also allows some
non-type values to be used in a type position.
For example, the value (bool, bool)
represents a tuple of types,
but is not itself a type, since it doesn't have type type
. It does have a
defined implicit conversion to type type
, which results in the value
(bool, bool) as type
. This means (bool, bool)
may be used in a type
position. (bool, bool) as type
is the type of the value (true, false)
(among
others), so this code is legal:
var b: (bool, bool) = (true, false);`
There is some need to be careful here, since the declaration makes it look like
the type of b
is (bool, bool)
, when in fact it is (bool, bool) as type
.
(bool, bool) as type
and (bool, bool)
are different values since they have
different types: the first has type type
, and the second has type
(type, type) as type
.
In addition to the types of tuples, this also comes up with struct types and facets.
Primitive types fall into the following categories:
- the boolean type
bool
, - signed and unsigned integer types,
- IEEE-754 floating-point types, and
- string types.
These are made available through the prelude.
The type bool
is a boolean type with two possible values: true
and false
.
The names bool
, true
, and false
are keywords.
Comparison expressions produce bool
values. The condition
arguments in control-flow statements, like if
and while
, and
if
-then
-else
conditional expressions take bool
values.
References:
- Question-for-leads issue #750: Naming conventions for Carbon-provided features
- Proposal #861: Naming conventions
The signed-integer type with bit width N
may be written iN
, as long as N
is a positive multiple of 8. For example, i32
is a signed 32-bit integer.
Signed-integer
overflow is a
programming error:
- In a development build, overflow will be caught immediately when it happens at runtime.
- In a performance build, the optimizer can assume that such conditions don't occur. As a consequence, if they do, the behavior of the program is not defined.
- In a hardened build, overflow does not result in undefined behavior. Instead, either the program will be aborted, or the arithmetic will evaluate to a mathematically incorrect result, such as a two's complement result or zero.
The unsigned-integer types may be written uN
, with N
a positive multiple
of 8. Unsigned integer types wrap around on overflow; we strongly advise that
they are not used except when those semantics are desired. These types are
intended for bit manipulation or modular arithmetic as often found in
hashing,
cryptography, and
PRNG use cases.
Values which can never be negative, like sizes, but for which wrapping does not
make sense
should use signed integer types.
Identifiers of the form iN
and uN
are type literals, resulting in the
corresponding type.
Not all operations will be supported for all bit sizes. For example, division may be limited to integers of at most 128 bits due to LLVM limitations.
Open question: Bit-field (1, 2) support will need some way to talk about non-multiple-of-eight-bit integers, even though Carbon will likely not support pointers to those types.
References:
- Numeric type literal expressions
- Question-for-leads issue #543: pick names for fixed-size integer types
- Question-for-leads issue #750: Naming conventions for Carbon-provided features
- Proposal #861: Naming conventions
- Proposal #1083: Arithmetic expressions
- Proposal #2015: Numeric type literal syntax
Integers may be written in decimal, hexadecimal, or binary:
12345
(decimal)0x1FE
(hexadecimal)0b1010
(binary)
Underscores (_
) may be used as digit separators. Numeric literals are
case-sensitive: 0x
, 0b
must be lowercase, whereas hexadecimal digits must be
uppercase. Integer literals never contain a .
.
Unlike in C++, literals do not have a suffix to indicate their type. Instead, numeric literals have a type derived from their value, and can be implicitly converted to any type that can represent that value.
References:
Floating-point types in Carbon have IEEE-754 semantics, use the round-to-nearest
rounding mode, and do not set any floating-point exception state. They are named
with a type literals, consisting of f
and the number of bits, which must be
a multiple of 8. These types will always be available:
f16
,
f32
,
and
f64
.
Other sizes may be available, depending on the platform, such as
f80
,
f128
,
or
f256
.
Carbon also supports the
BFloat16
format, a 16-bit truncation of a "binary32" IEEE-754 format floating point
number.
References:
- Numeric type literal expressions
- Question-for-leads issue #543: pick names for fixed-size integer types
- Question-for-leads issue #750: Naming conventions for Carbon-provided features
- Proposal #861: Naming conventions
- Proposal #1083: Arithmetic expressions
- Proposal #2015: Numeric type literal syntax
Floating-point types along with user-defined types may initialized from real-number literals. Decimal and hexadecimal real-number literals are supported:
123.456
(digits on both sides of the.
)123.456e789
(optional+
or-
after thee
)0x1.Ap123
(optional+
or-
after thep
)
As with integer literals, underscores (_
) may be used as digit separators.
Real-number literals always have a period (.
) and a digit on each side of the
period. When a real-number literal is interpreted as a value of a floating-point
type, its value is the representable real number closest to the value of the
literal. In the case of a tie, the nearest value whose mantissa is even is
selected.
References:
Note: This is provisional, no design for string types has been through the proposal process yet.
There are two string types:
String
- a byte sequence treated as containing UTF-8 encoded text.StringView
- a read-only reference to a byte sequence treated as containing UTF-8 encoded text.
There is an implicit conversion from
String
to StringView
.
References:
- Question-for-leads issue #750: Naming conventions for Carbon-provided features
- Proposal #820: Implicit conversions
- Proposal #861: Naming conventions
String literals may be written on a single line using a double quotation mark
("
) at the beginning and end of the string, as in "example"
.
Multi-line string literals, called block string literals, begin and end with
three single quotation marks ('''
), and may have a file type indicator after
the first '''
.
// Block string literal:
var block: String = '''
The winds grow high; so do your stomachs, lords.
How irksome is this music to my heart!
When such strings jar, what hope of harmony?
I pray, my lords, let me compound this strife.
-- History of Henry VI, Part II, Act II, Scene 1, W. Shakespeare
''';
The indentation of a block string literal's terminating line is removed from all preceding lines.
Strings may contain
escape sequences
introduced with a backslash (\
).
Raw string literals
are available for representing strings with \
s and "
s.
References:
- String literals
- Proposal #199: String literals
Carbon has both abstract values and concrete objects. Carbon values are
things like 42
, true
, and i32
(a type value). Carbon objects have
storage where values can be read and written. Storage also allows taking the
address of an object in memory in Carbon.
References:
A Carbon expression produces a value, references an object, or initializes an object. Every expression has a category, similar to C++:
- Value expressions produce abstract, read-only values that cannot be modified or have their address taken.
- Reference expressions refer to objects with storage where a value may be read or written and the object's address can be taken.
- Initializing expressions which require storage to be provided implicitly when evaluating the expression. The expression then initializes an object in that storage. These are used to model function returns, which can construct the returned value directly in the caller's storage.
Expressions in one category can be converted to any other category when needed. The primitive conversion steps used are:
- Value binding converts a reference expression into a value expression.
- Direct initialization converts a value expression into an initializing expression.
- Copy initialization converts a reference expression into an initializing expression.
- Temporary materialization converts an initializing expression into a reference expression.
References:
Value expressions are further broken down into three expression phases:
- A template constant has a value known at compile time, and that value is
available during type checking, for example to use as the size of an array.
These include literals (integer,
floating-point, string),
concrete type values (like
f64
orOptional(i32*)
), expressions in terms of constants, and values oftemplate
parameters. - A symbolic constant has a value that will be known at the code generation
stage of compilation when
monomorphization happens,
but is not known during type checking. This includes
checked-generic parameters, and type
expressions with checked-generic arguments, like
Optional(T*)
. - A runtime value has a dynamic value only known at runtime.
Template constants and symbolic constants are collectively called compile-time
constants and correspond to declarations using :!
.
Carbon will automatically convert a template constant to a symbolic constant, or any value to a runtime value:
graph TD;
A(template constant)-->B(symbolic constant)-->C(runtime value);
D(reference expression)-->C;
Template constants convert to symbolic constants and to runtime values. Symbolic constants will generally convert into runtime values if an operation that inspects the value is performed on them. Runtime values will convert into template or symbolic constants if constant evaluation of the runtime expression succeeds.
Note: Conversion of runtime values to other phases is provisional.
References:
- Proposal #2200: Template generics
- Proposal #2964: Expression phase terminology
- Proposal #3162: Reduce ambiguity in terminology
A tuple is a fixed-size collection of values that can have different types, where each value is identified by its position in the tuple. An example use of tuples is to return multiple values from a function:
fn DoubleBoth(x: i32, y: i32) -> (i32, i32) {
return (2 * x, 2 * y);
}
Breaking this example apart:
- The return type is a tuple of two
i32
types. - The expression uses tuple syntax to build a tuple of two
i32
values.
Both of these are expressions using the tuple syntax
(<expression>, <expression>)
. The only difference is the type of the tuple
expression: one is a tuple of types, the other a tuple of values. In other
words, a tuple type is a tuple of types.
The components of a tuple are accessed positionally, so element access uses subscript syntax, but the index must be a compile-time constant:
fn DoubleTuple(x: (i32, i32)) -> (i32, i32) {
return (2 * x[0], 2 * x[1]);
}
Tuple types are structural.
Note: This is provisional, no design for tuples has been through the proposal process yet. Many of these questions were discussed in dropped proposal #111.
References: Tuples
Carbon also has structural types whose members are identified by name instead of position. These are called structural data classes, also known as a struct types or structs.
Both struct types and values are written inside curly braces ({
...}
). In
both cases, they have a comma-separated list of members that start with a period
(.
) followed by the field name.
- In a struct type, the field name is followed by a colon (
:
) and the type, as in:{.name: String, .count: i32}
. - In a struct value, called a structural data class literal or a struct
literal, the field name is followed by an equal sign (
=
) and the value, as in{.key = "Joe", .count = 3}
.
References:
The type of pointers-to-values-of-type-T
is written T*
. Carbon pointers do
not support
pointer arithmetic;
the only pointer operations are:
- Dereference: given a pointer
p
,*p
gives the valuep
points to as a reference expression.p->m
is syntactic sugar for(*p).m
. - Address-of: given a reference expression
x
,&x
returns a pointer tox
.
There are no null pointers in
Carbon. To represent a pointer that may not refer to a valid object, use the
type Optional(T*)
.
Future work: Perhaps Carbon will have stricter pointer provenance or restrictions on casts between pointers and integers.
References:
The type of an array of holding 4 i32
values is written [i32; 4]
. There is
an implicit conversion from tuples to
arrays of the same length as long as every component of the tuple may be
implicitly converted to the destination element type. In cases where the size of
the array may be deduced, it may be omitted, as in:
var i: i32 = 1;
// `[i32;]` equivalent to `[i32; 3]` here.
var a: [i32;] = (i, i, i);
Elements of an array may be accessed using square brackets ([
...]
), as in
a[i]
:
a[i] = 2;
Carbon.Print(a[0]);
TODO: Slices
Note: This is provisional, no design for arrays has been through the proposal process yet.
Expressions describe some computed value. The simplest example would be a
literal number like 42
: an expression that computes the integer value 42.
Some common expressions in Carbon include:
-
Literals:
-
Names and member access
-
- Arithmetic:
-x
,1 + 2
,3 - 4
,2 * 5
,6 / 3
,5 % 3
- Bitwise:
2 & 3
,2 | 4
,3 ^ 1
,^7
- Bit shift:
1 << 3
,8 >> 1
- Comparison:
2 == 2
,3 != 4
,5 < 6
,7 > 6
,8 <= 8
,8 >= 8
- Conversion:
2 as i32
- Logical:
a and b
,c or d
,not e
- Indexing:
a[3]
- Function call:
f(4)
- Pointer:
*p
,p->m
,&x
- Move:
~x
- Arithmetic:
-
Conditionals:
if c then t else f
-
Parentheses:
(7 + 8) * (3 - 1)
When an expression appears in a context in which an expression of a specific type is expected, implicit conversions are applied to convert the expression to the target type.
References:
- Expressions
- Proposal #162: Basic Syntax
- Proposal #555: Operator precedence
- Proposal #601: Operator tokens
- Proposal #680: And, or, not
- Proposal #702: Comparison operators
- Proposal #845: as expressions
- Proposal #911: Conditional expressions
- Proposal #1083: Arithmetic expressions
- Proposal #2006: Values, variables, pointers, and references
Declarations introduce a new name and say what that name represents.
For some kinds of entities, like functions, there are two kinds of
declarations: forward declarations and definitions. For those entities,
there should be exactly one definition for the name, and at most one additional
forward declaration that introduces the name before it is defined, plus any
number of declarations in a
match_first
block. Forward
declarations can be used to separate interface from implementation, such as to
declare a name in an api file that is defined in an
impl file. Forward declarations also allow entities
to be used before they are defined, such as to allow cyclic references. A name
that has been declared but not defined is called incomplete, and in some cases
there are limitations on what can be done with an incomplete name. Within a
definition, the defined name is incomplete until the end of the definition is
reached, but is complete in the bodies of member functions because they are
parsed as if they appeared after the definition.
A name is valid until the end of the innermost enclosing scope. There are a few kinds of scopes:
- the outermost scope, which includes the whole file,
- scopes that are enclosed in curly braces (
{
...}
), and - scopes that encompass a single declaration.
For example, the names of the parameters of a function or class are valid until the end of the declaration. The name of the function or class itself is visible until the end of the enclosing scope.
References:
Note: This is provisional, no design for patterns has been through the proposal process yet.
A pattern says how to receive some data that is being matched against. There are two kinds of patterns:
- Refutable patterns can fail to match based on the runtime value being matched.
- Irrefutable patterns are guaranteed to match, so long as the code type-checks.
In the introduction, function parameters,
variable var
declarations, and
constant let
declarations use a "name :
type"
construction. That construction is an example of an irrefutable pattern, and in
fact any irrefutable pattern may be used in those positions.
match
statements can include both refutable patterns and irrefutable
patterns.
References:
- Pattern matching
- Proposal #162: Basic Syntax
- Proposal #2188: Pattern matching syntax and semantics
The most common irrefutable pattern is a binding pattern, consisting of a new
name, a colon (:
), and a type. It binds the matched value of that type to that
name. It can only match values that may be
implicitly converted to that type. A
underscore (_
) may be used instead of the name to match a value but without
binding any name to it.
Binding patterns default to let
bindings. The var
keyword is used to make
it a var
binding.
- A
let
binding binds a name to a value, so the name can be used as a value expression. This means the value cannot be modified, and its address generally cannot be taken. - A
var
binding creates an object with dedicated storage, and so the name can be used as a reference expression which can be modified and has a stable address.
A let
-binding may be implemented as an alias
for the original value (like a
const
reference in C++),
or it may be copied from the original value (if it is copyable), or it may be
moved from the original value (if it was a temporary). The Carbon
implementation's choice among these options may be indirectly observable, for
example through side effects of the destructor, copy, and move operations, but
the program's correctness must not depend on which option the Carbon
implementation chooses.
A compile-time binding uses :!
instead of
a colon (:
) and can only match compile-time constants,
not run-time values. A template
keyword before the binding selects a template
binding instead of a symbolic binding.
The keyword auto
may be used in place of the type in a binding pattern, as
long as the type can be deduced from the type of a value in the same
declaration.
References:
There are also irrefutable destructuring patterns, such as tuple destructuring. A tuple destructuring pattern looks like a tuple of patterns. It may only be used to match tuple values whose components match the component patterns of the tuple. An example use is:
// `Bar()` returns a tuple consisting of an
// `i32` value and 2-tuple of `f32` values.
fn Bar() -> (i32, (f32, f32));
fn Foo() -> i64 {
// Pattern in `var` declaration:
var (p: i64, _: auto) = Bar();
return p;
}
The pattern used in the var
declaration destructures the tuple value returned
by Bar()
. The first component pattern, p: i64
, corresponds to the first
component of the value returned by Bar()
, which has type i32
. This is
allowed since there is an implicit conversion from i32
to i64
. The result of
this conversion is assigned to the name p
. The second component pattern,
_: auto
, matches the second component of the value returned by Bar()
, which
has type (f32, f32)
.
Additional kinds of patterns are allowed in match
statements, that
may or may not match based on the runtime value of the match
expression:
- An expression pattern is an expression, such as
42
, whose value must be equal to match. - A choice pattern matches one case from a choice type, as described in the choice types section.
- A dynamic cast pattern is tests the dynamic type, as described in inheritance.
See match
for examples of refutable patterns.
References:
- Pattern matching
- Question-for-leads issue #1283: how should pattern matching and implicit conversion interact?
There are two kinds of name-binding declarations:
- constant declarations, introduced with
let
, and - variable declarations, introduced with
var
.
There are no forward declarations of these; all name-binding declarations are definitions.
A let
declaration matches an irrefutable pattern to a value. In
this example, the name x
is bound to the value 42
with type i64
:
let x: i64 = 42;
Here x: i64
is the pattern, which is followed by an equal sign (=
) and the
value to match, 42
. The names from binding patterns are
introduced into the enclosing scope.
References:
A var
declaration is similar, except with var
bindings, so x
here is a
reference expression for an object with storage and an
address, and so may be modified:
var x: i64 = 42;
x = 7;
Variables with a type that has an unformed state do not need to be initialized in the variable declaration, but do need to be assigned before they are used.
References:
- Binding patterns and local variables with
let
andvar
- Proposal #162: Basic Syntax
- Proposal #257: Initialization of memory and variables
- Proposal #339: Add
var <type> <identifier> [ = <value> ];
syntax for variables- Proposal #618: var ordering
- Proposal #2006: Values, variables, pointers, and references
If auto
is used as the type in a var
or let
declaration, the type is the
static type of the initializer expression, which is required.
var x: i64 = 2;
// The type of `y` is inferred to be `i64`.
let y: auto = x + 3;
// The type of `z` is inferred to be `bool`.
var z: auto = (y > 1);
References:
Constant let
declarations may occur at a global
scope as well as local and member scopes. However, there are currently no global
variables.
Note: The semantics of global constant declarations and absence of global variable declarations is currently provisional.
We are exploring several different ideas for how to design less bug-prone patterns to replace the important use cases programmers still have for global variables. We may be unable to fully address them, at least for migrated code, and be forced to add some limited form of global variables back. We may also discover that their convenience outweighs any improvements afforded.
Functions are the core unit of behavior. For example, this is a forward declaration of a function that adds two 64-bit integers:
fn Add(a: i64, b: i64) -> i64;
Breaking this apart:
fn
is the keyword used to introduce a function.- Its name is
Add
. This is the name added to the enclosing scope. - The parameter list in parentheses (
(
...)
) is a comma-separated list of irrefutable patterns. - It returns an
i64
result. Functions that return nothing omit the->
and return type.
You would call this function like Add(1, 2)
.
A function definition is a function declaration that has a body block instead of a semicolon:
fn Add(a: i64, b: i64) -> i64 {
return a + b;
}
The names of the parameters are in scope until the end of the definition or
declaration. The parameter names in a forward declaration may be omitted using
_
, but must match the definition if they are specified.
References:
- Functions
- Proposal #162: Basic Syntax
- Proposal #438: Add statement syntax for function declarations
- Question-for-leads issue #476: Optional argument names (unused arguments)
- Question-for-leads issue #1132: How do we match forward declarations with their definitions?
The bindings in the parameter list default to
let
bindings, and so the parameter names are treated as
value expressions. This is appropriate for input
parameters. This binding will be implemented using a pointer, unless it is legal
to copy and copying is cheaper.
If the var
keyword is added before the binding pattern, then the arguments
will be copied (or moved from a temporary) to new storage, and so can be mutated
in the function body. The copy ensures that any mutations will not be visible to
the caller.
Use a pointer parameter type to represent an
input/output parameter,
allowing a function to modify a variable of the caller's. This makes the
possibility of those modifications visible: by taking the address using &
in
the caller, and dereferencing using *
in the callee.
Outputs of a function should prefer to be returned. Multiple values may be returned using a tuple or struct type.
References:
If auto
is used in place of the return type, the return type of the function
is inferred from the function body. It is set to common type of
the static type of arguments to the return
statements in the
function. This is not allowed in a forward declaration.
// Return type is inferred to be `bool`, the type of `a > 0`.
fn Positive(a: i64) -> auto {
return a > 0;
}
References:
A block is a sequence of statements. A block defines a
scope and, like other scopes, is
enclosed in curly braces ({
...}
). Each statement is terminated by a
semicolon or block. Expressions, assignments
and var
and let
are valid statements.
Statements within a block are normally executed in the order they appear in the source code, except when modified by control-flow statements.
The body of a function is defined by a block, and some
control-flow statements have their own blocks of code. These
are nested within the enclosing scope. For example, here is a function
definition with a block of statements defining the body of the function, and a
nested block as part of a while
statement:
fn Foo() {
Bar();
while (Baz()) {
Quux();
}
}
References:
- Blocks and statements
- Proposal #162: Basic Syntax
- Proposal #2665: Semicolons terminate statements
Blocks of statements are generally executed sequentially. Control-flow statements give additional control over the flow of execution and which statements are executed.
Some control-flow statements include blocks. Those
blocks will always be within curly braces {
...}
.
// Curly braces { ... } are required.
if (condition) {
ExecutedWhenTrue();
} else {
ExecutedWhenFalse();
}
This is unlike C++, which allows control-flow constructs to omit curly braces around a single statement.
References:
- Control flow
- Proposal #162: Basic Syntax
- Proposal #623: Require braces
if
and else
provide conditional execution of statements. An if
statement
consists of:
- An
if
introducer followed by a condition in parentheses. If the condition evaluates totrue
, the block following the condition is executed, otherwise it is skipped. - This may be followed by zero or more
else if
clauses, whose conditions are evaluated if all prior conditions evaluate tofalse
, with a block that is executed if that evaluation is totrue
. - A final optional
else
clause, with a block that is executed if all conditions evaluate tofalse
.
For example:
if (fruit.IsYellow()) {
Carbon.Print("Banana!");
} else if (fruit.IsOrange()) {
Carbon.Print("Orange!");
} else {
Carbon.Print("Vegetable!");
}
This code will:
- Print
Banana!
iffruit.IsYellow()
istrue
. - Print
Orange!
iffruit.IsYellow()
isfalse
andfruit.IsOrange()
istrue
. - Print
Vegetable!
if both of the above returnfalse
.
References:
- Control flow
- Proposal #285: if/else
References: Loops
while
statements loop for as long as the passed expression returns true
. For
example, this prints 0
, 1
, 2
, then Done!
:
var x: i32 = 0;
while (x < 3) {
Carbon.Print(x);
++x;
}
Carbon.Print("Done!");
References:
for
statements support range-based looping, typically over containers. For
example, this prints each String
value in names
:
for (var name: String in names) {
Carbon.Print(name);
}
References:
The break
statement immediately ends a while
or for
loop. Execution will
continue starting from the end of the loop's scope. For example, this processes
steps until a manual step is hit (if no manual step is hit, all steps are
processed):
for (var step: Step in steps) {
if (step.IsManual()) {
Carbon.Print("Reached manual step!");
break;
}
step.Process();
}
References:
break
- Proposal #340: Add C++-like
while
loops- Proposal #353: Add C++-like
for
loops
The continue
statement immediately goes to the next loop of a while
or
for
. In a while
, execution continues with the while
expression. For
example, this prints all non-empty lines of a file, using continue
to skip
empty lines:
var f: File = OpenFile(path);
while (!f.EOF()) {
var line: String = f.ReadLine();
if (line.IsEmpty()) {
continue;
}
Carbon.Print(line);
}
References:
continue
- Proposal #340: Add C++-like
while
loops- Proposal #353: Add C++-like
for
loops
The return
statement ends the flow of execution within a function, returning
execution to the caller.
// Prints the integers 1 .. `n` and then
// returns to the caller.
fn PrintFirstN(n: i32) {
var i: i32 = 0;
while (true) {
i += 1;
if (i > n) {
// None of the rest of the function is
// executed after a `return`.
return;
}
Carbon.Print(i);
}
}
If the function returns a value to the caller, that value is provided by an expression in the return statement. For example:
fn Sign(i: i32) -> i32 {
if (i > 0) {
return 1;
}
if (i < 0) {
return -1;
}
return 0;
}
Assert(Sign(-3) == -1);
References:
return
return
statements- Proposal #415: return
- Proposal #538: return with no argument
To avoid a copy when returning a variable, add a returned
prefix to the
variable's declaration and use return var
instead of returning an expression,
as in:
fn MakeCircle(radius: i32) -> Circle {
returned var c: Circle;
c.radius = radius;
// `return c` would be invalid because `returned` is in use.
return var;
}
This is instead of the "named return value optimization" of C++.
References:
match
is a control flow similar to switch
of C and C++ and mirrors similar
constructs in other languages, such as Swift. The match
keyword is followed by
an expression in parentheses, whose value is matched against the case
declarations, each of which contains a refutable pattern,
in order. The refutable pattern may optionally be followed by an if
expression, which may use the names from bindings in the pattern.
The code for the first matching case
is executed. An optional default
block
may be placed after the case
declarations, it will be executed if none of the
case
declarations match.
An example match
is:
fn Bar() -> (i32, (f32, f32));
fn Foo() -> f32 {
match (Bar()) {
case (42, (x: f32, y: f32)) => {
return x - y;
}
case (p: i32, (x: f32, _: f32)) if (p < 13) => {
return p * x;
}
case (p: i32, _: auto) if (p > 3) => {
return p * Pi;
}
default => {
return Pi;
}
}
}
Note: This is provisional, no design for
match
statements has been through the proposal process yet.
References:
- Pattern matching
- Question-for-leads issue #1283: how should pattern matching and implicit conversion interact?
Nominal classes, or just classes, are a way for users to define their own data structures or record types.
This is an example of a class definition:
class Widget {
var x: i32;
var y: i32;
var payload: String;
}
Breaking this apart:
- This defines a class named
Widget
.Widget
is the name added to the enclosing scope. - The name
Widget
is followed by curly braces ({
...}
) containing the class body, making this a definition. A forward declaration would instead have a semicolon(;
). - Those braces delimit the class' scope.
- Fields, or
instances variables, are
defined using
var
declarations.Widget
has twoi32
fields (x
andy
), and oneString
field (payload
).
The order of the field declarations determines the fields' memory-layout order.
Classes may have other kinds of members beyond fields declared in its scope:
- Class functions
- Methods
alias
let
to define class constants. TODO: Another syntax to define constants associated with the class likeclass let
orstatic let
?class
, to define a member class or nested class
Within the scope of a class, the unqualified name Self
can be used to refer to
the class itself.
Members of a class are accessed using the dot
(.
) notation, so given an instance dial
of type Widget
, dial.payload
refers to its payload
field.
Both structural data classes and nominal classes are considered class types, but they are commonly referred to as "structs" and "classes" respectively when that is not confusing. Like structs, classes refer to their members by name. Unlike structs, classes are nominal types.
References:
- Classes
- Proposal #722: Nominal classes and methods
- Proposal #989: Member access expressions
There is an implicit conversions defined between a struct literal and a class type with the same fields, in any scope that has access to all of the class' fields. This may be used to assign or initialize a variable with a class type, as in:
var sprocket: Widget = {.x = 3, .y = 4, .payload = "Sproing"};
sprocket = {.x = 2, .y = 1, .payload = "Bounce"};
References:
Classes may also contain class functions. These are functions that are accessed as members of the type, like static member functions in C++, as opposed to methods that are members of instances. They are commonly used to define a function that creates instances. Carbon does not have separate constructors like C++ does.
class Point {
// Class function that instantiates `Point`.
// `Self` in class scope means the class currently being defined.
fn Origin() -> Self {
return {.x = 0, .y = 0};
}
var x: i32;
var y: i32;
}
Note that if the definition of a function is provided inside the class scope, the body is treated as if it was defined immediately after the outermost class definition. This means that members such as the fields will be considered declared even if their declarations are later in the source than the class function.
The returned var
feature can be used if the address of the
instance being created is needed in a factory function, as in:
class Registered {
fn Make() -> Self {
returned var result: Self = {...};
StoreMyPointerSomewhere(&result);
return var;
}
}
This approach can also be used for types that can't be copied or moved.
References:
Class type definitions can include methods:
class Point {
// Method defined inline
fn Distance[self: Self](x2: i32, y2: i32) -> f32 {
var dx: i32 = x2 - self.x;
var dy: i32 = y2 - self.y;
return Math.Sqrt(dx * dx + dy * dy);
}
// Mutating method declaration
fn Offset[addr self: Self*](dx: i32, dy: i32);
var x: i32;
var y: i32;
}
// Out-of-line definition of method declared inline
fn Point.Offset[addr self: Self*](dx: i32, dy: i32) {
self->x += dx;
self->y += dy;
}
var origin: Point = {.x = 0, .y = 0};
Assert(Math.Abs(origin.Distance(3, 4) - 5.0) < 0.001);
origin.Offset(3, 4);
Assert(origin.Distance(3, 4) == 0.0);
This defines a Point
class type with two integer data members x
and y
and
two methods Distance
and Offset
:
- Methods are defined as class functions with a
self
parameter inside square brackets[
...]
before the regular explicit parameter list in parens(
...)
. - Methods are called using the member syntax,
origin.Distance(
...)
andorigin.Offset(
...)
. Distance
computes and returns the distance to another point, without modifying thePoint
. This is signified using[self: Self]
in the method declaration.origin.Offset(
...)
does modify the value oforigin
. This is signified using[addr self: Self*]
in the method declaration. Since calling this method requires taking the non-const
address oforigin
, it may only be called on reference expressions.- Methods may be declared lexically inline like
Distance
, or lexically out of line likeOffset
.
References:
- Methods
- Proposal #722: Nominal classes and methods
The philosophy of inheritance support in Carbon is to focus on use cases where inheritance is a good match, and use other features for other cases. For example, mixins for implementation reuse and generics for separating interface from implementation. This allows Carbon to move away from multiple inheritance, which doesn't have as efficient of an implementation strategy.
Classes by default are
final,
which means they may not be extended. A class may be declared as allowing
extension using either the base class
or abstract class
introducer instead
of class
. An abstract class
is a base class that may not itself be
instantiated.
base class MyBaseClass { ... }
Either kind of base class may be extended to get a derived class. Derived
classes are final unless they are themselves declared base
or abstract
.
Classes may only extend a single class. Carbon only supports single inheritance,
and will use mixins instead of multiple inheritance.
base class MiddleDerived {
extend base: MyBaseClass;
...
}
class FinalDerived {
extend base: MiddleDerived;
...
}
// ❌ Forbidden: class Illegal { extend base: FinalDerived; ... }
// may not extend `FinalDerived` since not declared `base` or `abstract`.
A base class may define virtual methods. These are methods whose implementation may be overridden in a derived class. By default methods are non-virtual, the declaration of a virtual method must be prefixed by one of these three keywords:
- A method marked
virtual
has a definition in this class but not in any base. - A method marked
abstract
does not have a definition in this class, but must have a definition in any non-abstract
derived class. - A method marked
impl
has a definition in this class, overriding any definition in a base class.
A pointer to a derived class may be cast to a pointer to one of its base
classes. Calling a virtual method through a pointer to a base class will use the
overriding definition provided in the derived class. Base classes with virtual
methods may use
run-time type information
in a match statement to dynamically test whether the dynamic type of a value is
some derived class, as in:
var base_ptr: MyBaseType* = ...;
match (base_ptr) {
case dyn p: MiddleDerived* => { ... }
}
For purposes of construction, a derived class acts like its first field is
called base
with the type of its immediate base class.
class MyDerivedType {
extend base: MyBaseType;
fn Make() -> MyDerivedType {
return {.base = MyBaseType.Make(), .derived_field = 7};
}
var derived_field: i32;
}
Abstract classes can't be instantiated, so instead they should define class
functions returning partial Self
. Those functions should be marked
protected
so they may only be used by derived classes.
abstract class AbstractClass {
protected fn Make() -> partial Self {
return {.field_1 = 3, .field_2 = 9};
}
// ...
var field_1: i32;
var field_2: i32;
}
// ❌ Error: can't instantiate abstract class
var abc: AbstractClass = ...;
class DerivedFromAbstract {
extend base: AbstractClass;
fn Make() -> Self {
// AbstractClass.Make() returns a
// `partial AbstractClass` that can be used as
// the `.base` member when constructing a value
// of a derived class.
return {.base = AbstractClass.Make(),
.derived_field = 42 };
}
var derived_field: i32;
}
References:
- Classes: Inheritance
- Proposal #777: Inheritance
- Proposal #820: Implicit conversions
Class members are by default publicly accessible. The private
keyword prefix
can be added to the member's declaration to restrict it to members of the class
or any friends. A private virtual
or private abstract
method may be
implemented in derived classes, even though it may not be called.
Friends may be declared using a friend
declaration inside the class naming an
existing function or type. Unlike C++, friend
declarations may only refer to
names resolvable by the compiler, and don't act like forward declarations.
protected
is like private
, but also gives access to derived classes.
References:
- Access control for class members
- Question-for-leads issue #665:
private
vspublic
syntax strategy, as well as other visibility tools likeexternal
/api
/etc.- Proposal #777: Inheritance
- Question-for-leads issue #971: Private interfaces in public API files
A destructor for a class is custom code executed when the lifetime of a value of
that type ends. They are defined with the destructor
keyword followed by
either [self: Self]
or [addr self: Self*]
(as is done with
methods) and the block of code in the class definition, as in:
class MyClass {
destructor [self: Self] { ... }
}
or:
class MyClass {
// Can modify `self` in the body.
destructor [addr self: Self*] { ... }
}
The destructor for a class is run before the destructors of its data members. The data members are destroyed in reverse order of declaration. Derived classes are destroyed before their base classes.
A destructor in an abstract or base class may be declared virtual
like with
methods. Destructors in classes derived from one with a virtual
destructor must be declared with the impl
keyword prefix. It is illegal to
delete an instance of a derived class through a pointer to a base class unless
the base class is declared virtual
or impl
. To delete a pointer to a
non-abstract base class when it is known not to point to a value with a derived
type, use UnsafeDelete
.
References:
- Classes: Destructors
- Proposal #1154: Destructors
For every type MyClass
, there is the type const MyClass
such that:
- The data representation is the same, so a
MyClass*
value may be implicitly converted to a(const MyClass)*
. - A
const MyClass
reference expression may automatically convert to aMyClass
value expression, the same way that aMyClass
reference expression can. - If member
x
ofMyClass
has typeT
, then memberx
ofconst MyClass
has typeconst T
. - While all of the member names in
MyClass
are also member names inconst MyClass
, the effective API of aconst MyClass
reference expression is a subset ofMyClass
, because onlyaddr
methods accepting aconst Self*
will be valid.
Note that const
binds more tightly than postfix-*
for forming a pointer
type, so const MyClass*
is equal to (const MyClass)*
.
This example uses the definition of Point
from the
"methods" section:
var origin: Point = {.x = 0, .y = 0};
// ✅ Allowed conversion from `Point*` to
// `const Point*`:
let p: const Point* = &origin;
// ✅ Allowed conversion of `const Point` reference expression
// to `Point` value expression.
let five: f32 = p->Distance(3, 4);
// ❌ Error: mutating method `Offset` excluded
// from `const Point` API.
p->Offset(3, 4);
// ❌ Error: mutating method `AssignAdd.Op`
// excluded from `const i32` API.
p->x += 2;
References:
Types indicate that they support unformed states by implementing a particular interface, otherwise variables of that type must be explicitly initialized when they are declared.
An unformed state for an object is one that satisfies the following properties:
- Assignment from a fully formed value is correct using the normal assignment implementation for the type.
- Destruction must be correct using the type's normal destruction implementation.
- Destruction must be optional. The behavior of the program must be equivalent whether the destructor is run or not for an unformed object, including not leaking resources.
A type might have more than one in-memory representation for the unformed state,
and those representations may be the same as valid fully formed values for that
type. For example, all values are legal representations of the unformed state
for any type with a trivial destructor like i32
. Types may define additional
initialization for the hardened build mode. For example, this
causes integers to be set to 0
when in unformed state in this mode.
Any operation on an unformed object other than destruction or assignment from a fully formed value is an error, even if its in-memory representation is that of a valid value for that type.
References:
Carbon will allow types to define if and how they are moved. This can happen
when returning a value from a function or by using the move operator ~x
.
This leaves x
in an unformed state and returns its old
value.
Note: This is provisional. The move operator was discussed but not proposed in accepted proposal #257: Initialization of memory and variables.
Mixins allow reuse with different trade-offs compared to inheritance. Mixins focus on implementation reuse, such as might be done using CRTP or multiple inheritance in C++.
TODO: The design for mixins is still under development. The details here are provisional. The mixin use case was included in accepted proposal #561: Basic classes: use cases, struct literals, struct types, and future work.
A choice type is a tagged union,
that can store different types of data in a storage space that can hold the
largest. A choice type has a name, and a list of cases separated by commas
(,
). Each case has a name and an optional parameter list.
choice IntResult {
Success(value: i32),
Failure(error: String),
Cancelled
}
The value of a choice type is one of the cases, plus the values of the parameters to that case, if any. A value can be constructed by naming the case and providing values for the parameters, if any:
fn ParseAsInt(s: String) -> IntResult {
var r: i32 = 0;
for (c: i32 in s) {
if (not IsDigit(c)) {
// Equivalent to `IntResult.Failure(...)`
return .Failure("Invalid character");
}
// ...
}
return .Success(r);
}
Choice type values may be consumed using a match
statement:
match (ParseAsInt(s)) {
case .Success(value: i32) => {
return value;
}
case .Failure(error: String) => {
Display(error);
}
case .Cancelled => {
Terminate();
}
}
They can also represent an enumerated type, if no additional data is associated with the choices, as in:
choice LikeABoolean { False, True }
References:
- Sum types
- Proposal #157: Design direction for sum types
- Proposal #162: Basic Syntax
Names are introduced by declarations and are valid until the end of the scope in which they appear. Code may not refer to names earlier in the source than they are declared. In executable scopes such as function bodies, names declared later are not found. In declarative scopes such as packages, classes, and interfaces, it is an error to refer to names declared later, except that inline class member function bodies are parsed as if they appeared after the class.
A name in Carbon is formed from a sequence of letters, numbers, and underscores, and starts with a letter. We intend to follow Unicode's Annex 31 in selecting valid identifier characters, but a concrete set of valid characters has not been selected yet.
References:
- Files are grouped into libraries, which are in turn grouped into packages.
- Libraries are the granularity of code reuse through imports.
- Packages are the unit of distribution.
Each library must have exactly one api
file. This file includes declarations
for all public names of the library. Definitions for those declarations must be
in some file in the library, either the api
file or an impl
file.
Every package has its own namespace. This means libraries within a package need to coordinate to avoid name conflicts, but not across packages.
References:
Files start with an optional package declaration, consisting of:
- optionally, the
package
keyword followed by an identifier specifying the package name, - optionally, the
library
keyword followed by a string with the library name, - either
api
orimpl
, and - a terminating semicolon (
;
).
For example:
// Package name is `Geometry`.
// Library name is "Shapes".
// This file is an `api` file, not an `impl` file.
package Geometry library "Shapes" api;
Parts of this declaration may be omitted:
-
If the package keyword is not specified, as in
library "Widgets" api;
, the file contributes to theMain
package. No other package may import from theMain
package, and it cannot be named explicitly. -
If the library keyword is not specified, as in
package Geometry api;
, this file contributes to the default library. -
If both keywords are omitted, the package declaration must be omitted entirely. In this case, the file is an
impl
file belonging to the default library of theMain
package, which implicitly has an emptyapi
file. This library is used to define the entry point for the program, and tests and smaller examples may choose to reside entirely within this library. No other library can import this library even from within the default package.
If the default library of the Main
package contains a function named Run
,
that function is the program entry point. Otherwise, the program's entry point
may be defined in another language, such as by defining a C++ main
function.
Note: Valid signatures for the entry point have not yet been decided.
References:
After the package declaration, files may include import
declarations. The
import
keyword is followed by the package name, library
followed by the
library name, or both. If the library is omitted, the default library for that
package is imported.
All import
declarations must appear before all other non-package
declarations in the file.
The package name must be omitted when importing a library from the current package.
// Import the "Vertex" library from the package containing this file.
import library "Vertex";
The import library ...
syntax adds all the public top-level names within the
given library to the top-level scope of the current file as
private
names, and similarly for names in
namespaces.
Every impl
file automatically imports the api
file for its library.
Attempting to perform an import of the current library is invalid.
package MyPackage library "Widgets" impl;
// ❌ Error, this import is performed implicitly.
import MyPackage library "Widgets";
The default library for a package does not have a string name, and is instead
named with the default
keyword.
// Import the default library from the same package.
import library default;
It is an error to use the import library default;
syntax in the Main
package.
When the package name is specified, the import
declaration imports a library
from another package.
package MyPackage impl;
// Import the "Vector" library from the `LinearAlgebra` package.
import LinearAlgebra library "Vector";
// Import the default library from the `ArbitraryPrecision` package.
import ArbitraryPrecision;
The syntax import PackageName ...
introduces the name PackageName
as a
private
name naming the given package. Importing
additional libraries from that package makes additional members of PackageName
visible.
It is an error to specify the name of the current package. The package name must be omitted when importing from the same package.
It is an error to specify library default
in a package-qualified import.
Instead, omit the library
portion of the declaration.
It is an error to specify the package name Main
. Libraries in the Main
package can only be imported from within that package.
References:
The names visible from an imported library are determined by these rules:
- Declarations in an
api
file are by default public, which means visible to any file that imports that library. This matches class members, which are also default public. - A
private
prefix on a declaration in anapi
file makes the name library private. This means the name is visible in the file and allimpl
files for the same library. - The visibility of a name is determined by its first declaration, considering
api
files beforeimpl
files. Theprivate
prefix is only allowed on the first declaration. - A name declared in an
impl
file and not the correspondingapi
file is file private, meaning visible in just that file. Its first declaration must be marked with aprivate
prefix. TODO: This needs to be finalized in a proposal to resolve inconsistency between #665 and #1136. - Private names don't conflict with names outside the region they're private
to: two different libraries can have different private names
foo
without conflict, but a private name conflicts with a public name in the same scope.
At most one api
file in a package transitively used in a program may declare a
given name public.
References:
- Exporting entities from an API file
- Question-for-leads issue #665:
private
vspublic
syntax strategy, as well as other visibility tools likeexternal
/api
/etc.- Proposal #752: api file default public
- Proposal #931: Generic impls access (details 4)
- Question-for-leads issue #1136: what is the top-level scope in a source file, and what names are found there?
The top-level scope in a file is the scope of the package. This means:
- Within this scope (and its sub-namespaces), all visible names from the same
package appear. This includes names from the same file, names from the
api
file of a library when inside animpl
file, and names from imported libraries of the same package. - In scopes where package members might have a name conflict with something
else, the syntax
package.Foo
can be used to name theFoo
member of the current package.
In this example, the names F
and P
are used in a scope where they could mean
two different things, and
qualifications are needed to disambiguate:
import P;
fn F();
class C {
fn F();
class P {
fn H();
}
fn G() {
// ❌ Error: ambiguous whether `F` means
// `package.F` or `package.C.F`.
F();
// ✅ Allowed: fully qualified
package.F();
package.C.F();
// ✅ Allowed: unambiguous
C.F();
// ❌ Error: ambiguous whether `P` means
// `package.P` or `package.C.P`.
P.H();
// ✅ Allowed
package.P.H();
package.C.P.H();
C.P.H();
}
}
References:
- Code and name organization
- Proposal #107: Code and name organization
- Proposal #752: api file default public
- Question-for-leads issue #1136: what is the top-level scope in a source file, and what names are found there?
A namespace
declaration defines a name that may be used as a prefix of names
declared afterward. When defining a member of a namespace, other members of that
namespace are considered in scope and may be found by
name lookup without the namespace prefix. In this example,
package P
defines some of its members inside a namespace N
:
package P api;
// Defines namespace `N` within the current package.
namespace N;
// Defines namespaces `M` and `M.L`.
namespace M.L;
fn F();
// ✅ Allowed: Declares function `G` in namespace `N`.
private fn N.G();
// ❌ Error: `Bad` hasn't been declared.
fn Bad.H();
fn J() {
// ❌ Error: No `package.G`
G();
}
fn N.K() {
// ✅ Allowed: Looks in both `package` and `package.N`.
// Finds `package.F` and `package.N.G`.
F();
G();
}
// ✅ Allowed: Declares function `R` in namespace `M.L`.
fn M.L.R();
// ✅ Allowed: Declares function `Q` in namespace `M`.
fn M.Q();
Another package importing P
can refer to the public members of that namespace
by prefixing with the package name P
followed by the namespace:
import P;
// ✅ Allowed: `F` is public member of `P`.
P.F();
// ❌ Error: `N.G` is a private member of `P`.
P.N.G();
// ✅ Allowed: `N.K` is public member of `P`.
P.N.K();
// ✅ Allowed: `M.L.R` is public member of `P`.
P.M.L.R();
// ✅ Allowed: `M.Q` is public member of `P`.
P.M.Q();
References:
- "Namespaces" in "Code and name organization"
- "Package and namespace members" in "Qualified names and member access"
- Proposal #107: Code and name organization
- Proposal #989: Member access expressions
- Question-for-leads issue #1136: what is the top-level scope in a source file, and what names are found there?
Our naming conventions are:
- For idiomatic Carbon code:
UpperCamelCase
will be used when the named entity cannot have a dynamically varying value. For example, functions, namespaces, or compile-time constant values. Note thatvirtual
methods are named the same way to be consistent with other functions and methods.lower_snake_case
will be used when the named entity's value won't be known until runtime, such as for variables.
- For Carbon-provided features:
- Keywords and type literals will use
lower_snake_case
. - Other code will use the conventions for idiomatic Carbon code.
- Keywords and type literals will use
References:
alias
declares a name as equivalent to another name, for example:
alias NewName = SomePackage.OldName;
Note that the right-hand side of the equal sign (=
) is a name not a value, so
alias four = 4;
is not allowed. This allows alias
to work with entities like
namespaces, which aren't values in Carbon.
This can be used during an incremental migration when changing a name. For
example, alias
would allow you to have two names for a data field in a class
while clients were migrated between the old name and the new name.
class MyClass {
var new_name: String;
alias old_name = new_name;
}
var x: MyClass = {.new_name = "hello"};
Carbon.Assert(x.old_name == "hello");
Another use is to include a name in a public API. For example, alias
may be
used to include a name from an interface implementation as a member of a class
or named constraint, possibly renamed:
class ContactInfo {
impl as Printable;
impl as ToPrinterDevice;
alias PrintToScreen = Printable.Print;
alias PrintToPrinter = ToPrinterDevice.Print;
...
}
References:
- Aliases
- "Aliasing" in "Code and name organization"
alias
a name from an interface implalias
a name in a named constraint- Proposal #107: Code and name organization
- Proposal #553: Generics details part 1
- Question-for-leads issue #749: Alias syntax
- Proposal #989: Member access expressions
The general principle of Carbon name lookup is that we look up names in all relevant scopes, and report an error if the name is found to refer to more than one different entity. So Carbon requires disambiguation by adding qualifiers instead of doing any shadowing of names. Member name lookup follows a similar philosophy. For an example, see the "package scope" section.
Unqualified name lookup walks the semantically-enclosing scopes, not only the
lexically-enclosing ones. So when a lookup is performed within
fn MyNamespace.MyClass.MyNestedClass.MyFunction()
, we will look in
MyNestedClass
, MyClass
, MyNamespace
, and the package scope, even when the
lexically-enclosing scope is the package scope. This means that the definition
of a method will look for names in the class' scope even if it is written
lexically out of line:
class C {
fn F();
fn G();
}
fn C.G() {
// ✅ Allowed: resolves to `package.C.F`.
F();
}
Carbon also rejects cases that would be invalid if all declarations in the file, including ones appearing later, were visible everywhere, not only after their point of appearance:
class C {
fn F();
fn G();
}
fn C.G() {
F();
}
// Error: use of `F` in `C.G` would be ambiguous
// if this declaration was earlier.
fn F();
References:
- Name lookup
- "Qualified names and member access" section of "Expressions"
- Qualified names and member access
- Principle: Information accumulation
- Proposal #875: Principle: information accumulation
- Proposal #989: Member access expressions
- Question-for-leads issue #1136: what is the top-level scope in a source file, and what names are found there?
Common types that we expect to be used universally will be provided for every
file are made available as if there was a special "prelude" package that was
imported automatically into every api
file. Dedicated type literal syntaxes
like i32
and bool
refer to types defined within this package, based on the
"all APIs are library APIs" principle.
TODO: Prelude provisionally imports the
Carbon
package which includes common facilities, like
References:
- Name lookup
- Principle: All APIs are library APIs
- Question-for-leads issue #750: Naming conventions for Carbon-provided features
- Question-for-leads issue #1058: How should interfaces for core functionality be named?
- Proposal #1280: Principle: All APIs are library APIs
Generics allow Carbon constructs like functions and
classes to be written with compile-time parameters to generalize
across different values of those parameters. For example, this Min
function
has a type* parameter T
that can be any type that implements the Ordered
interface.
fn Min[T:! Ordered](x: T, y: T) -> T {
// Can compare `x` and `y` since they have
// type `T` known to implement `Ordered`.
return if x <= y then x else y;
}
var a: i32 = 1;
var b: i32 = 2;
// `T` is deduced to be `i32`
Assert(Min(a, b) == 1);
// `T` is deduced to be `String`
Assert(Min("abc", "xyz") == "abc");
Since the T
parameter is in the deduced parameter list in square brackets
([
...]
) before the explicit parameter list in parentheses ((
...)
), the
value of T
is determined from the types of the explicit arguments instead of
being passed as a separate explicit argument.
(*) Note: T
here may be thought of as a type parameter, but its values are
actually facets, which are
values usable as types. The T
in this example is
not itself a type.
References: TODO: Revisit
- Generics: Overview
- Proposal #524: Generics overview
- Proposal #553: Generics details part 1
- Proposal #950: Generic details 6: remove facets
- Proposal #2360: Types are values of type
type
The :!
marks it as a compile-time binding pattern, and so T
is a
compile-time parameter. Compile-time parameters may either be checked or
template, and default to checked.
"Checked" here means that the body of Min
is type checked when the function is
defined, independent of the specific values T
is instantiated with, and name
lookup is delegated to the constraint on T
(Ordered
in this case). This type
checking is equivalent to saying the function would pass type checking given any
type T
that implements the Ordered
interface. Subsequent calls to Min
only
need to check that the deduced value of T
implements Ordered
.
The parameter could alternatively be declared to be a template generic
parameter by prefixing with the template
keyword, as in template T:! type
.
fn Convert[template T:! type](source: T, template U:! type) -> U {
var converted: U = source;
return converted;
}
fn Foo(i: i32) -> f32 {
// Instantiates with the `T` implicit argument set to `i32` and the `U`
// explicit argument set to `f32`, then calls with the runtime value `i`.
return Convert(i, f32);
}
A template parameter can still use a constraint. The Min
example could have
been declared as:
fn TemplatedMin[template T:! Ordered](x: T, y: T) -> T {
return if x <= y then x else y;
}
Carbon templates follow the same fundamental paradigm as C++ templates: they are instantiated when called, resulting in late type checking, duck typing, and lazy binding.
One difference from C++ templates, Carbon template instantiation is not controlled by the SFINAE rule of C++ (1, 2) but by explicit constraints declared in the function signature and evaluated at compile-time.
TODO: The design for template constraints is still under development.
The expression phase of a checked parameter is a symbolic
constant whereas the expression phase of a template parameter is template
constant. A binding pattern using :!
is a compile-time binding pattern; more
specifically a template binding pattern if it uses template
, and a symbolic
binding pattern if it does not.
Although checked generics are generally preferred, templates enable translation of code between C++ and Carbon, and address some cases where the type checking rigor of checked generics is problematic.
References:
- Templates
- Proposal #553: Generics details part 1
- Question-for-leads issue #949: Constrained template name lookup
- Proposal #2138: Checked and template generic terminology
- Proposal #2200: Template generics
Interfaces specify a set of requirements that a type might satisfy. Interfaces act both as constraints on types a caller might supply and capabilities that may be assumed of types that satisfy that constraint.
interface Printable {
// Inside an interface definition `Self` means
// "the type implementing this interface".
fn Print[self: Self]();
}
An interface is kind of facet type, and the values of this type are facets, which are values usable as types.
In addition to function requirements, interfaces can contain:
- requirements that other interfaces be implemented or interfaces that this interface extends
- associated facets and other associated constants
- interface defaults
final
interface members
Types only implement an interface if there is an explicit impl
declaration
that they do. Simply having a Print
function with the right signature is not
sufficient.
// Class `Text` does not implement the `Printable` interface.
class Text {
fn Print[self: Self]();
}
class Circle {
var radius: f32;
// This `impl` declaration establishes that `Circle` implements
// `Printable`.
impl as Printable {
fn Print[self: Self]() {
Carbon.Print("Circle with radius: {0}", self.radius);
}
}
}
In this case, Print
is not a direct member of Circle
, but:
-
Circle
may be passed to functions expecting a type that implementsPrintable
.fn GenericPrint[T:! Printable](x: T) { // Look up into `T` delegates to `Printable`, so this // finds `Printable.Print`: x.Print(); }
-
The members of
Printable
such asPrint
may be called using compound member access syntax (1, 2) to qualify the name of the member, as in:fn CirclePrint(c: Circle) { // Succeeds, even though `c.Print()` would not. c.(Printable.Print)(); }
To include the members of the interface as direct members of the type, use the
extend
keyword, as in extend impl as Printable
. This is only permitted on
impl
declarations in the body of a class definition.
Without extend
, implementations don't have to be in the same library as the
type definition, subject to the orphan rule
(1, 2) for
coherence.
Interfaces and implementations may be
forward declared
by replacing the definition scope in curly braces ({
...}
) with a semicolon.
References:
- Generics: Interfaces
- Generics: Implementing interfaces
- Proposal #553: Generics details part 1
- Proposal #731: Generics details 2: adapters, associated types, parameterized interfaces
- Proposal #624: Coherence: terminology, rationale, alternatives considered
- Proposal #989: Member access expressions
- Proposal #990: Generics details 8: interface default and final members
- Proposal #1084: Generics details 9: forward declarations
- Question-for-leads issue #1132: How do we match forward declarations with their definitions?
- Proposal #2360: Types are values of type
type
- Proposal #2760: Consistent
class
andinterface
syntax
A function can require type arguments to implement multiple interfaces (or other
facet types) by combining them using an ampersand (&
):
fn PrintMin[T:! Ordered & Printable](x: T, y: T) {
// Can compare since type `T` implements `Ordered`.
if (x <= y) {
// Can call `Print` since type `T` implements `Printable`.
x.Print();
} else {
y.Print();
}
}
The body of the function may call functions that are in either interface, except for names that are members of both. In that case, use the compound member access syntax (1, 2) to qualify the name of the member, as in:
fn DrawTies[T:! Renderable & GameResult](x: T) {
if (x.(GameResult.Draw)()) {
x.(Renderable.Draw)();
}
}
References:
- Combining interfaces by anding type-of-types
- Question-for-leads issue #531: Combine interfaces with
+
or&
- Proposal #553: Generics details part 1
Member lookup into a template parameter is done in the actual value provided by
the caller, in addition to any constraints. This means member name lookup and
type checking for anything dependent
on the template parameter can't be completed until the template is instantiated
with a specific concrete type. When the constraint is just type
, this gives
semantics similar to C++ templates.
class Game {
fn Draw[self: Self]() -> bool;
impl as Renderable {
fn Draw[self: Self]();
}
}
fn TemplateDraw[template T:! type](x: T) {
// Calls `Game.Draw` when `T` is `Game`:
x.Draw();
}
fn ConstrainedTemplateDraw[template T:! Renderable](x: T) {
// ❌ Error when `T` is `Game`: Finds both `T.Draw` and
// `Renderable.Draw`, and they are different.
x.Draw();
}
fn CheckedGenericDraw[T:! Renderable](x: T) {
// Always calls `Renderable.Draw`, even when `T` is `Game`:
x.Draw();
}
This allows a safe transition from template to checked generics. Constraints can
be added incrementally, with the compiler verifying that the semantics stay the
same. If adding the constraint would change which function gets called, an error
is triggered, as in ConstrainedTemplateDraw
from the example. Once all
constraints have been added, it is safe to remove the word template
to switch
to a checked parameter.
References:
- Proposal #989: Member access expressions
- Proposal #2200: Template generics
An associated constant is a member of an interface whose value is determined by
the implementation of that interface for a specific type. These values are set
to compile-time values in implementations, and so use the
:!
compile-time binding pattern syntax
inside a let
declaration without an initializer.
This allows types in the signatures of functions in the interface to vary. For
example, an interface describing a
stack might use an
associated constant to represent the type of elements stored in the stack.
interface StackInterface {
let ElementType:! Movable;
fn Push[addr self: Self*](value: ElementType);
fn Pop[addr self: Self*]() -> ElementType;
fn IsEmpty[self: Self]() -> bool;
}
Then different types implementing StackInterface
can specify different values
for the ElementType
member of the interface using a where
clause:
class IntStack {
extend impl as StackInterface where .ElementType = i32 {
fn Push[addr self: Self*](value: i32);
// ...
}
}
class FruitStack {
extend impl as StackInterface where .ElementType = Fruit {
fn Push[addr self: Self*](value: Fruit);
// ...
}
}
References:
Many Carbon entities, not just functions, may be made generic by adding checked or template parameters.
Classes may be defined with an optional explicit parameter list. All parameters
to a class must be compile-time, and so defined with :!
, either with or
without the template
prefix. For example, to define a stack that can hold
values of any type T
:
class Stack(T:! type) {
fn Push[addr self: Self*](value: T);
fn Pop[addr self: Self*]() -> T;
var storage: Array(T);
}
var int_stack: Stack(i32);
In this example:
Stack
is a type parameterized by a typeT
.T
may be used within the definition ofStack
anywhere a normal type would be used.Array(T)
instantiates generic typeArray
with its argument set toT
.Stack(i32)
instantiatesStack
withT
set toi32
.
The values of type parameters are part of a type's value, and so may be deduced in a function call, as in this example:
fn PeekTopOfStack[T:! type](s: Stack(T)*) -> T {
var top: T = s->Pop();
s->Push(top);
return top;
}
// `int_stack` has type `Stack(i32)`, so `T` is deduced to be `i32`:
PeekTopOfStack(&int_stack);
References:
Choice types may be parameterized similarly to classes:
choice Result(T:! type, Error:! type) {
Success(value: T),
Failure(error: Error)
}
Interfaces are always parameterized by a Self
type, but in some cases they
will have additional parameters.
interface AddWith(U:! type);
Interfaces without parameters may only be implemented once for a given type, but
a type can have distinct implementations of AddWith(i32)
and
AddWith(BigInt)
.
Parameters to an interface determine which implementation is selected for a type, in contrast to associated constants which are determined by the implementation of an interface for a type.
References:
An impl
declaration may be parameterized by adding forall [
compile-time
parameter list]
after the impl
keyword introducer, as in:
impl forall [T:! Printable] Vector(T) as Printable;
impl forall [Key:! Hashable, Value:! type]
HashMap(Key, Value) as Has(Key);
impl forall [T:! Ordered] T as PartiallyOrdered;
impl forall [T:! ImplicitAs(i32)] BigInt as AddWith(T);
impl forall [U:! type, T:! As(U)]
Optional(T) as As(Optional(U));
Generic implementations can create a situation where multiple impl
definitions
apply to a given type and interface query. The
specialization rules
pick which definition is selected. These rules ensure:
- Implementations have coherence, so the same implementation is always selected for a given query.
- Libraries will work together as long as they pass their separate checks.
- A generic function can assume that some impl will be successfully selected if it can see an impl that applies, even though another more-specific impl may be selected.
Implementations may be marked
final
to indicate that they may
not be specialized, subject to
some restrictions.
References:
- Generic or parameterized impl declarationss
- Proposal #624: Coherence: terminology, rationale, alternatives considered
- Proposal #920: Generic parameterized impls (details 5)
- Proposal #983: Generics details 7: final impls
- Question-for-leads issue 1192: Parameterized impl syntax
- Proposal #1327: Generics:
impl forall
Carbon generics have a number of other features, including:
- Named constraints may be used to disambiguate when combining two interfaces that have name conflicts. Named constraints define facet types, and may be implemented and otherwise used in place of an interface.
- Template constraints are a kind of named constraint that can contain structural requirements. For example, a template constraint could match any type that has a function with a specific name and signature without any explicit declaration that the type implements the constraint. Template constraints may only be used as requirements for template parameters.
- An adapter type is a type with the same data representation as an existing type, so you may cast between the two types, but can implement different interfaces or implement interfaces differently.
- Additional requirements can be placed on the associated facets of an
interface using
where
constraints. - Implied constraints allow some constraints to be deduced and omitted from a function signature.
- Planned dynamic erased types
can hold any value with a type implementing an interface, and allow the
functions in that interface to be called using
dynamic dispatch, for some
interfaces marked "
dyn
-safe". Note: Provisional. - Planned variadics supports variable-length parameter lists. Note: Provisional.
References:
Determining whether two types must be equal in a checked-generic context is in general undecidable, as has been shown in Swift.
To make compilation fast, the Carbon compiler will limit its search to a depth
of 1, only identifying types as equal if there is an explicit declaration that
they are equal in the code, such as in a
where
constraint. There will be
situations where two types must be equal as the result of combining these facts,
but the compiler will return a type error since it did not realize they are
equal due to the limit of the search. An
observe
...==
declaration may be
added to describe how two types are equal, allowing more code to pass type
checking.
An observe
declaration showing types are equal can increase the set of
interfaces the compiler knows that a type implements. It is also possible that
knowing a type implements one interface implies that it implements another, from
an
interface requirement
or generic implementation. An observe
...impls
declaration may be used to
observe that a type implements an interface.
References:
Uses of an operator in an expression is translated into a call
to a method of an interface. For example, if x
has type T
and y
has type
U
, then x + y
is translated into a call to x.(AddWith(U).Op)(y)
. So
overloading of the +
operator is accomplished by implementing interface
AddWith(U)
for type T
. In order to support
implicit conversion of the first operand
to type T
and the second argument to type U
, add the like
keyword to both
types in the impl
declaration, as in:
impl like T as AddWith(like U) where .Result = V {
// `Self` is `T` here
fn Op[self: Self](other: U) -> V { ... }
}
When the operand types and result type are all the same, this is equivalent to
implementing the Add
interface:
impl T as Add {
fn Op[self: Self](other: Self) -> Self { ... }
}
The interfaces that correspond to each operator are given by:
- Arithmetic:
-x
:Negate
x + y
:Add
orAddWith(U)
x - y
:Sub
orSubWith(U)
x * y
:Mul
orMulWith(U)
x / y
:Div
orDivWith(U)
x % y
:Mod
orModWith(U)
- Bitwise and shift operators:
^x
:BitComplement
x & y
:BitAnd
orBitAndWith(U)
x | y
:BitOr
orBitOrWith(U)
x ^ y
:BitXor
orBitXorWith(U)
x << y
:LeftShift
orLeftShiftWith(U)
x >> y
:RightShift
orRightShiftWith(U)
- Comparison:
x == y
,x != y
overloaded by implementingEq
orEqWith(U)
x < y
,x > y
,x <= y
,x >= y
overloaded by implementingOrdered
orOrderedWith(U)
- Conversion:
x as U
is rewritten to use theAs(U)
interface- Implicit conversions use
ImplicitAs(U)
- Indexing:
x[y]
is rewritten to use theIndexWith
orIndirectIndexWith
interface.
- TODO: Dereference:
*p
- TODO: Move:
~x
- TODO: Function call:
f(4)
The logical operators can not be overloaded.
Operators that result in reference expressions, such
as dereferencing *p
and indexing a[3]
, have interfaces that return the
address of the value. Carbon automatically dereferences the pointer to form the
reference expression.
Operators that can take multiple arguments, such as function calling operator
f(4)
, have a variadic parameter
list. TODO: Variadics are still provisional.
Whether and how a value supports other operations, such as being copied, swapped, or set into an unformed state, is also determined by implementing corresponding interfaces for the value's type.
References:
- Operator overloading
- Proposal #702: Comparison operators
- Proposal #820: Implicit conversions
- Proposal #845: as expressions
- Question-for-leads issue #1058: How should interfaces for core functionality be named?
- Proposal #1083: Arithmetic expressions
- Proposal #1191: Bitwise operators
- Proposal #1178: Rework operator interfaces
There are some situations where the common type for two types is needed:
-
A conditional expression like
if c then t else f
returns a value with the common type oft
andf
. -
If there are multiple parameters to a function with a type parameter, it will be set to the common type of the corresponding arguments, as in:
fn F[T:! type](x: T, y: T); // Calls `F` with `T` set to the // common type of `G()` and `H()`: F(G(), H());
-
The inferred return type of a function with
auto
return type is the common type of itsreturn
statements.
The common type is specified by implementing the CommonTypeWith
interface:
// Common type of `A` and `B` is `C`.
impl A as CommonTypeWith(B) where .Result = C { }
The common type is required to be a type that both types have an implicit conversion to.
References:
if
expressions- Proposal #911: Conditional expressions
- Question-for-leads issue #1077: find a way to permit impls of CommonTypeWith where the LHS and RHS type overlap
Interoperability, or interop, is the ability to call C and C++ code from Carbon code and the other way around. This ability achieves two goals:
- Allows sharing a code and library ecosystem with C and C++.
- Allows incremental migration to Carbon from C and C++.
Carbon's approach to interop is most similar to Java/Kotlin interop, where the two languages are different, but share enough of runtime model that data from one side can be used from the other. For example, C++ and Carbon will use the same memory model.
The design for interoperability between Carbon and C++ hinges on:
- The ability to interoperate with a wide variety of code, such as classes/structs and templates, not just free functions.
- A willingness to expose the idioms of C++ into Carbon code, and the other way around, when necessary to maximize performance of the interoperability layer.
- The use of wrappers and generic programming, including templates, to minimize or eliminate runtime overhead.
This feature will have some restrictions; only a subset of Carbon APIs will be available to C++ and a subset of C++ APIs will be available to Carbon.
- To achieve simplification in Carbon, its programming model will exclude some rarely used and complex features of C++. For example, there will be limitations on multiple inheritance.
- C or C++ features that compromise the performance of code that don't use that feature, like RTTI and exceptions, are in particular subject to revision in Carbon.
References:
The goals for interop include:
- Support mixing Carbon and C++ toolchains
- Compatibility with the C++ memory model
- Minimize bridge code
- Unsurprising mappings between C++ and Carbon types
- Allow C++ bridge code in Carbon files
- Carbon inheritance from C++ types
- Support use of advanced C++ features
- Support basic C interoperability
References:
The non-goals for interop include:
- Full parity between a Carbon-only toolchain and mixing C++/Carbon toolchains
- Never require bridge code
- Convert all C++ types to Carbon types
- Support for C++ exceptions without bridge code
- Cross-language metaprogramming
- Offer equivalent support for languages other than C++
References:
Note: This is provisional, no design for importing C++ has been through the proposal process yet.
A C++ library header file may be imported into Carbon using an
import
declaration of the special Cpp
package.
// like `#include "circle.h"` in C++
import Cpp library "circle.h";
This adds the names from circle.h
into the Cpp
namespace. If circle.h
defines some names in a namespace shapes { ... }
scope, those will be found in
Carbon's Cpp.shapes
namespace.
In the other direction, Carbon packages can export a header file to be
#include
d from C++ files.
// like `import Geometry` in Carbon
#include "geometry.carbon.h"
Generally Carbon entities will be usable from C++ and C++ entities will be usable from Carbon. This includes types, function, and constants. Some entities, such as Carbon interfaces, won't be able to be translated directly.
C and C++ macros that are defining constants will be imported as constants.
Otherwise, C and C++ macros will be unavailable in Carbon. C and C++ typedef
s
would be translated into type constants, as if declared using a
let
.
Carbon functions and types that satisfy some restrictions may be annotated as
exported to C as well, like C++'s
extern "C"
marker.
Note: This reflects goals and plans. No specific design for the implementation has been through the proposal process yet.
Carbon itself will not have a stable ABI for the language as a whole, and most language features will be designed around not having any ABI stability. Instead, we expect to add dedicated language features that are specifically designed to provide an ABI-stable boundary between two separate parts of a Carbon program. These ABI-resilient language features and API boundaries will be opt-in and explicit. They may also have functionality restrictions to make them easy to implement with strong ABI resilience.
When interoperating with already compiled C++ object code or shared libraries, the C++ interop may be significantly less feature rich than otherwise. This is an open area for us to explore, but we expect to require re-compiling C++ code in order to get the full ergonomic and performance benefits when interoperating with Carbon. For example, recompilation lets us ensure Carbon and C++ can use the same representation for key vocabulary types.
However, we expect to have full support for the C ABI when interoperating with
already-compiled C object code or shared libraries. We expect Carbon's bridge
code functionality to cover similar use cases as C++'s
extern "C"
marker in order to provide full bi-directional support here. The functionality
available across this interop boundary will of course be restricted to what is
expressible in the C ABI, and types may need explicit markers to have guaranteed
ABI compatibility.
References:
Note: This is provisional, no design for this has been through the proposal process yet.
Operator overloading is supported in Carbon, but is done by implementing an interface instead of defining a method or nonmember function as in C++.
Carbon types implementing an operator overload using an interface should get the
corresponding operator overload in C++. So implementing ModWith(U)
in Carbon
for a type effectively implements operator%
in C++ for that type. This also
works in the other direction, so C++ types implementing an operator overload are
automatically considered to implement the corresponding Carbon interface. So
implementing operator%
in C++ for a type also implements interface
ModWith(U)
in Carbon. However, there may be edge cases around implicit
conversions or overload selection that don't map completely into Carbon.
In some cases, the operation might be written differently in the two languages.
In those cases, they are matched according to which operation has the most
similar semantics rather than using the same symbols. For example, the ^x
operation and BitComplement
interface in Carbon corresponds to the ~x
operation and operator~
function in C++. Similarly, the ImplicitAs(U)
Carbon
interface corresponds to implicit conversions in C++, which can be written in
multiple different ways. Other
C++ customization points
like swap
will correspond to a Carbon interface, on a case-by-case basis.
Some operators will only exist or be overridable in C++, such as logical operators or the comma operator. In the unlikely situation where those operators need to be overridden for a Carbon type, that can be done with a nonmember C++ function.
Carbon interfaces with no C++ equivalent, such as
CommonTypeWith(U)
, may be implemented for C++ types
out-of-line in Carbon code. To satisfy the orphan rule
(1, 2),
each C++ library will have a corresponding Carbon wrapper library that must be
imported instead of the C++ library if the Carbon wrapper exists. TODO:
Perhaps it will automatically be imported, so a wrapper may be added without
requiring changes to importers?
Note: This is provisional, no design for this has been through the proposal process yet.
Carbon supports both checked and template generics. This provides a migration path for C++ template code:
- C++ template -> Carbon template: This involves migrating the code from C++ to Carbon. If that migration is faithful, the change should be transparent to callers.
- -> Carbon template with constraints: Constraints may be added one at a time. Adding a constraint never changes the meaning of the code as long as it continues to compile. Compile errors will point to types for which an implementation of missing interfaces is needed. A temporary template implementation of that interface can act as a bridge during the transition.
- -> Carbon checked generic: Once all callers work after all constraints have been added, the template parameter may be switched to a checked generic.
Carbon will also provide direct interop with C++ templates in many ways:
- Ability to call C++ templates and use C++ templated types from Carbon.
- Ability to instantiate a C++ template with a Carbon type.
- Ability to instantiate a Carbon generic with a C++ type.
We expect the best interop in these areas to be based on a Carbon-provided C++ toolchain. However, even when using Carbon's generated C++ headers for interop, we will include the ability where possible to use a Carbon generic from C++ as if it were a C++ template.
Note: This is provisional, no design for this has been through the proposal process yet.
The Carbon integer types, like i32
and u64
, are considered equal to the
corresponding fixed-width integer types in C++, like int32_t
and uint64_t
,
provided by <stdint.h>
or <cstdint>
. The basic C and C++ integer types like
int
, char
, and unsigned long
are available in Carbon inside the Cpp
namespace given an import Cpp;
declaration, with names like Cpp.int
,
Cpp.char
, and Cpp.unsigned_long
. C++ types are considered different if C++
considers them different, so C++ overloads are resolved the same way. Carbon
conventions for implicit conversions between integer types
apply here, allowing them whenever the numerical value for all inputs may be
preserved by the conversion.
Other C and C++ types are equal to Carbon types as follows:
C or C++ | Carbon |
---|---|
bool |
bool |
float |
f32 |
double |
f64 |
T* |
Optional(T*) |
T[4] |
[T; 4] |
Further, C++ reference types like T&
will be translated to T*
in Carbon,
which is Carbon's non-null pointer type.
Carbon will work to have idiomatic vocabulary view types for common data
structures, like std::string_view
and std::span
, map transparently between
C++ and the Carbon equivalents. This will include data layout so that even
pointers to these types translate seamlessly, contingent on a suitable C++ ABI
for those types, potentially by re-compiling the C++ code with a customized ABI.
We will also explore how to expand coverage to similar view types in other
libraries.
However, Carbon's containers will be distinct from the C++ standard library containers in order to maximize our ability to improve performance and leverage language features like checked generics in their design and implementation.
Where possible, we will also try to provide implementations of Carbon's standard library container interfaces for the relevant C++ container types so that they can be directly used with checked-generic Carbon code. This should allow checked-generic code in Carbon to work seamlessly with both Carbon and C++ containers without performance loss or constraining the Carbon container implementations. In the other direction, Carbon containers will satisfy C++ container requirements, so templated C++ code can operate directly on Carbon containers as well.
Carbon has single inheritance allowing C++ classes using inheritance to be migrated. The data representation will be consistent so that Carbon classes may inherit from C++ classes, and the other way around, even with virtual methods.
C++ multiple inheritance and CRTP will be migrated using a combination of Carbon features. Carbon mixins support implementation reuse and Carbon interfaces allow a type to implement multiple APIs. However, there may be limits on the degree of interop available with multiple inheritance across the C++ <-> Carbon boundaries.
Carbon dyn-safe interfaces may be exported to C++ as an abstract base class. The reverse operation is also possible using a proxy object implementing a C++ abstract base class and holding a pointer to a type implementing the corresponding interface.
References:
TODO
Note: Everything in this section is provisional and forward looking.
Carbon's premise is that C++ users can't give up performance to get safety. Even if some isolated users can make that tradeoff, they share code with performance-sensitive users. Any path to safety must preserve performance of C++ today. This rules out garbage collection, and many other options. The only well understood mechanism of achieving safety without giving up performance is compile-time safety. The leading example of how to achieve this is Rust.
The difference between Rust's approach and Carbon's is that Rust starts with safety and Carbons starts with migration. Rust supports interop with C, and there is ongoing work to improve the C++-interop story and develop migration tools. However, there is a large gap in programming models between the two languages, generally requiring a revision to the architecture. So, thus far the common pattern in the Rust community is to "rewrite it in Rust" (1, 2, 3). Carbon's approach is to focus on migration from C++, including seamless interop, and then incrementally improve safety.
The first impact on Carbon's design to support its safety strategy are the necessary building blocks for this level of compile-time safety. We look at existing languages like Rust and Swift to understand what fundamental capabilities they ended up needing. The two components that stand out are:
- Expanded type system that includes more semantic information.
- More pervasive use of type system abstractions (typically checked generics).
For migrating C++ code, we also need the ability to add features and migrate code to use those new features incrementally and over time. This requires designing the language with evolution baked in on day one. This impacts a wide range of features:
- At the lowest level, a simple and extensible syntax and grammar.
- Tools and support for adding and removing APIs.
- Scalable migration strategies, including tooling support.
Rust shows the value of expanded semantic information in the type system such as precise lifetimes. This is hard to do in C++ since it has too many kinds of references and pointers, which increases the complexity in the type system multiplicatively. Carbon is attempting to compress C++'s type variations into just values and pointers.
Rust also shows the value of functions parameterized by lifetimes. Since lifetimes are only used to establish safety properties of the code, there is no reason to pay the cost of monomorphization for those parameters. So we need a checked-generics system that can reason about code before it is instantiated, unlike C++ templates.
In conclusion, there are two patterns in how Carbon diverges from C++:
- Simplify and removing things to create space for new safety features. This trivially requires breaking backwards compatibility.
- Re-engineer foundations to model and enforce safety. This has complex and difficulty in C++ without first simplifying the language.
This leads to Carbon's incremental path to safety:
- Keep your performance, your existing codebase, and your developers.
- Adopt Carbon through a scalable, tool-assisted migration from C++.
- Address initial, easy safety improvements starting day one.
- Shift the Carbon code onto an incremental path towards memory safety over the next decade.
References: Safety strategy
TODO:
TODO: References need to be evolved. Needs a detailed design and a high level summary provided inline.
Carbon provides metaprogramming facilities that look similar to regular Carbon code. These are structured, and do not offer arbitrary inclusion or preprocessing of source text such as C and C++ do.
References: Metaprogramming
TODO: References need to be evolved. Needs a detailed design and a high level summary provided inline.
References: Pattern matching
For now, Carbon does not have language features dedicated to error handling, but
we would consider adding some in the future. At this point, errors are
represented using choice types like Result
and Optional
.
This is similar to the story for Rust, which started using Result
, then added
?
operator
for convenience, and is now considering (1,
2)
adding more.
Carbon provides some higher-order abstractions of program execution, as well as the critical underpinnings of such abstractions.
TODO:
TODO:
TODO:
TODO: