Skip to content

Latest commit

 

History

History
540 lines (413 loc) · 21.4 KB

p0339.md

File metadata and controls

540 lines (413 loc) · 21.4 KB

var statement

Pull request

Table of contents

Problem

The var statement is noted in the language overview, but is provisional — no justification has been provided. Variable declarations are fundamental, and it should be clear to what degree the current syntax is adopted.

It's expected that after the adoption of this proposal, var syntax will still not be finalized: the proposal is an experiment.

Constants

Although constants are naturally related to variables, this proposal does not include any syntax for constants. This is expected to be revisited later.

Background

Terminology

In this proposal, "variable" is defined as an identifier referring to a mutable value.

Out of scope features

Questions have come up about:

  • The type system
  • Type checking
  • Scoping

All of these are important features. However, in the interest of small proposals, they are out of scope of this proposal.

Language comparison

Variables are standard in many languages. Some various forms to consider are:

  • C++:

    int x;
    int y = 0;
    bool a = true, *b = nullptr, c;
  • Python:

    x = None
    y = 0
    z: int = 7  # Added by PEP 526.
  • Swift:

    var x = 0
    var y: Int = 0
    var z: Int
  • TypeScript

    let y: Number = 0;
    var x = 0;  # Legacy from JavaScript.
  • Rust

    let mut x = 0;
    let mut y: i32 = 0;
    let mut z: i32;
  • Go

    var x = 0
    y := 0
    var z int
    var a, b = true, false
  • Visual Basic

    Dim x As Integer = 3

Proposal

Carbon should adopt var <type> <identifier> [ = <value> ]; syntax for variable statements.

Considerations for this syntax are:

  • Type and identifier ordering: The ordering of <type> before <identifier> reflects the typical C++ ordering syntax.
    • Some C++ syntax can put type information after the identifier, such as int x[6];. Carbon should be expected to place that as part of the type.
  • var introducer keyword: The use of var makes it clearer for readers to skim and see where variables are being declared. It also reduces complexity and potential ambiguity in language parsing.
  • One variable: In C++, multiple variables can be declared in a single statement. An equivalent Carbon syntax may end up looking like var (Int x, String y) = (0, "foo");, so limiting to one declaration is not fundamentally restrictive. However, by breaking with C++ and requiring the full type to be specified with each identifier, we achieve two important things:
    • It's clear what the full type is, preventing difficult-to-read statements with a mix of stack variables, pointers, and similar.
    • As the language grows, a function returning a tuple may be assigned to distinctly named and typed variables.

Experiment: The ordering of type and identifier will be researched. For more information, see the alternative.

Executable semantics form

Example bison syntax for executable semantics is:

statement:
  "var" expression identifier optional_assignment;
| /* preexisting statements elided */
;

optional_assignment:
  /* empty */
| "=" expression
;

Rationale based on Carbon's goals

Carbon needs variables in order to be writable by developers. That functionality needs a syntax.

Relevant goals are:

Caveats

var name may change

The name var could still change. However, it's used with similar meaning in other languages including Swift, Go, and TypeScript, and so it's reasonable to expect it will not.

Changing to let mut

The idea that var may change includes the possibility that var may become something like let mut in Rust. However, this is not assumed by this proposal:

Multiple identifiers in one statement

Although var (Int x, String y) = (0, "foo"); syntax is mentioned, this proposal is not intended to propose such a syntax. It's noted primarily to explain the likely path, that this does not rule out abbreviated syntax such as that. That should probably be covered as part of tuples.

Update provisional pattern matching syntax

Pattern matching syntax in the overview uses syntax similar to Int: x. As part of removing the colon between type and identifier from the provisional var syntax, that syntax should be changed to remove the :. Details should be resolved as part of the eventual pattern matching proposal, but if changes are needed to add a separator, the var syntax should be updated to remain consistent. The precise form of that implementation will be part of normal Carbon evolution.

For example, replacing fn Sum(Int: a, Int: b) -> Int; with fn Sum(Int a, Int b) -> Int; and case (Int: p, (Float: x, Float: _)) if (p < 13) => { with case (Int p, (Float x, Float _)) if (p < 13) => {.

Update provisional $ syntax

Variables using Type:$ and similar should drop the :, as in Type$.

Alternatives considered

Noted alternatives are key differences from C++ variable declaration syntax.

No var introducer keyword

The intent of the var statement is to improve readability and parsability, and it's related to fn for functions. Although code is more succinct without introducers, the noted benefits are expected to be significant. Most other modern languages use similar introducers, and so this break from C++ is adopting a different norm.

Name of the var statement introducer

var is used with a similar meaning in several other languages, including Swift, Go, and JavaScript. let is used by TypeScript. let mut is used by Rust, with let used for constants (this use of let alone is consistent with other languages). In general var appears to be a more common choice.

Colon between type and identifier

The use of a colon (:) between the type and identifier is intended to reduce potential parsing ambiguity, and to make reading code easier. As proposed, there is no colon between the type and identifier.

Syntax ambiguity

Using a colon or other separator could make it easier to avoid certain kinds of ambiguities. For example, suppose we decided to use a postfix * operator to form pointer types, as in C++. In such a setup, we could have code like the following:

var T * x = 3;
var T * x = 3 y;

In the first statement, * is a unary operator and so T* is the type and x is the identifier. However, in the second statement, * is a binary operator and so T * x = 3 is the type, and y is the identifier; the resulting compiler errors may be confusing to users. Furthermore, the place of 3 could be taken by an an arbitrarily complex expression; this could cause resolving the ambiguity between unary and binary * to require unbounded look-ahead, adversely impacting code compilation time goals.

Consider instead the code:

var T *: x = 3;
var T * x = 3: y;

The colon makes it unambiguous whether the * in each case is unary or binary with only one token of look-ahead. More importantly, this syntax immediately calls the reader's attention to the fact that the second declaration has a highly unusual type.

There are other ways of resolving ambiguities like this. For example, we could avoid allowing the same operator to have both postfix and infix forms, or we could distinguish them by the presence or absence of whitespace. However, even if we avoid formal ambiguity by such means, a separator like : may be useful for reducing visual ambiguity for human readers.

Confusion with other languages and alternatives

One of the disadvantages of : is that with var Int: x, ordering is inconsistent with other languages using :, such as Rust and Swift, which would say var x: Int.

It may be worth considering other syntax options. A few to consider are:

var(Int) x;
var Int# x;
var Int @x;
var Int -> x;

These aren't part of the proposed syntax mainly because it's not clear any would gain as much support as :. However, this is an opportunity to make suggestions and see if there's a good compromise.

Use in pattern matching

The old draft pattern matching proposal used : as a separator. In pattern matching, the : may be particularly important to distinguish between value matching and type name matching. However, the pattern matching proposal should examine these choices and alternatives before we reach a conclusion that : is necessary for pattern matching. Per syntax ambiguity, it is expected that : has some advantages, but may not turn out to make a compiler difference due to prevailing constraints on type expression syntax.

This proposal suggests we update provisional pattern matching syntax to match the proposed var syntax.

Advantages and disadvantages

Advantages:

  • Reduces syntax ambiguity.
    • This should improve readability and parsability.
    • It should make it easier to debug issues during development.

Disadvantages:

  • Deviates from the common syntax used by most languages with the type before the identifier, including C, C++, Java, and C#.
    • Changing from C++ is especially significant because of Carbon's goals for interoperability and migration which will mean an especially large portion of Carbon developers will be actively reading both Carbon and C++ code.
  • Other notable languages that us : in variable statements, including Swift, put the type after the identifier.

Conclusion

Right now the proposal is to not have anything between the type and identifier in order to avoid cross-language ambiguity, and to retain syntax that is closer to C++. However, the ultimate decision may hinge on type and identifier ordering, as well as related future evolution.

Type after identifier

There are many languages that put the type after the identifier. A common format used by Swift and Rust is var x: Int.

It's worth considering the sentence-like readings:

  • var x Int (or var x: Int) may be read as "declare x as an int" or "make a variable x and give it int storage".
  • var Int x may be read as "declare an int called x" or "make a variable with int storage called x".

These readings might be of similar quality, and are presented to offer different perspectives on how to read the possible statement orderings.

Ordering as a way to quickly answer questions

Ordering is essentially a question of pairing identifiers and types. This can be cast as asking which question developers consider more important when reading code:

  1. What is the type of variable x?
  2. What is the identifier of the Int variable?

We assert the first question is the more important one: developers will see an identifier in later code, and want to know its type. However, how do we determine which order is better for this purpose?

Unfortunately, little research has been done on this. All we're aware of right now is a study from an unpublished undergraduate project from Germany. The study was done in Java with 50 students. Its data indicates that it's faster to answer question 1 if the type comes first, and faster to answer question 2 if the identifier comes first. We do not want to make decisions based on the study because it isn't published, studied a small group, and doesn't directly compare possible var syntaxes; however, it still influences our thoughts.

Syntax popularity

When considering what to use for now, we can consider the popularity of various languages. The top 10 on several sources (with percentages noted by sources that have them) are:

TIOBE Pct GitHut Pct PYPL Pct Octoverse
C 16% JavaScript 19% Python 30% JavaScript
Java 11% Python 16% Java 17% Python
Python 11% Java 11% JavaScript 8% Java
C++ 7% Go 8% C# 7% TypeScript
C# 4% C++ 7% C and C++ 7% C#
Visual Basic 4% Ruby 7% PHP 6% PHP
JavaScript 2% TypeScript 7% R 4% C++
PHP 2% PHP 6% Objective-C 4% C
SQL 2% C# 4% Swift 2% Shell
Assembly 2% C 3% TypeScript 2% Ruby

Sources:

For these languages:

  • C, C++, C#, Objective-C, and Java put the identifier after the type.
    • These use <type> <identifier>, with no keyword.
  • Python, Go, TypeScript, Visual Basic, SQL, and Swift put the type after the identifier.
    • Python uses <identifier>: <type>, with no keyword. This was added in Python 3.6, and reflects language evolution.
    • Go uses var <identifier> <type>, with no colon.
    • TypeScript and Swift use var <identifier>: <type>.
    • Visual Basic uses Dim <identifier> as <type>.
    • SQL uses DECLARE @<identifier> AS <type>, where AS is optional.
  • JavaScript, R, Ruby, PHP, Shell, and Assembly language do not specify types in the same ways as other languages.

Type elision

First, with Carbon it's been discussed to require use of auto (or a similar explicit syntax marker) instead of allowing developers to elide the type entirely from a var statement. In other words, while var Int x = 0; is valid, and var auto x = 0; is equivalent, there is no form such as var x = 0; which removes the type entirely.

Most languages that write var x: Int also allow eliding the type when assigning a value. For example, Go allows x := 0 and Swift allows var x = 0;. As a result, there is no need for an auto keyword.

This would be more surprising with var Int x syntax because removing the Int now places the identifier immediately after var, where the type normally is. This may be subtly confusing to developers. However, if auto is required for explicitness, the issue is moot.

Retaining auto does not eliminate the need to consider type elision as part of advantages and disadvantages: if the type is put after the identifier and var x: auto syntax is used, it now becomes an inconsistency with other languages. This inconsistency would be a Carbon innovation that may confuse developers, leading to long-term pressure to remove auto for consistency with similar languages, and thus a disadvantage.

Advantages and disadvantages

Advantages:

  • Consistent with languages that put a : in the type declaration.
    • Notable languages using x: Int syntax include Swift, Rust, Kotlin and TypeScript. Go is similar but does not include :.
    • Swift puts argument labels before variable declaration. Keeping the identifier first allows consistency with Swift's argument label syntax while also keeping names adjacent.
  • More opportunity to unify a concept of putting the identifier immediately after the keyword.
    • fn, import, and package will all have the identifier immediately after the keyword, with non-identifier content following.
      • It's expected that a typical function declaration will look like fn Foo(Int bar), with var only added when a storage for a copy is required. Thus, this advantage is primarily about the function identifier Foo, not parameter identifiers.
    • Other cases, such as alias, are likely flexible and could follow var in concept by putting the resulting name at the end. That is, alias To = From; for consistency with var x: Int; versus alias From as To; similar to var Int x; ordering.

Disadvantages:

  • If Carbon doesn't support type elision, it creates an inconsistency with other notable languages using x: Int syntax.
  • The ordering of x: Int is inconsistent with C++ variable syntax.
  • Early signs are that putting the identifier first makes it slower for developers to answer the question, "what is the type of variable x?"
  • Popular languages tend to use int x syntax, including C, Java, C++, and C#. Other notable languages include Groovy and Dart.

Conclusion

We should conduct a larger study on the topic of type and identifier ordering and syntax. Until then, we should adopt C++-like syntax. This meets the migration sub-goal

Familiarity for experienced C++ developers with a gentle learning curve, and allows applying the higher-priority goal Code that is easy to read, understand, and write if supporting evidence is found.

Experiment: The ordering of type and identifier will be researched.