Every Elm value has a header of 32 bits in size. It's defined in types.h
-----------------------------------------------------
| tag (4 bits) | size (28 bits) | Elm data |
-----------------------------------------------------
size
is measured in words, where a word is either 32 or 64 bits, depending on the target platform. It makes sense to use words rather than bytes because all values are aligned to word boundaries anyway. For example in a 32-bit system, we'll always place our values at addresses that evenly divide by 4 bytes. Real CPUs are optimised to work faster when pointers are aligned this way.
The only individual value that can get really large in practice is String
. (Lists don't count, they are made up of many Cons cells.) A maximum value of 228-1 for size
corresponds to 1 GB on a 32-bit system or 4 GB on a 64-bit system.
We always use 32-bit headers, even on 64-bit systems. 1GB is large enough, there's no point increasing the header size. Wasm is always 32 bits but since we're using C as an intermediate language, we can also create native 64-bit binaries. That's how I run most of my tests.
To explain how the type tag in the header works, we need to discuss constrained type variables. This is the feature of Elm that allows some functions like ++
, +
and >
, to work on more than one, but not all types.
To facilitate this, we insert a "tag" as metadata into the byte level representation of every Elm value. The tag is a 4-bit number carrying information about the type and memory layout of the value. For example, the low-level implementation for ++
needs to know whether its arguments are Lists or Strings because the memory layout for each is totally different. Using the tag data, it can decide which of two code branches to execute.
Tag | Type | number | comparable | appendable |
---|---|---|---|---|
0 | Int |
✓ | ✓ | |
1 | Float |
✓ | ✓ | |
2 | Char |
✓ | ||
3 | String |
✓ | ✓ | |
4 | List |
✓ | ✓ | |
5 | Tuple2 |
✓ | ||
6 | Tuple3 |
✓ | ||
7 | Custom | |||
8 | Record | |||
9 | Closure |
The remaining 6 possible values (a
→f
) are reserved for Garbage Collector record-keeping data.)
For more details see the header file defining the relevant structs, or see the output of some tests in your browser.
The C language doesn't allow you to pass functions around as values, nor to "partially apply" them. But it does allow function pointers to be passed around as values. We represent an Elm function as a data structure called Closure
that contains a pointer to a C function and pointers to any partially-applied arguments. This structure is used by the "function application" operator Utils_apply
, which implements features like partial application, higher-order functions, and so on.
Let's look at the Wasm representation of the partially applied Elm function (+) 5 : Int -> Int
. This function adds 5 to any integer, and its representation is shown in the diagram below.
The header indicates that it's a Closure
with a size
of 4 words (a "word" being 32 bits). It has one applied value (n_values=1
) and expects 2 values to be applied in total (max_values=2
). The evaluator
field points to the C function Basics_add_eval
, which will be called when the last argument is applied. The values[0]
field points to the partially applied argument literal_int_5
.
A working example of all of this can be the tests for the apply
operator. Read the source or run the tests in your browser. You can also check out a blog post I wrote about how to implement Elm first-class functions in WebAssembly. (The implementation of n_values
and max_values
has changed since the post was written but otherwise it's the same.)
A good intro to Elm extensible records can be found here. In this project they are split into two C structs, Record
and FieldSet
, defined in types.h.
Field names are represented as integer "field IDs". The compiler would convert every field name in the program to a unique ID, using the same kind of optimisation the Elm 0.19 compiler uses to shorten fieldnames in --optimize
mode.
The Record
struct stores only the values, in ascending order of the corresponding field IDs. The field IDs themselves are stored in a FieldSet
, a single structure shared by all values of the same Record type, in ascending order. To access a field by its field ID, we first look up the field ID in the FieldSet
. If it's in the nth position, then the corresponding value will also be in the nth position in the Record itself.
Elm has special functions for accessing records, prefixed by a dot, like .name
, which can be applied to any Record type that contains a field called name
. It's implemented using a Kernel function that takes the field ID as an Elm Int
, and the record itself.
access : Int -> r -> a
access fieldId record =
-- Kernel C code
-- 1. Look up the FieldSet that this record points to
-- 2. Find the index of `fieldId` in that FieldSet (binary search)
-- 3. Return the value found at the value at that same index in `record`
The compiler would insert code to create each accessor function by partially applying the relevant fieldId
to access
function in the generated code.
-- Compiler inserts something roughly equivalent to this to to define `.name`
.name record = access 123 record -- where 123 is the field ID for 'name'
The implementation is in utils.c (see access_eval
). The code is unsafe if the field does not actually exist in the record, but it can only be called in compiler-generated code.
Elm's record update syntax is r2 = { r1 | field1 = newVal1, field2 = newVal2 }
Currently, Elm implements this using a JavaScript function. We do something similar here with a C function called record_update
, found in utils.c. A pseudo-code version is below.
Clone the original record
For each field ID to be updated
Find the index of the field ID in the FieldSet
Change the pointer in the clone at the same index to point at the updated value
Read the source code or run the tests in your browser.
In this project, all values are "boxed" - i.e. they have a header that contains some metadata. They're all stored on the heap, and are referred to via a pointer. This setup makes a lot of sense for more complex value types like lists, tuples, records, strings. But for integers it can be a lot of overhead. The +
operator has to fetch two structures from memory, separate the integer from its header, add the numbers, wrap the new value in a new data structure, and write it back to memory. For a numerical expression like a-(b+c)*d
, or more complex expressions, this can be expensive.
Many language implementations "unbox" integers, so they're represented directly as machine integers without any wrapper or metadata. This can be a big performance gain for some common code patterns, but it requires a lot of book-keeping. It can be hard to tell the difference between integers and pointers, you need some system to keep track of what's what.
In this project I've avoided unboxing integers because it seems like it would be a major piece of work. I'd rather try to build a working implementation first, and optimise later.
However there are some relatively simple compiler optimisations that could reduce the cost of boxing in the nearer term. For a start, we could translate an Elm expression like a-(b+c)*d
into the equivalent expression in C, and box only the final result rather than the result of each subexpression. This optimisation should be limited to just the code generator. In fact the Elm compiler's JS code generator already has some special handling for numerical operators.