There are really two stacks:
-
data stack for function arguments and local values
-
call stack for return addresses and debug data
(top) +---------------+ (lower address) | 2nd local | +---------------+ | 1st local | base +===============+ | 1st non-local | \ +---------------+ > tuple with nonlocals | 2nd non-local | / (args) +===============+ | 1st argument | \ +---------------+ > tuple with arguments | 2nd argument | / (ret) +---------------+ (higher address)
Data stack grows down, to lower addresses.
Arguments are pushed in reversed order, which has two benefits:
-
Functions can consume arguments from top: pull first arg, pull second arg, push result.
-
Tuple arguments are equivalent to unpacked fields. It doesn’t matter if each argument is passed separately or all of them are passed together as a tuple.
Non-locals are pushed as a tuple, above the arguments. They are effectively pushed in reversed order, the same way as the arguments.
Only the base
address is saved in the Call Stack, the other addresses are
computed by the compiler and encoded directly into instructions.
For example, when compiling the function, compiler knows if it has
any non-locals, so it can compute (args)
offset from base
and use that
when referencing arguments. It can compute address of each argument in the
same way.
Everything below base
is static - the addresses don’t change
while the function runs. On the other hand, the locals area above base
is dynamic, and the stack (top)
can move freely.
Call stack is maintained solely by the VM. When a function reaches end of its code, the VM is responsible for popping the frame from the call stack and restoring program counter of the caller site.
The call stack frame contains:
-
return address: function / instruction
-
base pointer: parameters and non-locals are below, local values above
The base pointer is accessible to program via instructions which take offset from current base as their parameter.
The caller pushes args to stack, in reversed order (1st arg last). Then it either calls a static function (CALL), or executes function object (EXECUTE). If the function object contains closure, it is unpacked on stack by the VM before calling the function.
Static function:
Caller code: Stack when entering the called function: PUSH <2nd arg> base +===============+ PUSH <1st arg> | 1st argument | CALL <static func> +---------------+ | 2nd argument | +---------------+
Function object:
Caller code: Stack when entering the called function: PUSH <2nd arg> base +===============+ PUSH <1st arg> | non-locals | PUSH <function object> +===============+ EXECUTE | 1st argument | +---------------+ | 2nd argument | +---------------+
The called function is responsible for consuming the arguments and non-locals. Those have to be replaced by returned value before exiting.
Callee code: Stack when returning to the caller: PUSH <ret val> base +===============+ DROP <N args + nonloc> . . // RETURN top +---------------+ | return value | +---------------+
Machine instructions like ADD
, GREATER_THAN
behave in the same way:
they expect the operands in reversed order and replace them with the result.
Some values live on the heap, with only a pointer on the stack. All heap values are reference-counted. When all references are removed from stack, the heap value is freed immediately.
As all values are immutable, we can still think about the values as if they were directly on the stack. An optimizing compiler can decide if a value of specific kind goes to heap or stack, and it may also decide to modify the heap value in place, if it has only a single reference.
All heap values have a header with refcount and pointer to deleter:
4B refcount zB deleter (function pointer) xB data
The size of data is not part of the header, but may be the first item of the data (this is the case for strings and arrays).
String values live on heap. A pointer to the heap is pushed to stack in place of the value.
Heap value:
(4B+zB header) 4B size xB UTF-8 data (x = size)
The header is common for all heap values, allowing universal management for heap values of any type.
The string itself has 32bit size and UTF-8 data. The size is in bytes.
The number of Unicode characters is not directly saved,
it has to be computed by walking through the data whenever needed.
If this is an issue, use [Char]
, which is written as UTF-32 string.
Heap value:
(4B+zB header) 4B length (number of elements) 2B size of deleter data vB deleter data (v = size of deleter data) xB element data (x = length * size of element)
First field is LEB128-encoded number of elements. The size of each element is not directly saved, it must be passed via TypeInfo. This means that inspecting the heap value is not possible without knowing the actual type of the value. Note that without the type, it also wouldn’t be possible to tell String from List etc.
A list element can be of any type, including another list, string or tuple. These types may need a deleter, to decref their heap slots. The deleter data contains offsets of all heap slots in an element. It is dynamically sized, encoded as an array of LEB128 integers. The size of deleter data is written in 2B integer field, so the data can be skipped when reading elements. If the element doesn’t contain any heap slots, the deleter data are empty. Otherwise, it contains relative offsets of the slots. First one is offset from beginning of an element to first heap slot, next is offset from this slot to next one, and final one is offset to the end of the element (no heap slot there). The deleter gathers these values, skips each offset bytes, calls decref on the heap slot, then skips the final bytes and repeats with next element.
04 first heap slot is 4 bytes from beginning 10 another heap slot is 16 bytes from previous (20 bytes from beginning) 08 final 8 bytes to reach end of the element
Note that sum of the offsets is equivalent to size of the element.
Heap value:
(4B+zB header) zB function (pointer) xB closure values (tuple)
A closure is composed of a function and tuple of values. The function is stored as a pointer to Function object. It’s the system pointer, so its size depends on system machine, e.g. for 64bit CPU, it’s 8 bytes.
The closure values are a tuple of nonlocals and partial call arguments. The size of the tuple and its content depends on the function’s type information.
The instructions and their operands are encoded in a bytecode representation. There are 256 possible opcodes, each of which can have any number of bytes as operands. Some instructions use variable-length operands, so the instruction code is not enough to know how many following bytes are the operands. We have to look inside their bits.
Example instructions:
-
NOOP
- Do nothing. Useful as placeholder or replacement of deleted code when generating the machine bytecode. -
LOGICAL_OR
- Pull two Bools from stack and push a Bool result of||
operation back. -
SUBSCRIPT
- Has a type T as an instruction operand. Pulls a list of T and an Int index from stack. Pushes result of subscript operation back.
Note
|
All instructions are documented in Code.h and Machine.cpp .
|
Types are stored in modules. Any type from the current module can be referenced by an instruction. The information that needs to be encoded in the instruction operands is the index of the type in the module’s type table.
Built-in primitive types are expected to be referenced frequently, so they have special encoding: the first 32 indices are reserved for them. Types from current module are encoded as 32+idx using LEB128 encoding.
Variable length integers are used for encoding various indexes. The most-significant bit is reserved as the continuation bit, the other bits carry the actual data.
Continuation bit:
-
0
- This is the final byte. -
1
- More bytes need to be processed.
The data bits are joined in little-endian order, i.e. the first byte always contains the least-significant bits, the bits from the following bytes are then appended as more significant bits (without shifting the already collected bits).
Example:
-
0b_0000_0000
⇒ 0 -
0b_0000_1000
⇒ 8 -
0b_0111_1111
⇒ 127 (maximal single-byte value) -
0b_1000_1000, 0b_0101_1001
⇒0b101_1001_000_1000
⇒ 11400
Spare bits from the previous byte (which already contain some other information) can be used as the first "byte" of Varint encoding. In this case, the first byte is shorter, providing less information bits to the output integer. All other rules still apply.
Example: First byte has only 5 bits (contributing 4 bits to the result):
-
0b_1_1000, 0b_0101_1001
⇒0b101_1001_1000
⇒ 1432
Let’s see simple if-expression and the resulting unoptimized bytecode.
if num > 10 then 10 else 0;
Translates to:
LOAD_STATIC (10) CALL0 (num Void -> Int32) CALL (gt Int32 Int32 -> Bool) JUMP_IF_NOT .else LOAD_STATIC (10) JUMP .end .else: LOAD_STATIC (0) .end:
The same happens with a multi-branch if-expression.
if num > 10 then 10
if num > 5 then 5
else 0;
Translates to:
LOAD_STATIC (10) CALL0 (num Void -> Int32) CALL (gt Int32 Int32 -> Bool) JUMP_IF_NOT .cond1 LOAD_STATIC (10) JUMP .end .cond1: LOAD_STATIC (5) CALL0 (num Void -> Int32) CALL (gt Int32 Int32 -> Bool) JUMP_IF_NOT .else LOAD_STATIC (5) JUMP .end .else: LOAD_STATIC (0) .end:
The logic is always the same:
-
condition code
-
JUMP_IF_NOT to next condition (or else branch if it was the last one)
-
then-expression code
-
JUMP to the end (an instruction right after the whole if-expression)
-
(repeat for another if-then branch)
-
else-expression code