@@ -10,6 +10,111 @@ make test-simple
10
10
How to run ` life.fs ` and the hash_xors
11
11
12
12
``` sh
13
- ./compile-and-run.sh forth_programs/life/life.fs < forth_programs/life/starting_board.txt
14
- ./compile-and-run.sh forth_programs/hash_xor/hash_xor.fs < forth_programs/hash_xor/hash_input.txt
13
+ ./compile-and-run.sh vm forth_programs/life/life.fs < forth_programs/life/starting_board.txt
14
+ ./compile-and-run.sh vm forth_programs/hash_xor/hash_xor.fs < forth_programs/hash_xor/hash_input.txt
15
15
```
16
+
17
+ ## Performance Measurements
18
+
19
+ | Test | Gforth | mcp-forth vm -O3 | mcp-forth x86 -O3 | C equivalent -O3 |
20
+ | -------------------------------- | ------- | ---------------- | ----------------- | ---------------- |
21
+ | SPI pixel data compression | 12.754s | 1m48.735s | 12.978s | 0.432s |
22
+
23
+ See the bemchmarks directory for the test source files.
24
+
25
+ ## Why
26
+
27
+ I need portable driver code.
28
+
29
+ An imaginary display module has a Forth program in its ROM which is retrieved by the host
30
+ and executed to copy updated areas to the display. This display is not trivially controlled.
31
+ It receives a compressed stream of data over SPI. The Forth program has a function with
32
+ a host-expected signature like ` ( x1 y1 x2 y2 src -- ) ` which implements the compression
33
+ algorithm and uses host-provided facilities to achieve the SPI transmission. It needs
34
+ to be fast so mcp-forth must be capable of producing somewhat optimal machine code. Forth is
35
+ the chosen language instead of C because I need the compiler to use a minimal amount of
36
+ memory to do the compilation. The mcp-forth compiler is intended to run on hosts which are
37
+ MCUs with RAM around the range of 100 kB to 10 MB. A secondary requirement is the host cross-compiling
38
+ Forth programs to run on peripheral MCUs with as little as 8 kB of RAM.
39
+
40
+ ## Supported Architectures
41
+
42
+ The bytecode VM is one of the available compiler backends for ease of development and while
43
+ having some freestanding merit such as being portable to platforms for which there isn't a
44
+ native backend yet. It will likely always boast the smallest binary size across architectures
45
+ due to using a stream of variable-length numbers as its encoding for opcodes and operands.
46
+ It is the default choice for testing new Forth code due to having over/underflow/run
47
+ checks where the other backends may forgo safety in favor speed. In summary, it is the default
48
+ choice unless speed is a requirement.
49
+
50
+ Currently supported architectures:
51
+
52
+ - Interpreted bytecode VM (explained above)
53
+ - x86-32
54
+
55
+ Planned:
56
+
57
+ - ARM Thumb (Cortex M0+)
58
+ - Xtensa (LX6, i.e. ESP32)
59
+
60
+ ## 32 Bits
61
+
62
+ For simplicity, mcp-forth only supports 32 bit. The C code that implements the runtimes assumes
63
+ that both ` int ` and ` void * ` are 32 bits wide. The Forth cell size in mcp-forth is always 4 bytes. Pointers
64
+ can be transparently handled as integers. The compiler can work on 64 bit machines so cross-compiling
65
+ Forth programs on a 64 bit host is possible but running the output is only possible if
66
+ the host supports some kind of 32 bit mode where pointers are 32 bits wide. This means that
67
+ Apple M1, M2, M3, and M4 processors cannot execute the output of the mcp-forth compiler even with the
68
+ VM runtime because they have no support for 32 bit programs. An emulator such as Qemu is required.
69
+
70
+ ## Non-standard Quirks
71
+
72
+ - ` C' <word> ` works like ` ' <word> ` except it creates a C function pointer from the word so that
73
+ Forth words can be used as C callback functions. The number of parameters and optional
74
+ return value is derived from the ` ( -- ) ` signature and an error is raised at compile time
75
+ if the signature is missing. The ` ( -- ) ` signature has no other semantic meaning besides this.
76
+ - Currently, defined words must only be used after they're defined or else a compile time
77
+ error is raised.
78
+ - Any word that was not found at compile time is a runtime dependency and must be provided by
79
+ the runtime.
80
+ - Gforth's "compile time only words" can be used outside of functions in mcp-forth.
81
+ - ` UNLOOP ` is not required (and will be a no-op if added in the future)
82
+
83
+ ## Minutia
84
+
85
+ ### Iterative "Fragment Solving"
86
+
87
+ Fragments aka snippets of machine code have variable sizes depending on their operands. If a jump
88
+ instruction jumps somewhere nearby, it may only use 1 byte to encode the offset, otherwise 4 bytes.
89
+ Literal values are similar. An immediate literal may be loaded into a register differently
90
+ depending on its size. Some architectures require multiple instructions to load larger immediate
91
+ literal values.
92
+
93
+ Given that the jump distance may not be known at the time of a jump fragment's creation, the
94
+ collection of all fragments at the end of compilation must be solved in an iterative way to
95
+ achieve optimal packing.
96
+
97
+ Question: will iterative solving ever cause the compiler to hang in an infinite loop that it can't solve?
98
+
99
+ ### Optimizing Compiler Memory Usage
100
+
101
+ The compiler allocates a few arrays which it repeatedly appends elements to during compilation
102
+ and resizes them when their capacities are exceeded. There are no small allocations since the overhead
103
+ of N allocations of a small struct may be greater than an allocated contiguous array of N small structs.
104
+
105
+ Strings referring to source code tokens are not allocated arrays of bytes.
106
+ They are pointers into the source code and a length.
107
+
108
+ Since struct references are always being invalidated due to array resizing, struct references
109
+ are stored as array indices instead of pointers.
110
+
111
+ Question: is it a good or bad idea to reduce memory usage by:
112
+
113
+ - Storing strings as only a pointer with no length. The strings are whitespace-terminated since they
114
+ point inside the source code. There is a special case for a string that is the last token in the
115
+ source with no following whitespace.
116
+ - Extending the previous point, should 16 bit offsets into the source be used instead of 32 bit pointers?
117
+ - For indices into arrays of structs, should 16 bit indices be used instead of 32 bit ints?
118
+ - For a case where less-than-32-bit integer types are used to store offsets, can all the members of
119
+ an array of offsets be dynamically promoted as needed? The first offset that exceeds 65535 would
120
+ cause the array to be converted from an array of 16 bit offsets to an array of 32 bit offsets.
0 commit comments