|
| 1 | +# Interpreter Architecture |
| 2 | + |
| 3 | +AIScript's interpreter architecture follows a traditional compilation pipeline with modern enhancements for flexibility, performance, and AI integration capabilities. This chapter aims to help new contributors understand the system's architecture, key components, and how they interact. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The AIScript interpreter follows these main stages: |
| 8 | + |
| 9 | +**Lexical Analysis** → **Parsing** → **Type Checking** → **Code Generation** → **Virtual Machine Execution** |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +This design allows for clear separation of concerns while maintaining flexibility for language evolution. Let's explore each component in detail. |
| 14 | + |
| 15 | +## Lexical Analysis (Lexer) |
| 16 | + |
| 17 | +The [lexer](https://github.com/aiscriptdev/aiscript/tree/main/aiscript-lexer) is the first stage of compilation, responsible for converting source code text into a stream of tokens. Each token represents a meaningful unit in the language (like keywords, identifiers, operators, and literals). |
| 18 | + |
| 19 | +```rust |
| 20 | +// Example of how the lexer works: |
| 21 | +// Input: "let x = 10 + 20;" |
| 22 | +// Output: [Token(Let), Token(Identifier, "x"), Token(Equal), Token(Number, "10"), |
| 23 | +// Token(Plus), Token(Number, "20"), Token(Semicolon)] |
| 24 | +``` |
| 25 | + |
| 26 | +Key responsibilities: |
| 27 | +- Breaking source code into tokens |
| 28 | +- Handling string/numeric literals |
| 29 | +- Managing line numbers for error reporting |
| 30 | +- Skipping whitespace and comments |
| 31 | +- Recognizing keywords and operators |
| 32 | + |
| 33 | +Important structures in the lexer: |
| 34 | +- `TokenType` enum: Defines all possible token types |
| 35 | +- `Token` struct: Contains the token type, lexeme (original text), and line number |
| 36 | +- `Scanner` struct: Manages the scanning state and provides methods for token consumption |
| 37 | + |
| 38 | +## Parsing |
| 39 | + |
| 40 | +The [parser](https://github.com/aiscriptdev/aiscript/tree/main/aiscript-parser) converts the token stream into an Abstract Syntax Tree (AST), which represents the hierarchical structure of the program. AIScript uses a recursive descent parser with Pratt parsing for expressions. |
| 41 | + |
| 42 | +Key components: |
| 43 | +- AST node definitions |
| 44 | +- Parsing functions for statements and expressions |
| 45 | +- Precedence handling for operators |
| 46 | +- Error recovery mechanisms |
| 47 | +- Type annotation handling |
| 48 | + |
| 49 | +The parser also performs some early validation, such as: |
| 50 | +- Checking for valid syntax |
| 51 | +- Validating enum variants |
| 52 | +- Ensuring valid function declarations |
| 53 | +- Validating match patterns |
| 54 | + |
| 55 | +## Type Checking and Resolution |
| 56 | + |
| 57 | +AIScript includes a type checking system ([ty/resolver.rs](https://github.com/aiscriptdev/aiscript/tree/main/aiscript-vm/src/ty)) that validates types at compile time when possible. This improves error detection before runtime and enables better performance optimizations. |
| 58 | + |
| 59 | +Main features: |
| 60 | +- Type annotation validation |
| 61 | +- Class and enum type checking |
| 62 | +- Validation of object literals against class definitions |
| 63 | +- Function parameter type checking |
| 64 | +- Error type validation |
| 65 | + |
| 66 | +The type resolver is introduced early in the parsing phase to catch type errors as soon as possible. |
| 67 | + |
| 68 | +## Code Generation |
| 69 | + |
| 70 | +The [code generator](https://github.com/aiscriptdev/aiscript/blob/main/aiscript-vm/src/compiler/codegen.rs) transforms the AST into bytecode that can be executed by the virtual machine. This phase also performs several optimizations. |
| 71 | + |
| 72 | +Key aspects: |
| 73 | +- Generation of VM opcodes from AST nodes |
| 74 | +- Handling variable scope and closures |
| 75 | +- Managing function parameters and defaults |
| 76 | +- Implementing control flow (if/else, loops, etc.) |
| 77 | +- Error handling code generation |
| 78 | +- Enum and class compilation |
| 79 | + |
| 80 | +The code generator produces a set of functions with associated bytecode chunks, which are then executed by the VM. |
| 81 | + |
| 82 | +## Virtual Machine |
| 83 | + |
| 84 | +The [virtual machine](https://github.com/aiscriptdev/aiscript/blob/main/aiscript-vm/) is a stack-based interpreter that executes the generated bytecode. It maintains execution state and provides runtime facilities like garbage collection. |
| 85 | + |
| 86 | +Important components: |
| 87 | +- Call frames for function invocation |
| 88 | +- Value stack for computations |
| 89 | +- Global and local variable storage |
| 90 | +- Upvalue handling for closures |
| 91 | +- Garbage collection (via `gc_arena`) |
| 92 | +- Runtime error handling |
| 93 | + |
| 94 | +The VM also handles built-in functions, modules, and AI operations. |
| 95 | + |
| 96 | +## Value Representation |
| 97 | + |
| 98 | +Values in AIScript are represented using a tagged union approach, allowing efficient storage and manipulation of different data types: |
| 99 | + |
| 100 | +```rust |
| 101 | +pub enum Value<'gc> { |
| 102 | + Number(f64), |
| 103 | + Boolean(bool), |
| 104 | + String(InternedString<'gc>), |
| 105 | + IoString(Gc<'gc, String>), |
| 106 | + Closure(Gc<'gc, Closure<'gc>>), |
| 107 | + NativeFunction(NativeFn<'gc>), |
| 108 | + Array(GcRefLock<'gc, Vec<Value<'gc>>>), |
| 109 | + Object(GcRefLock<'gc, Object<'gc>>), |
| 110 | + Enum(GcRefLock<'gc, Enum<'gc>>), |
| 111 | + EnumVariant(Gc<'gc, EnumVariant<'gc>>), |
| 112 | + Class(GcRefLock<'gc, Class<'gc>>), |
| 113 | + Instance(GcRefLock<'gc, Instance<'gc>>), |
| 114 | + BoundMethod(Gc<'gc, BoundMethod<'gc>>), |
| 115 | + Module(InternedString<'gc>), |
| 116 | + Agent(Gc<'gc, Agent<'gc>>), |
| 117 | + Nil, |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +This design allows for efficient operations while supporting garbage collection and reference semantics. |
| 122 | + |
| 123 | +## Memory Management |
| 124 | + |
| 125 | +AIScript uses the [gc_arena](https://github.com/kyren/gc-arena) crate for memory management, which provides: |
| 126 | +- Tracing garbage collection |
| 127 | +- Memory safety through lifetime parameters |
| 128 | +- Efficient allocation and collection |
| 129 | +- Cycle detection |
| 130 | + |
| 131 | +All heap-allocated objects are wrapped in `Gc` or `GcRefLock` pointers, allowing the garbage collector to track and manage memory. |
| 132 | + |
| 133 | +## AI Integration |
| 134 | + |
| 135 | +AIScript has special handling for AI operations: |
| 136 | +- `prompt` for sending requests to AI models |
| 137 | +- `Agent` system for complex AI interactions |
| 138 | +- AI function compilation and execution |
| 139 | + |
| 140 | +## OpCode System |
| 141 | + |
| 142 | +AIScript uses a bytecode instruction set defined in [chunk.rs](https://github.com/aiscriptdev/aiscript/blob/main/aiscript-vm/src/chunk.rs): |
| 143 | + |
| 144 | +```rust |
| 145 | +pub enum OpCode { |
| 146 | + Constant(u8), // Load constant value |
| 147 | + Return, // Return from function |
| 148 | + Add, Subtract, // Arithmetic operations |
| 149 | + GetLocal(u8), // Get local variable |
| 150 | + SetLocal(u8), // Set local variable |
| 151 | + // ... many more instructions |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +Each instruction operates on the VM's stack and affects program execution flow. |
| 156 | + |
| 157 | +## How to Contribute |
| 158 | + |
| 159 | +Now that you understand the architecture, here are some ways to contribute: |
| 160 | + |
| 161 | +1. **Start Small**: Look for issues labeled "good first issue" in our GitHub repository. |
| 162 | + |
| 163 | +2. **Improve Error Messages**: Clear error messages help users debug their code. The parser and VM error systems are good places to contribute improvements. |
| 164 | + |
| 165 | +3. **Add Language Features**: New syntax features typically require changes to the lexer, parser, code generator, and VM. |
| 166 | + |
| 167 | +4. **Optimize Performance**: Look for opportunities to improve bytecode generation or VM execution. |
| 168 | + |
| 169 | +5. **Enhance Type System**: Contribute to the type resolver to improve static analysis capabilities. |
| 170 | + |
| 171 | +6. **Fix Bugs**: Bug fixes are always valuable contributions. |
| 172 | + |
| 173 | +Before making significant changes, please open a GitHub issue to discuss your approach with the community. This ensures your efforts align with the project's goals and direction. |
| 174 | + |
| 175 | +## Development Workflow |
| 176 | + |
| 177 | +1. Fork the repository on GitHub |
| 178 | +2. Create a feature branch |
| 179 | +3. Make your changes, following our code style |
| 180 | +4. Add tests for your changes |
| 181 | +5. Run the existing test suite to ensure nothing breaks |
| 182 | +6. Submit a pull request with a clear description of your changes |
| 183 | + |
| 184 | +## Code Organization Conventions |
| 185 | + |
| 186 | +- Each major component has its own module |
| 187 | +- Opt for composition over inheritance |
| 188 | +- Prefer immutable data where possible |
| 189 | +- Use descriptive naming |
| 190 | +- Document public APIs with comments |
| 191 | +- Follow Rust's naming conventions |
| 192 | + |
| 193 | +The AIScript interpreter is designed to be modular and extensible, making it possible for contributors to work on different parts independently. We're excited to see what you'll build with us! |
0 commit comments