Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
4dc61de
no LamPrint downstream of tc
billhails Jan 14, 2026
c252140
no typeof downstream of tc
billhails Jan 14, 2026
e23ae01
stages 1-4 complete, build passes
billhails Jan 15, 2026
313a65c
stages 1-4 complete, build passes
billhails Jan 15, 2026
31ae3fb
no-op desugaring in place
billhails Jan 16, 2026
623be5d
no-op desugaring in place
billhails Jan 16, 2026
4a18522
desugared print
billhails Jan 16, 2026
d1101f7
desugared let
billhails Jan 16, 2026
4ba8dbb
desugared let*
billhails Jan 16, 2026
ab02872
desugared typeof
billhails Jan 16, 2026
95f9177
desugared construct
billhails Jan 16, 2026
a4c5125
desugared deconstruct
billhails Jan 16, 2026
d01966f
desugared constant
billhails Jan 17, 2026
0ed24c5
desugared TypeConstructorInfo
billhails Jan 17, 2026
8a593b0
desugared makeTuple
billhails Jan 17, 2026
fe49e9e
desugared tag
billhails Jan 17, 2026
dc00f36
desugared tupleIndex
billhails Jan 17, 2026
a9ce9ea
desugared typedefs
billhails Jan 17, 2026
4352943
desugared typedefs
billhails Jan 17, 2026
14f0f9e
desugared intList
billhails Jan 17, 2026
f019ff1
makeVec is now just a list of args
billhails Jan 17, 2026
62176f5
clean up after makeVec change
billhails Jan 17, 2026
babc2c4
unified MinExprList
billhails Jan 17, 2026
f56f956
removed unused lookup name
billhails Jan 17, 2026
c853b3b
flag to view desugared code
billhails Jan 17, 2026
4865d9f
pp should never abort
billhails Jan 18, 2026
96d3029
more to do
billhails Jan 18, 2026
59c6f4c
yet more to do
billhails Jan 18, 2026
4c57c41
a bit less to do
billhails Jan 18, 2026
f58bbc2
new common utils.yaml
billhails Jan 22, 2026
beb1944
eq functions now generated to the main file
billhails Jan 22, 2026
92b4529
compare is now eq throughout the generator
billhails Jan 22, 2026
02f2500
clear distinction between cmpFn and eqFn
billhails Jan 22, 2026
4c4d873
AgnosticFileId replaced by generated FileId
billhails Jan 22, 2026
6b0c964
two fewer mallocs
billhails Jan 22, 2026
f40a781
gradually replacing the few remaining mallocs/frees
billhails Jan 23, 2026
12cb612
no malloc or free in the Pratt parser
billhails Jan 24, 2026
6a673bc
all memory now GC, except membuf in builtin io
billhails Jan 24, 2026
3481a32
html breaks instead of newlines in mermaid output
billhails Jan 24, 2026
150b046
cleaned todo, removed vestiges of old AgnosticFileId
billhails Jan 24, 2026
53060cd
fixed unsigned short limit, added builtin getdec
billhails Jan 24, 2026
2262b05
scratch/README
billhails Jan 24, 2026
bbd4431
prep for bytecode fix
billhails Jan 25, 2026
96cbb4b
disallow non functional declarations in namespaces
billhails Jan 25, 2026
72359b7
better error messages in step.c
billhails Jan 25, 2026
c4e6daa
hopefully improved agent instructions
billhails Jan 25, 2026
9bd2367
made more use of the make<Union>_<Field>() functions
billhails Jan 27, 2026
a465139
fn/rewrite is better, also improved some error messages
billhails Jan 28, 2026
1a346cc
codegen improvement proposal (mine)
billhails Jan 28, 2026
d88a900
preliminary generator refactoring complete
billhails Jan 29, 2026
f7b1a4a
added external flag always False
billhails Jan 29, 2026
36797d9
better externals solution implemented
billhails Jan 29, 2026
54dd435
generated comments now also indicate the implementing class
billhails Jan 30, 2026
a161391
prep for removal of primapp from fn/rewrite/minexpr.fn
billhails Jan 31, 2026
8eea8ed
simplified minexpr.fn and finished constant_folding.fn
billhails Jan 31, 2026
71913fb
simplified minexpr.fn and finished constant_folding.fn
billhails Jan 31, 2026
1b5d1df
prep for unification fix
billhails Feb 1, 2026
fad4c5c
unification bug fixed
billhails Feb 1, 2026
a4372cf
subtle unification bug squashed
billhails Feb 1, 2026
d6c8cf9
subtle unification bug fix
billhails Feb 2, 2026
2424807
accounts for recent bug fixes
billhails Feb 2, 2026
8e1bcd7
pseudo-unification now supports multiple occurances
billhails Feb 2, 2026
be2b434
tests less noisy
billhails Feb 2, 2026
0fcb360
tests a lot less noisy
billhails Feb 2, 2026
63f0683
identified over-application bug
billhails Feb 2, 2026
c766bae
fixed another unification bug
billhails Feb 3, 2026
a35a1ee
basic tpmc_match.c refactoring
billhails Feb 3, 2026
9c21ed0
split mixture into sub-functions
billhails Feb 3, 2026
8dc9cd7
section headings
billhails Feb 3, 2026
0a5ce1e
bespoke comparator proposal
billhails Feb 4, 2026
0e76174
fixed convergence test in the type checker
billhails Feb 4, 2026
550f3c8
fixed a bug in the constant folding prototype
billhails Feb 4, 2026
03d888a
bespoke equality spec done
billhails Feb 5, 2026
72242ec
bespoke comparator implementation complete
billhails Feb 5, 2026
87b2c80
bespoke comparator implementation complete
billhails Feb 5, 2026
5440020
constant folding prototype uses commutative operators
billhails Feb 6, 2026
a4076ed
prep for unify assignment
billhails Feb 6, 2026
1c860b9
unify assignment case B working
billhails Feb 6, 2026
7d56d9c
can_happen takes explicit PI
billhails Feb 6, 2026
1177c67
macro replaced by lazy fn in parser
billhails Feb 6, 2026
425c723
replaced the term "macro" with "lazy" throughout
billhails Feb 7, 2026
7333491
fixed README wrt lazy fn
billhails Feb 7, 2026
2423d65
shared SymbolList type
billhails Feb 7, 2026
96bbe51
moved bits of re-usable code into utils_helper.[ch]
billhails Feb 7, 2026
2f9b81a
basic set operations on SymbolSets
billhails Feb 7, 2026
a345b8b
added wchar equivalents to utils, updated agents docs to mention utils
billhails Feb 7, 2026
7245fc8
prep for (non-)lazy operators
billhails Feb 7, 2026
2607059
fixed memory leak
billhails Feb 8, 2026
8e29c38
lazy operators done
billhails Feb 8, 2026
85b3653
catch and report occurs-in pattern matching error
billhails Feb 9, 2026
b46d50e
currying prototype
billhails Feb 9, 2026
c5c45a6
unpacking tuple assignment allows wildcards
billhails Feb 10, 2026
34ae476
test_tuple_assign
billhails Feb 10, 2026
a8fdeb1
maybe.some -> maybe.just so "some" can be used for amb.some_of
billhails Feb 11, 2026
22e582c
conversionError now just can_happen
billhails Feb 11, 2026
1605cab
removed substError
billhails Feb 11, 2026
9822d43
vcan_happen
billhails Feb 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
804 changes: 96 additions & 708 deletions .github/copilot-instructions.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ $(TEST_DEP): $(DEPDIR)/%.d: $(TSTDIR)/src/%.c .generated | $(DEPDIR)

test: $(TEST_TARGETS) $(TARGET) $(UNIDIR)/unicode.db
for t in $(TSTDIR)/fn/test_*.fn ; do echo '***' $$t '***' ; ./$(TARGET) --include=fn --assertions-accumulate $$t || exit 1 ; done
for t in $(TSTDIR)/fn/fail_*.fn ; do echo '***' $$t '(expect to see an error) ***' ; ! ./$(TARGET) --include=fn --assertions-accumulate $$t || exit 1 ; done
for t in $(TSTDIR)/fn/fail_*.fn ; do echo '***' $$t '***' ; ! ./$(TARGET) --include=fn --assertions-accumulate $$t >/dev/null 2>&1 || exit 1 ; done
for t in $(TEST_TARGETS) ; do echo '***' $$t '***' ; $$t || exit 1 ; done
@echo "All tests passed."

Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ oi --> scanner
parser --> ast(AST) -->
lc([Lambda Conversion]):::process --> tpmc([Pattern Matching Compiler]):::process
lc <---> pg([Print Function Generator]):::process
lc <---> me([Macro Expansion]):::process
lc <---> me([Lazy Function Expansion]):::process
tpmc --> vs([Variable Substitution]):::process
vs --> lc
lc <--> des([Desugaring]):::process
Expand All @@ -97,7 +97,9 @@ tc --> lambda2(Plain Lambda Form)
lambda2 --> ci([Constructor Inlining]):::process
ci --> lambda3(Inlined Lambda)
subgraph anf-rewrite-2
alpha(["ɑ-Conversion"]):::process
desugaring(["Desugaring"]):::process
desugaring --> lambda_ds(desugared lambda)
lambda_ds --> alpha(["ɑ-Conversion"]):::process
alpha --> lambda_a(alphatized lambda)
lambda_a --> anfr([ANF Rewrite]):::process
anfr --> lambda_b(New ANF)
Expand All @@ -109,7 +111,7 @@ subgraph anf-rewrite-2
lambda_c --> betar(["β-Reduction WiP"]):::process
betar --> lambda_d(simplified)
end
lambda3 --> alpha
lambda3 --> desugaring
lambda3 --> anfc
lambda_a --> anfc([A-Normal Form Conversion]):::process
anfc --> anf(ANF)
Expand All @@ -124,14 +126,15 @@ bc --> cekf([CEKF Runtime VM]):::process
The "anf-rewrite-2" section is a WiP on the `anf-rewrite-2` branch. Although that branch started as a rewrite of the ANF transform, it became apparent that the CEK machine itself was blocking optimizations and so the intention is to target a more "traditional" register machine with an eye towards LLVM in the longer term. On that branch the ɑ-conversion is complete and incorporated (though it achieves nothing for the ANF path it is required for CPS.) The ANF rewrite is complete but abandoned, and the CPS transform is also complete.

The various components named in the diagram above are linked to their implementation entry point here:

* Scanner [pratt_scanner.c](src/pratt_scanner.c)
* Parser [pratt_parser.c](src/pratt_parser.c)
* AST [ast.yaml](src/ast.yaml)
* Lambda Conversion [lambda_conversion.c](src/lambda_conversion.c)
* Tpmc [tpmc_logic.c](src/tpmc_logic.c)
* Print Function Generator [print_generator.c](src/print_generator.c)
* Variable Substitution [lambda_substitution.c](src/lambda_substitution.c)
* Macro Expansion [macro_substitution.c](src/macro_substitution.c)
* Lazy Function Expansion [lazy_substitution.c](src/lazy_substitution.c)
* Plain Lambda Form [lambda.yaml](src/lambda.yaml)
* Simplification [lambda_simplify.c](src/lambda_simplify.c)
* Type Checking [tc_analyze.c](src/tc_analyze.c)
Expand Down
41 changes: 41 additions & 0 deletions docs/BETTER-EXTERNAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Proposal for Replacing the `external` Section in Yaml Files

Currenty, and embarrassingly, the `external` section in the yaml files is just a synonym for the `primitives` section. The intention of the `external` section was to describe specifically memory-managed types from other yaml files, but that requires adding information that is already available from those files.

The idea then is quite simple: rather than the `external` section listing details of those other types:

```yaml
external:
TcType:
meta:
brief: external type from the type-checker
description: A type-checker type referenced by the ANF code.
data:
cname: "struct TcType *"
printFn: printTcType
markFn: markTcType
valued: true
IntMap:
meta:
...
```

it would just contain references to the files themselves:

```yaml
external:
- !include tc.yaml
```

Those includes could be parsed recursively by the existing parser and added to the current catalog.
This causes a couple of problems, but they should be easy enough to solve:

1. How to avoid generating code for the external nodes? Each entry in the catalog has an additional `external` flag, if true it is skipped over by the catalog dispatchers.
2. How to avoid mutually recursive includes? The `include` feature of the yaml is written in `generate/loader.py` and that code could be extended to quietly return an empty object if it sees a file it has already (started to) load.
3. How to avoid primitives being re-entered and causing duplicates? Actually the same solution for recursive includes, `primitives.yaml` would only be parsed once.

I'd imagine that the entire parse and catalog-injection section in `generate.py` would become a function that gets handed an object resulting from a yaml file, along with an `external` boolean flag. It would call itself recursively with each element of any `internal` section that it finds.

The `config` section could be ignored or only partially used if it is `external`.

Of course there wil likely be other problems, probably `#include` directives will need looking at, but the advantages of having direct access to the original definitions is obvious.
4 changes: 4 additions & 0 deletions docs/OPERATORS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Operators (and macros)

Important update if you are reading this: the keyword `macro` has since been
replaced with the sequence `lazy fn` in the parser to better reflect its
nature. However the semantics are unchanged.

Some issues with the initial implementation.

I'd thought I could get away with a pure parser-only implementation of
Expand Down
8 changes: 7 additions & 1 deletion docs/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,15 @@

More of a wish-list than a hard and fast plan.

* Simplify
* Add a beta-reduction pass (after cps-transform).
* Add an eta-reduction pass.
* Add a constant/operator folding pass after beta and eta reduction.
* Target LLVM
* `syntax` construct that allows large-scale syntactic structures to be defined by the user.
* Clean Up.
* Generate
* Move all signatures into `signature_helper.py`, not just the shared ones.
* More numbers:
* NaN for division by Zero etc.
* Matrices.
Expand All @@ -29,4 +36,3 @@ More of a wish-list than a hard and fast plan.
* (internal) have a NEWZ variant of NEW that bzero's its result.
* Builtins
* `now()` builtin returns current time in milliseconds.
* `iter()` returns consecutive integers starting from 0
138 changes: 138 additions & 0 deletions docs/agent/anf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# ANF (Administrative Normal Form) Conversion

Converts lambda expressions to A-Normal Form where all intermediate computations are named.

## What ANF Does

Transforms nested expressions into a flat sequence of let-bindings where:

- **Atomic expressions (aexp)**: Variables, constants, lambdas - always terminate, never error
- **Complex expressions (cexp)**: Function applications, conditionals - may not terminate or may error
- All complex subexpressions become let-bound temporary variables

Example transformation:

```scheme
(a (b c) (d e))
=>
(let (t$1 (d e))
(let (t$2 (b c))
(a t$2 t$1)))
```

## The Algorithm (from Matt Might's blog)

**Core Idea**: Walk expressions depth-first, replacing complex subexpressions with fresh variables, accumulating let-bindings on the way back out.

**Key functions** (all in `src/anf_normalize.c`):

- `normalize(LamExp, tail)` - Main entry point, dispatches on LamExp type
- `replaceLamExp(LamExp, replacements)` - Converts LamExp to Aexp, accumulating replacements
- `letBind(body, replacements)` - Wraps body in let-bindings from replacements table
- `wrapTail(exp, tail)` - Optionally wraps expression in additional let-binding

**The `tail` parameter**: Continuation-like - represents the "rest of the computation" to wrap the current expression in. NULL means this is the final result.

## Implementation Pattern

Most normalize functions follow this pattern:

1. Create a `LamExpTable` for tracking replacements (hash table mapping fresh symbols to LamExps)
2. Call `replaceLamExp()` on subexpressions, which:
- If subexpr is atomic (var/constant), return it as Aexp
- If subexpr is complex (application), generate fresh symbol, add to replacements, return symbol as Aexp
3. Build the ANF construct with replaced Aexps
4. Call `wrapTail(exp, tail)` to optionally wrap in outer binding
5. Call `letBind(exp, replacements)` to wrap in all accumulated let-bindings
6. Return the wrapped expression

## Critical Data Flow

```text
LamExp (lambda.yaml)
↓ normalize()
→ replaceLamExp() + LamExpTable
→ Aexp (atomic expressions)
→ Build ANF structure (Exp/Cexp)
→ wrapTail()
→ letBind() - wraps in let-bindings
Exp (anf.yaml)
```

## Known Complexity Issues

1. **Deeply Nested Functions**: The normalize functions have 30+ dispatch cases, one per LamExp type. Each follows slightly different logic.

2. **GC Protection Overhead**: Extensive use of PROTECT/UNPROTECT macros throughout due to allocations during traversal. Easy to get wrong.

3. **Tail Threading**: The `tail` parameter threads through recursion but its purpose isn't always clear. Sometimes NULL, sometimes accumulated let-bindings.

4. **Dual Type System**: Must track both LamExp (input) and Aexp/Cexp/Exp (output) simultaneously. Easy to confuse which type is which.

5. **Replacements Table**: The `LamExpTable` accumulates symbol→LamExp mappings that become let-bindings, but lifetime and scope isn't always obvious.

## Debugging ANF

```bash
# Enable debug output
# Uncomment DEBUG_ANF in src/common.h

# Dump ANF for inspection
./bin/fn --dump-anf path/to/file.fn
```

**Watch for**:

- Incorrect nesting of let-bindings
- Fresh symbol collisions (shouldn't happen but indicates `freshSymbol()` issues)
- GC crashes (usually from missing PROTECT/UNPROTECT)
- Type mismatches between LamExp and ANF structures

## Potential Improvements

1. **Simplify normalize dispatch**: Could the 30+ cases share more common code?
2. **Clearer tail semantics**: Document when tail is NULL vs. non-NULL
3. **Reduce PROTECT overhead**: Could intermediate allocations be batched?
4. **Better error messages**: When ANF conversion fails, why?
5. **Refactor replacements**: The hash table approach works but is it the clearest?

## Key Files

- `src/anf_normalize.c` - The implementation (1100+ lines)
- `src/anf.yaml` - ANF data structures (Exp, Aexp, Cexp)
- `src/lambda.yaml` - Input lambda structures
- `docs/ANF.md` - Original algorithm notes

## References

- [Matt Might's ANF blog post](https://matt.might.net/articles/a-normalization/)

## Tail Recursion & Wrapping Pitfalls

**Correct tail wrapping is critical**. The `tail` parameter in `normalize` functions represents the "continuation" or "context" that the current expression should return to.

### Incorrect Wrapping (Breaks Tail Recursion)

Wraps the result `t$1` in a new `let` *after* the recursive call returns, forcing a stack frame.

```scheme
;; Source: tail_call(x)
;; Bad ANF:
(let (t$1 (tail_call x))
t$1)
;; This is NOT a tail call!
```

### Correct Wrapping (Preserves Tail Recursion)

If `tail` is passed down correctly, the recursive call becomes the body of the `let` chain.

```scheme
;; Source: tail_call(x)
;; Good ANF:
(tail_call x)
;; No wrapping, jumps directly
```

**Rule of Thumb**: When normalizing a function call, if it is in tail position (i.e., `tail` parameter is NULL or empty identity), **do not bind it to a variable** just to return that variable. Return the `AppExp` (or `Cexp`) directly.
Loading