Wave Field LLM — Language Modeling Through Physics

An alternative language model architecture that replaces O(n²) self-attention with wave equation dynamics on continuous fields. O(n log n) complexity, within 5% of standard transformer quality.

What if language models could propagate information the way physics propagates waves — through fields, interference, and conservation laws?

Key Results

Model	Test PPL	Test Acc	Complexity	Params
Standard Transformer	5.9	51.0%	O(n²)	~6M
Wave Field V3.5	6.2	50.5%	O(n log n)	~6M

WikiText-2, character tokenizer, 30 epochs, same hyperparameters. Within 5% of standard transformer quality.

Computational Savings at Scale

Sequence Length	Standard O(n²)	Wave O(n log n)	Savings
512	134M ops	14.3M ops	9x
2,048	2.1B ops	68M ops	31x
8,192	34B ops	319M ops	107x
32,768	550B ops	1.5B ops	367x

What Makes This Different

This is not a modification of an existing architecture. It's a new approach where:

Tokens live on a continuous field — not just in discrete sequence positions
Information propagates via damped wave equations — each attention head is a physical wave with 3 learnable parameters (frequency, damping, phase)
Heads self-organize into roles — local grammar, medium-range context, long-range document structure
Physics-based diagnostics — every bug from V3.0 to V3.5 was found by inspecting physical quantities (energy flow, conservation, causality), not by guessing

Wave Kernel

Each attention head is parameterized as a damped oscillation:

k(t) = exp(-α·t) · cos(ω·t + φ)    for t ≥ 0 (causal)

Parameter	Controls	What It Learns
ω (frequency)	Oscillation speed	Attention pattern periodicity
α (damping)	Decay rate	How far back to attend
φ (phase)	Offset	Head diversity

Convolution is computed via FFT in O(n log n).

Architecture

Input tokens
    │
[Token Embedding + Sinusoidal Position Encoding]
    │
[Wave Field Layer ×N]
    │── Pre-norm
    │── Wave Field Attention:
    │     │── QKV projection
    │     │── Absolute position mapping (token_i → field_pos = i × stride)
    │     │── Bilinear scatter (deposit values onto continuous field)
    │     │── Wave convolution via FFT — O(n log n)
    │     │── Static multi-field coupling (cross-head interactions)
    │     │── Content-dependent gating
    │     │── Bilinear gather (read from field)
    │── Pre-norm FFN (GELU)
    │── Field Interference (every 3 layers)
    │
[LayerNorm → Output Projection (weight-tied)]
    │
Next token logits

How It Compares

Feature	Transformer	Mamba	Hyena	Wave Field
Complexity	O(n²)	O(n)	O(n log n)	O(n log n)
Content-dependent	Yes (Q·K)	Yes (selective)	Yes (gating)	Yes (gating)
Kernel type	Learned (full)	State-space	Implicit NN	Physics wave (3 params/head)
Multi-scale	Arbitrary	Via channels	Via order	Wave frequencies
Cross-head interaction	None	None	None	Static coupling
Interpretability	Attention maps	Opaque	Opaque	Physics quantities

Quick Start

git clone https://github.com/badaramoni/wave-field-llm.git
cd wave-field-llm
pip install -r requirements.txt

Train on WikiText-2

from src import WaveFieldTransformer

model = WaveFieldTransformer(
    vocab_size=8000,
    embedding_dim=256,
    num_layers=6,
    num_heads=8,
    ffn_dim=1024,
    field_size=1024,
    max_seq_len=256,
)

logits, loss = model(input_ids, labels=target_ids)

Generation Samples

Wave Field V3.5 (BPE tokenizer, WikiText-2)

[The president of the]
The president of the Li @-@ 28 is a rectagonal vait. It was written
by Vigada and Herlla, which has a chapel of 3.6 m (13 ft) above the south

[In the year]
In the year of the German Republic, Dinness and Chester Couz was given
to be in its first most successful tour.

[He was born in]
He was born in a category of the second half-time season, after
finishing back to London in January and February.

Project Structure

wave-field-llm/
├── src/
│   ├── wave_field_attention.py       # Core V3.5: wave kernels, bilinear scatter/gather, coupling
│   ├── wave_field_transformer.py     # Full model: layers, interference, embeddings
│   ├── causal_field_attention.py     # V1/V2 field attention (historical)
│   ├── causal_field_transformer.py   # V1/V2 transformer (historical)
│   └── global_context.py            # O(n) global context via causal pooling
├── benchmarks/
│   ├── benchmark_wikitext2.py        # WikiText-2 benchmark
│   ├── train_wave_v35_bpe.py         # V3.5 + BPE training
│   └── train_100m_bpe.py            # 100M parameter scaling experiment
├── diagnostics/
│   ├── diagnose_physics.py           # Physics-based model diagnostics
│   └── diagnose_bpe.py              # BPE tokenizer diagnostics
├── tokenizers/
│   ├── field_tokenizer_v2.py         # Words-first tokenizer, zero UNK
│   ├── field_tokenizer_v3.py         # V3 tokenizer with BPE support
│   └── field_aware_tokenizer.py      # Co-occurrence based tokenizer
├── docs/
│   ├── WAVE_FIELD_V3.md             # Full technical writeup
│   ├── BENCHMARK_RESULTS.md          # All benchmark data
│   └── ARCHITECTURE.md              # V1 architecture (historical)
├── tests/
│   └── test_causality.py            # Causality verification
├── requirements.txt
├── LICENSE
└── README.md

The Journey: V1 → V3.5

This architecture went through 6 major revisions. Every bug was found through physics-based diagnostics — something no other architecture supports:

Version	What Happened	How Diagnosed
V3.0	Initial physics architecture. Beat standard transformer on Shakespeare (PPL 13.5 vs 16.5)	—
V3.1	Conservation shortcut bug	Energy flow trace showed residual amplification
V3.1→WikiText	Future token leak (PPL 1.1, garbage generation)	Training PPL impossibly low → data leak
V3.2	FFT wraparound leaking future info	Causality test showed leakage
V3.5	Position shifting during generation	Kernel energy fell on empty field region
V3.5	Conservation crushing sparse fields	Short sequences → conservation rescales to zero

See docs/WAVE_FIELD_V3.md for the full technical story.

Current Status & Known Limitations

What works:

Within 5% of standard transformer on WikiText-2 (character tokenizer, 6M params)
Clean English generation with BPE tokenizer
Physics-based debugging that catches bugs no profiler can find

Known gap:

With BPE (8K vocab), there's a capacity bottleneck: Wave PPL 170.7 vs Standard PPL 91.4
This is a model capacity issue, not an architecture flaw (proven by the 5% gap at small vocab)
Currently scaling to 100M parameters to close this gap

What's next:

100M parameter training with 768-dim embeddings (addresses the vocab pressure)
Long-context benchmarks at 4K-128K tokens (where the O(n log n) advantage matters)
Hybrid architectures: wave attention for long-range + standard attention for local

Citation

If you use this work, please cite:

@software{wave_field_llm,
  title={Wave Field LLM: Language Modeling Through Physics},
  author={Badaramoni Avinash},
  year={2026},
  url={https://github.com/badaramoni/wave-field-llm}
}

License

MIT License. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wave Field LLM — Language Modeling Through Physics

Key Results

Computational Savings at Scale

What Makes This Different

Wave Kernel

Architecture

How It Compares

Quick Start

Train on WikiText-2

Generation Samples

Wave Field V3.5 (BPE tokenizer, WikiText-2)

Project Structure

The Journey: V1 → V3.5

Current Status & Known Limitations

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
diagnostics		diagnostics
docs		docs
src		src
tests		tests
tokenizers		tokenizers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Wave Field LLM — Language Modeling Through Physics

Key Results

Computational Savings at Scale

What Makes This Different

Wave Kernel

Architecture

How It Compares

Quick Start

Train on WikiText-2

Generation Samples

Wave Field V3.5 (BPE tokenizer, WikiText-2)

Project Structure

The Journey: V1 → V3.5

Current Status & Known Limitations

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages