Lokal ML (@lokal-ml)

The Local-First Mobile LLM Infrastructure. Zero friction. Pure Rust. Native Edge AI.

Warning

Active Development — Not Production Ready
The Rust core (hardware profiler, resumable downloader, GGUF inference engine, TalaDB RAG embedder) is implemented. The React Native JSI bridge and TypeScript API surface are still being wired up. APIs are unstable and subject to change. Contributions and early feedback are very welcome — star the repo to follow progress.

The Problem

Running Small Language Models (SLMs) like Gemma directly on mobile devices is the future of privacy-first, zero-latency applications. But today, the Developer Experience (DX) is fundamentally broken:

The App Store Trap: Bundling a 1.5 GB+ .gguf quantized model directly into an app binary destroys user acquisition and violates App Store cellular download limits.
The C++ Boilerplate: React Native and Flutter developers are forced to wrestle with complex C++ wrappers, asynchronous bridging overhead, and memory leaks just to stream tokens.
The RAG Fragmentation: Building offline Retrieval-Augmented Generation (RAG) requires developers to manually stitch together text chunkers, separate embedding models, and local vector databases.

Architecture

Layer	Description
🦀 Pure Rust Core	Memory-safe GGUF inference via `llama-cpp-2` — Metal GPU on iOS/macOS, NEON on Android ARM64, CPU fallback everywhere else
⚡ Zero-Overhead Bridging	Direct JSI for React Native, `flutter_rust_bridge` FFI for Flutter — per-token streaming via C-ABI callbacks, no async bottlenecks
📦 Shell & Fetch Delivery	Resumable background downloader with HTTP Range support and SHA-256 integrity verification. Device hardware is profiled before any download to prevent OOM crashes. Initial app binary stays < 50 MB
🧠 Plug-and-Play Local RAG	Optional TalaDB plugin: auto-chunks text, runs `all-MiniLM-L6-v2` locally for 384-dim embeddings, persists vectors in TalaDB's HNSW index — Rust-to-Rust, zero serialisation overhead

Repository Structure

lokal-ml/
├── packages/
│   ├── lokal-ml-core/                # 🦀 Rust: hardware profiler, resumable downloader, GGUF engine
│   ├── lokal-ml-taladb/              # 🦀 Rust: text chunker, MiniLM embedder, TalaDB vector injector
│   ├── lokal-ml-react-native/        # 📱 React Native JSI bridge + TypeScript API
│   │   └── rust/                     #    C-ABI FFI layer (cbindgen → lokal-ml.h)
│   └── lokal-ml-taladb-plugin/       # 🔌 @lokal-ml/taladb-plugin TypeScript wrapper
└── registry/
    └── models.json                   # Model manifest (URLs, SHA-256, min RAM requirements)

Model Registry

The registry ships 9 models across three device tiers. Gemma 4 E-series is our top recommendation — these are Google's purpose-built edge models with 128K context and multimodal support, tuned specifically for on-device deployment.

Note: Gemma and MedGemma models require accepting Google's terms on HuggingFace before the download URL resolves. Qwen3 models are fully open.

⭐ Recommended — Gemma 4 Edge (flagship phones, 5–8 GB RAM)

Model ID	Active / Total Params	File Size	Min RAM	Notes
`gemma4-e2b`	2.3B / 5B	3.46 GB Q4_K_M	5 GB	Best overall. Any-to-any, 128K ctx
`gemma4-e4b`	4.5B / 8B	5.41 GB Q4_K_M	7 GB	Best quality. Any-to-any, 128K ctx

"Any-to-any" = text + image + audio input. Both require accepting Google's Gemma terms on HuggingFace.

Compact (mid-range phones, iPad — 3.5–5 GB RAM)

Model ID	Params	File Size	Min RAM	Notes
`gemma3-4b`	4B	3.16 GB QAT Q4_0	4.5 GB	Official Google quant. Multimodal*
`medgemma-4b`	4B	2.49 GB Q4_K_M	3.5 GB	Medical text + image. 128K ctx
`qwen3-4b`	4B	2.50 GB Q4_K_M	3.5 GB	256K ctx. No HF token needed

* Gemma 3 4B vision requires a separate mmproj GGUF (851 MB); the registry URL is text-only.

Nano (any modern phone — < 2.5 GB RAM)

Model ID	Params	File Size	Min RAM	Notes
`gemma3-1b`	1B	1.00 GB QAT Q4_0	1.5 GB	Official Google quant. Ultra-fast
`qwen3-1.7b`	1.7B	1.83 GB Q8_0	2.5 GB	256K ctx. No HF token needed
`qwen3-0.6b`	0.6B	639 MB Q8_0	1 GB	Smallest chat model. No HF token

Embedding (RAG only)

Model ID	Size	Notes
`all-minilm-l6-v2`	~23 MB	384-dim vectors for TalaDB RAG

Bring your own GGUF

Pass an absolute path instead of a model ID to load any GGUF you manage yourself:

const ai = await Lokal.init({ model: '/path/to/your/custom.gguf' });

Useful for private fine-tunes, enterprise models, or any quantisation not in the registry. The hardware profiler still runs, but SHA-256 verification is your responsibility.

Packages

Package	Status	Description
`@lokal-ml/react-native`	🚧 In Development	Core engine — hardware check, model download, GGUF inference
`@lokal-ml/taladb-plugin`	🚧 In Development	Optional RAG layer — offline vector memory via TalaDB

Developer Experience

import { Lokal, ModelManager } from '@lokal-ml/react-native';
import { TalaPlugin } from '@lokal-ml/taladb-plugin';
import { openDB } from '@taladb/react-native';

// 1. Profiler prevents OOM crashes on older devices
const canRun = await ModelManager.checkRequirements('gemma-2b-int4');
if (!canRun) {
  console.log('Device cannot run local AI — falling back to cloud.');
  return;
}

// 2. Resumable background download (Wi-Fi enforced, fires only if not cached)
await ModelManager.downloadModel('gemma-2b-int4', {
  requireWifi: true,
  onProgress: (p) => setProgress(p),
});

// 3. Connect to local-first storage & initialise engine
const db = await openDB('local_data.db');
const ai = await Lokal.init({
  model: 'gemma-2b-int4',
  plugins: [new TalaPlugin({ db, collection: 'knowledge_base' })],
});

// 4. Ingest your documents (auto-chunked + auto-embedded locally)
await ai.plugins.TalaRAG.ingest({
  data: [{ id: 'policy_1', text: 'Enterprise SLAs require a 2-hour response time...' }],
});

// 5. Stream instantly with embedded RAG context
await ai.chat({
  prompt: 'What is the enterprise SLA response time?',
  useRAG: true,
  onToken: (token) => process.stdout.write(token),
});

App Store Compliance

✅ Initial app binary < 50 MB — no model weights bundled
✅ Weights fetched post-install via HTTP Range (resumable, survives backgrounding)
✅ requireWifi: true enforced by default
✅ Files stored in OS-designated app data directory (excluded from iCloud backup)

Development

Prerequisites: Rust stable, cmake, Node ≥ 18, pnpm ≥ 9

# Clone
git clone https://github.com/thinkgrid-labs/lokal-ml
cd lokal-ml

# Rust workspace (requires cmake for llama.cpp)
cargo fmt --all -- --check
cargo clippy --workspace -- -D warnings
cargo check --workspace
cargo test --workspace

# JS packages
pnpm install
pnpm typecheck

CI runs fmt, clippy, check, and test on every push via GitHub Actions, plus cross-compilation checks for aarch64-apple-ios and the three primary Android ABIs (aarch64, armv7, x86_64).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
packages		packages
registry		registry
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
lokal-ml.jpg		lokal-ml.jpg
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lokal ML (@lokal-ml)

The Problem

Architecture

Repository Structure

Model Registry

⭐ Recommended — Gemma 4 Edge (flagship phones, 5–8 GB RAM)

Compact (mid-range phones, iPad — 3.5–5 GB RAM)

Nano (any modern phone — < 2.5 GB RAM)

Embedding (RAG only)

Bring your own GGUF

Packages

Developer Experience

App Store Compliance

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lokal ML (@lokal-ml)

The Problem

Architecture

Repository Structure

Model Registry

⭐ Recommended — Gemma 4 Edge (flagship phones, 5–8 GB RAM)

Compact (mid-range phones, iPad — 3.5–5 GB RAM)

Nano (any modern phone — < 2.5 GB RAM)

Embedding (RAG only)

Bring your own GGUF

Packages

Developer Experience

App Store Compliance

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages