A single-file, static web app to read GGUF model metadata directly in the browser and estimate memory usage (RAM/VRAM) for a chosen context window and KV cache quantization.
- Works with remote URLs that support HTTP Range requests (e.g., many Hugging Face files)
- Works with local
.gguffiles (drag-and-drop via file picker) - Detects sharded models (e.g.,
-00001-of-00013) and sums total size - No server required; everything runs client-side
Option A: Open the file directly
- Open
index.htmlin a modern browser (Chrome, Edge, Safari). - Paste a GGUF URL or choose a local
.gguffile and click the corresponding button.
Option B: Serve locally (helps with some CORS setups)
cd path/to/model-memory-calculator
python -m http.server 8000
# Then open http://localhost:8000 in your browser- GGUF URL: Paste a direct link to a
.gguf(e.g., a Hugging Face “resolve/main” URL). Many hosts allow partial download via HTTP Range. - Or choose a local GGUF file: Uses the browser’s File API; no upload leaves your machine.
- Context size (tokens): Select a preset (e.g., 4K, 16K, 128K) or choose "Custom…" and enter any positive integer token length.
- KV cache quantization: Choose how keys/values are stored in memory. Options show approximate bytes per value.
- Verbose: Prints debug logs of what’s read and how size is determined.
Click “Read URL” or “Read File”. If successful, you’ll see:
- Extracted params:
attention_heads,kv_heads,hidden_layers,hidden_size,split_count(if present) - Memory estimate: model size + KV cache size at your chosen context/quantization
- GGUF parsing: Reads just enough of the GGUF header to extract:
.attention.head_count.attention.head_count_kv.block_count.embedding_lengthsplit.count(if present)
- Remote file size:
- Tries
HEADto getContent-Length. - Falls back to a
Range: bytes=0-0request and readsContent-Range.
- Tries
- Sharded models:
- Detects
-00001-of-000NNstyle patterns in URLs or usessplit.countmetadata. - Sums sizes across parts (remote) or estimates total from a single shard (local) when possible.
- Detects
- KV cache estimate:
- Uses a simplified formula:
bytes_per_value × hidden_size × hidden_layers × context_tokens. - Shows total as:
Model + KV(MB/GB). Actual usage can vary by backend/implementation.
- Uses a simplified formula:
- GGUF versions: Supports GGUF v1–v3 headers for the fields listed above.
- CORS & Range: Remote hosts must allow cross-origin requests and HTTP Range. If not, size detection may fail; download the file and use the local option instead.
- Range ignored: Some servers respond
200without honoringRange. The app avoids downloading the full body for size only; estimates can fail in this case. - Sanity limits: Very long strings/arrays in metadata are bounded to avoid huge reads.
- Estimates only: KV cache math is intentionally simplified. Different runtimes store KV differently (e.g., layout, precision, per-head factors).
- “Failed to read params.”
- The file may not be GGUF or uses unsupported/unexpected metadata. Try another file or update the URL.
- “Could not determine file size or compute usage (CORS/Range?).”
- The remote host may block CORS or not report size via
HEAD/Range. Try serving the page locally, a different host, or the local file picker.
- The remote host may block CORS or not report size via
- Split detection issues
- Ensure URLs use a stable pattern (e.g.,
-00001-of-000xx) or thatsplit.countis present in metadata.
- Ensure URLs use a stable pattern (e.g.,
- The local file option never uploads your file; parsing happens entirely in your browser.
- For remote URLs, the app performs small range requests to read the header and determine file size. It aborts early once required metadata is read.
- No build step required. The app is a single page:
index.html— All logic and UI
- Open in a browser or serve with any static server.
This project is licensed under the terms in LICENSE.