Skip to content

Update Neuralwatt models: remove dead GLM 5.1 entries, add GLM 5.2 variants (fast, short, short & fast), and add cache-read pricing#2730

Open
bakhtiar-id wants to merge 1 commit into
anomalyco:devfrom
bakhtiar-id:sync-neuralwatt-260622
Open

Update Neuralwatt models: remove dead GLM 5.1 entries, add GLM 5.2 variants (fast, short, short & fast), and add cache-read pricing#2730
bakhtiar-id wants to merge 1 commit into
anomalyco:devfrom
bakhtiar-id:sync-neuralwatt-260622

Conversation

@bakhtiar-id

@bakhtiar-id bakhtiar-id commented Jun 22, 2026

Copy link
Copy Markdown

Summary

This updates the neuralwatt provider catalog to match the requested narrow refresh scope:

  • removes deprecated/dead GLM 5.1-era Neuralwatt entries
  • adds the missing GLM 5.2 variants
  • adds cache_read pricing to all remaining Neuralwatt model configs
  • updates the Neuralwatt README to reflect the current model lineup and cache pricing policy

Existing non-GLM Neuralwatt IDs were intentionally left unchanged.

What Changed

Removed dead Neuralwatt GLM entries

Deleted these model configs:

  • providers/neuralwatt/models/glm-5-fast.toml
  • providers/neuralwatt/models/glm-5.1-fast.toml
  • providers/neuralwatt/models/zai-org/GLM-5.1-FP8.toml

Added new GLM 5.2 variants

Added these model configs:

  • providers/neuralwatt/models/glm-5.2-fast.toml
  • providers/neuralwatt/models/glm-5.2-short.toml
  • providers/neuralwatt/models/glm-5.2-short-fast.toml

Configured pricing and limits from Neuralwatt’s live catalog:

  • GLM 5.2 family pricing:
    • input = 1.45
    • output = 4.5
    • cache_read = 0.3625
  • Short variants use:
    • context = 199_984
    • output = 199_984

Updated cache-read pricing across remaining Neuralwatt models

Added cache_read to all remaining Neuralwatt model TOMLs using the documented Neuralwatt 25% cache-read rate:

  • GLM 5.2 family: 0.3625
  • Kimi K2.5 family: 0.13
  • Kimi K2.6 family: 0.1725
  • Kimi K2.7 Code: 0.2375
  • Qwen3.5 397B family: 0.1725
  • Qwen3.6 35B family: 0.0725

cache_write was intentionally left unchanged.

Reasoning Options Note

GLM 5.2 reasoning options remain:

  • high
  • max

This matches Neuralwatt’s chat-completions docs, which describe high and max as the native GLM 5.2 reasoning levels, with lower OpenAI-style values normalized onto them.

Validation

Validated successfully with:

bun validate

Sources

Neuralwatt models API: https://api.neuralwatt.com/v1/models
Neuralwatt models docs: https://portal.neuralwatt.com/docs/api/models
Neuralwatt chat completions docs: https://portal.neuralwatt.com/docs/api/chat-completions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant