Every token matters. Built for AI agents, not humans.
zquery is a search engine designed specifically for AI agents. Unlike traditional search APIs that return walls of text optimized for human reading, zquery delivers progressive, token-efficient answers that let AI decide how deep to go.
# Install
curl -sSL https://zquery.dev/install | bash
# One-shot answer (~42 tokens)
$ zq "claude opus 4 context window"
1,000,000 tokens (since 2026-03)
# With sources
$ zq "claude opus 4 context window" -d 2
Claude Opus 4 context window:
• Scale: 1M tokens
• Released: 2026-03 official announcement
• Pricing: prompt caching -50%
sources: docs.anthropic.com, anthropic.com/news
# AI agent mode (JSON + token budget)
$ zq "..." --format json --budget 500Every AI search API today is built on top of human search engines — optimized for humans, not machines.
| Service | What AI gets |
|---|---|
| Google Search API | Links + snippets, AI must fetch each one |
| Tavily | Fixed-format summaries, no control over depth |
| Perplexity API | English-first, poor Chinese quality |
| Serper | Essentially a Google proxy |
Common problem: none of them understand what AI actually needs.
AI agents default to Layer 1. They decide if they need more.
// Layer 1: Direct answer (~42 tokens, default)
GET /v1/query?q=claude+opus+4+context&depth=1
{
"answer": "1,000,000 tokens (since 2026-03)",
"confidence": 0.95,
"tokens_used": 42,
"more_available": true,
"continuation": "eyJxIjoi..."
}
// Layer 2: Key points + data (~400 tokens)
GET /v1/query?continuation=...&depth=2
{
"answer": "Claude Opus 4 context window:",
"points": [
{ "text": "Scale: 1M tokens", "cite": 1 },
{ "text": "Released: 2026-03 official", "cite": 1 },
{ "text": "Pricing: prompt caching -50%", "cite": 2 }
],
"sources_preview": [
{ "id": 1, "title": "Claude Opus 4 Release Notes", "domain": "docs.anthropic.com" },
{ "id": 2, "title": "What's New in Opus 4", "domain": "anthropic.com/news" }
],
"tokens_used": 387
}
// Layer 3: Full synthesis with citations (~1500 tokens)
// Layer 4: Raw content chunks (~5000 tokens, on-demand)
Query → [Intent Classification] → [Multi-Source Retrieval] → [Content Extraction] → [Layered Cache] → [Response]
│ │
│ ├─ Bing Search API (real-time index)
│ ├─ Whitelist fast-path (Wikipedia/MDN/Official Docs)
│ └─ Chinese-first sources (zhihu/WeChat/CSDN)
│
└─ Query type: factual / tutorial / news / code / opinion
→ different retrieval strategies
Tech stack:
- Backend: Go (single binary)
- Content extraction: go-readability
- LLM synthesis: Claude Haiku ($0.25/M tokens)
- Cache: Redis (Layer 1-2 cached 24h)
- Deploy: Docker, single VPS to start
Cost per query: ~$0.004 → price at $0.01 → 60% margin
| Tavily | zquery | |
|---|---|---|
| Cost | Free tier → then pay | Always cheap (Bing API cost only) |
| Chinese | Mediocre | Native (zhihu/WeChat/MDN-CN priority) |
| Response format | Fixed | Progressive (Layer 1-4) |
| Token cost | Opaque | Returns tokens_used every call |
| Self-hostable | ❌ | ✅ |
| Source control | Black box | Customizable whitelist |
| CLI | ❌ | ✅ |
# OpenAI Function Calling
{
"name": "zquery",
"description": "Search the web for current information",
"parameters": {
"query": {"type": "string"},
"depth": {"type": "integer", "enum": [1, 2, 3, 4], "default": 1},
"budget": {"type": "integer", "description": "max tokens to use"}
}
}Works with:
- OpenAI Function Calling
- Anthropic Tool Use
- MCP (Claude Desktop / Cursor / Windsurf)
- LangChain / LlamaIndex plugins
- ZyHive (first showcase)
| Plan | Free | Pro $19/mo | Team $99/mo | Enterprise |
|---|---|---|---|---|
| Calls | 100/day | 10K/mo | 100K/mo | Unlimited |
| Depth | 1-2 | All | All | All |
| Chinese optimization | ✅ | ✅ | ✅ | ✅ |
| Custom source whitelist | ❌ | ❌ | ✅ | ✅ |
| Self-hosted | ❌ | ❌ | ❌ | ✅ |
| SLA | best-effort | 99% | 99.5% | 99.9% |
- Go backend + Bing Search API + Haiku synthesis
- Chinese + English support
- CLI tool (
zq) - Open source: github.com/Zyling-ai/zquery
- ZyHive integration (first real user)
- zquery.dev domain
- API key management + billing (Stripe)
- Layered cache (Redis)
- MCP protocol adapter
- Documentation site (VitePress)
- Cursor / Claude Desktop integration guide
- LangChain / LlamaIndex plugin
- SDK: Python / Go / TypeScript
- Usage dashboard
- Vertical editions (finance, legal, medical, code)
- Enterprise private deployment
- Long-term memory (ZyHive memory tree integration)
- Tavily (2023): $100M+ valuation (Series A)
- Exa.ai: $700M valuation
- Chinese market: completely empty
- Cursor, Windsurf, Manus, AutoGPT all looking for reliable search backends
Built by zyling.ai
Tagline: "Every token matters."