feat: agentic loop, vision, Gemini/Ollama, structured outputs, prompt caching#3
Open
VDurocher wants to merge 14 commits into
Open
feat: agentic loop, vision, Gemini/Ollama, structured outputs, prompt caching#3VDurocher wants to merge 14 commits into
VDurocher wants to merge 14 commits into
Conversation
…ompt caching
Feature 1 — Agentic ReAct loop
- run(messages:tools:executor:maxSteps:) default impl in AIAgent protocol
- Auto-executes tool calls until model returns a final text response
- Throws agentLoopExceeded when maxSteps is reached
Feature 2 — Vision multi-modal
- New AIImageContent enum (.url / .data) with Sendable+Codable conformances
- AIMessage gains images field; user(_:images:) factory updated
- OpenAI: ContentPart encodes images as data URLs (base64)
- Anthropic: ContentBlock.image encodes URL and base64 sources
- Gemini: GeminiPart.inlineData encodes base64 bytes
Feature 3 — Google Gemini + Ollama providers
- New GeminiClient actor with generateContent / streamGenerateContent
- API key passed as ?key= query param; SSE streaming with alt=sse
- GeminiScalar Codable enum for outbound function call args encoding
- GeminiArgValue Decodable enum for inbound function call args decoding
- Ollama reuses OpenAIClient with http://localhost:11434/v1 base URL
- AIConfiguration.validate() skips API key check for Ollama provider
Feature 4 — Structured outputs
- send<T: Decodable>(messages:as:) generic method on AIAgent protocol
- sendForJSON() protocol hook overridden per provider for native JSON mode
- OpenAI: response_format {type:"json_object"}
- Gemini: responseMimeType "application/json" in generationConfig
- Anthropic: falls back to system prompt injection (no native JSON mode)
- Strips ```json ... ``` markdown fences before decoding
Feature 5 — Anthropic prompt caching
- AIMessage gains cacheControl: Bool field; system(_:cached:) factory updated
- AIMessageWithTools.asHistoryMessage carries tool calls for history encoding
- AnthropicClient detects cacheControl messages and adds anthropic-beta header
- SystemContent supports both string and blocks format for cached system messages
- ContentBlock encodes cache_control: {type:"ephemeral"} when cached
Bug fix: Bool must precede Int/Double in Any-based switch to avoid NSNumber
ambiguity when decoding JSON booleans via JSONSerialization (both clients).
- Add Gemini, Ollama, vision, agentic loop, structured outputs, prompt caching - Full convenience initializer list for all 20+ supported models - New sections: Agentic Loop, Vision, Gemini, Ollama, Structured Outputs, Prompt Caching - Updated architecture diagram and API reference - Updated SwiftAIAgentCore.swift umbrella doc comments
…cancellation SEC-02: Validate MIME type in AIImageContent against allowlist (image/jpeg, image/png, image/gif, image/webp) — throws AIError.invalidContext on unknown types SEC-03: Enforce 20 MB max size on inline image data — prevents OOM on large inputs SEC-05: Cap SSE buffer at 1 MB per message in NetworkClient.stream() — throws AIError.streamingError on overflow, preventing unbounded memory growth from malformed or malicious SSE streams SEC-06: Truncate HTTP error body to 500 chars in NetworkClient.execute() — prevents leaking large or sensitive provider error responses SEC-08: Validate tool names against the declared tool list in the agentic loop — throws AIError.invalidContext if the model requests an unknown tool SEC-09: Add onTermination handler to streamCompletion() in OpenAIClient, AnthropicClient, and GeminiClient — cancels the inner Task when the consumer stops iterating, preventing network task leaks SEC-10: Validate URL scheme in AIImageContent.validated() — only http/https accepted, prevents file:// or internal URLs being forwarded to provider APIs
GeminiPart CodingKeys: use camelCase (inlineData, functionCall, functionResponse) — Gemini REST API v1 uses camelCase throughout. Snake_case was silently dropping vision and tool call parts. GenerateContentResponse.Part and Candidate: remove redundant explicit CodingKeys that mapped camelCase to camelCase — Swift default synthesis already handles this correctly. FunctionResponse.name: use metadata tool_name (function name) instead of tool_call_id — Gemini function_response.name must be the function name, not the opaque call identifier. AIAgentProtocol agentic loop: add tool_name to tool result metadata alongside tool_call_id so Gemini can correctly identify function responses. AnthropicScalar decoder: move Bool check before Int to avoid JSON boolean ambiguity in the Codable path. AnthropicInput.toJSONString: replace JSONSerialization with JSONEncoder on the typed AnthropicScalar dict — eliminates NSNumber ambiguity and removes the only remaining use of [String: Any] in the encode path.
…tructured outputs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
5 new features added to the
AIAgentprotocol and all provider clients.Feature 1 — Agentic ReAct Loop
run(messages:tools:executor:maxSteps:)default impl onAIAgentprotocolasHistoryMessage(assistant message carrying tool calls) + tool results to history at each stepAIError.agentLoopExceeded(steps:)whenmaxStepsis reachedFeature 2 — Vision / Multi-Modal
AIImageContentenum:.url(URL)and.data(Data, mimeType: String)AIMessagegains animages: [AIImageContent]?field;user(_:images:)factory updateddata:<mime>;base64,...URLs in aContentPartarraysource: {type:"url"/"base64", ...}blocksinline_data: {mime_type, data}partsFeature 3 — Google Gemini + Ollama
GeminiClientactor:generateContent,streamGenerateContent(SSE withalt=sse), function calling?key=query parameter (not a header)GeminiScalarCodable enum for outbound function call args;GeminiArgValueDecodable for inboundOpenAIClientwithhttp://localhost:11434/v1— zero extra codeAIConfiguration.validate()skips API key check for OllamaFeature 4 — Structured Outputs
send<T: Decodable & Sendable>(messages:as:)generic method onAIAgentprotocolsendForJSON()protocol hook overridden per provider for native JSON moderesponse_format: {type: "json_object"}responseMimeType: "application/json"ingenerationConfig```json ... ```markdown fences before decodingFeature 5 — Anthropic Prompt Caching
AIMessagegainscacheControl: Bool;system(_:cached:)factory updatedAnthropicClientdetects cached messages and addsanthropic-beta: prompt-caching-2024-07-31headerSystemContentsupports both string format (no cache) and blocks array (withcache_control)ContentBlockencodescache_control: {type: "ephemeral"}on text and image blocks when flaggedBug Fixes
Boolmust precedeInt/DoubleinAny-based switch statements to avoidNSNumberambiguity when decoding JSON booleans viaJSONSerialization— fixed in bothAnthropicScalar.fromandGeminiScalar.fromTest plan
swift buildpasses with zero errors and zero warnings under Swift 6.0 strict concurrencyollamaLlama32()connects to local server, send a messagemaxStepssend(messages:as:)decodes a known JSON fixtureanthropic-betaheader is present whencacheControl: true