feat: agentic loop, vision, Gemini/Ollama, structured outputs, prompt caching by VDurocher · Pull Request #3 · VDurocher/Swift-AI-Agent-Core

VDurocher · 2026-04-09T20:05:41Z

Summary

5 new features added to the AIAgent protocol and all provider clients.

Feature 1 — Agentic ReAct Loop

run(messages:tools:executor:maxSteps:) default impl on AIAgent protocol
Automatically executes tool calls in a loop until the model returns a final text response
Appends asHistoryMessage (assistant message carrying tool calls) + tool results to history at each step
Throws AIError.agentLoopExceeded(steps:) when maxSteps is reached

Feature 2 — Vision / Multi-Modal

New AIImageContent enum: .url(URL) and .data(Data, mimeType: String)
AIMessage gains an images: [AIImageContent]? field; user(_:images:) factory updated
OpenAI: images encoded as data:<mime>;base64,... URLs in a ContentPart array
Anthropic: images encoded as source: {type:"url"/"base64", ...} blocks
Gemini: images encoded as inline_data: {mime_type, data} parts

Feature 3 — Google Gemini + Ollama

New GeminiClient actor: generateContent, streamGenerateContent (SSE with alt=sse), function calling
API key passed as ?key= query parameter (not a header)
GeminiScalar Codable enum for outbound function call args; GeminiArgValue Decodable for inbound
Ollama reuses OpenAIClient with http://localhost:11434/v1 — zero extra code
AIConfiguration.validate() skips API key check for Ollama
12 new convenience initializers (Gemini + Ollama + Claude 3.7/4.x + GPT-4.1)

Feature 4 — Structured Outputs

send<T: Decodable & Sendable>(messages:as:) generic method on AIAgent protocol
sendForJSON() protocol hook overridden per provider for native JSON mode
- OpenAI: response_format: {type: "json_object"}
- Gemini: responseMimeType: "application/json" in generationConfig
- Anthropic: system prompt injection (no native JSON mode)
Strips ```json ... ``` markdown fences before decoding

Feature 5 — Anthropic Prompt Caching

AIMessage gains cacheControl: Bool; system(_:cached:) factory updated
AnthropicClient detects cached messages and adds anthropic-beta: prompt-caching-2024-07-31 header
SystemContent supports both string format (no cache) and blocks array (with cache_control)
ContentBlock encodes cache_control: {type: "ephemeral"} on text and image blocks when flagged

Bug Fixes

Bool must precede Int/Double in Any-based switch statements to avoid NSNumber ambiguity when decoding JSON booleans via JSONSerialization — fixed in both AnthropicScalar.from and GeminiScalar.from

Test plan

swift build passes with zero errors and zero warnings under Swift 6.0 strict concurrency
OpenAI: send a message, stream a message, send with tools, send with images (GPT-4o)
Anthropic: send a message, stream, tool use, prompt caching (Claude Sonnet 4.6)
Gemini: send a message, stream, tool use, JSON mode (Gemini 2.0 Flash)
Ollama: ollamaLlama32() connects to local server, send a message
Agentic loop: verify multi-step tool call resolves correctly within maxSteps
Structured outputs: send(messages:as:) decodes a known JSON fixture
Prompt caching: confirm anthropic-beta header is present when cacheControl: true
Vision: attach JPEG data to a GPT-4o and Claude message, verify non-nil response

…ompt caching Feature 1 — Agentic ReAct loop - run(messages:tools:executor:maxSteps:) default impl in AIAgent protocol - Auto-executes tool calls until model returns a final text response - Throws agentLoopExceeded when maxSteps is reached Feature 2 — Vision multi-modal - New AIImageContent enum (.url / .data) with Sendable+Codable conformances - AIMessage gains images field; user(_:images:) factory updated - OpenAI: ContentPart encodes images as data URLs (base64) - Anthropic: ContentBlock.image encodes URL and base64 sources - Gemini: GeminiPart.inlineData encodes base64 bytes Feature 3 — Google Gemini + Ollama providers - New GeminiClient actor with generateContent / streamGenerateContent - API key passed as ?key= query param; SSE streaming with alt=sse - GeminiScalar Codable enum for outbound function call args encoding - GeminiArgValue Decodable enum for inbound function call args decoding - Ollama reuses OpenAIClient with http://localhost:11434/v1 base URL - AIConfiguration.validate() skips API key check for Ollama provider Feature 4 — Structured outputs - send<T: Decodable>(messages:as:) generic method on AIAgent protocol - sendForJSON() protocol hook overridden per provider for native JSON mode - OpenAI: response_format {type:"json_object"} - Gemini: responseMimeType "application/json" in generationConfig - Anthropic: falls back to system prompt injection (no native JSON mode) - Strips ```json ... ``` markdown fences before decoding Feature 5 — Anthropic prompt caching - AIMessage gains cacheControl: Bool field; system(_:cached:) factory updated - AIMessageWithTools.asHistoryMessage carries tool calls for history encoding - AnthropicClient detects cacheControl messages and adds anthropic-beta header - SystemContent supports both string and blocks format for cached system messages - ContentBlock encodes cache_control: {type:"ephemeral"} when cached Bug fix: Bool must precede Int/Double in Any-based switch to avoid NSNumber ambiguity when decoding JSON booleans via JSONSerialization (both clients).

- Add Gemini, Ollama, vision, agentic loop, structured outputs, prompt caching - Full convenience initializer list for all 20+ supported models - New sections: Agentic Loop, Vision, Gemini, Ollama, Structured Outputs, Prompt Caching - Updated architecture diagram and API reference - Updated SwiftAIAgentCore.swift umbrella doc comments

…cancellation SEC-02: Validate MIME type in AIImageContent against allowlist (image/jpeg, image/png, image/gif, image/webp) — throws AIError.invalidContext on unknown types SEC-03: Enforce 20 MB max size on inline image data — prevents OOM on large inputs SEC-05: Cap SSE buffer at 1 MB per message in NetworkClient.stream() — throws AIError.streamingError on overflow, preventing unbounded memory growth from malformed or malicious SSE streams SEC-06: Truncate HTTP error body to 500 chars in NetworkClient.execute() — prevents leaking large or sensitive provider error responses SEC-08: Validate tool names against the declared tool list in the agentic loop — throws AIError.invalidContext if the model requests an unknown tool SEC-09: Add onTermination handler to streamCompletion() in OpenAIClient, AnthropicClient, and GeminiClient — cancels the inner Task when the consumer stops iterating, preventing network task leaks SEC-10: Validate URL scheme in AIImageContent.validated() — only http/https accepted, prevents file:// or internal URLs being forwarded to provider APIs

GeminiPart CodingKeys: use camelCase (inlineData, functionCall, functionResponse) — Gemini REST API v1 uses camelCase throughout. Snake_case was silently dropping vision and tool call parts. GenerateContentResponse.Part and Candidate: remove redundant explicit CodingKeys that mapped camelCase to camelCase — Swift default synthesis already handles this correctly. FunctionResponse.name: use metadata tool_name (function name) instead of tool_call_id — Gemini function_response.name must be the function name, not the opaque call identifier. AIAgentProtocol agentic loop: add tool_name to tool result metadata alongside tool_call_id so Gemini can correctly identify function responses. AnthropicScalar decoder: move Bool check before Int to avoid JSON boolean ambiguity in the Codable path. AnthropicInput.toJSONString: replace JSONSerialization with JSONEncoder on the typed AnthropicScalar dict — eliminates NSNumber ambiguity and removes the only remaining use of [String: Any] in the encode path.

…tructured outputs

VDurocher added 14 commits March 25, 2026 09:41

test: add AIConversationTests with mock agent — history, reset, undo

820d05e

feat: add single-prompt structured output overload send(_:as:)

5cd7034

test: add AIAgentProtocolTests covering default implementations and s…

33fd7ec

…tructured outputs

feat: add AIRateLimiter — token-bucket throttle for API calls

5889e37

test: add AIRateLimiterTests covering load and utilization

9113648

chore: add CHANGELOG, SwiftLint config, and fix CI matrix

fcdcdad

feat: add AIConversation stateful wrapper for chat-style interactions

dab1e37

feat: add AIRetryPolicy with exponential backoff and jitter

764f03e

feat: add AIStreamingSession — stateful token-by-token streaming wrapper

75b1520

test: add AIStreamingSessionTests — history, reset, token callback

a15ed0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agentic loop, vision, Gemini/Ollama, structured outputs, prompt caching#3

feat: agentic loop, vision, Gemini/Ollama, structured outputs, prompt caching#3
VDurocher wants to merge 14 commits into
masterfrom
feature/vision-agentic-multimodel

VDurocher commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant