Summary
The Go SDK AI package lacks WithAudioFile, WithAudioURL, and generic WithFile multimodal helper options that exist in the Python SDK, forcing Go agents to hand-build content parts for audio-capable models.
Context
sdk/go/ai/ exposes WithImageFile and WithImageURL but has no equivalent for audio inputs or generic file types. The Python SDK's multimodal.py supports audio and generic file content parts, enabling agents to work with Anthropic, Gemini, and OpenAI audio models using the same ergonomic option style. Go agents targeting these models must manually construct ContentPart slices, bypassing the SDK's abstraction layer and duplicating provider-specific encoding logic. This parity gap will grow as multimodal models expand.
Scope
In Scope
- Add
WithAudioFile(path string, mediaType string) RequestOption that reads a local audio file, base64-encodes it, and appends the appropriate content part.
- Add
WithAudioURL(url string, mediaType string) RequestOption for URL-referenced audio.
- Add
WithFile(path string, mediaType string) RequestOption as a generic file content part helper.
- Ensure the content part format is compatible with at least the OpenAI and Anthropic provider schemas already supported by the image helpers.
Out of Scope
- Implementing audio transcription or output generation — input content parts only.
- Adding video content part helpers — a separate follow-up.
- Changing the existing
WithImageFile / WithImageURL behavior.
Files
sdk/go/ai/multimodal.go — add WithAudioFile, WithAudioURL, WithFile option functions
sdk/go/ai/request.go — extend ContentPart type / provider serialization to handle audio and file part types if not already present
sdk/go/ai/multimodal_test.go — unit tests: each helper correctly encodes content, attaches correct media type, produces valid provider-specific JSON
Acceptance Criteria
Notes for Contributors
Severity: MEDIUM
Use sdk/python/agentfield/ai/multimodal.py as the reference implementation. Check the existing WithImageFile implementation for the base64-encode + content-part-append pattern — reuse it rather than duplicating. Media type should be passed explicitly by the caller (do not try to detect from file extension) to keep the helper simple and avoid magic.
Summary
The Go SDK AI package lacks
WithAudioFile,WithAudioURL, and genericWithFilemultimodal helper options that exist in the Python SDK, forcing Go agents to hand-build content parts for audio-capable models.Context
sdk/go/ai/exposesWithImageFileandWithImageURLbut has no equivalent for audio inputs or generic file types. The Python SDK'smultimodal.pysupports audio and generic file content parts, enabling agents to work with Anthropic, Gemini, and OpenAI audio models using the same ergonomic option style. Go agents targeting these models must manually constructContentPartslices, bypassing the SDK's abstraction layer and duplicating provider-specific encoding logic. This parity gap will grow as multimodal models expand.Scope
In Scope
WithAudioFile(path string, mediaType string) RequestOptionthat reads a local audio file, base64-encodes it, and appends the appropriate content part.WithAudioURL(url string, mediaType string) RequestOptionfor URL-referenced audio.WithFile(path string, mediaType string) RequestOptionas a generic file content part helper.Out of Scope
WithImageFile/WithImageURLbehavior.Files
sdk/go/ai/multimodal.go— addWithAudioFile,WithAudioURL,WithFileoption functionssdk/go/ai/request.go— extendContentParttype / provider serialization to handleaudioandfilepart types if not already presentsdk/go/ai/multimodal_test.go— unit tests: each helper correctly encodes content, attaches correct media type, produces valid provider-specific JSONAcceptance Criteria
WithAudioFilereads a local audio file and attaches it as a base64-encoded content part with the specified media typeWithAudioURLattaches a URL-referenced audio content partWithFileattaches a generic file content part (base64-encoded)go test ./sdk/go/...)make lint)Notes for Contributors
Severity: MEDIUM
Use
sdk/python/agentfield/ai/multimodal.pyas the reference implementation. Check the existingWithImageFileimplementation for the base64-encode + content-part-append pattern — reuse it rather than duplicating. Media type should be passed explicitly by the caller (do not try to detect from file extension) to keep the helper simple and avoid magic.