feat: Support Gemini client for Gemini API and Vertex AI #5524

yu-iskw · 2025-02-13T02:25:19Z

Why are these changes needed?

This pull request introduces support for Google’s Gemini API into the autogen-ext package. The changes include two new client implementations—GeminiChatCompletionClient and VertexAIChatCompletionClient—which enable users to interact with the Gemini models for advanced chat completions. The new clients support:

Long Context Handling: Efficiently manage extended conversations with context caching.
Vision/Multimodal Inputs: Process image inputs and other multimedia data.
Function Calling: Integrate function/tool calling capabilities within chat interactions.
Structured Output: Handle responses in JSON format for easy post-processing.
Robust Error Handling & Streaming Responses: Improve the reliability and interactivity of chat completions.
Token Management: Accurate token counting and remaining token calculations.

Related issue number

#3741
Closes #5528

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

Signed-off-by: Yu Ishikawa <[email protected]>

yu-iskw · 2025-02-13T04:12:59Z

@microsoft-github-policy-service agree

ekzhu · 2025-02-14T07:12:32Z

Vision/Multimodal Inputs: Process image inputs and other multimedia data.

@yu-iskw does this client support multimodal output as well?

yu-iskw · 2025-02-14T07:43:05Z

@ekzhu Good point. I am seeking a better approach to support both of text generation and image generation with Gemini, since the method and its configurations of each are different. I appriciate if you could give good ideas to handle this.

As far as I know, the API to text and image with OpenAI and Azure OpenAI is the same. So, it is unnecessary to use different APIs no matter which we want to generate text or image. I suppose it would be good to add a field whether or not a model supports image generation to ModelInfo class. By doing so, we can effectively select APIs for user's request based on the information.

autogen/python/packages/autogen-core/src/autogen_core/models/_model_client.py

Lines 95 to 103 in e7a3c78

    
           class ModelInfo(TypedDict, total=False): 
        
               vision: Required[bool] 
        
               """True if the model supports vision, aka image input, otherwise False.""" 
        
               function_calling: Required[bool] 
        
               """True if the model supports function calling, otherwise False.""" 
        
               json_output: Required[bool] 
        
               """True if the model supports json output, otherwise False. Note: this is different to structured json.""" 
        
               family: Required[ModelFamily.ANY | str] 
        
               """Model family should be one of the constants from :py:class:`ModelFamily` or a string representing an unknown model family."""

If there is no effective way at the moment and need to align the core component as ModelInfo, I think it might be good to start with only the text generation first and will support image generation later.

[UPDATE]
I have come up with a tentative solution to handle this. We can add a model family like IMAGEN_3_0 and use the information whether or not a model in Gemini supports image generation.

Sample Code

Text Generation

import os

from google import genai

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Explain how AI works",
    config=types.GenerateContentConfig(
      temperature=0.5,
    ),
)

Image Generation

from io import BytesIO

from google import genai  # type: ignore[import]
from google.genai import types  # type: ignore[import]
from PIL import Image

client = genai.Client(vertexai=True, location="us-central1")

response = client.models.generate_images(
    model="imagen-3.0-generate-002",
    prompt="Fuzzy bunnies in my kitchen",
    config=types.GenerateImagesConfig(
        number_of_images=4,
    ),
)
for generated_image in response.generated_images:
    image = Image.open(BytesIO(generated_image.image.image_bytes))
    image.show()

yu-iskw · 2025-02-14T08:07:21Z

NOTE We can get information of model. If we use Gemini API, we can get information of supported_actions as predict and generateContent. However, the API for Vertex AI returns less information.

from google import genai
import os
import json

from pprint import pprint

models = [
    "gemini-1.5-flash",
    "gemini-1.5-pro",
    "gemini-2.0-flash",
    "imagen-3.0-generate-002",
    "text-embedding-004",
]

# Gemini API
print("==================== Gemini API ====================")
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
for model_name in models:
    model = client.models.get(model=model_name)
    pprint(f"{model_name}: {json.dumps(model.to_json_dict(), indent=2)}")

# Vertex AI
print("==================== Vertex AI ====================")
client = genai.Client(vertexai=True, location="us-central1")
for model_name in models:
    model = client.models.get(model=model_name)
    pprint(f"{model_name}: {json.dumps(model.to_json_dict(), indent=2)}")

Model Information (Gemini API)

==================== Gemini API ====================
('gemini-1.5-flash: {\n'
 '  "name": "models/gemini-1.5-flash",\n'
 '  "display_name": "Gemini 1.5 Flash",\n'
 '  "description": "Alias that points to the most recent stable version of '
 'Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across '
 'diverse tasks.",\n'
 '  "version": "001",\n'
 '  "tuned_model_info": {},\n'
 '  "input_token_limit": 1000000,\n'
 '  "output_token_limit": 8192,\n'
 '  "supported_actions": [\n'
 '    "generateContent",\n'
 '    "countTokens"\n'
 '  ]\n'
 '}')
('gemini-1.5-pro: {\n'
 '  "name": "models/gemini-1.5-pro",\n'
 '  "display_name": "Gemini 1.5 Pro",\n'
 '  "description": "Stable version of Gemini 1.5 Pro, our mid-size multimodal '
 'model that supports up to 2 million tokens, released in May of 2024.",\n'
 '  "version": "001",\n'
 '  "tuned_model_info": {},\n'
 '  "input_token_limit": 2000000,\n'
 '  "output_token_limit": 8192,\n'
 '  "supported_actions": [\n'
 '    "generateContent",\n'
 '    "countTokens"\n'
 '  ]\n'
 '}')
('gemini-2.0-flash: {\n'
 '  "name": "models/gemini-2.0-flash",\n'
 '  "display_name": "Gemini 2.0 Flash",\n'
 '  "description": "Gemini 2.0 Flash",\n'
 '  "version": "2.0",\n'
 '  "tuned_model_info": {},\n'
 '  "input_token_limit": 1048576,\n'
 '  "output_token_limit": 8192,\n'
 '  "supported_actions": [\n'
 '    "generateContent",\n'
 '    "countTokens",\n'
 '    "bidiGenerateContent"\n'
 '  ]\n'
 '}')
('imagen-3.0-generate-002: {\n'
 '  "name": "models/imagen-3.0-generate-002",\n'
 '  "display_name": "Imagen 3.0 002 model",\n'
 '  "description": "Vertex served Imagen 3.0 002 model",\n'
 '  "version": "002",\n'
 '  "tuned_model_info": {},\n'
 '  "input_token_limit": 480,\n'
 '  "output_token_limit": 8192,\n'
 '  "supported_actions": [\n'
 '    "predict"\n'
 '  ]\n'
 '}')
('text-embedding-004: {\n'
 '  "name": "models/text-embedding-004",\n'
 '  "display_name": "Text Embedding 004",\n'
 '  "description": "Obtain a distributed representation of a text.",\n'
 '  "version": "004",\n'
 '  "tuned_model_info": {},\n'
 '  "input_token_limit": 2048,\n'
 '  "output_token_limit": 1,\n'
 '  "supported_actions": [\n'
 '    "embedContent"\n'
 '  ]\n'
 '}')

Model Information (Vertex AI)

==================== Vertex AI ====================
('gemini-1.5-flash: {\n'
 '  "name": "publishers/google/models/gemini-1.5-flash",\n'
 '  "version": "default",\n'
 '  "tuned_model_info": {}\n'
 '}')
('gemini-1.5-pro: {\n'
 '  "name": "publishers/google/models/gemini-1.5-pro",\n'
 '  "version": "default",\n'
 '  "tuned_model_info": {}\n'
 '}')
('gemini-2.0-flash: {\n'
 '  "name": "publishers/google/models/gemini-2.0-flash",\n'
 '  "version": "default",\n'
 '  "tuned_model_info": {}\n'
 '}')
('imagen-3.0-generate-002: {\n'
 '  "name": "publishers/google/models/imagen-3.0-generate-002",\n'
 '  "version": "default",\n'
 '  "tuned_model_info": {}\n'
 '}')
('text-embedding-004: {\n'
 '  "name": "publishers/google/models/text-embedding-004",\n'
 '  "version": "default",\n'
 '  "tuned_model_info": {}\n'
 '}')

ekzhu · 2025-02-14T17:11:48Z

@siscanu please provide your feedback to this PR here.

chengyu-liu-cs · 2025-02-17T12:48:31Z

Another thing to take into account is how to use tools provided by google in AutoGen together with other FunctionTools.

Just for curiosity, do other client (like openAIchatcompletion) supports the use of tools provided by google or others ? Or users need to create a wrapper function ?

yu-iskw · 2025-02-20T00:38:16Z

Another thing to take into account is how to use tools provided by google in AutoGen together with other FunctionTools.

Just for curiosity, do other client (like openAIchatcompletion) supports the use of tools provided by google or others ? Or users need to create a wrapper function ?

First of all, we should enable users to use AutoGen tools with the client. On top of that, it would be good to discuss how to support tools of google-genai with this AutoGen Gemini client.

Implement a function/class to convert a Gemini tool to a AutoGen tool.
Enable to directly bind Gemini tools to this Gemini client.

Signed-off-by: Yu Ishikawa <[email protected]>

yu-iskw · 2025-02-20T00:43:34Z

I am still working on it. But, the direction of how to implement the client is getting clear.

yu-iskw added 2 commits February 12, 2025 15:33

tmp

416616c

Signed-off-by: Yu Ishikawa <[email protected]>

Merge remote-tracking branch 'upstream/main' into feature/google-genai

eca0404

yu-iskw mentioned this pull request Feb 13, 2025

Gemini Client #5522

Closed

3 tasks

yu-iskw added 5 commits February 13, 2025 11:46

update

c6710fb

Signed-off-by: Yu Ishikawa <[email protected]>

update

f472268

Signed-off-by: Yu Ishikawa <[email protected]>

Upgrade google-genai

65b6abf

Signed-off-by: Yu Ishikawa <[email protected]>

Update

2dd3f67

Signed-off-by: Yu Ishikawa <[email protected]>

update

6b47ce3

Signed-off-by: Yu Ishikawa <[email protected]>

This was referenced Feb 13, 2025

Error Handling Messages: BadRequestError with Invalid Argument in AutoGen Studio #5528

Open

Issue function calling in AssistantAgent with Gemini model #5508

Closed

ekzhu mentioned this pull request Feb 14, 2025

[Feature Request]: Tool call support with Gemini #3251

Open

ekzhu linked an issue Feb 14, 2025 that may be closed by this pull request

[Feature Request]: Tool call support with Gemini #3251

Open

ekzhu mentioned this pull request Feb 14, 2025

[Issue]: Function calling is not working properly for gemini model. #1198

Closed

yu-iskw added 2 commits February 20, 2025 09:42

Update

431bece

Signed-off-by: Yu Ishikawa <[email protected]>

Merge remote-tracking branch 'upstream/main' into feature/google-genai

8484009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Gemini client for Gemini API and Vertex AI #5524

feat: Support Gemini client for Gemini API and Vertex AI #5524

yu-iskw commented Feb 13, 2025 •

edited by victordibia

Loading

yu-iskw commented Feb 13, 2025

ekzhu commented Feb 14, 2025

yu-iskw commented Feb 14, 2025 •

edited

Loading

yu-iskw commented Feb 14, 2025

ekzhu commented Feb 14, 2025

chengyu-liu-cs commented Feb 17, 2025

yu-iskw commented Feb 20, 2025

yu-iskw commented Feb 20, 2025

feat: Support Gemini client for Gemini API and Vertex AI #5524

Are you sure you want to change the base?

feat: Support Gemini client for Gemini API and Vertex AI #5524

Conversation

yu-iskw commented Feb 13, 2025 • edited by victordibia Loading

Why are these changes needed?

Related issue number

Checks

yu-iskw commented Feb 13, 2025

ekzhu commented Feb 14, 2025

yu-iskw commented Feb 14, 2025 • edited Loading

Sample Code

Text Generation

Image Generation

yu-iskw commented Feb 14, 2025

ekzhu commented Feb 14, 2025

chengyu-liu-cs commented Feb 17, 2025

yu-iskw commented Feb 20, 2025

yu-iskw commented Feb 20, 2025

yu-iskw commented Feb 13, 2025 •

edited by victordibia

Loading

yu-iskw commented Feb 14, 2025 •

edited

Loading