Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemini Client #5522

Closed
wants to merge 2 commits into from
Closed

Gemini Client #5522

wants to merge 2 commits into from

Conversation

gziz
Copy link
Contributor

@gziz gziz commented Feb 13, 2025

Why are these changes needed?

PR Message

Hi, I just read the conversation you guys @yu-iskw, @rohanthacker, @ekzhu had in this PR. I have arrived a bit late to the party.

I started working on the client because I was playing/testing the Gemini models on AutoGen and the new Gemini SDK (google-genai) was just released, so I thought it would be a good moment. Additionally I had read about the need to have this client in previous issues.

The Gemini SDK has lots of features, way more than OpenAI. The implementation I’m sharing doesn’t include them all, however, I wanted to share the current state so we don’t duplicate work.

Good to mention it includes the important features (all I think) from the Autogen OpenAI client.

Here are the features the provided Gemini client support:

  • Generate text (well of course)
  • Json Mode and Structured Outputs
  • Function Calling i.e. tools
  • Streaming tokens as defined in create_stream and tested using the chainlit example.
  • Passing images to the model, e.g. I have tested the client with M1, where the web surfer sends images.

Missing features & TO DOs

  • Test with VertexAI
  • Image generation, i.e. the model returns a generated image.
  • Add rest of the models to mode_info.py
  • Lots of pyright/mypy warnings

Some preliminary tests I have run:

  • Works with M1 and chainlit.
  • Ran the tests in test_gemini inside test_openai_model_client.py

Important things to consider about the behavior of the Gemini SDK

  • By default Gemini tries to call functions (tools), this creates a conflict since Autogen is expected to call these tools.
    • I had to explicitly disable automatic_function_calling in the create_args config.
  • Gemini doesn’t have a json_output config, put rather response_mime_type config. Where they not only support json but also Enums.
  • For the reasoning models, thoughts are currently not provided in the API (source), however, the Response schema does have a thought field, hence I included the necessary code to handle it, in case they provide the feature in the future.

I’ll continue working and testing the client, however, would be good to see if I can get any thumbs up . Thanks

Related issue number

#3741

Checks

@yu-iskw
Copy link

yu-iskw commented Feb 13, 2025

@gziz I have been implementing almost the same as this pull request, though my implementation isn't finished yet. If you don't mind, I want to address this. What do you think?

#5524

@gziz
Copy link
Contributor Author

gziz commented Feb 13, 2025

No problem, would love to review! @yu-iskw 👍

@ekzhu
Copy link
Collaborator

ekzhu commented Feb 13, 2025

@gziz thanks for the PR, and let's move forward with #5524

@yu-iskw
Copy link

yu-iskw commented Feb 13, 2025

@gziz Thank you for your understanding. I will let you know, when I finish the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants