-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Processing image/ multi modal responses in function tool results? #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You're correct in identifying that current function_tool support in the OpenAI Agents SDK does not yet support returning images as function outputs directly in a structured or visualizable way
Manual No structured way to keep an image in the agent’s memory for later tools. Save the image and upload a public or internal URL. @function_tool
def check_panel_on_grafana(start_time: str, end_time: str):
data = get_data_from_grafana(start_time, end_time)
image_path = plot_graph_and_save(data) # Save locally
image_url = upload_to_temp_storage(image_path) # Upload to cloud
description = f"Panel from {start_time} to {end_time}"
return {
"image_url": image_url,
"description": description
} connet with the agent agent = Agent(
name = "agent name...",
instructions = "your system prompt",
tools=[
check_panel_on_grafana,
analyze_grafana_panel,
search_logs_on_elasticsearch
]
)
# run the code with runner class |
I don't think providing the url would work, LLM wouldn't be able to see the image. |
This issue is stale because it has been open for 7 days with no activity. |
@rm-openai I think this is one of the biggest issues with the sdk |
I have seen related discussion: #341
and a related PR: #654
But seems like function tools don't support returning images as outputs yet.
I wonder what's the best workaround we'd have around this, or whether including images in the outputs would make sense for my use case?
For context, I'm building a PagerDuty alert root cause analysis agent with access to tools like this:
For the
check_panel_on_grafana
tool, since time series data could be huge, I was thinking I'd first plot the data as an image, and then feed the image into LLM along with some descriptions (start time, end time, panel name, etc.).I was thinking of just returning both the image and the text directly in the function output. Seems like that's not supported yet though.
Is my best workaround something like this? Call LLM directly and return the results?
but i guess
call_chatgpt_directly
won't have context to all the previous actions done by the agent thus far, and also, we only return text so all future actions won't get to see the actual image.The text was updated successfully, but these errors were encountered: