Skip to content

Optimize image + text prompt ordering for better results #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2025

Conversation

tberends
Copy link
Contributor

@tberends tberends commented Jul 23, 2025

Description

This PR improves the ordering of content in requests that combine images with text prompts. Following Google's Gemini API best practices, text prompts are now placed after image parts in the contents array when using a single image with text.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

According to the Gemini API documentation on image prompts, when using a single image with text, the recommended approach is to place the text prompt after the image part in the contents array. This ordering has been shown to produce significantly better results in practice.

In our testing with Process & Instrument Diagrams (P&IDs) using object detection, this reordering led to drastically improved accuracy in bounding box positioning. While the object labels were already accurate, the spatial precision of detected elements improved considerably with the optimized prompt ordering

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CLAassistant
Copy link

CLAassistant commented Jul 29, 2025

CLA assistant check
All committers have signed the CLA.

@SkalskiP SkalskiP merged commit c7988b1 into roboflow:main Jul 29, 2025
1 check passed
@SkalskiP
Copy link
Collaborator

Hi @tberends 👋🏻 Thanks a lot for this PR. It has been merged. Would you be willing to update the sv.Detections.from_vlm section of the supervision docs? We share prompting tips there, and I think adding this information would make a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants