Submission: duvo-eye-1, Holo-3.1-35B-A3B + LoRA (72.9% overall, self-reported)#29
Open
tomascupr wants to merge 2 commits into
Open
Submission: duvo-eye-1, Holo-3.1-35B-A3B + LoRA (72.9% overall, self-reported)#29tomascupr wants to merge 2 commits into
tomascupr wants to merge 2 commits into
Conversation
PROMPT_TEMPLATE contained {"x":...} which .format(instruction=...) parsed
as a replacement field, raising KeyError: '"x"'. Double the braces so the
JSON example survives .format(). Verified against eval_screenspot_pro.py.
Author
|
Update: I ran this end-to-end under your So 72.87% overall under your harness, matching the 72.9% reported in the PR (our harness, 1153/1581) to within a single sample, with zero malformed outputs. Two small notes:
Happy to adjust anything for the merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
duvo-eye-1 — 72.9% on ScreenSpot-Pro (self-reported)
Model: duvoai/duvo-eye-1 (public weights)
Authors: Duvo
Method
Holo-3.1-35B-A3B (3B active) with a LoRA adapter (rank 64, alpha 128, lr 7e-5, 1 epoch) trained on duvoai/SynthUI (14.9k synthetic enterprise-UI rows). No benchmark data in training. The model emits
{"x": int, "y": int}in [0, 1000] relative to the input image; the point is scaled to absolute pixels via the original image dimensions.Results (full English split, all 1,581 samples)
0.0% unparseable outputs (1,581/1,581 valid coordinate responses). For reference, H Company's published ScreenSpot-Pro number for the base Holo-3.1-35B-A3B is 71.5 (their model card).
Protocol (honest notes)
bench_eval.py, included in the predictions dataset), not this repo'seval_screenspot_pro.py. The adapter in this PR reproduces the protocol inside your harness.chat_template_kwargs: {"enable_thinking": false}), guided JSON decoding.mm-processor-kwargs {"max_pixels": 8000000}.screenspot-pro/bench_sspro_highres_duvo-eye-1.predictions.jsonl). The category table above was recomputed from those raw outputs joined against the dataset annotations.Files changed
models/duvo_eye_1.py— adapter implementingground_only_positive()(same interface asmodels/holo1_5.py), so results can be reproduced with this repo's harness.model_factory.py— addedduvo_eye_1model type.Usage