Fix or remove task that fails 95% of the time on 90% of models

## Problem

There's at least one task in PinchBench that has a 95% failure rate across 90% of models. This skews results and wastes compute.

## Action Needed

Identify which task this is and either:
- Fix the underlying issue (if it's a test bug)
- Remove it from the benchmark suite (if it's genuinely too hard)

## Notes

Look for tasks with consistently low pass rates across diverse model types (not just small/cheap models).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix or remove task that fails 95% of the time on 90% of models #51

Problem

Action Needed

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix or remove task that fails 95% of the time on 90% of models #51

Description

Problem

Action Needed

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions