Today, our official results, which show by default, are all run from the same hardware and the same inference provider of OpenRouter.
However, of course, there's been many submissions from unofficial results from different hardware, different providers, etc.
So I'm of two minds about how to handle this. On the one hand, I want to be able to provide a consistent platform to compare models against one another. And I think that hardware and providers can obviously be variables that impact that. Thus the current methodology of controlling for those variables.
But on the other hand, I want it to represent a real world view into how people are using OpenClaw. That's why today we can show you every submission if you check unofficial results. You'll see all the submissions from everyone around the world.
The question for this issue is, and I'd like some input on, is how do we display results by default? Should we continue the way we're going? Should we add other harnesses or providers to the official results? Or should we by default just show you all results?