Autobench the stdlib with LLM

In our work, we frequently utilize LLMs to automate benchmarking and generate “stacked commits.” Each commit in the stack delivers a positive performance improvement, and the series is ultimately submitted either as a single consolidated Pull Request or as separate, individual PRs. Recently, we observed within the community how lihaoyi employed a “Ralph loop” to optimize a compiler, boosting its speed by over 50%. I believe we could apply a similar approach to optimize the performance of the Scala standard library—even though it is already quite fast. What are everyone’s thoughts on this? Given that corresponding JMH scores are available to provide a clear before-and-after comparison, I am confident that LLMs would be well-suited to handle this task effectively.

2 Likes

It’s not a bad idea in principle, but I worry about automated LLM usage from a pure cost perspective.

We’re starting to see LLM costs rising to something closer to the actual costs of running the services, which is on the order of ten times what people have been paying up until now.

So I’d be cautious about using it in an overly automated way, given that Scala Center very much isn’t made of money. Used as an occasional tool it’s plausible, but I wouldn’t bake it into processes too much.

1 Like

Currently, both OpenAI and Claude offer dedicated support via OSS accounts. While it is possible to identify certain issues using the free plans available on platforms like OpenRouter or OpenCode, relying on actual API calls for extensive usage proves to be prohibitively expensive. If anyone happens to have unused credits available, it is certainly worth giving it a try.