We can enable some code review ai robot on the Scala repo and then people can request an AI to do some cr, eg, GitHub Copilot. and the GREPTILE seems to provide some free license for OSS too.
I’m skeptical. We have mandatory GitHub Copilot code reviews at work, and the majority of things that they bring up are picayne at best, and frequently just plain wrong. They raise a lot of fairly “small-minded” details that just don’t understand the larger context, the code style of the project, and so on. They are especially bad at Scala 3.8 – I get many, many Copilot reviews that say “this won’t compile” which are empirically wrong: best I can tell, they’re pretty bad about telling the difference between Scala 2 and 3.
It’s all getting better, relatively rapidly, and about one time in five they find something that I decide is worth actually paying attention to. But relatively few are critical, and in the aggregate they slow the process down and require more human effort per PR in order to evaluate all those suggestions.
(Note: I’m not an AI skeptic. I use GH Copilot every day at my dayjob, and Claude for my own Querki. I use this stuff a lot. But that means I’m acutely aware of its limitations – it’s far from consistently good.)
I generally find that with best-in-class LLM tools, my Scala code quality goes down when I get them involved unless I use a lot of discretion.
Now, the volume certainly goes way, way up. If I am trying to get a project coded quickly, I will absolutely use them.
However, as a code review tool, I find that while they do find things I wouldn’t have found myself due to lack of personal attention, they also “find” a lot of things that are fine, or which are a tradeoff and the alternatives are worse.
For now–maybe in another generation or two this will change, since the last couple of generations brought things up to the point where I found it useful for writing Scala 3–I think code reviews should be owned by an actual person who has some reasonable amount of familiarity with Scala. If they want to use LLM to help, great! But just turning on something automated without human oversight is I think more likely to result in distracting clutter than beneficial insights.
Again, we should keep evaluating. I just don’t think it’s time to pull that trigger yet.
I’ve put in several days total time writing Scala guideline markdown for Anthropic Claude. It takes a lot of up-front project-specific investment of time to get it to be helpful. Without that it tends to write python- or Java-style code, minimal types, and no use of Scala special features. Insist that it at a minimum compiles the code before having you look at it. However, if you let it run tests it’ll reach a point where it decides to delete the test code unless you forbid it. It also isn’t mature enough to commit on its own. The effort will also expose a lot of minor differences in style with teammates. Be ready for that.
Once you’ve made the investment then it does much better with code generation than review. It is especially good at using really flat DSL-style Scala libraries. It doesn’t think enough to create them, but with some guidance can extend them. It is surprisingly good at solving Cats Effect type signature puzzles when I force it to loop with the compiler.
Now that I’ve made that investment the prompt “Suggest six improvements to the files we’ve worked on for this issue” will give me maybe three things to change now plus one new issue to do later. Its code review suggestions haven’t been good. Our working relationship is I review its work before I commit, and I don’t let it commit on its own.
Try not to think of the AI as a person. If it were a person it would be a sociopath with a 20-minute memory and no aesthetic sense. It’d be the sort of intern you fire before lunch on its first day.
I’ve gotten them to the point where the generated code quality is usually good. But it’s required evolving some increasingly serious AGENTS.md files, and I prompt aggressively, including telling it where I expect refactorings. My typical PR prompt is several paragraphs and a bullet list. (And I still usually do a couple of rounds of “fix this, refactor that” before submitting the PR – it’s almost never good enough on the first try.)
But yeah, we still require double-code-review by humans (as well as the person driving the PR), and I don’t expect that to change for at least the next year.
Overall, it’s still a big win, speeding up my coding by a factor of three to five I’d guess. And I use a separate high-quality LLM (Kagi Assistant) for technical research almost every day, which is vastly faster than old-fashioned Googling. But I’m still deeply skeptical about the vibe-coding approach – we’re nowhere near there yet, at least for serious code.
Not sure what model is used by copilot but in general using the best available model (gpt-5.5 high or xhigh) with precise context and instructions makes a huge difference. For example provide pointers to the documentation for the exact Scala version in the project, and explicit instructions on what do you want the review to focus on, etc.
The GitHub copilot library only supports medium views for Opus 4.7 and high views for GPT 5.5, which is theoretically insufficient for in-depth reviews. While the final review and optimization should indeed be done by humans, and we can’t follow OpenClaw’s example of writing and merging entirely with AI, using LLM for review can indeed help identify issues early, allowing a PR to quickly reach a ready state.
Thanks for sharing. Setting up a good workflow and good prompts takes a lot of time. I brought this up because I think our experts can provide better Scala 3 compiler expert experience in the repository. When feeding these skills to the AI, it can better review our PRs with high quality and consistency.
Yes, same. Using the same approach.
I haven’t found one universally good set of instructions; each project tends to end up with its own set that pushes it towards what I consider good coding practice for that project–not because my style changes, but because the types of deviation from what I consider good style changes depending on the project. I find it more reliable to fix the project-specific leanings than have one big list of all rules, because when the list gets too big, I find it is more likely to ignore parts of it.
So I tend to have a small list of core rules with things like “Do not swallow errors. Propagate error branches until they reach the user. Let exceptions propagate or package them into an error-aware sum type. Make sure thread boundaries always pack exceptions into error-aware types.” And then depending on what it does wrong, I add specific instructions about that. (“Seriously, I mean it, do NOT swallow errors!!” except elaborated into what not to do with matching, with foreach, with toOption, etc..)
(In case you can’t tell, I have the most fights about error handling.)
Yes, it’s best to use top-tier models like GPT5.5 and Opus 4.7, while also needing some engineering support from Scala repositories to preserve expert experience, so that any AI can be effectively implemented.
GitHub Copilot’s not a model, it’s a shell on top of a bunch of models that you choose among. In practice, my experience cited above is almost entirely using Claude Opus 4.6 and Codex 5.5 – basically the state of the art – plus a little Opus 4.7 in my personal work.
(I didn’t even consider this stuff to be worth my time until Opus and Codex.)
Would love so see some of the things folks find most useful in their agent files.
A few very broad categories, off the top of my head:
- Nuts and bolts: how to build, how to test, coverage rules, stuff like that.
- Paradigm information. (Eg, that it’s a relatively pure Typelevel stack, no cheating; the way we’re doing dependency injection.)
- Testing approach. (Eg, no indeterminacy; favor scenario tests instead of unit tests where appropriate; allow COVERAGE-OFF for belt-and-suspenders edge cases that really mean “the world is broken”, but otherwise require 100% coverage.)
- Coding style. (Favor smallish methods, refactor aggressively; favor newtypes instead of primitives.)
- Lots of project-specific details, some of them generated by the LLM itself.