The role of the Scala language and compiler and tooling in the age of LLM-supported "automatic coding"

I think we have to be quite cautious about stating what there isn’t. We’re in very early days of figuring out how LLMs and transformers encode higher-level concepts especially as you nest them more and more deeply. “We looked a bit and didn’t find it yet” is the state of most knowledge, not “we looked so extensively that we know that it isn’t there”.

1 Like

A year has passed since this thread, and given how fast things moved in the agentic coding space, it feels like the right time to pick this discussion back up.

Perspectives from outside the Scala world

Three recent pieces frame the current debate sharply:

In “A Language For Agents” published yesterday, Armin Ronacher (Flask/Ruff/Sentry creator) argues we may need new languages designed around how agents actually work. His checklist: agents want local reasoning, greppable symbols, explicit types over inference (“a language where you need the LSP to know the type creates two experiences”), results over exceptions, braces over whitespace, dependency-aware builds, no macros/re-exports/aliasing. Provocative for Scala — we score well on some (strong types, braces, Either/ZIO, incremental compilation) and poorly on others (givens as invisible context, deep inference, macros, package object re-exports).

Meanwhile, Steve Yegge (co-author of Vibe Coding) has been running 20-30 Claude Code agents in parallel and landed on Go as his language of choice: “Go is just… boring. When the diffs go by, you can always understand it.” His take: simplicity is an evolutionary advantage when agents do the writing and humans do the reviewing.

On the other end, Gabriella Gonzalez (Haskell for All) makes the case in “Beyond Agentic Coding” that current agentic tools don’t actually improve productivity — citing studies where outcomes were equal or worse when measuring completed goals rather than lines produced. His alternative vision: AI as a proof-search engine over formal specifications, where rich type systems become the central advantage.

These three positions map neatly onto what was already debated here — @Ichoran’s “Python wins via simplicity,” @odersky’s “rich types for specifications,” and @djspiewak’s “give models TaSTy.”

Ecosystem updates — and a reality check on MCP

Metals gained MCP support (v1.5.3 May 2025, updated v1.6.1 Jul 2025) — typed symbol search, compilation, test execution, build imports. MCP as a standard went mainstream: adopted by every major agent platform, donated to the Linux Foundation’s Agentic AI Foundation (Dec 2025), reaching 97M monthly SDK downloads.

In practice, though, MCP ran into a serious problem: context bloat. Servers front-load every tool definition into the agent’s context window at startup. GitHub’s MCP server alone consumes roughly 25% of Sonnet’s context. With 4-5 servers connected, users saw 60-80K tokens gone before typing a single prompt — and paradoxically, more tools led to worse agent performance (wrong tool selection, hallucinated parameters). Two responses emerged:

  • CLI + Skills as a leaner pattern. Many power users replaced MCP servers with markdown-based skill files and bash scripts that use progressive disclosure (~30 tokens of frontmatter each at startup, full instructions loaded on demand). Example: instead of keeping GitHub MCP loaded, a /gh-pr skill wrapping gh pr create does the same job at a fraction of the context cost.

  • Tool Search (Anthropic, Jan 2026) addresses this from the protocol side — agents now discover and load tools on demand rather than upfront, cutting overhead by ~85%. Enabled by default in Claude Code. Still, per-session token cost tends to favor the CLI+Skills approach.

For Metals specifically, the tool surface is modest (~9-15 tools vs. GitHub’s 40+ or Docker’s 135), so the bloat is less acute. But the broader takeaway applies: how much context an integration consumes matters as much as the protocol powering it. Worth asking: would a set of lightweight Scala-specific Skills calling scala-cli/sbt/metals/mill CLI commands serve daily agent workflows better than a full MCP server?

Other developments worth flagging: TypePilot (EPFL, Oct 2025) showed multi-step agentic generation where Scala’s type system actively hardens code against vulnerabilities — real evidence for the “types as agent guardrails” thesis. Reasoning models have also improved markedly: Claude Opus 4.6, Gemini 3 Pro, and GPT-5.2 all handle Scala 3 (including Cats Effect and ZIO) far better than the o3-mini @Ichoran tested against last year.

Open questions — looking for practitioner input

  1. Day-to-day experience — who here is using agentic tools (Claude Code, Cursor, Copilot agent mode) with Scala regularly? Have you tried the Metals MCP integration, and if so, did context overhead push you toward CLI+Skills instead?

  2. Ronacher’s agent-friendliness checklist for Scala — how do we actually score on greppability, local reasoning, type explicitness, macro minimalism, build-system awareness? Where does Scala create the most friction for agents?

  3. TaSTy as agent context@djspiewak floated this idea a year ago. Metals MCP takes a step in that direction. Has anyone experimented further? Could TaSTy be surfaced through a lightweight Skill rather than a full MCP server?

  4. CLAUDE.md / AGENTS.md for Scala repos — these instruction files are now standard across many ecosystems. Should the Scala community publish a recommended template for sbt/Mill/scala-cli projects? This might be the simplest high-impact thing we can do right now.

  5. The specification + proof path — capabilities tracking is on the Scala roadmap, TypePilot shows type-guided generation works in practice. Is there a realistic path to Scala becoming the language where agentic output is verifiably correct by construction?

  6. Boring vs. expressive — does Yegge’s Go argument hold for Scala? Does simplicity win outright when agents write code, or does Scala’s type system generate enough verification value to offset the complexity agents face?

Scala has genuinely distinctive assets here — TaSTy, the type system, early MCP integration, the capabilities roadmap — but realizing that potential requires designing for agent consumption intentionally, not as an afterthought. Curious what others are seeing on the ground.

4 Likes

Thanks for this thorough and thoughtful input @guersam !

Regarding language design targetting both AI agents and human readers/creators, I think the trade off depends a lot on the agentic use case and whether or not humans look at the code in e.g. reviews, maintenance, etc. I assume we could have a balanced AI-friendly AND human-friendly Scala style, perhaps enforced by formatters and linting and even supported by a compiler rewrite, so that inferred types are automatically expanded in relevant places and braces are inserted to signal long scopes, etc. I agree that it is interesting to investigate if an agent supported by a typed abstract syntax tree could give the best trade-off between rich and accurate information and context spending.

Looking forward to any input from Scala vibe coders with various use cases in AI-supported software engineering!

1 Like

I believe that’s a meaningless position given how fast they evolve and how fast “how agents actually work” changes. LLMs don’t even share a foundation - most use transformers, but some are based on LSTM. Just recently they could only really write Python properly, now they’re pretty good at Rust and are getting proficient with Scala. We shouldn’t bother too much about the details of how they work or try to optimize for that, as that’ll keep changing for the foreseeable future.

5 Likes

There are 3 “readers” of the code in order of significance:

  • developers/maintainers
  • compiler
  • anything else, like scalafmt and random AI

Language creators should focus mainly on first group, with main focus on ergonomics for developers - compiler should adapt to what humans write and understand, not on what is easier to implement in compiler.

If humans can understand the source code, then humans will be able to create tools around it.

4 Likes

We are working on something like that. “Verifiably correct” asks for a lot, but “verified to stay within given boundaries” is achievable We should have something to show in a month or so.

13 Likes

This used to be true, but we’re entering an era where I don’t think it’s certain to stay true.

What if the future is that we look at actual code about as much as we look at assembly language instructions now? Yes, a few people do (hence godbolt etc.), but mostly you write code. Are we really sure we’re going to be spending much time looking at code, as opposed to conversing about capabilities, in the future? Yes, there are a variety of significant challenges (e.g. code reuse enables changes/fixes in a way that massive duplication does not, and LLMs presently lean heavily towards the latter), but we have a lot of effort devoted to tackling the challenges.

So I think we have to be a bit more cautious these days. Languages that have capabilities not easily replicated by human or LLM (type systems and various other proofs, for instance) ought to be a bigger win in the space than languages that don’t (e.g. Go) in the long run, but there’s also a lot of path-dependence to where we end up.

3 Likes

While I understand this point, I think there is a very important distinction:
Compilers are cheap (comparatively):

  1. They are cheap to run, basically any CPU can do it
    Compare to AI where you either need a beefy GPU, or to pay a subscription somewhere
  2. They are cheap to make, again, a regular computer is fine
    For AI, it’s even worse, you need giant data centers to train those things

Not to speak of the salaries to make all that run, on top of the development costs

Currently AI companies are hemorrhaging money, especially at their highest subscription tiers
So the question becomes: Can it be ubiquitous if users were to pay the real costs ?

And maybe yes ! A lot of technologies were really expensive, and then became really cheap (for example flights), but others didn’t ! (like the Concorde)

Furthermore, the trend currently is to make AIs as similar to humans as possible, since that’s how we define to be intelligence.
So making a language better for humans will make it better for AIs
The reverse might not be true, making it better for AIs might not make it better for us

So I believe we should really focus on making the language better for humans, this will prove beneficial whatever happens with AI
(Even in the case of a singularity, I’d still write Scala for fun !)

6 Likes

I spend some time reading a bug report and noting that it looks like another bug report and is probably subsumed by it.

I think the way to make a developer more productive is not by improving tools that prove results but by tools that identify what is already known to be unproven, the detritus of tickets.

This was the thesis of a tool from 20 years ago, that organizations have institutional knowledge that is inaccessible because of silos. If you capture that knowledge, which exists in informal chats or perhaps bug reports, and make it available to queries, then people can work better; though you must also respect boundaries for IP, that is, secrets.

I don’t want my compiler to explain to me why it can’t prove some type conformance.

I want it to tell me my problem sounds like this known problem discussed on this ticket or that forum.

I especially don’t want the compiler to ask me to “run again with -explain -llm" to access this information.

6 Likes

I concur, but it’s worth interrogating that.

The reason almost nobody bothers to look at assembly language any more is that everyone is more or less 100% confident that the generated assembly language is not only more efficient than a human can generate, but also that it is essentially certain that it is correct. That’s very much not the case yet for LLM-generated code – it’s plausible that we may get there, but by no means certain.

In my immense collection of buttons (~750 of them: I choose one at random each morning), I have one that reads, “It is trivial to write a program that conforms to the spec. It is impossible to write a spec that says what you want.” That’s very much in my mind these days, since it succinctly describes the hardest problem for LLM generation.

At least for the moment, it seems like a huge benefit is being able to quickly validate the generated code, with as little effort as possible. That suggests to me that the priority for the current phase of evolution is a combination of expressiveness – it’s clear what the code is doing – and provability – having confidence that the code can’t be broken in hidden ways.

My suspicion is that both of these characteristics are going to be helpful in the long run. It’s by no means obvious, but it feels like the likely evolutionary path, even as things improve to the point where it’s less and less necessary to do manual checking.

5 Likes

I agree, and given how LLMs work, I think many of the things that make that easier for LLMs and humans are the same (likestrong typechecking, because neither human nor LLM can think it through as quickly and reliably as the compiler).

But I don’t know whether the human review will be an only-for-the-moment thing.

3 Likes

i was talking to some Atlassian people this week, the theory (not an endorsement) is that you could export chats, transcripts etc to a Jira issue and then basically their agent has access to this index accross a whole org - yet to see if there is enough computing power to scale but they claim already to have 30 billion objects indexed - but deep text is a whole other story

1 Like

I have some thoughts on this. Up front disclaimers: I work for Nvidia, so I both have a literally vested interest in AI’s success and a ton of exposure to a lot of the present and future of this space. My opinions are my own. Nothing I’ve written here is beyond what any other reasonably informed person would know (though I have probably used more of the frontier models more extensively than most people, so I have a pretty wide sampling to draw on).

Scala is in a strange place with respect to vibe coding. On the positive side, it has extremely good tooling for expressing and automatically-verifying guardrails and requirements at scalable levels of granularity. The Metals MSP is absurdly good and very potent, but even more importantly, the language itself carrots you to effectively leverage the type system and to properly layer your architecture. Martin used to talk all the time about how Scala was designed to be a compositional language, and it succeeds quite well at that when you’re using it optimally. More on the importance of this in a bit. Scala also has a uniquely powerful ecosystem of libraries and frameworks. I realize I’m a bit biased on this front as well, but seriously everyone, y’all have no idea how good you have it until you try a different language. Did you know that Go’s channels have a kernel mutex which wraps around every operation since they couldn’t be bothered to implement lock-free concurrency? Now you do.

On the negative side, knowledge of Scala idioms is quite limited within the training corpra. This leads to some odd issues like unholy blending of functional and imperative styles (I built some immutable data structures the other day and it happily plopped return statements in the middle of otherwise very functional implementations). It can also lead to some weird hilarity when you really get down to it. The corpus of pure functional Scala is even more limited than the corpus of Scala, and I’ve personally written a significant fraction of it. I have my own quirks and idioms when I hand write code that are pretty identifiable when you know what you’re looking for (inner and back names come to mind), and it’s kind of hilarious to see the model happily regurgitating my own idioms back at me (stochastically blended with recognizable idioms of other prolific authors like Mike Pilquist). This ends up being a stylistic issue more than anything, but it speaks to how tiny the Scala-specific training corpus is compared to other languages if the model is literally overfitting on my own work.

As noted repeatedly earlier, this doesn’t make the LLMs bad at writing Scala per se. Again, they seem to generalize quite well cross-language. It just makes them inconsistent at rendering Scala.

Even worse, the JVM and its mechanisms for separate compilation have a long-standing and very undeserved negative reputation. Three decades on, I think that reputation is pretty much unshakeable. This sucks, and it doubly sucks since Scala’s fate is inseparable from the JVM’s. When people are picking languages today, they care a lot less about the way the syntax makes them feel and a lot more about how easy it is to deploy and operate. Go has a million terrible flaws but it does have very fast tooling (which really matters when the agent is running test build actions a hundred times in a loop) and statically linked binaries. Scala can also have these things if you use it in a certain way, but it’s not the default mode.

Let’s take a step back… What makes a good agentic language? I would posit the answer is one in which it is possible to concisely express and comprehend automatically verifiable requirements at varying scales of granularity. I spend a lot of time reviewing and nudging LLMs into different type signatures or module boundaries. Honestly even more time than I spend reviewing their test coverage. This is something that Scala is already extremely good at. Like, really really good. What’s more, as agentic tooling continues to evolve, human review will happen further up the architectural ladder (covering progressively larger modules). Used properly, Scala scales really gracefully up this spectrum in a way very few other languages do (as Gabriella points out, this is a property shared with other functional languages, but Scala is easily the most mainstream of that lot).

The state of the art agentic tooling today is really exceptionally good but only if you can get it into a box and let it hallucinate the contents. The practical size of those boxes is getting larger over time, but until we get to AGI, there will always be some granularity of boxing. The walls of the box need to be automatically verifiable, usually with a combination of tests, external tooling (this gives you a cross-check on your problem space), and human-readable type signatures. If you have to pick one battlefield in which to be really good in the AI era, this is it.

I don’t think we need to make real changes to the language in order to fully exploit this. In a meaningful sense, we’re already there. What we do need to do is lean further into the tooling and ecosystem. As I said a year ago, exposing TaSTy is an extremely effective strategy. The Metals MCP is extremely effective. I don’t really accept the context window bloat problem since the alternative is turning raw compiler output into tokens and parsing it, but MCP authors definitely need to remember to progressively reveal information rather than just plopping it in a big dump.

As an aside, build tools which have low startup time (sbt --client, sbtn, mill) are insanely important since agents don’t seem to like interacting with persistent background shells like sbt (and they probably never will, since such tools are very unusual; fun fact, bazel is written in Java but its interaction model is similar to any other mainstream tool like go or cmake).

Additionally, it would be extremely valuable if we could go back in time ten years and, as a community, settle on a standard way of writing Scala. I obviously have my preferences on this, but frankly those preferences don’t particularly matter. It is objectively optimal for agentic tooling to lean heavily in a functional direction (in the composable, parametric, and immutable sense, not in the “avoid subtyping at all cost” sense since that really doesn’t matter). Questions like direct style vs flatMap vs for are also irrelevant, though again it would be really helpful if we just picked one. If we had a more uniform corpus of public code, we could expect much more uniform and easier to review LLM generated outputs (try vibe coding in Go or TypeScript once or twice and you’ll see exactly how this is useful), but sadly it may be too late to fix this. The next best thing is to have a curated set of published prompts which anecdotally help the frontier models do sane things. A standard set of cursor rules and CLAUDE.md bullets would go a long way.

Finally, we need to celebrate the strengths of the JVM and the strengths of the library ecosystem much more heavily. Languages are a means to an end, and that end is in one sense a runnable thing, and in another sense is a conduit to the functionality exposed by the underlying platform and ecosystem. On the former front, the JVM is extremely good at most cases that don’t involve fast startup, but people don’t believe it, so we probably need to lean more and more into Scala Native. Building up a strong “statically link your binaries” muscle would go a long way to winning the “I’m starting a new project, which language do I pick” battle. Additionally, and critically, we really need to celebrate what we have. You seriously cannot understand the kind of advanced alien technology which exists within the Scala ecosystem until you do serious work without it. It is absolutely criminal that we aren’t selling this a lot harder.

None of this is new news tbh.

On the other end, Gabriel Gonzalez (Haskell for All)

Off topic sidebar, but please don’t dead name people. I assume it wasn’t intentional, but her name is Gabriella.

12 Likes

Excellent analysis–or at least one I almost entirely agree with (on every point, save the ones I mention below).

I don’t think that’s quite fair. Go was also too lazy to distinguish between streaming and single-use select statements, which consequently means that they are building and unbuilding listeners like crazy. This means that under concurrent use, you typically have multiple steps that need to stay in sync on every operation. A mutex is a very reasonable control structure for that.

And I think the key there was simplicity more than laziness. There is one thing: select. You use it. Or you use it a bunch of times in a loop; doesn’t matter, it works the same way. Just select.

In Rust, crossbeam/mpsc channels could be used in select! but more often not, because there’s usually a smarter way to do it. Go doesn’t intend for you to do it the smarter way. Do it the obvious way! And for that obvious way, it’s not clear that lock-free would be a win.

Yes, this!!!

I would be surprised if this doesn’t change as they move more into desktop control and are interacting with all sorts of other persistent stuff.

But this is exactly backwards. I can run JVM code anywhere on anything, pretty much, with two lines on the command-line: one to install scala-cli, and the second to compile and locally run the project. Terrible for fast startup, but it runs rings around everything else in terms of end-user-reliability and ease-of-development. You don’t need to set up a cross-platform build action on GitHub or anything. You just write your code, run it like that, and it runs everywhere. With GPU access, if you pull in the bytedeco stuff.

It’s kind of crazy how easy it is.

The cross-platform binary stuff is way way better these days than it used to be, but still, Scala-on-JVM has about the best possible story if startup time isn’t an issue. Playing catchup with others, while losing some features, and losing on performance compared to the native experts, seems like not the best strategy. (Native should have a good story, but the good story should be, “And when you need native, you can still have your lovely, powerful language,” not “Hey teams, reach for native by default because mumble mumble cross platform.”)

2 Likes

I mean, Cats Effect does the same thing. Due to the fact that parallelism always caps out at nproc, the contention is firmly within the window where optimistic locking golly-wallops pessimistic, even with extremely simplistic implementations. Take a look at Fs2’s Channel and the non-Async Queue implementations, for example.

Multi-shot listeners are nice but they’re not really required.

I think this is a good example of where we’re kind of evaluating the wrong things. Working on projects and testing/trialling/iterating locally is relatively easy in any language. They all have environmental setup hurdles. If anything, Scala’s story is, as you pointed out, leading the pack since we don’t need to worry about virtual environments or lock files or similar.

I was referring to the deployment story. Hopefully it’s clear that statically linked binaries are the superior option when you’re talking about tools (meant to be launched directly by users), but they’re also the simplest and most ergonomic option when talking about deployed backend services. This is ironically the area where the JVM shines from a performance and operational tooling standpoint (heap dumps omg), but people don’t see that as acutely as they see the fact that it’s dead, dead simple to containerize a go/rust/scala-native binary and toss it over the wall, and you know it’s going to properly respect cgroups on the first try. And of course, don’t get me started on serverless, container provision times (which really do matter), or multicore dependence.

And this is all on top of the JVM’s really terrible (and undeserved) reputation. Like, you don’t need to sell me on how good it really is, and of course I don’t personally have any trouble containerizing a Scala JVM app and getting it right on the first try. I’m trying to conduit the masses here.

2 Likes

I agree for backend services, assuming you’re not doing something as lightweight as AWS lambdas–deployment is envisioned as “throw binary over the wall”, and you can’t count on very much being there especially if you might switch service providers, so you may as well statically link as much as possible.

So, yes, the simpler it is to throw a binary over the wall, the less it feels like friction. Even if the amount of friction in getting a JVM in shouldn’t be a blocker, one can notice the difference.

For tools meant to be launched by users, I think it’s less clear. Practically every program of nontrivial complexity has some initial setup to do; what really matters is that it actually works. If that mechanism is local compilation, pulling a JVM and packages from Maven, or tweaking some config files and running binary, who cares? It needs to work and not take too long (and if it takes a while, be faster next time).

These concerns might be rather different soon because of AI agents, though. “Get this working in Docker” and “install VS Code for me” seem like pretty low-hanging fruit. The answer here might change faster than we can adapt to what the answer is now. Complicated deployment yesterday might turn into trivial just-ask-for-it deployment tomorrow.

What won’t change is the difference you get in scalability and robustness based on whether the AI workbox is proofy or is test-suite-driven.

The optimism can be abstracted into the mutex itself, which Go does at least as of 16 years ago. (And Java does also with synchronized among other things.) You lose the data-awareness that you have if your channel does it itself, and if you could have done your entire operation with that atomic update, that’s obviously a loss. But with Go channels, as far as I understand it, the usual use case can’t be done with an atomic update anyway because that could only work if nobody was waiting, which would only happen if you’re failing to feed your pipeline adequately. So the mutex just abstracts the usual case.

(Aside: at least for now, fiddly concurrency is a very good case for “whatever the LLM uses, it should be programmer-readable”. I at least have had a hard time getting them to produce ideal concurrent code.)

1 Like

Appreciate your valuable, detailed thought!

Off topic sidebar, but please don’t dead name people. I assume it wasn’t intentional, but her name is Gabriella.

It wasn’t intentional at all; my memory was quite outdated. Just updated the original post, thanks for the correction!

3 Likes
  1. Tooling. It’s incredibly complex. When I read something like: A Beginner's Guide to Using Scala Metals With its Model Context Protocol Server. – it’s full of WTF for me as a relative beginner. Why do I need to install and run VSCode? The build process already spawns sbt. In ideall world it should be one command (or one sentence in CLAUDE.md) – to start the MCP server in the background run ,… if one is not yet available.
    Interesting, that claude code is quite usable without MCP.
  2. Multi-project. With today’s setup, people often wait for AI output in one project while working on another. But when sbtn from project A connects to sbt server in project B (which is blocked) – we have a catastrophe.
  3. Scala has great potential be a language for AI, because developers and assistants work on common artifacts, which should have strict semantics (unlike natural language). And because Scala allows for condensing things (via metaprogramming or DSLs), it enables building an observable program with greater functionality. What will help - analysis (but compiler already doing this)
  4. One opportunity, which I periodically think about – If we have expressed domain models in Scala, it will be good to prove facts about domain at compile time and store proof artifacts.
2 Likes

Is this an issue you have encountered? I almost always have multiple projects open at the same time and I always use sbt --client. And never encountered any problems. At least not any that are related to having multiple sbt servers running. I don’t use the agentic coding mode often though, but I don’t see how that would make a difference.

[insert Obama giving himself a medal meme]
But yes that is absolutely correct.

I think it depends on the size of the project, and it becomes visible when the compilation time reaches minutes.