In-source evaluation of expressions (and in-source unit tests)

mfcold · December 24, 2025, 9:08am

Lean4 has a feature that is awesome for teaching, learning, and incremental programming: real-time evaluation of source code expressions in the editor, without the need to compile or run code manually.

This is done by writing the command #eval underneath your definitions.

def f (x : ℤ) : ℤ := x * 2

#eval f 8 -- 16
#eval f 16 -- 32

This is similar to interactive notebooks for Python and worksheets.sc for Scala, but has the advantage that it can be used in real practical codebases with no setup burden, rather than being limited to purely educational and literature contexts.

There is also a #guard command which serves as in-source unit tests that can be left in source to serve as documentation.

#guard f 8 = 18 -- error, not true

Which is a huge win for reducing the overhead of unit testing – I personally used to be a passionate advocate for writing tests, yet over the years have stopped entirely out of framework-fatigue. I find it too cumbersome in Scala to justify for smaller projects. (I hate the directory structure it imposes, I hate needing to memorize and use library constructs like assertEquals, needing to write verbose class declarations for test suites, needing to write english descriptions of tests as strings, needing to re-research how to set up the build tooling for it on every new project, etc..) not to mention how inaccessible it is for beginners.
Having a simple way to write simple in-source unit tests would make it feel much more worthwhile. Rust is another language that offers this.

I would love to see something like this in Scala. Given that we already have .worksheets.sc and websites like Scastie for evaluating top-level expressions, I assume the architecture is mostly there?

Pre-Proposal Idea: ‘`#`‘ as a new top-level keyword

As far as I know, the most sensible implementation for Scala would be to do exactly what .worksheet already does, but for all .scala files now and activated only on lines that begin with a special keyword (I propose # as I think this is a free symbol in the language), e.g.

def f(x: Int): Int = x * 2

# f(8) // 16

I don’t think we should offer multiple forms like Lean does in distinguishing #eval from #guard. Rather, I think we should only offer #, and just interpret boolean expressions as unit tests that emit IDE errors when they evaluate to false

def f(x: Int): Int = x * 2

# f(8)       // 16       (for temporary experimentation, you'll delete this)
# f(6) == 12 // ✅ true  (a unit test, you'll commit this)
# f(6) == 22 // ❌ false (a unit test failing, emits a file error until fixed)

Simpler and more elegant.

Feedback and ideas appreciated. How feasible would this be to implement in the compiler and vscode? What would I start with if I wanted to experiment on getting this to work on my own?

spamegg1 · December 24, 2025, 10:04am

I don’t think # or any special keyword is necessary or desirable (except for the purpose of imitating Lean syntax); like in worksheets, just top-level expression can be enough. If you wanted them not to be evaluated, you would have to comment them out anyway (regardless of #).

I’m not sure why you’re not satisfied with worksheets though? It solves all the problems you hate. I ported hundreds of interactive Dr. Racket files with unit tests to Scala worksheets successfully and satisfactorily. Thanks to the Doodle library I even ported the interactive animations Does the file extension bother you that much? You can always change it back to .scala later if you want I’ve done so hundreds of times. If it’s for non-educational purposes, you wouldn’t want to leave those evaluations polluting the file anyway right? And for educational purposes you’d want to leave them there.

Probably nothing is necessary or possible in the compiler; Lean was designed as an interactive prover from the beginning, aimed at non-programmers and mathematicians, and not for building applications; so its compiler must be fundamentally different than Scala’s. My understanding is that Lean has a “kernel” that implements the CoC type-checker, and everything else around it (VS Code extension, etc.) interacts with this kernel. Essentially, every Lean source file is an “interactive worksheet” in VS Code. It evaluates even if you don’t use #eval or #check or #guard. The kernel will evaluate the types where you place your mouse cursor, at all times. To get an executable binary you have to do additional stuff on the command line (with lake build), which also uses the kernel but differently.

Also I would have to say that Lean’s “everything is a worksheet” can sometimes be an awful experience. It can consume too much CPU/RAM or crash, or become very slow / unresponsive. Non-interactive, performance aware Lean code (in Mathlib for example) is unreadable to humans. In my educational code, I make a clear distinction between “lightweight, interactive exploration with worksheets / scripts” (.worksheet.sc or .sc) and “more serious, possibly performance heavy code to be run” (.scala).

Probably best approach is to ask @tgodzik and maybe raise a Metals feature request, you could probably implement it as a Metals feature or a VS Code extension / add-on; basically all you need is to look at how worksheets are done and hack it so that it can work for .scala files.

mfcold · December 24, 2025, 10:52am

Because I want this in real software engineering context with zero friction cost. When I’m iterating on a design, it’s useful to see what expressions evaluate to. What that currently means is going to Scastie to experiment, and then copy pasting my finished implementation back into VS code, (this is usually less work than would be to rename my file extension to .worksheet and then renaming it back to .scala). I also would find needing to rename very irritating, it requires too much mouse and hand movement and GUI interaction, and it’s not always obvious when I’m “finished” experimenting, so who knows how many times I would need to do this.

For the scenario of boolean expressions, I would want to leave those in the file to serve as self-documenting unit tests and examples of how the function works. I think that is a great alternative to /** comments.

def max(a: Int, b: Int): Int = if a >= b then a else b

# max(3,4) == 4
# max(1,-1) == 1

I would strongly prefer this over a natural English comment for documentation. I hate seeing English generally and would rather see compile-verified examples, the code speaks for itself.

Currently Scala doesn’t allow top-level expressions, and I figured changing that would be too big of an ask.. It’s also possible that having # would make IDE integration like treating them as documentation for the functions on hovering easier? But sure, also a nice idea if it can be made to work for .scala

spamegg1 · December 24, 2025, 11:30am

That’s interesting; I think your approach to software engineering is quite different than most people, or I’m just ignorant… maybe that approach (leaving stuff like that in) is more popular in the Python world? I remember some Python libraries putting unit tests directly in triple-quoted doc strings. Although it seems more of an “interpreted language world” kind of thing.

I still think you can just use worksheets for what you are describing. Worksheets allow you to import stuff from your normal .scala files. So you work on your .scala file, with a separate worksheet “scratch” file open, and switch between with Ctrl+PageUp/PageDown. No need to rename your file extension. I do this all the time, I even add my Scratch.worksheet.sc to my .gitignore, and it has all the imports I need ready.

It’s true this still has some friction (switching between files) but I think it’s acceptable. You might be wanting a bit too much perfection But that’s just me, nothing wrong with wanting stuff I guess

mfcold · December 24, 2025, 11:53am

I should re-mention that in-source unit tests are in Rust and loved by many.

This is also something people setup to do in Haskell using ‘doctest’.

Real-time IDE checking of unit tests is also made popular in the js/ts ecosystem with the Wallaby extension.

It is less popular yes, but seems to be an emerging and exciting approach to testing.

spamegg1 · December 24, 2025, 12:06pm

Yep, I’m ignorant as predicted

OK I think I’m getting a better understanding of these things. What you want is more like “in-doc real-time unit test checking” type of thing. OK. That can probably be done in Metals, it could use the mechanisms of worksheets in the background. I also see the point of the # in that case.

Yeah it sounds like a fine idea! I get it now! You got my support. When you were talking about Lean, I got a completely different impression of the purpose and reasons. It might be better to explain it as “Rust-like unit tests”.

The issue would be more cultural; Java/Scala world’s way of doing things is very different. Not sure if there would be many people over on this side who would use that style of software engineering. The preferred way of doing things over here (separate test files / folders) seem to be exactly the stuff you hate I wouldn’t want to be in your shoes, it would make me very miserable Over time I just learned to accept whatever preferred way each language has, and don’t care too much beyond it.

mfcold · December 24, 2025, 1:54pm

I assume getting # to be allowed as top-level in source files would need to be a compiler change. Do you know if this can be made legal through a compiler plugin, or would it need to touch more fundamental parts? (in other words, am I going to need to fork the compiler and tinker with it’s internals to experiment with a proof of concept?)

The compiler doesn’t need to do much with the new keyword, other than type check it’s contents. The IDE is who would evaluate the expressions. I’m imagining it analogously as defining a function that is meant to be thrown away in the compilation stage.

# f(8) == 16
// basically syntax sugar for
def erase_me_after_type_checking =
  f(8) == 16

Though there’s probably a more serious way to do this.

Quafadas · December 24, 2025, 2:09pm

You may be interested in this;

Which I believe may carry you some way towards your desire…

spamegg1 · December 24, 2025, 3:08pm

Not sure but you might be right, recently with the advent of Scala-cli we got “using directives” that start with //>, I wonder how that was done? Maybe it could be done similarly? For example improvement: Support using directives in worksheets by tgodzik · Pull Request #22957 · scala/scala3 · GitHub

kai · December 25, 2025, 12:07am

NB: we have a Scala port of doctest, sbt-doctest

jducoeur · December 25, 2025, 1:12am

Just an alternative viewpoint: this approach is basically a non-sequitur in my usual world (relatively large-scale business applications), for the same reason traditional unit tests largely are – it’s too small-scale to be relevant.

Meaningful tests in that environment need to be at the module/application level, because business applications tend to be pipelines, where the problems mostly show up in mismatches in assumptions between the pieces, so you need to test at the large scale in order to catch most bugs. (Full details on how I test in that world, if anyone is curious – there’s a lot to it.)

None of which is to say this idea isn’t plausible, just that it’s not going to replace other approaches.

charpov · December 25, 2025, 1:43am

I feel the same way, even on smaller (but tricky) programs. My tests typically need quite a bit of setup: givens, contexts, threads, resources, randomly generated data, post-condition checking code, etc. Testing a single function on a known output is a small fractions of the tests. I wouldn’t benefit much from the proposal. (I’m already using worksheets for the smaller stuff.)

mfcold · December 25, 2025, 3:31am

This may not replace more externalized testing setups for more involved problems, but this is really useful for the mental workflow of iterating on a design and ensuring its correctness - think “defining a complicated regex match function”, or “defining formulas and arithmetic operations on physics vectors” or testing the final output or ensuring certain properties about the output on a small composition of 3 or 4 combinator pipelines. I personally would leave many of these in the source code to serve as documentation and persistent unit tests, but you don’t have to do that, it’s still beneficial as a temporary assistance for discovering the correct implementations.

The worksheets aren’t optimal for serious project use-cases. Repeatedly renaming files is an irritating effort, can cause temporary issues and distractions with the OS or source control (Windows often refuses to rename files in VSCode until the window is reloaded), and even keeping a dedicated scratch.worksheet.sc in your workspace that you momentarily switch to whenever you want to experiment is not pain-free: I want to see everything at once in the same file, watching what expressions evaluate to and which unit tests pass as I edit the implementation above (and I don’t want to cramp my monitors horizontal space by putting the files side-by-side, plus the irritating GUI interaction that would require to get right), and I also don’t want to have to write out imports… And I wouldn’t be able to commit these to source control as permanent unit tests and documentation…

None of these are catastrophically prohibitive and it is cool that we already have worksheets, but they are worse than what some other languages offer. And it seems fairly easy to close that gap. Of course I would like the languages I use professionally to feel as modern and premium as possible even if only a small upgrade. This would benefit some users workflow by a lot.

mfcold · December 25, 2025, 4:02am

That’s cool, but I also am still not a fan of these docstring testing approaches as they (as far as i know?) are materially just comments that don’t benefit from intellisense, auto-completion, type checking, and syntax highlighting. Also, no real-time printing.

Ichoran · December 25, 2025, 4:30am

Maybe part of the answer is just to use a different framework?

I had the same irritation and decided I’d write a testing framework that I was happy with, and stop when I stopped being irritated.

Recreating the entire directory structure for tests and non-tests was irritating. But mill already has module/src and module/test/src for that–at which point it’s just the source leaves which was no longer irritating for me. So switching from sbt to mill solved that enough so that even though it wasn’t what I would have thought was ideal, it was now okay.

Writing tests was also irritating. So I decided to see how much I needed to replace before I stopped being irritating. Like you, what I found most annoying was not having the little one-liners like you showed. So my one-liners now look like

T ~ f(8)  ==== 16
T ~ f(32) ==== 64
T ~ f(64) ==== 128 --: typed[Int]   // Insists the type matches

T ~ Seq(1, 2, 3).filter(_ % 2 == 0)   ==== Seq(2)
T ~ Array(1, 2, 3).filter(_ % 2 == 0) =**= Seq(2)  // Matches element-by-element

T ! """Seq(2).take("two")"""  // Test passes if code does not compile

Because this is Just Plain Scala, I don’t have to worry about handling things like code blocks or whatever in some special way with #-statements. T is the testing object; ~ binds tightly to the next expression as a lambda (use { ... } to get a block), ==== executes the test, capturing exceptions and so on, and the thing on the right is also captured as a by-name parameter and evaluated once lazily as needed.

And for me, that was enough.

I now find my tests to be easy to write and clear to read.

Because I use JUnit4 under the hood to do the test evaluation, there’s some annoying clutter still, but it doesn’t annoy me enough for me to put any effort into fixing it.

I’d love it if I could write in-companion-object tests too. I can think of a few methods that wouldn’t require very much help from the compiler.

But, honestly, I’m already pretty happy, after having had exactly the itch that you seem to be describing. So it’s not worth it for me, at least, to put more effort into it.

If you want to literally just use the test framework I built, it’s in kse3. But my advice isn’t really that–it’s to see if you can’t push on the problem yourself enough to make something you like, if you have the flexibility to choose how your projects are structured.

mfcold · December 25, 2025, 4:57am

That’s awesome work. I’ll still advocate for this language feature as an ideal solution (especially the real-time evaluation for experimenting, not testing), but I’m very glad a framework like yours exists. It solves much of what I hate about all the java ones.

noresttherein · December 25, 2025, 5:45am

I got here because of the comment about hating to scroll through scaladocs. IntelliJ has actions you can bind to key shortcuts for collapsing and expanding all docs in a file (as well as a single doc, but I believe it already has a default binding. This makes the argument irrelevant in the context of Idea, and it will be quite likely easier to create an IDE plugin instead of changing the compiler. A lot of friction you mention can be also greatly reduced through using key shortcuts. I don’t use a mouse at all at work, and rebound allcarret navigation keys directly under main fingers (think VI or WASD scheme), which I cannot recommend enough.

If this is not enough, this could largely be implemented as a macro, if you were willing to give up on a bit of syntax convenience.

Your idea is valid and interesting, but

it’s a bit niche,
So are other people’s ideas, and some of them will necessarily conflict with yours,
Scala is amazing at implementing all kinds of DSLs, which seems like good enough potential solution,
Good feature ideas are for rigid languages with design philosophy on making all code uniform and follow the creator’s vision, while Scala’s was ‚go at it and have fun’ and becoming surprised by creative use cases

Overall, I think it would be probably better as an IDE extension (if you want to see the evaluated values of everything - again, Idea does a lot of it by default (showing the evaluated values during debugging, and types on every line statically), and I found that every time I come witha feature suggestion, someone replies with a plugin that already does that or almost that. I certainly saw some implementing similar ideas, you might want to have a look. If not, then an optional library or SBT plugin - correct me if I am wrong, but an annotation can have any expression as an argument, and you may parse it during build. For example, I use something like this for docummentation purposes:

trait extensionClass[T](conversion: Any=>T) extends Annotation

@extensionClass(A.methodsOfA) class A //type inferred, shown in scaladocs

object A {
  implicit class methodsOfA(private val self: A) extends AnyVal {
  //extension methods
  }
}

I am willing to bet the argument may be a macro. Maybe you could look into it?

mfcold · December 25, 2025, 6:06am

I also don’t use a mouse, but even with my fast keyboard shortcuts in vscode the worksheet solution would still involve a lot of time wasted (in ways I can explain if anyone is skeptical). Nothing beats being able to go one line underneath a definition, write # ... and have immediate feedback.

Yes, it’s a newer and less known approach. Still, everyone should consider that major projects like Rust and Lean4 found it worthwhile enough to offer as a built-in feature. It is becoming less niche.

I would be fine with this, or as a DSL, but I imagine this would require constant boilerplate anytime I want to use it. The ifdef(“test”) snippets @Quafadas linked is a good case-in-point for what I mean. It’s not what a modern language experience looks like - it’s a painful, ugly workaround. Of course ideas for improving that are welcome on this thread if anyone has any.

tgodzik · December 29, 2025, 10:22am

As a quick note from me, doing something that was initially suggested is possible and I did play around with it:

Though I wasn’t sure a 100% about whether we should introduce it. And we would need to make efficient so no additional cost is added to when this is not used. The PoC is available in GitHub - tgodzik/metals at inline-evaluator

mfcold · December 30, 2025, 6:19pm

Very cool. Though the problem with doing this within comments is that you don’t have typo detection, auto completion, type checking, syntax highlighting, etc. I assume this implementation also suffers from that?

I would then ask, could we not simply make >>> work as a top level token, no // needed?

In-source evaluation of expressions (and in-source unit tests)

Pre-Proposal Idea: ‘#‘ as a new top-level keyword

Pre-Proposal Idea: ‘`#`‘ as a new top-level keyword