Case Class toString new behavior proposal (with implementation)

metasim · July 16, 2018, 12:34pm

Another option is to provide an asMap[String, Any], so as to avoid the .zip(productIterator). Either way, if case classes provided something more or less like productFields this discussion would probably be moot. People can write whatever mix-in they want to get toFancyString, etc.

joshlemer · July 16, 2018, 12:41pm

@dwijnand I think I will have a go at implementing a @classicToString annotation or a ClassicToString trait which reverts to the old behavior, in the PR.

Keep in mind, it is already easy enough to implement the classic toString as an override:

import scala.runtime.ScalaRunTime._toString
trait ClassicToString { this: Product => 
  def toString = _toString(this)
}
case class A(i: Int, b: Int) extends ClassicToString

jducoeur · July 16, 2018, 2:12pm

It’s all anecdotal guesswork, but I suspect you’re wrong. It’s not all that unusual to have tests that compare a bunch of printouts to an expected output. (IIUC, Dotty’s own test suite does some of this.)

I’m sure it’s less common that comparing values using equals, yes. But I’d still be kind of surprised if this didn’t break hundreds, maybe thousands, of projects in a way that couldn’t be automatically migrated using scalafix. That’s a serious price…

kevinwright · July 16, 2018, 8:45pm

Ultimately, I have to agree with the voices here that state changing the established behaviour of toString would by an entirely unacceptable breaking change.

We all want better string representations of case classes, but toString most certainly isn’t the way to do it. Scala isn’t Kotlin, we don’t need to try and shoehorn such behaviour into a special-case one-size-fits-all solution like Kotlin does.

We have typeclasses.

With Dotty/Scala 3 the situation is going to be better still, with typeclasses and some degree of generic derivation baked into the heart of the language - this is the right place to target. We could easily have a range of typeclasses supporting existing behaviour, quoted strings, exposed parameter names, multi-line trees (like pprint), graphviz .dot syntax, etc, etc

There’s no good reason to force just one solution on everyone, and even less reason to break the existing approach that people will be assuming in their tests.

Ichoran · July 16, 2018, 8:52pm

Just in my own code I know of a bunch of places where I don’t override toString because it does what I want, and there are probably a bunch more I’ve never thought about. Some of this involves unit tests against hard-coded strings, others don’t.

We don’t get to redo stuff like this any longer. If you want an easy way to get alternate behavior, it’s worth thinking about, but changing long-standing conventions isn’t.

(Also–I prefer the existing behavior regardless, because it’s externally usable if you pick the name carefully; the debug version is pretty useless for everything except for debugging, and even then it’s unhelpful unless you’ve forgotten what your case class is. But that’s irrelevant; the point is that this is a major breaking change.)

(Also, the Rust comparison isn’t fair, because struct initializers require the field name, while Scala case classes do not. So the stringification matches the usage.)

dwijnand · July 17, 2018, 9:15am

Why I might not have your experience, I already know how true that is. People relying on the iteration order of Sets, comes to mind, because they unit tested with sets up to size 4. But I’m not sure if that’s enough reason not to do it.

“conventions”? It’s just the current implementation, and I don’t think it can no longer be redone.

lihaoyi · July 17, 2018, 10:34am

If we add productFields to the case class instances, I could implement this as part of https://github.com/lihaoyi/PPrint without needing to change the built-in toString method

olafurpg · July 17, 2018, 12:14pm

It turns out the compiler already has commented-out code to generate a method productElementName (same as productFields) that would allow you to implement this functionality as a library. The feature was disabled out of concern for bytecode bloat. I took a stab at reviving the commented-out code in this commit here https://github.com/scala/scala/commit/8430f69f72d10e437563f3e14ed329bb246f317c I added a unit test to show how it works and ran :javap -c to measure the bytecode size. I’m not experienced in reading javap -c output so I would some appreciate help there.

If the bytecode overhead is acceptable, I think it’s worth discussing whether to re-enable productElementName since it makes it possible pretty-print case classes in a far more readable way without breaking existing code that relies on toString.

olafurpg · July 17, 2018, 12:54pm

After more consideration, I am concerned productElementName doesn’t go far enough for those who want to change toString and it already goes too far for those who don’t want this feature. More bloated bytecode for every case class definition would be a regression in some code-bases.

I believe a more interesting discussion is the general problem: how can we make case-classes more flexible? Currently, all case class must have the same encoding but sometimes you prefer a custom combination of hashCode, equals, toString, copy, unapply. It’s a tough problem with no obvious answer.

adriaanm · July 17, 2018, 1:27pm

Intriguing After playing around with javap a bit, I think it’s about 56 + 10*fieldCount

Breakdown for each field:

3 bytes for the string info (the actual characters are already in the constant pool)
4 bytes for the tableswitch entry
2 bytes for the ldc to load the string
1 byte for areturn

Fixed cost: 56 (??) bytes for the productElementName method. Not sure about this one, ran out of time before I could run a linear regression

lihaoyi · July 18, 2018, 7:29am

It makes sense that the strings are already in the constant pool; they’re used for debug information, and you can already get access to them via the Paranamer library (that’s what e.g. jackson-module-scala does)

In that case the additional bloat from making them available should be trivial, and we should just do it

olafurpg · July 18, 2018, 11:39am

Thank you for the analysis @adriaanm! The estimate would need to be validated on the actual produced jars, but I think if this estimate is correct then the change is absolutely worth it. If anyone is interested in picking this up, feel free to build on top of the changes in this PR here https://github.com/scala/scala/pull/6951

jducoeur · July 18, 2018, 2:16pm

Is this true of linked Scala.js? I honestly have no idea, but it’s a much bigger deal over there. Bloat in the JVM is kind of “meh”; bloat in JS is deadly critical, and it’s not obvious to me that the strings are typically retained in fullOptJS. @sjrd?

jeantil · July 18, 2018, 3:06pm

from the discussion so far I understand that toString will not be changed but that an alternative representation may be introduced.
If that is the case I humbly suggest to actually quote the strings in the output.
Here is an example to illustrate what I mean :

case class Point(x:Int,y:Int)
Point(1,1).toString

Currently yields the string Point(1,1) which you can conveniently copy and paste back into code especially when working from the repl or writing tests. This doesn’t change with the addition of field names.
Now consider the following :

case class Person(firstName:String,lastName:String)
Person("martin","odersky").toString

yields Person(martin, ordersky) notice how the quotes are lost ? this means you can no longer copy/paste this back into code without some more editing. I wish it would output Person("martin", "odersky") instead.

I stopped counting the number of students that were surprised by that behaviour when teaching the Lightbend’s Fast Track to Scala/Scala Language for Professionals class.
I understand that it simply delegates to the toString call on java.lang.string which removes the quotes but I still would like the behaviour changed (while still honoring null properly) it would also make empty strings stand out better. This would probably also break a lot of toString usasge but it may make more sense if a new string serialization mechanisms is introduced in the language/standard library.

sjrd · July 18, 2018, 3:19pm

If no one calls productElementNames, then the Scala.js optimizer will remove that method, and the strings it contains with it. If someone does call that method, well then they’re using the strings, so they’ll be kept, but it can hardly be considered “bloat” when it’s actually used.

joshlemer · July 18, 2018, 4:50pm

@jeantil how would you treat the case Person("\"", "")?

jeantil · July 18, 2018, 7:27pm

@joshlemer The idea is for the string representation to be exactly the code typed to create the instance, I would there fore expect expect the string to contain the escape. If I were to implement it, I would probably look at how it’s done in https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html#escapeJava(java.lang.String) and start from there.

olafurpg · July 19, 2018, 5:20am

I recommend you check out pprint as it works pretty much as you’re describing http://www.lihaoyi.com/PPrint/

olafurpg · July 19, 2018, 5:33am

The PR https://github.com/scala/scala/pull/6951 adding productElementNames has been reviewed by Jason Zaugg from the compiler team with actionable feedback if someone wants to pick this up.

jeantil · July 19, 2018, 1:22pm

thanks for the pointer, since this thread was originally about changing the toString behaviour, I wish it would behave like pprint

but I will look into integrating pprint for my projects thought that won’t help with teaching scala classes.