What is the purpose of toString in Scala?

So like many others, before me I’ve created my own Show type class. Which gives me the freedom to design its methods and conventions as I see fit, but it feels like good practice that the toString method should respect the existing conventions. As much as possible users of one’s classes and traits should not be surprised by the implementations of standard methods. But what are the convention’s for toString?

The Docs just state:

Returns a string representation of the object.
The default representation is platform dependent.

Is toString meant to uniquely identify an object? 3.0 for example identifies this as not an Int, but it could be a Float or a Double. At least given that one knows the type, is toString meant to fully define an object regardless of its character length? Some objects can easily require thousands or tens of thousands of characters for their specification. Is toString "allowed to be multi-line? Is it OK to contain blank lines?

Is HTML allowed within toString. Is the full Unicode character set permissible within toString implementations?

Or is toString just what it is? A rough and ready, poorly specified method that is extremely useful for development but not to be relied upon for production code. Either way I think it would be useful if the Docs could give more guidance.

AFAIK, scala.Any.toString is an alias for java.lang.Object.toString, which is documented thus:

Returns a string representation of the object. In general, the toString method returns a string that “textually represents” this object. The result should be a concise but informative representation that is easy for a person to read. It is recommended that all subclasses override this method.

The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:

getClass().getName() + '@' + Integer.toHexString(hashCode())

Yes, not very specific. But if we wanted that description to change (after being like this for many years), we would have to lobby the Java community for that.

Apart from development and testing, toString is sometimes used to construct messages for exceptions, which sometimes may be exposed to the end user. You probably want it concise and easily readable, no blank lines, no HTML and no non-ASCII characters unless they are part of a String or similar text data. Definitely not a full specification.

Arguably toString is not underspecified but overspecified by any implementation.

Related discussion

1 Like

Side note: this is a good example of a question someone learning scala might have to which the answer is “first know a beginner level of Java, now given that answer:…”.

I think a really good goal would be to document everytime this comes up and try to write explanations that don’t lean on Java. While I know some parts of the JDK are basically lifted into scala (see implementations in scalajs and scala native), I still think it would be nice to have fully self contained answers to these questions (especially answering what Any and AnyRef are and what methods they have without invoking Java).

7 Likes

Thanks for the link. I notice the last post mentions Java 8 reflection for method. Its strikes me that not only should we not have to use reflection to get such basic data such as the strings for type names, field names etc, but we shouldn’t have to wait for run time at all. This along with a string for source code position would be my priority candidates for additions to the Standard Library. Whether they are provided by the compiler itself or standard library macros doesn’t matter.

Whether you wish to use toString, create your own inherited toStr or str methods for your own objects or create your own Show type class system i would suggest that perhaps the fundamental error underlying that discussion is the assumption that our stringification (is that a word?) needs can be satisfied by a single (parameter-less) property. Surely it is reasonable to be able to display case class both with and without method names.

I also feel it is pretty vital to be able to control decimal precision, control decimal precision in the string output of compound classes and control the depth to which the compound class is displayed.

1 Like

Triggered!

My conclusion after doing Java and Scala forever is that toString must be explicitly and only for debugging and diagnostics (same for cats.Show). I intentionally make them unsuitable for end users and also unsuitable for serialization, so I can change them whenever I want without the danger that someone out there is depending on specific behavior. I learned this the hard way.

If the intent is that there is a two-way relationship between values and their associated strings then this must be made explicit. For example if you have a Prism[String,A] you are making a very precise statement that the elements in A map 1:1 with a subset of strings. This is not the case for things like Int.toString and String.toInt … that relationship is actually quite a bit more complex because every Int has many string representations. I did a tiresome talk about this general idea at Scala Exchange a few years ago.

In any case I think the doc for toString should be something like “An arbitrary value for debugging purposes, with no particular meaning, never to be used for anything that matters”.

14 Likes

I second that with all my soul. I also add that .toString is an horror to look-up in code - it’s not even there half the time! (yes, that’s said on a comic tone, but like Rob, the fun is to hide all the scars from prod disintegration from that).

So yes, my wish-list is:

  • avoid toString. Make its use, implicit or explicit, a compilation error (appart if a compiler flag like --yes-I-really-want-to-be-bitten-in-prod-by-toString-and-make-it-a-lifetime-supported-API is enabled)

And my enforced law in apps:

  • use .debugString or something similarly clear for debug log/ops messages,
  • use explicit serialisation methods for (de)serialisation. With a dedicated, versionned API. Split away from business objects and their debugString (they rarelly have the same dev life-cycle)
4 Likes