Better number literals

NthPortal · June 3, 2018, 1:12am

I would like to propose that Scala add support for underscores in number literals.

Since Java 7, Java has supported underscores in number literals. These can enhance readability as follows:

|  Welcome to JShell -- Version 10.0.1
|  For an introduction type: /help intro

jshell> 1_000_000_000
$1 ==> 1000000000

I think most people would agree that adding underscores drastically improves readability, to the point where with the underscores, you can tell that the number above is 1 billion at a glance, but without them, you have to tediously count the zeros.

Additionally, underscores are supported for floating point literals, as well as for hex and binary literals.

Other languages which support underscores in number literals include Python (since 3.6), Rust, and several others.

som-snytt · June 3, 2018, 5:09am

I had to tediously count the underscores. But frankly, Scala cannot have too many uses for underscore. This just underscores my point. My Java friend added underscore support to his property file reading framework. Personally, by the time I need a separator character, I’d prefer either scientific notation (which requires that I don both my scientist hat and my notary hat) or different units.

dragos · June 4, 2018, 10:34am

I’m in favor of this. Scala has picked up Java syntax for almost all literals and generally follows Java when it comes to primitives. As such, I would find it more surprising that it doesn’t support it than the other way around.

I don’t see compatibility issues with existing code bases, nor it introducing ambiguities in the grammar. It’s in the same ballpark as trailing commas: it’s nice if you need it, it’s harmless if you don’t.

One thing should be said about existing tools that parse Scala (scalafmt&friends, scalariform, Ecilpse/IntelliJ/VSCode) that’ll need to update. Perhaps something for Scala 3.

NthPortal · June 4, 2018, 11:47pm

Well, one third as tedious as counting zeros at least

Naetmul · June 5, 2018, 4:54am

Scientific notation and units are not for this usage.
More importantly, scientific notation is used for floating-point type, so it loses precision.

You mean you want to represent 1234574812938 as 1.234574812938E12 or 1.234574812938T /* tera */?
It doesn’t make sense at all!
The purpose of digit group separators is not like this.

Compre the four:

val a = 1234574812938L
val b = (1.234574812938E12).toLong // Possible losing precision
val c = (1.234574812938T).toLong // Possible losing precision
val d = 1_234_574_812_938L

Naetmul · June 5, 2018, 6:29am

Integer literals are usually of type Int , or of type Long when followed by a L or l suffix.

from Scala Language Specification

OK. 1.234574812938T should have been 1.234574812938.T.

RichType · June 5, 2018, 10:42am

This seems totally unnecessary to me. What about:

//Int extension methods
def million: Int = thisInt *             1000000
def billion: Long = thisInt.toLong *     1000000000l
def trillion: Long = thisInt.toLong *    1000000000000l
def quadrillion: Long = thisInt.toLong * 1000000000000000l
//Long extension method
def million: Long = thisLong * 1000000l
def billion: Long = thisLong * 1000000000l

And then for any non-simple, many digit numbers just compose using the above, Int literals and + or -.

arthurp · June 5, 2018, 2:20pm

This works for round numbers (10 million) but fails for arbitrary numbers like 1000_0531.

curoli · June 5, 2018, 4:04pm

Hello,

I’m not sure how often people have non-round integer literals with more then three digits in their code, but if you really need something:

Welcome to Scala 2.12.4 (OpenJDK 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.

implicit class SuperLong(l: Long) { def m: Long = 1000Ll; def m(l2: Long) = 1000Ll + l2 }
defined class SuperLong

import scala.language.postfixOps
import scala.language.postfixOps

123 m
res0: Long = 123000

123 m 456 m 789
res1: Long = 123456789

Best, Oliver

RichType · June 5, 2018, 5:22pm

val specialNum = 10.million + 531

AMatveev · June 5, 2018, 6:31pm

Just for fun
:))
if bigdecimal literal exists( Bigdecimal literal )
there will be easy way to get digit count.

static int integerDigits(BigDecimal n) {
    n = n.stripTrailingZeros();
    return n.precision() - n.scale();
}

So we can easyly make “||” operator:

def || (b:BigDecimal):Bigdecimal = {
   v*integerDigits(b)+b 
}

It will works for arbitrary numbers like 1000b || 0531b

:))
Seriously. I don’t think there is something better than string interpolation.
Unfortunately it doesn’t work with pattern matching.

NthPortal · June 6, 2018, 2:36am

I would like to note that a lot of the suggestions here:

are no longer constants, which is not great
only work for round numbers, and not well for arbitrary constants
don’t work well in other bases

These other suggestions don’t address the fact that underscores can also make a binary or hex literal more readable.

For example (I’m going to draw heavily from Python’s syntax and examples here):

val magicNumber = 0xCAFE_BABE

val flags = 0b_0011_1111_0100_1110

NthPortal · June 6, 2018, 2:39am

but what if you need 10_200_531 instead? 10.million + 200.thousand + 531? This can become quite verbose.

nafg · June 6, 2018, 3:01am

I think the main objection to a compile-time string interpolator was macros’ future. Maybe we could solve that while avoiding a language change by adding the string interpolator(s) to the standard library. We already have the f interpolator.

Incidentally, how does Dotty use the standard library when it has macros?

smarter · June 6, 2018, 3:24am

If you try to use any Scala 2 macro like the f"" interpolator in Dotty, you’ll get a MethodNotFoundException at runtime (we should probably detect this at compile time).

odersky · June 7, 2018, 4:18pm

Lots of good suggestions here.

I am completely in favor of allowing _ in numeric literals. If someone wants to open an issue laying down the desired syntax, we can act on this quickly.

Regarding string interpolators: I am not sure why they need to be whitebox macros. The type of a string interpolator such as f could well be (Any*)String. It is then the job of the interpolator to refuse any argument that does not conform to its format string. So f is still a macro, but not a whitebox macro.

Concerning BigDecimal literals themselves, I think only extensible solutions are worth considering. Don’t stop at BigDecimals. Can we have a scheme where we can have arbitrary user-defined types that allow some form of numeric literal? One way to do it is to say that a numeric literal N is of type T if

T is the expected type at the point where the literal is defined
There is an instance of typeclass Literally[T] defined for T where

trait Literally[A] {
  def fromString(s: String): A
}

So, the following would both work, assuming BigDecimal and BigInt define Literally instances that
convert strings of digits to the type itself.

val x: BigDecimal = 1234567890.9
(123: BigInt)

They would expand to

val x: BigDecimal = BigDecimal.literally.fromString("1234567890.9")
(BigInt.literally.fromString("123"): BigInt)

This assumes that BigDecimal has a companion object like

object BigDecimal {
  implicit def literally: Literally[BigDecimal] = new {
    def fromString(digits: String) = new BigDecimal(digits)
  }
}

We can improve on this solution by making fromString a macro that checks the digits string
for syntactic validity.

One issue is how to define a syntax for general numeric literals N without mentioning the result type. One candidate syntax would be:

a digit 0-9
followed by a sequence of digits or letters,
which can also contain one or more '_'s, if followed by a digit or letter,
which can also contain one or more '.'s if followed by a digit.

This would allow for many different ways to write numbers (even IP addresses would be covered!). It does not cover floating point numbers with exponents, but I am not sure these are worth generalizing.

smarter · June 7, 2018, 4:46pm

I like this proposal, but having to rely on the expected type makes for verbose expressions, e.g:

(1232432432: BigDecimal) + (3432432: BigDecimal)

I would rather write:

1232432432bd + 3432432bd

This would also be consistent with the fact that today I can write 3432432l. We could allow this by rewriting:

123foo

as:

NumericalContext("123").foo

That would nicely mirror the string interpolator syntax.

AMatveev · June 7, 2018, 5:02pm

It looks very useful.
It will be great if it works with pattern matching.
If NumericalContext provide unapply method:

n match {
   case 1232432432bd => println("great")
}

AMatveev · June 7, 2018, 6:33pm

If I understand correctly
https://scala-lang.org/files/archive/spec/2.12/13-syntax-summary.html
In the code:

    1 match {
      case -1 => println("-1")
      case _ => println("_")
    }

The '-'s is part of literal.

So may be, we can use the '+'s or '-'s to define type aliace:

type Bd = BigDecimal
val a = Bd+1234567890.9
val b = Bd-1234567890.9
a + b match {
  case Bd+0 => 
  case _ => 
}

It will be work with any letters:
For example

val date = Dd+1d.2m.2004y_00h.5min.24sec

SethTisue · June 8, 2018, 3:46am

me too.

Seth (SIP committee member)