Pre SIP: Named tuples

Ichoran · December 9, 2023, 9:52pm

Tuples are specialized, but not to arbitrary arity. Try (1, 2).getClass!

tarsa · December 9, 2023, 10:21pm

To make things clear for observers (I’m sure Martin is fully aware of how tuples work, but this post is for people not knowing it), here are a few facts about tuples (Martin presented an advantage of tuples, so I’ll present disadvantages):

Tuple is a class with as many parameters as there are elements in tuple, so for each tuple length there is a separate class (up until about 22 after which there is TupleXXL which boxes elements into Array[Object]). Type parameters of a tuple are erased to Object, so e.g. (1, 'a', "xyz") is stored as Tuple3[Object, Object, Object], i.e. primitives are boxed. Since type parameters of tuples are erased and replaced with casts (at usage site), JVM has to add some guards for that (that adds a bit of overhead) and there is constant boxing and unboxing of primitives (or, more precisely, all value classes). That’s a lot of overhead. If it’s a major contributor to computational complexity then it pays off to e.g. use extra case class to avoid such boxing and casts. But as Martin said - each extra case class definition is extra work for JVM to load that extra class and do all sort of optimizations and other stuff.

JVM can do escape analysis for tuples, but it can do escape analysis for case classes too. Note that HotSpot is probably weak at escape analysis if you have non-trivial code e.g. you have collections of tuples as intermediary steps in collections processing. But that’s usually not a bottleneck anyway - big-O notation is more important than constant factor in computational complexity.

In Scala 2 yes (even if only to Tuple2), but are they in Scala 3? Long tuples (i.e. ones with many parameters, currently that means > 2) are not specialized and named tuples are most useful for long tuples.

Anyway, here’s code (note it’s Scala 2.13) that uses Tuple3 (which isn’t specialized):

object Main {
  def timeAndPrint[T](description: String)(action: => T): Unit = {
    val startTime = System.currentTimeMillis()
    val result = action
    val endTime = System.currentTimeMillis()
    println(s"$description: Total time ${endTime - startTime} ms. Result: $result")
  }

  def main(args: Array[String]): Unit = {
    val arraySize = 10_000_000
    for (_ <- 1 to 10) {
      timeAndPrint("Using tuples") {
        Array.tabulate(arraySize)(x => x)
          .map(x => (x, x / 2, x + 2))
          .map(x => x._1 + x._2 + x._3)
          .sum
      }
      timeAndPrint("Using case classes") {
        // use case class instead of a tuple
        case class IntermediateStep(a: Int, b: Int, c: Int)
        Array.tabulate(arraySize)(x => x)
          .map(x => IntermediateStep(x, x / 2, x + 2))
          .map(x => x.a + x.b + x.c)
          .sum
      }
    }
  }
}

Running it shows some wild variability. Probably refactoring that into JMH microbenchmark would allow to get reliable timings. Anyway, under Java 11 the timings are more stable than under Java 8 on my machine, so I’ve used Java 11 (by ‘Java’ I mean OpenJDK without GraalVM) with -Xms5g -Xmx5g -XX:+AlwaysPreTouch to hopefully get more stability and the results are:

Using tuples: Total time 1032 ms. Result: -723182784
Using case classes: Total time 508 ms. Result: -723182784
Using tuples: Total time 856 ms. Result: -723182784
Using case classes: Total time 295 ms. Result: -723182784
Using tuples: Total time 1317 ms. Result: -723182784
Using case classes: Total time 594 ms. Result: -723182784
Using tuples: Total time 841 ms. Result: -723182784
Using case classes: Total time 486 ms. Result: -723182784
Using tuples: Total time 863 ms. Result: -723182784
Using case classes: Total time 416 ms. Result: -723182784
Using tuples: Total time 883 ms. Result: -723182784
Using case classes: Total time 405 ms. Result: -723182784
Using tuples: Total time 858 ms. Result: -723182784
Using case classes: Total time 472 ms. Result: -723182784
Using tuples: Total time 881 ms. Result: -723182784
Using case classes: Total time 419 ms. Result: -723182784
Using tuples: Total time 830 ms. Result: -723182784
Using case classes: Total time 429 ms. Result: -723182784
Using tuples: Total time 838 ms. Result: -723182784
Using case classes: Total time 423 ms. Result: -723182784

As you see, after stabilization, case classes are about 2x faster than tuples in this particular scenario. Take it with a large pinch of salt as the results are still pretty unstable.

This post is only about the performance differences between tuples and case classes, i.e. mainly to disprove the intuitions about tuples’ supposed extraordinary performance. Otherwise it’s not meant to be a major factor in debate about merit of named tuples.

Quick summary of my post:

with case classes you pay extra cost once for each case class definition, but then potentially get improved performance for each case class instance
with tuples you don’t pay extra cost for each tuple type (as there is small finite number of them), but then potentially you pay performance penalty for each tuple instance you create

Overall, the performance difference has to be measured to get certainty on what’s the impact. In simple cases the escape analysis removes tuples’ overhead, but in general it’s not easy to predict when that optimization will kick in (especially if someone is not into JVM optimizations).

djx314 · December 11, 2023, 9:14pm

Great! Can not wait for

type Model[T] = (String, List[Model[T]])

Jasper-M · December 12, 2023, 10:33am

I don’t think that recursive type aliases are related to this SIP.

djx314 · December 13, 2023, 2:25pm

case class Model1(nickName: String, sub: List[Model1])
type Model2 = (String, List[Model2])

There will be this serious concern.

Ichoran · December 13, 2023, 10:11pm

I changed the type test encoding and restricted some of the allowable type manipulations to make things easier, and now have library-level named tuples up to arity 9 (no macros, but lots of inlining). Here’s a runnable example, if you use scala-cli:

//> using scala 3.4.0-RC1-bin-20231114-18ada51-NIGHTLY
//> using dep com.github.ichoran::kse3-eio:0.2.8
//> using mainClass the.Main

package the

import scala.language.experimental.relaxedExtensionImports
import kse.basics._
import kse.basics.labels._

// Label any type with a string constant: Int \ "eel"
// Label any value: 5 \ "eel" (now has type Int \ "eel")
// If you have a tuple of size 9 or less and every value is
//   lablelled, you can refer to elements by label using ~
// If labels are known, infer from unlabelled by using `label`
//   (.labelled for singletons)
// To pull out a subset of things by name, use `.pick("name1", "name2")`
// To update defaults according to any names that match,
//   use `.updatedBy`
// Labels aren't printed; it's just an opaque type around the tuple.

object Main:
  val default = (0 \ "eel", false \ "cod", 0 \ "salmon")
  var mutable = default
  mutable = (0, true, 5).label  // Infer labels on request
  val update = (true \ "cod", 3 \ "eel", 'm' \ "minnow")
  
  def main(args: Array[String]): Unit =
    println(default ~ "cod")
    println(mutable ~ "cod")
    println(update.pick("minnow", "eel"))
    println(default.updatedBy(update))
    println(mutable.updatedBy(update))

Ichoran · December 15, 2023, 1:46am

I’ve been playing around a bit with ad-hoc type naming and I think that this reveals that if we say that named tuples and method arguments are specific instances of a generalized capability, just like unnamed tuples have a clear isomorphism with function arguments, then there is only one right answer.

Consider a function f: ((A, B, C) => Unit). This is so naturally associated with g: (((A, B, C)) => Unit), the tupled version, that f.tupled gives exactly the latter–all it does is unpack the tuple into the arguments of f. And although it’s not built in, it’s hopefully very obvious how to perform g.untupled to get f back (i.e. pack the arguments, pass to g).

Now consider a method def foo[A, B, C](a: A, b: B, c: C): Unit. We have the same natural isomorphism between its argument list and a single argument list def bar[A, B, C](args: (a = A, b = B, c = C)): Unit as we did with f and f.tupled where we now have a named tuple to preserve the argument names.

But at this point, why do we even need to think about tuples? We can simply think about a function h: A => Unit and an equivalent (named) method, baz[A](a: A): Unit.

How can we use baz? Well, we can baz(myA). And we can baz(a = myA). But we can’t baz(b = myA) or baz(myB).

But how can we use h? Only h(myA). h(a = myA) doesn’t work (nor does h(b = myA), h(myB), etc.).

This tells us that if we are going to assign a type to a = myA, it has to be a supertype of A. Furthermore, if we say that in fact a: A is itself a named type, then we can get a perfect correspondence between method and function: h: (a: A) => Unit Now we can call h exactly the same way that we can call baz, assuming that a = myA is an upcast from type A to type a: A.

So, by introducing the idea of named types, where a named type is a supertype of the type it names, we heal one of the irregularities in Scala, which is that methods have named parameters and functions can’t.

Functions of named types do have named parameters!

One could imagine different ways to put names in the hierarchy, but given how named arguments already work, we already made the decision.

So, if we at least conceptually buy the idea of named types, then a tuple (a: A, b: B, c: C) is literally just an ordinary tuple, except the types inside are all named. Since tuples are covariant, by ordinary tuple rules, (A, B, C) is a subtype of (a: A, b: B, c: C)–A is a subtype of a: A, etc…

So if we buy the unification of function and method arguments via named types, then the only consistent choice for tuples is that they are the subtype of any named tuple.

If we don’t like the idea of named types (or for some odd reason love that it’s impossible to give functions argument names), then we can pick whatever we want.

So at this point my mind is conditionally made up. If Scala implements named types as a feature, then they have to be supertypes of the corresponding unnamed types, and then of course named tuples are supertypes of regular tuples (and also Martin’s first encoding of named tuples is the one to use). Otherwise, as far as I can tell, we’re free to ad-hoc choose whatever.

jeremyrsmith · December 15, 2023, 2:12am

Thank you for specifying this more explicitly. I like the idea of named types, and, like you (?), I think it naturally leads to named <:< unnamed (edit: Oops, it looks like we drew the opposite conclusions. Anyway…)

I wanted to say a lot of the things you said, but these threads get really long and have so many opinions, that it ends up with nobody’s points (from the larger community, at least) really being seriously considered.

jeremyrsmith · December 15, 2023, 2:36am

The argument here comes from the notion that a function over named tuples would be substitutable for a function over the same types (but unnamed). And the use case example comes from (I think?) the assumption that one could tuple-ize a function in a way that would yield a named-tuple-accepting function.

In that case, naming is in a contravariant position, so you must arrive at the conclusion that a: A is a supertype of A.

But I don’t agree with that premise. It’s pretty intuitive that a T named foo is more specific than any old T, and if we’re drawing the opposite conclusion then we’ve made a wrong turn.

Let’s imagine all the variance cases:

Should (foo: A, bar: B) be substitutable for (A, B)? (I’d say yes)
Should (A, B) be substitutable for (foo: A, bar: B)? (I’d say no – e.g. in the case of (base: Double, exponent: Double), this would defeat the purpose)
Should ((foo: A, bar: B)) => C be substitutable for ((A, B) => C)? Again, I’d say no – for the same reason as (2). You shouldn’t be able to pass unnamed tuples where names are expected; it defeats the purpose.
Should ((A, B)) => C be substitutable for ((foo: A, bar: B)) => C? Yes, I think so – that function doesn’t care about names, so it can handle values that have them by simply ignoring them.
Should C => (foo: A, bar: B) be substitutable for C => (A, B)? I’d say yes – the caller of that function doesn’t care about names in the result, so they can simply discard them.
Should C => (A, B) be substitutable for C => (foo: A, bar: B)? Again, I’d say no. The caller cares about the naming, or they wouldn’t have specified the type that way. Allowing unnamed results makes naming pointless.

jeremyrsmith · December 15, 2023, 2:45am

The point I want to emphasize (and I wanted to say this earlier in response to Martin but settled for a joke because I don’t expect to get anywhere) is that the arguments for named >: unnamed seem to come down to “saving people some keystrokes, because it’s annoying to specify names”. I’d argue that people who feel that way can avoid names. That’s not a good reason to reverse the intuitive type relationship.

Edit: And, some thought and discussion around “what does (or could) naming itself mean for the type system” (as opposed to just tacking on named tuples in isolation) didn’t happen. So I appreciate that @Ichoran did start discussing that.

bishabosha · December 15, 2023, 9:00am

This is already solved in Scala 3 with dependent function types:

val foo: (x: Int) => x.type = (x: Int) => x
foo(x = 23)

but true, the dependent type is not inferred automatically

Ichoran · December 15, 2023, 7:44pm

Huh, somehow I missed that!

So it seems that we’ve already made the decision, and it is that named tuples are simply tuples. Despite technically being different,

val f: (x: Int, y: Int) => Int = _ + _
val g: (Int, Int) => Int = _ + _

have the same type and can substitute anywhere for each other, with no restrictions as far as I can see. Since for function arguments, named =:= unnamed, it should be true for tuples also.

This is pretty disappointing, honestly, because it makes named tuples a lot less useful; they’re informational only, since anything can conform to anything else. But otherwise it creates a pretty annoying inconsistency.

Ichoran · December 15, 2023, 8:14pm

Yes, I agree that this is my claim, but it’s a little more that what you stated: it is that we already made that choice as a consequence of three things (1) named method arguments have optional names, and (2) there is an injection of methods into functions that could and should be made bijective, and (3) there is a bijection between function parameters and tuples. (The italicized part is arguable, but very natural.)

So when you ask your questions, I think the answers also have to explain where we want to subvert the parallels / bijections we already have.

For example, if you say

then you say "because named tuples ARE NOT bijections with function arguments, because def powM(base: Double, exponent: Double) very much can be called with pow(3.0, 1.5) and we want it to be this way; but powF: (base = Double, exponent = Double) => Double cannot be called with powF((3.0, 5.0)) and we want this to be that way too.

If we’re going to say so, then I think we need to be very clear about the generalizations that we are and are not supporting.

We could have two different ideas of named types: name-optional and name-required. Then requirednamed <:< unnamed <:< optionalnamed.

We could write required names as name = Type, but optional names as name: Type, for instance.

But, again, I don’t think it’s useful to answer intuitive questions without forcing one’s intuition to explore all the consequences. This is a good way to intuit yourself into trouble and inconsistency, because intuition is a usual-case specialist, giving quick answers that work right most of the time.

Unfortunately, in programming, we are constantly exposed to the unusual cases too, and large systems can fail to work if they come out wrong, so we’d better force ourselves to think those through, too.

Meanwhile, I’m reasonably content with my (Double \ "base", Double \ "exponent") as a “named is unrelated to unnamed” named type / named tuple system. It’s not perfect. Even reduced to a single symbol, it feels a bit heavy. It doesn’t solve any lack of parallels that exist. But neither does it introduce any lack of parallels that previously weren’t there, and you can’t beat the safety if that’s what you need.

For instance, just because you can discard the names in => (foo: A, bar: B) it doesn’t follow that you should without explicitly declaring your indifference to the careful curation of what things were called. It’s less likely to be an issue than if it’s an input, but it isn’t necessarily safe.

tarsa · December 16, 2023, 10:19am

why? (instanceOfModel1: Model1).toTuple will result in a value of type:

type Model3 = (String, List[Model1])

i.e. there won’t be any recursive type conversion.

Sporarum · December 16, 2023, 11:49am

What is the extractee type for the following ?

val (sum: Int, product: Int) = p

If we have named tuples, it’s not clear whether I’m labeling the elements, or imposing a constraint on their names (in addition to labeling them)

And of course Scala users are used to the labeling mindset already,

Note that if instead of named tuples we have named types, we can be explicit (if a bit repetitive):

val (a: (sum: Int), b: (product: Int)) = p

Sporarum · December 16, 2023, 12:11pm

I’ll also give my opinion with regards to Qualified Types (a not-even-experimental feature, i.e. take everything with a big grain of salt).

There were two ideas relevant here:

The name in the predicate does not matter: (x: Int with x > 0) =:= (y: Int with y > 0)
We can reuse an identifier already present

So for example we can write:

def posOnly(x: Int with x > 0) = ???

posOnly( 4: (y: Int with y > 0) )

But as you can see, this clashes with the behavior of named tuples:

(a: Int with a > 0) =:= (x: Int with x > 0)

(a: Int, b: Int) !=:= (x: Int, y: Int)

(a: Int with a > 0, b: Int with b > 0) ?=:= (x: Int with x > 0, y: Int with y > 0)

This is not exclusive to qualified types of course, but would be much more widespread than:

(a: Int, b: Int) => a.type  =:=  (x: Int, y: Int) => x.type

((a: Int, b: Int)) => Int !=:=  ((x: Int, y: Int)) => Int

((a: Int, b: Int)) => a.type  ?=:=  ((x: Int, y: Int)) => x.type

Sporarum · December 16, 2023, 12:28pm

Overall, the best choice might be to have a different syntax for labelled types and named types:

Something like:

(x: Int) =:= (y: Int)

(x :: Int) !=:= (y :: Int)

(Probably not :: as it would be ambiguous with List’s ::)

This solves all the issues I raised above:

val (sum: Int, product: Int) = p // expected: (Int, Int) and binds sum, product
val (sum :: Int, product :: Int) = p // expected: (sum :: Int, product :: Int) and binds sum, product

val (a: sum :: Int, b: product :: Int) = p // expected: (sum :: Int, product :: Int)  and binds a, b


(x: Int with x > 0) =:= (y: Int with y > 0)
(a :: (x: Int with x > 0)) !=:= (b :: (x: Int with x > 0))

(a :: Int with a > 0) !=:= (b :: Int with b > 0) // reuse identifier

(a: Int, b: Int) => a.type  =:=  (x: Int, y: Int) => x.type
(a :: Int, b :: Int) => a.type  !=:=  (x :: Int, y :: Int) => x.type

// and of course:
(a: Int, b: Int) =:= (x: Int, y: Int)
(a :: Int, b :: Int) !=:= (x :: Int, y :: Int)

I don’t think we can we ignore that sometimes you want to refer to an element of a type, and sometimes you want to name a type to make it distinct from different names

Ichoran · December 16, 2023, 8:28pm

Yeah, it’s looking like this might be necessary, and I called the latter one (x = Int) in my last post above, because that’s kind of the partly-there syntax already.

Sporarum · December 17, 2023, 1:22pm

I’d expect x = Int to bind the type Int to x, whereas we want to bind elements of the type Int to x, an example:

type x = Int
val b: x.type = ??? //error: Not Found: x

val x: Int = ???
val a: x.type = ??? // works !

type MyType = (x: Int) => x.type // refers to element of type Int

Ichoran · December 17, 2023, 5:19pm

I would expect that when the word type is in front of it, certainly. Otherwise, it seems to me as good a candidate as any for a general-purpose naming operator.