Tuples are specialized, but not to arbitrary arity. Try (1, 2).getClass
!
To make things clear for observers (Iām sure Martin is fully aware of how tuples work, but this post is for people not knowing it), here are a few facts about tuples (Martin presented an advantage of tuples, so Iāll present disadvantages):
Tuple is a class with as many parameters as there are elements in tuple, so for each tuple length there is a separate class (up until about 22 after which there is TupleXXL which boxes elements into Array[Object]
). Type parameters of a tuple are erased to Object, so e.g. (1, 'a', "xyz")
is stored as Tuple3[Object, Object, Object]
, i.e. primitives are boxed. Since type parameters of tuples are erased and replaced with casts (at usage site), JVM has to add some guards for that (that adds a bit of overhead) and there is constant boxing and unboxing of primitives (or, more precisely, all value classes). Thatās a lot of overhead. If itās a major contributor to computational complexity then it pays off to e.g. use extra case class to avoid such boxing and casts. But as Martin said - each extra case class definition is extra work for JVM to load that extra class and do all sort of optimizations and other stuff.
JVM can do escape analysis for tuples, but it can do escape analysis for case classes too. Note that HotSpot is probably weak at escape analysis if you have non-trivial code e.g. you have collections of tuples as intermediary steps in collections processing. But thatās usually not a bottleneck anyway - big-O notation is more important than constant factor in computational complexity.
In Scala 2 yes (even if only to Tuple2), but are they in Scala 3? Long tuples (i.e. ones with many parameters, currently that means > 2) are not specialized and named tuples are most useful for long tuples.
Anyway, hereās code (note itās Scala 2.13) that uses Tuple3 (which isnāt specialized):
object Main {
def timeAndPrint[T](description: String)(action: => T): Unit = {
val startTime = System.currentTimeMillis()
val result = action
val endTime = System.currentTimeMillis()
println(s"$description: Total time ${endTime - startTime} ms. Result: $result")
}
def main(args: Array[String]): Unit = {
val arraySize = 10_000_000
for (_ <- 1 to 10) {
timeAndPrint("Using tuples") {
Array.tabulate(arraySize)(x => x)
.map(x => (x, x / 2, x + 2))
.map(x => x._1 + x._2 + x._3)
.sum
}
timeAndPrint("Using case classes") {
// use case class instead of a tuple
case class IntermediateStep(a: Int, b: Int, c: Int)
Array.tabulate(arraySize)(x => x)
.map(x => IntermediateStep(x, x / 2, x + 2))
.map(x => x.a + x.b + x.c)
.sum
}
}
}
}
Running it shows some wild variability. Probably refactoring that into JMH microbenchmark would allow to get reliable timings. Anyway, under Java 11 the timings are more stable than under Java 8 on my machine, so Iāve used Java 11 (by āJavaā I mean OpenJDK without GraalVM) with -Xms5g -Xmx5g -XX:+AlwaysPreTouch
to hopefully get more stability and the results are:
Using tuples: Total time 1032 ms. Result: -723182784
Using case classes: Total time 508 ms. Result: -723182784
Using tuples: Total time 856 ms. Result: -723182784
Using case classes: Total time 295 ms. Result: -723182784
Using tuples: Total time 1317 ms. Result: -723182784
Using case classes: Total time 594 ms. Result: -723182784
Using tuples: Total time 841 ms. Result: -723182784
Using case classes: Total time 486 ms. Result: -723182784
Using tuples: Total time 863 ms. Result: -723182784
Using case classes: Total time 416 ms. Result: -723182784
Using tuples: Total time 883 ms. Result: -723182784
Using case classes: Total time 405 ms. Result: -723182784
Using tuples: Total time 858 ms. Result: -723182784
Using case classes: Total time 472 ms. Result: -723182784
Using tuples: Total time 881 ms. Result: -723182784
Using case classes: Total time 419 ms. Result: -723182784
Using tuples: Total time 830 ms. Result: -723182784
Using case classes: Total time 429 ms. Result: -723182784
Using tuples: Total time 838 ms. Result: -723182784
Using case classes: Total time 423 ms. Result: -723182784
As you see, after stabilization, case classes are about 2x faster than tuples in this particular scenario. Take it with a large pinch of salt as the results are still pretty unstable.
This post is only about the performance differences between tuples and case classes, i.e. mainly to disprove the intuitions about tuplesā supposed extraordinary performance. Otherwise itās not meant to be a major factor in debate about merit of named tuples.
Quick summary of my post:
- with case classes you pay extra cost once for each case class definition, but then potentially get improved performance for each case class instance
- with tuples you donāt pay extra cost for each tuple type (as there is small finite number of them), but then potentially you pay performance penalty for each tuple instance you create
Overall, the performance difference has to be measured to get certainty on whatās the impact. In simple cases the escape analysis removes tuplesā overhead, but in general itās not easy to predict when that optimization will kick in (especially if someone is not into JVM optimizations).
Great! Can not wait for
type Model[T] = (String, List[Model[T]])
I donāt think that recursive type aliases are related to this SIP.
case class Model1(nickName: String, sub: List[Model1])
type Model2 = (String, List[Model2])
There will be this serious concern.
I changed the type test encoding and restricted some of the allowable type manipulations to make things easier, and now have library-level named tuples up to arity 9 (no macros, but lots of inlining). Hereās a runnable example, if you use scala-cli:
//> using scala 3.4.0-RC1-bin-20231114-18ada51-NIGHTLY
//> using dep com.github.ichoran::kse3-eio:0.2.8
//> using mainClass the.Main
package the
import scala.language.experimental.relaxedExtensionImports
import kse.basics._
import kse.basics.labels._
// Label any type with a string constant: Int \ "eel"
// Label any value: 5 \ "eel" (now has type Int \ "eel")
// If you have a tuple of size 9 or less and every value is
// lablelled, you can refer to elements by label using ~
// If labels are known, infer from unlabelled by using `label`
// (.labelled for singletons)
// To pull out a subset of things by name, use `.pick("name1", "name2")`
// To update defaults according to any names that match,
// use `.updatedBy`
// Labels aren't printed; it's just an opaque type around the tuple.
object Main:
val default = (0 \ "eel", false \ "cod", 0 \ "salmon")
var mutable = default
mutable = (0, true, 5).label // Infer labels on request
val update = (true \ "cod", 3 \ "eel", 'm' \ "minnow")
def main(args: Array[String]): Unit =
println(default ~ "cod")
println(mutable ~ "cod")
println(update.pick("minnow", "eel"))
println(default.updatedBy(update))
println(mutable.updatedBy(update))
Iāve been playing around a bit with ad-hoc type naming and I think that this reveals that if we say that named tuples and method arguments are specific instances of a generalized capability, just like unnamed tuples have a clear isomorphism with function arguments, then there is only one right answer.
Consider a function f: ((A, B, C) => Unit)
. This is so naturally associated with g: (((A, B, C)) => Unit)
, the tupled version, that f.tupled
gives exactly the latterāall it does is unpack the tuple into the arguments of f
. And although itās not built in, itās hopefully very obvious how to perform g.untupled
to get f
back (i.e. pack the arguments, pass to g
).
Now consider a method def foo[A, B, C](a: A, b: B, c: C): Unit
. We have the same natural isomorphism between its argument list and a single argument list def bar[A, B, C](args: (a = A, b = B, c = C)): Unit
as we did with f
and f.tupled
where we now have a named tuple to preserve the argument names.
But at this point, why do we even need to think about tuples? We can simply think about a function h: A => Unit
and an equivalent (named) method, baz[A](a: A): Unit
.
How can we use baz
? Well, we can baz(myA)
. And we can baz(a = myA)
. But we canāt baz(b = myA)
or baz(myB)
.
But how can we use h
? Only h(myA)
. h(a = myA)
doesnāt work (nor does h(b = myA)
, h(myB)
, etc.).
This tells us that if we are going to assign a type to a = myA
, it has to be a supertype of A
. Furthermore, if we say that in fact a: A
is itself a named type, then we can get a perfect correspondence between method and function: h: (a: A) => Unit
Now we can call h
exactly the same way that we can call baz
, assuming that a = myA
is an upcast from type A
to type a: A
.
So, by introducing the idea of named types, where a named type is a supertype of the type it names, we heal one of the irregularities in Scala, which is that methods have named parameters and functions canāt.
Functions of named types do have named parameters!
One could imagine different ways to put names in the hierarchy, but given how named arguments already work, we already made the decision.
So, if we at least conceptually buy the idea of named types, then a tuple (a: A, b: B, c: C)
is literally just an ordinary tuple, except the types inside are all named. Since tuples are covariant, by ordinary tuple rules, (A, B, C)
is a subtype of (a: A, b: B, c: C)
āA
is a subtype of a: A
, etcā¦
So if we buy the unification of function and method arguments via named types, then the only consistent choice for tuples is that they are the subtype of any named tuple.
If we donāt like the idea of named types (or for some odd reason love that itās impossible to give functions argument names), then we can pick whatever we want.
So at this point my mind is conditionally made up. If Scala implements named types as a feature, then they have to be supertypes of the corresponding unnamed types, and then of course named tuples are supertypes of regular tuples (and also Martinās first encoding of named tuples is the one to use). Otherwise, as far as I can tell, weāre free to ad-hoc choose whatever.
Thank you for specifying this more explicitly. I like the idea of named types, and, like you (?), I think it naturally leads to named <:< unnamed (edit: Oops, it looks like we drew the opposite conclusions. Anywayā¦)
I wanted to say a lot of the things you said, but these threads get really long and have so many opinions, that it ends up with nobodyās points (from the larger community, at least) really being seriously considered.
The argument here comes from the notion that a function over named tuples would be substitutable for a function over the same types (but unnamed). And the use case example comes from (I think?) the assumption that one could tuple-ize a function in a way that would yield a named-tuple-accepting function.
In that case, naming is in a contravariant position, so you must arrive at the conclusion that a: A
is a supertype of A
.
But I donāt agree with that premise. Itās pretty intuitive that a T
named foo
is more specific than any old T
, and if weāre drawing the opposite conclusion then weāve made a wrong turn.
Letās imagine all the variance cases:
- Should
(foo: A, bar: B)
be substitutable for(A, B)
? (Iād say yes) - Should
(A, B)
be substitutable for(foo: A, bar: B)
? (Iād say no ā e.g. in the case of(base: Double, exponent: Double)
, this would defeat the purpose) - Should
((foo: A, bar: B)) => C
be substitutable for((A, B) => C)
? Again, Iād say no ā for the same reason as (2). You shouldnāt be able to pass unnamed tuples where names are expected; it defeats the purpose. - Should
((A, B)) => C
be substitutable for((foo: A, bar: B)) => C
? Yes, I think so ā that function doesnāt care about names, so it can handle values that have them by simply ignoring them. - Should
C => (foo: A, bar: B)
be substitutable forC => (A, B)
? Iād say yes ā the caller of that function doesnāt care about names in the result, so they can simply discard them. - Should
C => (A, B)
be substitutable forC => (foo: A, bar: B)
? Again, Iād say no. The caller cares about the naming, or they wouldnāt have specified the type that way. Allowing unnamed results makes naming pointless.
The point I want to emphasize (and I wanted to say this earlier in response to Martin but settled for a joke because I donāt expect to get anywhere) is that the arguments for named >: unnamed seem to come down to āsaving people some keystrokes, because itās annoying to specify namesā. Iād argue that people who feel that way can avoid names. Thatās not a good reason to reverse the intuitive type relationship.
Edit: And, some thought and discussion around āwhat does (or could) naming itself mean for the type systemā (as opposed to just tacking on named tuples in isolation) didnāt happen. So I appreciate that @Ichoran did start discussing that.
This is already solved in Scala 3 with dependent function types:
val foo: (x: Int) => x.type = (x: Int) => x
foo(x = 23)
but true, the dependent type is not inferred automatically
Huh, somehow I missed that!
So it seems that weāve already made the decision, and it is that named tuples are simply tuples. Despite technically being different,
val f: (x: Int, y: Int) => Int = _ + _
val g: (Int, Int) => Int = _ + _
have the same type and can substitute anywhere for each other, with no restrictions as far as I can see. Since for function arguments, named =:= unnamed
, it should be true for tuples also.
This is pretty disappointing, honestly, because it makes named tuples a lot less useful; theyāre informational only, since anything can conform to anything else. But otherwise it creates a pretty annoying inconsistency.
Yes, I agree that this is my claim, but itās a little more that what you stated: it is that we already made that choice as a consequence of three things (1) named method arguments have optional names, and (2) there is an injection of methods into functions that could and should be made bijective, and (3) there is a bijection between function parameters and tuples. (The italicized part is arguable, but very natural.)
So when you ask your questions, I think the answers also have to explain where we want to subvert the parallels / bijections we already have.
For example, if you say
then you say "because named tuples ARE NOT bijections with function arguments, because def powM(base: Double, exponent: Double)
very much can be called with pow(3.0, 1.5)
and we want it to be this way; but powF: (base = Double, exponent = Double) => Double
cannot be called with powF((3.0, 5.0))
and we want this to be that way too.
If weāre going to say so, then I think we need to be very clear about the generalizations that we are and are not supporting.
We could have two different ideas of named types: name-optional and name-required. Then requirednamed <:< unnamed <:< optionalnamed
.
We could write required names as name = Type
, but optional names as name: Type
, for instance.
But, again, I donāt think itās useful to answer intuitive questions without forcing oneās intuition to explore all the consequences. This is a good way to intuit yourself into trouble and inconsistency, because intuition is a usual-case specialist, giving quick answers that work right most of the time.
Unfortunately, in programming, we are constantly exposed to the unusual cases too, and large systems can fail to work if they come out wrong, so weād better force ourselves to think those through, too.
Meanwhile, Iām reasonably content with my (Double \ "base", Double \ "exponent")
as a ānamed is unrelated to unnamedā named type / named tuple system. Itās not perfect. Even reduced to a single symbol, it feels a bit heavy. It doesnāt solve any lack of parallels that exist. But neither does it introduce any lack of parallels that previously werenāt there, and you canāt beat the safety if thatās what you need.
For instance, just because you can discard the names in => (foo: A, bar: B)
it doesnāt follow that you should without explicitly declaring your indifference to the careful curation of what things were called. Itās less likely to be an issue than if itās an input, but it isnāt necessarily safe.
why? (instanceOfModel1: Model1).toTuple
will result in a value of type:
type Model3 = (String, List[Model1])
i.e. there wonāt be any recursive type conversion.
What is the extractee type for the following ?
val (sum: Int, product: Int) = p
If we have named tuples, itās not clear whether Iām labeling the elements, or imposing a constraint on their names (in addition to labeling them)
And of course Scala users are used to the labeling mindset already,
Note that if instead of named tuples we have named types, we can be explicit (if a bit repetitive):
val (a: (sum: Int), b: (product: Int)) = p
Iāll also give my opinion with regards to Qualified Types (a not-even-experimental feature, i.e. take everything with a big grain of salt).
There were two ideas relevant here:
- The name in the predicate does not matter:
(x: Int with x > 0) =:= (y: Int with y > 0)
- We can reuse an identifier already present
So for example we can write:
def posOnly(x: Int with x > 0) = ???
posOnly( 4: (y: Int with y > 0) )
But as you can see, this clashes with the behavior of named tuples:
(a: Int with a > 0) =:= (x: Int with x > 0)
(a: Int, b: Int) !=:= (x: Int, y: Int)
(a: Int with a > 0, b: Int with b > 0) ?=:= (x: Int with x > 0, y: Int with y > 0)
This is not exclusive to qualified types of course, but would be much more widespread than:
(a: Int, b: Int) => a.type =:= (x: Int, y: Int) => x.type
((a: Int, b: Int)) => Int !=:= ((x: Int, y: Int)) => Int
((a: Int, b: Int)) => a.type ?=:= ((x: Int, y: Int)) => x.type
Overall, the best choice might be to have a different syntax for labelled types and named types:
Something like:
(x: Int) =:= (y: Int)
(x :: Int) !=:= (y :: Int)
(Probably not ::
as it would be ambiguous with Listās ::
)
This solves all the issues I raised above:
val (sum: Int, product: Int) = p // expected: (Int, Int) and binds sum, product
val (sum :: Int, product :: Int) = p // expected: (sum :: Int, product :: Int) and binds sum, product
val (a: sum :: Int, b: product :: Int) = p // expected: (sum :: Int, product :: Int) and binds a, b
(x: Int with x > 0) =:= (y: Int with y > 0)
(a :: (x: Int with x > 0)) !=:= (b :: (x: Int with x > 0))
(a :: Int with a > 0) !=:= (b :: Int with b > 0) // reuse identifier
(a: Int, b: Int) => a.type =:= (x: Int, y: Int) => x.type
(a :: Int, b :: Int) => a.type !=:= (x :: Int, y :: Int) => x.type
// and of course:
(a: Int, b: Int) =:= (x: Int, y: Int)
(a :: Int, b :: Int) !=:= (x :: Int, y :: Int)
I donāt think we can we ignore that sometimes you want to refer to an element of a type, and sometimes you want to name a type to make it distinct from different names
Yeah, itās looking like this might be necessary, and I called the latter one (x = Int)
in my last post above, because thatās kind of the partly-there syntax already.
Iād expect x = Int
to bind the type Int
to x
, whereas we want to bind elements of the type Int
to x
, an example:
type x = Int
val b: x.type = ??? //error: Not Found: x
val x: Int = ???
val a: x.type = ??? // works !
type MyType = (x: Int) => x.type // refers to element of type Int
I would expect that when the word type
is in front of it, certainly. Otherwise, it seems to me as good a candidate as any for a general-purpose naming operator.