Pre SIP: Named tuples

alvae · December 4, 2023, 9:37pm

I don’t think we should understand named fields in opposition to positional ones. Names are just another way to refer to a position in a tuple. Otherwise we’re no longer talking about a tuple.

Isn’t that re-inventing case classes?

case class Person(name: String = "Bob", age: Int, length: Double)

odersky · December 4, 2023, 9:42pm

But that’s the crux of the matter! r.named is not definable, it needs a magic conversion. The definition you tried to give for named is not legal code.

I’ll shut up now. I think we are turning in circles.

bishabosha · December 4, 2023, 11:18pm

it should be defined as

extension [Ts <: Tuple](ts: Ts)
  inline def named[Ns <: Tuple]: NamedTuple[Ns, Ts] = ts.asInstanceOf[NamedTuple[Ns, Ts]]

and well if there’s no expected type then it will infer as NamedTuple[Nothing, Ts] which will I guess confuse the type printer

Ichoran · December 5, 2023, 12:16am

That’s because I was writing pseudocode. I thought the intent was clear enough.

Here’s a working library-level implementation of named 2-tuples which are opaque wrappers around 2-tuples but are not subtypes of tuples and use non-magical .named exactly as indicated in lrytz’s code: Scastie - An interactive playground for Scala.

(To be fair, transparent inline is hard to distinguish from magic sometimes. But anyway, it’s not novel magic; it’s the magic we already have.)

Because it’s library-level, I can’t use .name but instead have to use $ "name". That’s where a little compiler magic would be needed. (R uses $ instead of . for member access.)

The issue isn’t whether it can be implemented, but rather how we wish for the feature to work. unnamed <:< named is elegant but facilitates usage that is the polar opposite of one main use case of names: to safely disambiguate.

MateuszKowalewski · December 5, 2023, 3:34am

I disagree with the claims that named tuples need to provide some high level of type safety.

The opposite is the case: (Named) tuples are a construct for the case where you didn’t commit to strong typing (yet)!

They are for “liquid structures”—for explorative, or very generic work.

You start out with a tuple, just defining the types of your fields. In the next step you commit to some names for the fields as you figured that out. As things evolve further you give the baby a name and create a case class…

The whole point is that you don’t need to commit to the strongest of typing directly form the get go!

That’s exactly the issue with strong typing: It’s hard to use in cases you haven’t figured out everything already, or the structures you work with are very dynamic / generic.

I want named tuples so I have some middle ground between tuples and case classes. But they aren’t a replacement for cases classes of course.

I also don’t buy any examples like:

case class Rectangle(origin: (x: Int, y: Int), dimensions: (width: Int, height: Int))

def makeRectangle: Rectangle = {
  val o: (Int, Int) = computeOrigin()
  val d: (Int, Int) = computeDimension()
  Rectangle(d, o) // Oops
}

If you want strong typing just make origin and dimensions take better defined types.

With the argument above you would actually need to disallow positional arguments to functions entirely. All calls would need to spell out the parameter names. Always. Because otherwise there is the potential that you confuse some arguments.

Think for example of a Dimension case class like this here:

case class Dimension(width: Int, height: Int)

The compiler won’t prevent me from writing something like:

def getDimension =
   val w: Int = computeWidth()
   val h: Int = computeHeight()
   Dimension(h, w) // Oops!

When you use completely underspecified types like Ints or Strings that’s always the danger with positional arguments. Still nobody every proposed to get rid of them, because they’re obviously error prone.

Instead you would refactor to stronger types step by step when the need arises. You could for example introduce opaque type aliases for your loosely typed Int fields…

lihaoyi · December 5, 2023, 3:49am

I agree that Nothing, overloading, and implicits are existing cases of “compile-time” LSP violation. But I argue that they indeed fit the bill for “endless stream of edge cases”! We were just discussing two different edge cases around Nothing inference a few days ago, and I think most would agree that the existence of overloading makes everything more complicated, with tons of features that don’t work well together with it (default values, target typing, …). I don’t think adding more such compile-time LSP violations is desirable if there are easy alternatives available.

I think there is enough confusion in this thread to suggest that despite the technical validity of unnamed <: named, it’s pretty counterintuitive to a majority of users. There is certainly a mental model where unnamed <: named makes sense, but it seems that it is simply not the model that everyone already has in their heads

Given unnamed <: named seems empirically confusing, assuming we don’t want to go with python-ish named <: unnamed, I like @soronpo’s idea of just having all tuples be named, with unnamed tuples just being named tuples with the names "_1", "_2", etc.:

That would both keep things simple since there’s only one real implementation of named tuples with a thin desugaring for unnamed tuples, but also keep the two types distinct so named =!= unnamed.
We can add conversions in either direction, between named -> unnamed, unnamed -> named, and even between different named -> named tuples. Given that they are all the same runtime values, these conversions would be zero runtime cost, and IMO having people explicitly opt-in to convert between them is a very good idea for reasons @Ichoran and others have already brought up.
It opens up the possibility of mixed named/unnamed tuples for free, since in the end they’re all just named tuples with some of the names being "_n", there’s no additional complexity to worry about: you can append them, mix them in any order, and it should “just work”

MateuszKowalewski · December 5, 2023, 7:31am

Nevertheless nobody every proposed to not have overloading…

The reason is simple: People expect that overloading works. Actually, people even expect that “something like overloading” works in case of extension methods! Because overloading “feels natural”, no matter the technical challenges.

People also expect that they can use positional arguments in functions, even all functions have named parameter lists! Even that’s “unsafe” because you can confuse different parameters when they have the same types.

What’s the “easy alternative”?

Nobody proposed anything like that in that thread here until now.

The most realistic counter-proposal so far was actually to deliver a crippled feature—mostly like what they did in C#—and “just” make Tuples and NamedTuples completely unrelated types. This feels very unsatisfying!

You’re able to deduce that from a sample of around 10 people? Respect for that statistical insight. (And not even here we have an unambiguous “majority” btw. But decisions about technical innovations aren’t something you can go by majority vote anyway. Otherwise we would still program in assembly to this very day… ).

People who ever used positional arguments in functions would disagree…

Besides that: This feature is not out, so such claims make anyway no sense. You can’t have a mental model of something you never used.

But there is actually a very strong analogy to positional arguments in functions! So my guess would be that most people wouldn’t be surprised that you can pass a unnamed tuple where a named is expected.

But given the previous experience with (named) parameter lists in functions while calling them with positional parameters I guess people would be very surprised if that wouldn’t work. (At least not without additional ceremony. Ceremony that can’t be explained by the language, but needs outright magic that only the compiler could provide in some ad hoc way.)

Come on! Not even I would reach for such a stretch! (And I really love to exaggerate in arguments).

Emm, no.

That’s a contradiction.

If unnamed tuples are named tuples, there can’t be any distinction on the type level.

I hope that’s the case anyway. Because at runtime that should be just “flat structs” no matter what.

Again that argument?

Consequently you would need to ask users to explicitly “convert” positional parameter lists to named parameters on every function call.

That’s just plain ridiculous.

Well, that’s just restating my observation that the super-type of named tuples are ParameterLists… Because all the things mentioned here are already possible with parameter list.

But I clearly see that reworking the language in that direction is out of scope for this proposal here.

That’s not an issue as this could be done in some follow up (maybe Scala 4, or so).

I would disagree. There is plenty of subtle complexity in here!

Parameter list handling in Scala is already super complex. You would need the exact same complexity to handle “mixed” tuples.

I don’t say that’s undesirable, but that’s nothing you would do in the first iteration of the feature. That’s just to much to ask for in one iteration.

som-snytt · December 5, 2023, 7:42am

said nobody ever. Or do I mean never.

Nobody expects that overloading works, and it doesn’t feel natural.

Everybody knows that overloading is evil and is unnatural demon spawn.

My poignant observation is that unnamed <: named is the new PartialFunction <: Function.

lrytz · December 5, 2023, 7:48am

Yeah, seeing r.named working I think I prefer keeping the types separate. That would actually minimize compiler / language magic, either direction of subtyping has to be built in.

EDIT: though the ergonomics still matter - I think everyone agrees that an unnamed tuple literal can be used where a named tuple type is expected.

alvae · December 5, 2023, 8:03am

If names matter so little, then I don’t think we need named tuples in the first place.

I have already stated my position and won’t repeat my arguments but there are a couple of statements that rubbed me the wrong way:

I once gave a talk in which I said we should get rid of overloading in PLs because that’s a confusing feature. I’m sure I’m not the only one.

Nobody proposed anything that you found satisfying. Making unnamed tuples and named tuples unrelated is an alternative and several people have put arguments forward to justify this approach. It is a little disappointing to see these arguments pushed aside because “[it] feels very usnsatisfying”.

I have enough respect for the people on this forum to trust that they have good intuition about language design. If a feature seems so devise to 10 of these people then I’d also conclude that something is fishy.

Named tuples have been used in other languages for years. That is enough time to build a mental model of the feature.

While breaking precedent with tradition should always be on the table, it should also require substantial evidence of being beneficial. I don’t think we gathered enough understanding about the consequences of unnamed <: named to make definitive statements here.

The fact that you’re not moved by that argument doesn’t make it invalid. It has been put forward by several people who seemed genuinely worried about a loss of type safety. This question sounds like you’re asking these people to just forget about it and move on.

This statement might be infuriating to Swift users. As I mentioned in an earlier post, Swift arguments are always positional and those that have labels must always be “named”. Millions of Swift users are used to this approach and at least half of them find it useful.

MateuszKowalewski · December 5, 2023, 8:45am

You’re definitely not the only one.

There are people who disregard overloading for its theoretical challenges. For example the Flix language doesn’t include overloading for this exact reason. But Flix solves this differently, by allowing functions to take parameters of structural types, and these structural types (Records) are first class constructs. In Flix parameter lists named parameters are modeled by passing in row-polymorphic records.

https://doc.flix.dev/records.html#named-parameters-with-records

But there are also more than enough people who think that Java like overloading isn’t even enough. People built whole languages on so called multi-methods, which are kind of “maximally overloaded” functions. That’s the other end of the spectrum…

That’s a much better argument. The former tried “statistics”…

First of all a lot of people didn’t use these languages. Not everybody used all kinds of languages.

But I used some C# actually. Nothing exciting, just a few kLOC. But I’m quite sure that I don’t think C# is very exciting after this experience. It has a lot of checkboxes ticked in the feature list, but almost everything “feels” like a 80% solution. Including named tuples. (The lack of Nothing, or actually any advanced type system features may be a reason for that.)

Where do you want to take that evidence from?

You would need to push that feature out to the people so you could gather some feedback from field usage.

I think we saw some very compelling arguments.

Calling functions is one of the most fundamental things in programming languages, especially in functional ones. Having features in your toolbox that allow for a kind of “dynamic” programming in a statically typed language is one of the selling points for Scala.

Scala has already plenty of tools to asses the highest level of type safety. But it lacks a little bit on the other side of the spectrum, where you’re interested in more dynamic data. Scala is actively loosing to Python for exactly this reason! It’s a pain to do explorative data processing in the stiff corset of a strongly typed language (when you don’t have any escape hatch).

Scala is trying to cover this use-case with features like structural types, and now named tuples.

Things in that space need to work in loosely, “quick and dirty” ways also. You don’t want to do type acrobatics to transform some data that doesn’t have a proper scheme!

That’s why I think it’s important to make the feature very convenient to use. Even this means loosing out in terms of absolute “safety”.

odersky · December 5, 2023, 8:58am

I completely agree we need to research links to data frames and relational algebra. I made a start in my PR where I implemented one kind of join in file named-tuple-strawman-2.scala. It would be great if others could help and add to this.

What do you mean by “built in”? It’s a simple lower bound declaration in a library.

I hear the arguments against subtyping, but I don’t find them convincing. I believe library operations will be a lot more pleasant to use if there is subtyping. And the safety concerns look overblown to me. There’s also the point that all other language that have named tuples do allow automatic injection. I am not arguing that we should in any case copy what others do, but if we deviate we have a burden of proof that explains why. This burden of proof has not been met, IMO.

If I write

val bob: Person = ("Bob", 33)
bob.zip(("Miller", "years"))

I find it un-Scala like to demand that one writes either

val x: Person = (name = "Bob", age = 33)
bob.zip(name = "Miller", age = "years")

or

val x: Person = ("Bob", 33).named
bob.zip(("Miller", "years").named)

It’s more verbose and ugly and gains us nothing (and, no, making an exception for literals won’t cut it, it would just introduce another irregularity). It feels to me we have lost the spirit of Scala. Why do this here, but not for named arguments? It’s just an arbitrary restriction to make people’s life harder in the interest of some hypothetical safety. By the very same argument you could say it’s easy to accidentally swap arguments to functions, so let’s make named arguments mandatory! You can have that opinion, but that language would not be Scala.

You can dismiss all that and weigh your priorities differently. But in the face of that and the evidence that all the other languages do the same, I think we have to agree to disagree on this point.

MateuszKowalewski · December 5, 2023, 9:12am

Well, it gains some explicitness.

That is clearly a value in some contexts.

But I think it’s counter-productive in the context where (named) tuples are most desirable, namely in some code dealing with ad hoc data (like often found in Spark processing, or DB query code).

I’m usually arguing for “theoretical purity”. But not in this case here, as I think this kind of “purity” ruins the most compelling use-case.

Also I hope now everybody sees clearly the parallel to positional function arguments. That’s just something that can’t be dismissed, imho!

Dear SIP Committee members, please try to look at the thing from the use case perspective and not insist on theoretical purity where it makes no sense. That’s just not Scala.

AMatveev · December 5, 2023, 9:21am

Will it be possible?

Use reflection(typetags) to create factory for any named tuple.
Use match types to convert types in a named tuple .

odersky · December 5, 2023, 9:49am

Yes, the representation of named tuples as NamedTuple instances is exposed. So one can use the usual generic programming for tuples also for named tuples. named-tuple-strawman-2.scala in my PR shows some examples (it’s still in a rough state).

EDIT that was intended as a reply to:

alvae · December 5, 2023, 12:01pm

This program won’t compile in Swift:

func f<S: Sequence<(a: Int, b: Bool)>>(_ s: S) {}
f([(1, true), (2, true)]) // Error!

We’ll get this error:

global function ‘f’ requires the types ‘(Int, Bool)’ and ‘(a: Int, b: Bool)’ be equivalent

The conversion from (Int, Bool) to (a: Int, b: Bool) will only work if it happens with a literal (my suggestion) or at some specific AST positions, like assignments and return values. These are the confusing rules that I think we should not emulate.

I have enough lines of Swift under my belt to confidently say that:

The above error is not making my life harder.
I would not be sad if we only had literal conversions and not the other ones.

But anyway, I’m happy to rest my case now.

My closing argument is that I think insisting on a subtyping relationship is making us miss an opportunity to explore a way to improve the whole tuple conversion problem. It is order of magnitudes harder to add restrictions than it is to lift them later. If we started without subtyping, we’d have a chance to better identify the conversions that really cause pain, if any, and we’d be better equipped to know how to solve them. If we rush into one, it will be difficult to go back if we realize that there was a hidden trap. Besides conversions, I am also quite convince that the absence of subtyping would make tuple operations simpler to define; and perhaps more importantly than anything else, better define the role of named tuples by creating less overlap with case classes. You can see my original post for my rationale.

IMO, the fact that we can nicely fit a subtyping relationship in the type system should weigh less than the cost of making that relationship counterintuitive. In fact, I’ll even say that it is an appeal to purity that runs against @MateuszKowalewski’s pleading. Further, the correspondence to named parameters seems incidental. Named parameters do not work like tuples in Scala because they are not positional. I’m also not convinced that modeling functions as arrows from tuples to tuples is an idea that scales in practice. It is appealing because it works on the surface but then you get into passing conventions, default arguments, etc.

Ichoran · December 5, 2023, 7:48pm

Actually, one of the biggest headaches I have is getting same-typed positional arguments in functions straight. I used one of the examples before: ranges specified by start and end vs ones specified by start and length. It’s even worse with coordinates. Is it (x0, y0, width, height)? Is it (x0, xN, y0, yN)? Maybe (cx, rx, cy, ry)? I return things in tuples and forget the order and muck stuff up that way all the time, alas. If I could get this straight reliably enough, I might like Python more than I do. So if it were possible to make some named arguments mandatory, I would leap on that in an instant.

I complain not for some abstract reason of purity, but rather because it’s an actual pain point. One of the worst remaining, actually; almost everything else in Scala 3 is a joy to work with. (I don’t see how this could be easily fixed, alas; much of the problem comes from Java libraries.)

However, in coming up with my working example of named opaque tuples without subtyping, I realized that I’m almost completely satisfied with a pure library-level solution if I want it the non-subtyped way, and I think most everyone else could be too. So the compiler can do whatever, and if it turns out not to be the preferred solution, people don’t have to use it and can have nearly seamless functionality with what they prefer.

Sporarum · December 5, 2023, 8:32pm

Except for the fact that once it’s in the compiler, it has to stay there forever !

It would be a shame to discover later that we decided on the wrong sub-typing relation, and be unable to change it !
(Deprecating a feature is kinda possible, replacing a feature by its opposite is impossible, even when breaking compatibility like in Scala 2 to Scala 3, as that would be extremely confusing to users)

I think the overwhelming take-away from this thread is that there is no consensus among us on what is best, even though we all want named tuples !
Therefore I think the wisest choice is to do as some have proposed:
No sub-typing between named and unnamed, in either direction, until we have enough examples under our belt to see what is more useful in practice

P.S: I really don’t buy the parallel with function parameters, for example the return type of a function does not depend on the correct ordering of parameters, whereas named tuples differ:

def foo(x: Int, y: Int) = x + y
foo(x = 2, y = 4) // : Int
foo(y = 4, x = 2) // : Int

(x = 2, y = 4) // : (x: Int, y: Int)
(y = 4, x = 2) // : (y: Int, x: Int)

jeremyrsmith · December 5, 2023, 10:20pm

You know, people say this about static types all the time

lihaoyi · December 6, 2023, 4:20am

I mean, the most popular language in the world is Python, and Python has names tuples, and does not allow automatic injection as I have shown above, in fact the subtype reationship is opposite this proposal. @alvae demonstrates the same for Swift. This argument seems objectively false.

Swift does this: named arguments are named, positional arguments are positional, according to how they are defined, and enforces that the order of the named arguments must match the definition. Python allows keyword-only arguments as well, albeit opt-in and without ordering enforcement. Swift maybe you could argue is just a legacy of Objective-C, but Python chose to add these explicitly in PEP 3102, which is an excellent read for why these are a reasonable idea.

IMO opt-in keyword-only-arguments would be strict improvement over Scala allowing every callsite to make a different choice, similar to how:

Scala 3’s strict empty-parens-in-method-call handling is better than Scala2’s “use as many or as few empty parens as you like” at callsite,
Scala 3 chose to only allow definition-sites marked as infix to allow infix alphanumeric methods, v.s. Scala 2 letting each callsite make a choice.
Scala’s definition-site variance being better than Java’s use site variance.

In all cases, they remove an unnecessary degree of freedom and room for error, while preserving the flexibility at the definition site to dictate how the callsites will function.

Maybe not everyone agrees on the details, but keyword-only arguments are not some unspeakable abomination that you seem to suggest it is. It’s a pretty reasonable choice, that many languages have made, that would actually fit perfectly into the Scala 3 goal of trying to remove unnecessary flexibility from the Scala language while preserving its core expressiveness.

I don’t think this is true? Since when was Scala the “convert everything to everything” language? We literally just discussed how we can limit or remove implicit conversions, so as to explicitly discourage “converting everything to everything” as a way of using Scala. We literally brought up the removal of JavaConversions convenient “convert everything to everything” import in this thread!

Every language has cases where it doesn’t do something implicitly because there is no obvious/agreed-upon semantics, I don’t see why users would be upset at all. It would simply be another day that ends with a “y”. I would expect professional software engineers to reject their colleagues’ code at code review if written in such a way that they could not agree on a semantic intuition, even if the code was technically valid.

@soronpo, @Ichoran, @alvae, and others have all made good arguments grounded in facts, often with code snippets demonstrating the ground truth. I have tried to provide these as well. The arguments in favor of unnamed <: named have not been convincing.

@alvae and @Sporarum are right to say that it is harder to remove things than to add them. This is objectively true. We all want named tuples, and we all disagree on which possible subtyping is a terrible idea. The obvious path forward is to go with named tuples without subtyping, have concise explicit conversions (as we do with .asJava, .asScala, or the proposed .convert), and leave the door open to adding subtyping or adding implicit conversions later.