3 Questions about unsigned numeric primitives and Valhalla

In 2015 @sjrd and @densh drafted a SIP for the addition of 4 “primitive” types to the standart library for representing unsigned integers: UByte, UShort , UInt and ULong.
However, based on the following quoted requirement, the SIP was rejected because the performance of unsigned AnyVals was inferior to native built-in signed JVM primitives.

Since they will inevitably be “branded” as primitive data types, unsigned integer types should be as efficient as signed integer types.

Despite it was rejected i have some questions about some SIP quotes but in the context of Valhalla language model. Answer if you want.

Topics:

Scala has custom-made AnyVal s and other Object-Oriented abstractions on top of primitive types that can allow for additional, zero-overhead “primitive” types

  1. Is Valhalla project trying to achieve something like Scala´s AnyVal (inlined classes with zero overhead), if that is the case does Scala benefits at all?

Java cannot decently add new primitive types for compatibility reasons

  1. Valhalla also suffers from Boxing/Unboxing of primitives, does that make the quote above still stand true?
  2. Does the JEP 402: Classes for the Basic Primitives (Preview) improves it at all?

arrays of unsigned integer types will suffer from the same performance penalty as arrays of user-defined AnyVal s, because the elements will be boxed.

  1. Is there any hope for this ever to be addressed at the JVM level?

Pertinent references:
JEP draft: Value Objects (Preview)
JEP 402: Classes for the Basic Primitives (Preview)
JEP 401: Primitive Classes (Preview)

1 Like

So apologies if I’ve misunderstood the SIP’s, but for myself the most helpful thing would be an unsigned Int that was a subset of Int. Again just speaking for myself the next most useful thing would be a non negative floating point number that was a subset of Double. it really is a bit odd that here we are in 2022 and still no Natural number in the type system. A natural number type that can just be widened to an Int when required.

What would also be helpful is if Avoid bridge clashes… could be fixed. This. at least for me causes problems in Scala.js on Scala 3, but not JVM or Native. I get round it by stripping the AnyVal from the src file for Js. This is unfortunate because its actually in Scala.js where AnyVals can be most needed given the reduced resources of the browser and the less smart translation of high level code into its final compilation.

2 Likes
  1. Scala’s AnyVal desugars to statics calls with the value as just another argument in most cases. AnyVal is limited to only wrapping a single value. Meanwhile from what I understand, Valhalla aims to alter the JVM to truly support something more lightweight than an object. These new types are not limited to just a single value. The new valhalla types is not just a trick with static methods.

  2. As I see it, if we truly do get what we want from Valhalla, then you can make unsigned data types as efficient as the primitives we currently have. They will box in all the cases where primitives will box.

  3. You can view JEP 402 as declaring int and Integer to be basically the same from what I understand. In some ways it’s quite similar to how Int is currently defined in the Scala library. It’s just another AnyVal class given a bit of an special treatment by the compiler. Java will do something similar in JEP 402.

  4. Valhalla is an attempt at addressing this at the JVM level

2 Likes

To check out Java’s Valhalla project in general, go here:
https://openjdk.java.net/projects/valhalla/

And to specifically check out how they are super-charging Java’s primitives (which is what is required to get to performant unsigned integer types), go here:
https://openjdk.java.net/jeps/169

2 Likes

thanks @RichType @Katrix and @chaotic3quilibrium, will try to respond you, this fragment from the JEP 401: Primitive Classes (Preview) i found useful and highlighted some pertinent parts.

Primitive classes give developers the capability to define new primitive types that aren’t subject to these limitations. Programs can make use of class features without giving up any of the performance benefits of primitives.

Applications of developer-declared primitives include:

  • Numbers of varieties not supported by the basic primitives, such as unsigned bytes, 128-bit integers, and half-precision floats;
  • Points, complex numbers, colors, vectors, and other multi-dimensional numerics;
  • Numbers with units—sizes, rates of change, currency, etc.;
  • Bitmasks and other compressed encodings of data;
  • Map entries and other data structure internals;
  • Data-carrying tuples and multiple returns;
  • Aggregations of other primitive types, potentially multiple layers deep

it’s odd to get an integer : { ...,-3-2-1,0,1,2,3,... } from String.length() and array.length when the correct API representation are whole numbers, {0,1,2,4...}
How cumbersome it is that Scala’s numeric type system, the most fundamental type system, is stuck and limited by an initial primitive specification of the runtime?

I really hope, another question may be, if Valhalla really delivers developer-declared primitives with the same performance as the built-in ones, then, should Scala get its own unsigned integers or wait for java to add them to the standard library?

1 Like

I echo @RichType’s observation that its 2022, we don’t have natural numbers out of the box, and that’s frustrating. Personally, I’d trade a lot of the Scala 3 featureset for a built-in natural number refinement. It would close so many opportunities for bugs and make types more expressive.

Are we limited by the JVM here? Couldn’t we distinguish between richer refined compile-time types, vs the underyling runtime type. For example, a type for an “Int between 0 and 10”, that erases to regular Int at runtime. Of course, such richer refined types could not be available for libraries distributed solely as jvm bytecode, since the extra info has been discarded by then. But I guess they would be available for TASTY libs. Personally, I’d grab that tradeoff with both hands.

I think the matter of refined subtypes of the regular primitive types is largely orthognal to Valhalla.

-Ben

5 Likes

it’s curious that Kotlin had a similar proposal, and was approved: unsigned-types · Kotlin/KEEP, started experimental in Kotlin 1.3, now it’s beta and integrated with the compiler, it won’t let a program compile if you try to add negative vaules like -2 to a kotlin.UInt.

Just clarifying, by no means I’m suggesting Scala must follow Kotlin, they were born for different purposes, still I believe they will reach some level of feature convergence over time.

I guess that also responds my question:

as if Java adds their own unsigned primitives to the standard library, Kotlin could map his own to the Java ones

2 Likes

This is actually the goal of some existing libraries like Iron or Refined.

This means this is entirely possible in Scala and it is even easier in Scala 3 with inlines and scala.compiletime. We almost don’t need a single macro to achieve this in 3.x and thanks to opaque types, this is reachable without boxing/unboxing overhead.

4 Likes

Typelevel Spire Unsigned Types somehow do the job for now.

1 Like

Seems Valhalla project had progress and it’s reaching stabilization stage

Valhalla is available on JDK early-access builds at
https://jdk.java.net/valhalla/

2024 conference (Valhalla architects said it’s mostly ready)

I don’t think would be required to revisit SIP-26 - Unsigned Integers because most unsigned numeric classes would probably be added in the java std library, so scala can just use them

As for Kotlin, it did implement unsigned integer types as inline classes, which would be the second feature that conflicts with a newly added by the JVM, the other one being green threads vs coroutines, even though it’s only available on JVM and not js/native

1 Like

do you have a source for that claim? i have an impression that java architects don’t want to ever introduce such things.

that’s a good point, they mentioned converting existing primitives to value classes but not adding new ones, however the whole point of Valhalla it’s being able to do that, so it’s fair to say it may happen, if not officially then by community leads like apache or jetbrains, in any case there is value to reuse whatever becomes standard in the java world for interop, also because scala already has few of them like typelevel spire

on minute 8:50 from Goetz talk

it certainly doesn’t scale to add a 100 new primitive types and are these eight primitive types exactly the ones we want for the entire future of computing? hell no, right where’s my float 16, where’s my complex number

Scala’s opaque types also make it comparatively easy to implement unsigned arithmetic types (as long as nothing is bigger than 64 bits). I’ve done that in my own library, and it’s…fine, kinda, except the interop is a little bit clunky.

But seems you can’t generate a tableswitch with opaque types

he also said (i’m not sure in which talk) that they’re thinking about adding typeclasses and a predefined set of operators to java, but if they implement it, it’s probably far away in the future. maybe it would work somewhat like in rust?

wouldn’t extending anyval also work?

case class UInt32(val raw: Int) extends AnyVal {
  def /(that: UInt32): UInt32 =
    UInt32(Integer.divideUnsigned(raw, that.raw))
}

println(UInt32(-345) / UInt32(42534))

works in scala 2 and scala 3 mode too.

extends AnyVal doesn’t have a “scream at me if you’re boxing this” mode. Opaque types are more reliable when you need performance. But yes, you can extend AnyVal also.

Tangential - there’s an open ticket for the screaming: Warn when boxing value classes · Issue #12271 · scala/bug · GitHub

3 Likes

I’ve never understood the case for “custom primitive types” in a programming language. Something like that can’t really exist.

In the end everything is just some (currently) 64 bit (or some “vector” bundling a few of these into one entity). How you interpret them is arbitrary.

The only question that matters is for which interpretations you have efficient HW support. This is a question of hardware (and runtime), not programming language.

Only if you can directly use the hardware features for some interpretation it makes sense to declare something “a primitive value”, imho.

If you can’t use hardware features it makes no sense to worry about having or not having some certain type of “primitives”. They will be anyway just mapped by software at runtime onto whatever the hardware supports natively.

So if you add for example custom f16 or u64 types this will make exactly no difference to using f32 or i64 when it comes to performance as long as the runtime doesn’t map them to real hardware features. (On the JVM you can’t control that.)

What’s indeed much more important and useful are static refinements of some types. For that you don’t need any support from the runtime, and it makes no difference what they are “in reality” at runtime. But that’s about correctness, not performance.

Valhalla is not about static refinements.

Valhalla is more about memory layout of custom (value) types on the JVM. Currently all you have for any user defined type is Object; which isn’t compact, needs a header (which can be even larger than the actually value), and is hold together by pointers instead of being a contiguous “memory block”.

OTOH Valhalla won’t give you access to hardware interpretations of some memory, AFAIK. Defining a f16 won’t make it more efficient than a f32 (even if you had hardware that supports that). Defining a u64 won’t let you use CPU instructions for unsigned arithmetic for your new type. It will be still “faked”—as you can do already now.

There are ways to access some HW features on the JVM, at least through intrinsics, but that’s not Valhalla, that’s mostly the new vector API. To define new efficient primitives one would likely need to combine both. (IDK this is even possible or planed).

So as long as the JVM won’t give you access to the relevant HW features even Valhalla won’t give you more performant unsigned numeric values than you can have already now. (64 bit register vs. 64 bit register, interpretation as “unsigned” in software)

That’s at least my understanding. Please correct me if I’m wrong anywhere here! I think I don’t know enough about all that low-level stuff. So really happy to learn more! :sweat_smile:

1 Like

I don’t really care about performance, but to have the right data type, for example, this is perfectly valid scala code, even valid for Java Array size

List(1, 2, 3, 4, 5).size == -1

It shouldn’t be possible to compare a collection size with a negative number, Rust can be a good example where Vec.len is usize

Same goes for screen size, or anything related to physical counts

the reason SIP-26 - Unsigned Integers was rejected was because a 5% performance hit, Kotlin didn’t care and i wouldn’t but Valhalla can remove that 5% performance hit that justified to reject the SIP