3 Questions about unsigned numeric primitives and Valhalla

Ichoran · January 15, 2025, 7:47pm

Can you explain how this could work? The problem with new data types is that equals is primitive-friendly, which actually is a fair bit of work to pull off, and the more datatypes you have to check, the worse it is.

Because you explicitly don’t have those guarantees with opaque types, you at least are warned that it’s going to break.

Furthermore, with opaque types, you don’t need to have any performance penalty at all. Everything that is a performance penalty in practice is under the control of the compiler team and can be elided; everything can be made (with inline) to boil down to simply the minimum bare operators needed to do the math, with whatever hardware support the JVM or other platforms offer (and you can’t do better than that).

It’s only for types that are bigger than 64 bits that Valhalla makes any difference. Those, right now, are forced to box because you can’t return them unless they’re in a box. (You can pass them by decomposing them into multiple parameters, but you can’t return them.)

saulpalv · January 16, 2025, 12:43am

Not sure im following but will try to answer two aspects using Goetz slide

Valhalla value classes have automatic equals based on the value stored in memory (not pointer but actual value)

Sounds like they are copied by value similar to struct in other languages, no boxing takes place, only temporally data duplication, then the first is cleaned by the GC when going out of scope

Ichoran · January 16, 2025, 12:58am

But this doesn’t help the Scala “more primitives” situation, does it?

In BoxesRunTime.java in the Scala library, we have

    public static boolean equalsNumNum(java.lang.Number xn, java.lang.Number yn) {
        int xcode = typeCode(xn);
        int ycode = typeCode(yn);
        switch (ycode > xcode ? ycode : xcode) {
        case INT:
            return xn.intValue() == yn.intValue();
        case LONG:
            return xn.longValue() == yn.longValue();
        case FLOAT:
            return xn.floatValue() == yn.floatValue();
        case DOUBLE:
            return xn.doubleValue() == yn.doubleValue();
        default:
            if ((yn instanceof ScalaNumber) && !(xn instanceof ScalaNumber))
                return yn.equals(xn);
        }
        if (xn == null)
            return yn == null;

        return xn.equals(yn);
    }

which has to get longer if it’s going to handle more boxed numerics in an Any context. That can only slow things down.

Valhalla doesn’t help you compare UInt64 to Long. If you’re allowed to compare state to state, you’ll be wrong for negative numbers. If you’re not, you have wider branches/tableswitches to handle the comparison. You can’t get it for free.

tarsa · January 16, 2025, 2:02am

i guess the jvm has special optimizations for comparing value classes based on profiling, i.e. if at a given place the types are same in every invocation, then the equality operator gets optimized for these types. google ‘acmp valhalla -assassin’ or something like that.

saulpalv · January 16, 2025, 6:05am

That would not be sound or allowed by the type system when comparing primitives, it should be compared using same types using == and not Object.equals, for Objects boxing primitives it’s a different story
If you want to compare UInt64 to Long as primitives, you need to convert one to the type of the other first, like in Rust, for objects that would require an equals matrix of one to one primitive comparisons, and that mechanism would be a task for the scala core team

OndrejSpanel · January 16, 2025, 7:24am

The original quote is:

Value objects are compared for equality (==) by their state, not by identity

Your sentence gives me (perhaps not intended) impression Valhalla primitives are compared bitwise, but the quote does not say that. Scala case class comparison is also by state, and can be arbitrarily complex (and slow).

no boxing takes place, only temporally data duplication, then the first is cleaned by the GC when going out of scope

In many cases variables are local objects. That allows using registers or stack to store the variables, therefore they may never even enter the area of GC interest, just like you do not garbage collect current Int or Float.

saulpalv · January 16, 2025, 4:51pm

Would expect that, but correct this reasoning if you found an issue

At the beginning of the talk Goetz mentions identity is required for mutability, which requires a known memory location and pointers

Value classes have no identity, they are immutable, they are not referencing or pointing to something but storing the actual value,

Having said that, comparision using references (as in java classes) for value classes sounds a little akward, because there may not be references for internal data, it’s a flattened data layout in memory on Valhalla, and if comparision is not by reference i don’t see what can else be but by value, you need something physical to compare, in other words it can be bitwise pointer or bitwise value, but it’s always bitwise for any type of comparision, let me know if there is a better way to reason about this.

yes by state reference, but valhalla seems to me like state value

tarsa · January 16, 2025, 10:55pm

i’ve done some experiments with not so recent valhalla openjdk version:

$ ~/devel/jdk-20-valhalla/bin/java --version
openjdk 20-valhalla 2023-03-21
OpenJDK Runtime Environment (build 20-valhalla+1-75)
OpenJDK 64-Bit Server VM (build 20-valhalla+1-75, mixed mode, sharing)

the test code with comparisons results put into comments:

public class comparisons_under_valhalla {
    public static primitive class MyValueClass1 {
        public final int value;
        MyValueClass1(int value) { this.value = value; }
    }
    public static primitive class MyValueClass2 {
        public final int value;
        MyValueClass2(int value) { this.value = value; }
    }
    public static void main(String[] args) {
        var valueObject1 = new MyValueClass1(-700);
        var valueObject1b = new MyValueClass1(-700);
        var valueObject2 = new MyValueClass2(-700);
        // prints true
        System.out.println(valueObject1.value == valueObject2.value);
        // prints false, doesn't compile without casting to Object
        System.out.println((Object) valueObject1 == (Object) valueObject2);
        // prints true
        System.out.println(valueObject1 == valueObject1b);
        // prints true
        System.out.println((Object) valueObject1 == (Object) valueObject1b);
    }
}

MateuszKowalewski · January 17, 2025, 4:51am

That’s true.

But this has nothing to do with “primitive types”, and their memory layout. (Actually struct would be a better name for this construct.)

Not allowing things like negative sizes is something for the type system, and nothing regarding runtime representations of some values.

Opaque types can help already with creating such custom types which come with the adequate type checking. The nice thing is, they will hide the runtime representation. So one could even change it later on. For example from normal objects to some nice compact struct in case it’s a composite value.

It can be only bit-wise as long as you compare values of the same type.

But in case you add new numeric value types you want to compare them also to each other. In that case a bit-by-bit comparison doesn’t necessary work.

You need to convert one value. As you can’t do in-place updates this will need a copy of some arbitrary large entity. (Think “primitive” Vectors or Matrices).

Of course you need to know what to convert to what and how. This needs some switch / table lookup.

I think that was what @Ichoran tried to explain. (At least my understanding)

saulpalv · January 18, 2025, 12:51am

That may be true in the platonic word of ideas but on earth, the SIP committee perception of underlying mechanism prevented the SIP for unsigned numbers to see light

would expect it to be two staged, check type match, then value if types matched

Converting makes sense for the building atoms like numbers, in case of composed entities there is no need to copy, just compare their constituent atoms

collection type
collection size
then for each item
* item type
* item value