Urgent need for improved equality in IArray (Scala 3) - Case Class Incompatibility

arie-benichou · May 1, 2025, 3:06pm

    val ia1_1 = IArray(1,2,3,4)
    println(ia1_1.mkString("|"))
    
    val ia2_1 = IArray(1,2,3,4)
    println(ia2_1.mkString("|"))

    val ia1_2 = IArray.from(Array(1,2,3,4))
    println(ia1_2.mkString("|"))

    val ia2_2 = IArray.from(Array(1,2,3,4))    
    println(ia1_2.mkString("|"))

    println(IArray.equals(ia1_1, ia2_1))
    println(IArray.equals(ia1_2, ia2_2))

Compilation Errors :

    Found     : (ia1_1 : IArray[Int])
    Required  : IArray[AnyRef]

    Found     : (ia2_1 : IArray[Int])
    Required  : IArray[AnyRef]

    Found     : (ia1_2 : IArray[Int])
    Required  : IArray[AnyRef]

    Found     : (ia2_2 : IArray[Int])
    Required  : IArray[AnyRef]

IArray.equals contract is severely broken:

Should use Any instead of AnyRef
Should return true if both underlying arrays have the same elements
Should handle nested arrays

Ichoran · May 1, 2025, 5:15pm

The contract isn’t broken. The contract is the contract: you can do Object easily and everything else is hard, so it does the easy thing. That’s the contract!

Do you want to try to do a harder thing? You can write it yourself! You have to call IArray.equals manually anyway. If you want Java deepEquals, just call it. Every IArray[T] is just Array[T] so you can asInstanceOf it over. You can write a helper function that does it:

extension [T <: AnyRef](a: IArray[T])
  def deepEquals[U <: AnyRef](b: IArray[U]) =
    java.util.Arrays.deepEquals(a.asInstanceOf[Array[AnyRef]], b.asInstanceOf[Array[AnyRef]])

Now you can deepEquals it. But wait! What if you want deepEquals on any type of array? It turns out that you can’t express that in Java so it doesn’t support it–save for System.arrayCopy which takes Object instead of Array because there’s no superclass for a general Array (!!)–so you have to do it manually. And then you have to decide if Array(1, 2, 3) is or is not equal to Array(1L, 2L, 3L). In Scala, 1 == 1L, 2 == 2L, and 3 == 3L, but in Java Array(1, 2, 3).equals(Array(1L, 2L, 3L)) is false, and there is no java.util.Arrays.equals method that takes Array[Int] and Array[Long]. Hence, deepEquals, unlike Scala collection equals, will return false if there are embedded Arrays with identical elements but a different primitive type.

But then you realize that Array[String] and Array[CharSequence] and Array[AnyRef] are actually all different class types, but they all return true for isInstanceOf[Array[AnyRef]] even though Array[AnyRef] is not a superclass of Array[CharSequence] etc.! So you can totally go

if xs.isInstanceOf[Array[AnyRef]] then
  val ys = xs.asInstanceOf[Array[AnyRef]]
  ys(0) = "Uh-oh"

and that will totally work even if it’s an Array[Option[Boolean]]. And crash later with a ClassCastException when you try to use the Option[Boolean] that is actually a string.

Around about this time, most people throw up their hands and declare arrays broken by design, and use Scala collections which have sensible policies for all these things.

But if you haven’t thrown up your hands, and you still want to keep at it, you can use asInstanceOf[Array[Char]] etc. to detect primitive arrays, or you can just accept that you can’t even call Array(1, 2).deepEquals(Array(1L, 2L)) (or, more likely, def foo[X, Y](xs: Array[X], ys: Array[Y]) = xs.deepEquals(ys)); or you can pattern match out all the primitives for comparisons like that and defer to java.util.Arrays.equals if the types match, even though it has different behavior than every other collection type, or write your own recursive routine.

But the designers of the Scala library decided, sensibly enough, that this is all a minefield and if you want well-behaved collections, you can use the collections library. Including wrapped Arrays, if that’s what you want–the wrapper smooths out all the wrinkles.

So, wherever you might be tempted to use IArray, think about whether collection.immutable.ArraySeq would do instead. Sometimes it doesn’t; ArraySeq is not guaranteed to hold primitives as primitives; ArraySeq(1, 2, 3) wraps a primitive Array(1, 2, 3) (Array[Int]) but ArraySeq("a", "bc", "def").map(_.length) is also typed as ArraySeq[Int] but stores the values as boxed because map acts generically.

Anyway, the bottom line of all of this is that Array and IArray allow high-performance primitive arrays to be easily accessible in Scala as well as Java, but they don’t shield you from all the weirdnesses of primitive arrays. If you want to build your own shields, go for it! You can write extension methods. But the rest of us have just given up and decided that we will use them very intentionally as the weird things that they are.

Ichoran · May 1, 2025, 5:27pm

You have fooled yourself here. The ImmutableArray constructor demands an Array[Any]. So ImmutableArray(Array(1, 2, 3, 4)) understands that Array(1, 2, 3, 4) is an Array of Any (i.e. Object under the hood). So it boxes 1, 2, 3, 4. You’re not actually using primitive arrays here.

If you tried val a1 = Array(1, 2, 3, 4); val ia1 = ImmutableArray(a1), it wouldn’t compile because a1 is a primitive array of Int.

arie-benichou · May 1, 2025, 7:56pm

That is true. However, it does not mean there is no problem with the actual IArray.equals implementation.

I understand that the purposes of IArray is to offer an illusion of an immutable array through opaque type in order to make it very efficient by avoiding the need for a wrapper and by avoiding boxing unboxing, and by leveraging java efficient handling of array.

This conversation should not try to negate by all means the fact that there is no problem with the actual IArray.equals method, since there is a problem obviously.

It should be heading us to ask the following questions:

What is the contract / purpose of the actual IArray.equals method ?

If this method does not behave the way one could expect according to the documentation, should the method be corrected, or should the documentation be updated ?

Perhaps, this method should not be implemented at all, since it can not be implemented in a universal way ?

Perhaps, there is a need for a IArray entity that is dedicated to primitive java types, and another IArray entity that is dedicated to non primitive types, if the design intent is efficiency ?

arie-benichou · May 1, 2025, 7:56pm

Here is an updated version that allows to understand better the problem regardind IArray.equals.
It strongly suggests that IArray.equals can mislead users by being implemented in IArray and by being public in its actual implementation.

And since the purpose and the intent of IArray has been focused on efficiency while having the benefice of immutability, it strongly suggests that IArray.equals would better not be universal.

So there is a need for a change somewhere and some questions to be asked about this matter.

object MyImmutableArray {

  def apply(mutableArray: Array[Int]) = IArray(mutableArray)

  def equals(left: IArray[Int], right: IArray[Int]): Boolean = {
    java.util.Arrays.equals(left.asInstanceOf[Array[Int]], right.asInstanceOf[Array[Int]])
  }

  def deep2Equals(left: IArray[IArray[Int]], right: IArray[IArray[Int]]): Boolean = {
    if (left.size != right.size) return false
    for (i <- 0 until left.size) {
      if (!MyImmutableArray.equals(left(i), right(i))) return false
    }
    true
  }

  def main(args: Array[String]): Unit = {

    val ia1 = IArray(1, 2, 3, 4)
    assert(ia1.mkString("|") == "1|2|3|4")

    val ia2 = IArray(1, 2, 3, 4, 5)
    assert(ia2.mkString("|") == "1|2|3|4|5")

    val ia3 = IArray.from(Array(1, 2, 3, 4))
    assert(ia3.mkString("|") == "1|2|3|4")

    val ia4 = IArray.from(Array(1, 2, 3, 4, 5))
    assert(ia4.mkString("|") == "1|2|3|4|5")

    assert(MyImmutableArray.equals(ia1, ia2) == false)
    assert(MyImmutableArray.equals(ia1, ia3) == true)
    assert(MyImmutableArray.equals(ia2, ia4) == true)
    assert(MyImmutableArray.equals(ia1, ia4) == false)

    // According to the actual public API, one would be able to write the following :
    // assert(IArray.equals(ia1, ia2) == false) // will not compile
    // assert(IArray.equals(ia1, ia3) == true) // will not compile
    // assert(IArray.equals(ia2, ia4) == true) // will not compile
    // assert(IArray.equals(ia1, ia4) == false) // will not compile

    val nia1 = IArray(
      IArray(1, 2, 3),
      IArray(4, 5, 6),
      IArray(7, 8, 9)
    )

    val nia2 = IArray(
      IArray(1, 2, 3),
      IArray(4, 5, 6),
      IArray(7, 8, 9)
    )

    val nia3 = IArray(
      IArray(1, 2, 0),
      IArray(4, 0, 6),
      IArray(0, 8, 9)
    )

    assert(MyImmutableArray.deep2Equals(nia1, nia2) == true)
    assert(MyImmutableArray.deep2Equals(nia1, nia3) == false)

    // According to the actual public API, one would be able to write the following :
    // assert(IArray.equals(nia1, nia2) == true) // will not compile
    // assert(IArray.equals(nia2, nia3) == false) // will not compile
    // assert(IArray.equals(nia1, nia3) == false) // will not compile

  }

}

Ichoran · May 1, 2025, 9:19pm

No, there isn’t, because IArray is a primitive array if it is used with a primitive type. Because it’s an opaque type, it inherits that behavior from Array.

Now, I understand the complaint that IArray has lots of handy methods* but its equals is pretty lame in comparison. Fair enough, I suppose. Now, you may not know that by far the most common way to implement generic operations which are backed by an array is to have an Array[Object] (aka Array[AnyRef]) behind the scenes, with the interface taking care of casting to the correct type-parameterized object. So, actually, if one is going to have any equals method, the one that’s there is the most useful.

(* It’s partly an illusion. A lot of these have poor performance because they use standard generics instead of specialization and/or inlining; if you were after the methods you may as well just use ArraySeq or somesuch.)

That doesn’t necessarily mean that it couldn’t get others, but there there’s another constraint: compatibility simultaneously with Array and with Scala 2.

Array only has equals for pairs of Array[AnyRef]. (Which, since equality is in covariant position, works on subtypes at least for well behaved types–equality with subclassing is its own can of worms.) IArray matches Array. And Array is fixed (for now) by Scala 2 compatibility.

So there are compelling reasons–maybe not fully adequate, but not for no reason–to have things as they are. The very most helpful method is provided. The others are not. The type system will check to make sure you use the most helpful one only in cases where it’s valid.

And you can add anything else you want to explicitly gain more functionality. That is really part of the core ethos of Scala–it’s incredibly flexible, so it’s a big deal if something is in the standard library that’s bad (because people will use it and be burned), but it’s not a big deal if something isn’t there because almost anything can be created.

So, no, there is not an obvious problem.

If you learn Scala well enough to be able to use it for projects of nontrivial complexity, you will understand (1) the ways in which Arrays are weird (regrettable, but that’s what we inherited from Java), and (2) the method signature. The signature is possibly misleading to someone who doesn’t understand arrays or signatures very well, but the top priority is being correct and making it easy to learn is secondary. There are cases where the second priority is embraced (e.g. scaladoc mostly doesn’t give the full signature of collections methods because the parts that are required for correctness and feature-completeness necessarily look complicated and esoteric). I don’t think this is one of them, though.

Anyway, someday the standard library will be unlocked from compatibility constraints, and it will make sense to think about what additions are sensible. I don’t see any harm in overloading the non-floating-point primitive arrays to use standard equality, if you happen to have that type. It won’t work for opaque types placed into arrays, though! And floating-point types have weird equality. Do you really mean that Array(Double.NaN) != Array(Double.NaN) just because that’s how NaN works? What about Array(0.0 / 0.0) which has a different type of NaN? Java’s Arrays.equals makes certain choices, and one could just copy the API, but there’s a discussion to be had there. (Whole other can of worms about == on fp not defining equivalence classes and how to understand the compatibility of total orderings with partial equivalence classes–we must have spent a hundred hours talking that one through when trying to support default .sorted. Note that Rust punts on that–you can’t, by default, sort a Vec<f64>!)

Happily, though, you can roll your own any way you want. Because it is structurally impossible to have == equality work the desired way on opaque types, and you have to use explicit methods anyway, you have full freedom to pick your favorite behavior.

This doesn’t mean that there should never be improvements, only that they shouldn’t be felt as terribly pressing, because the type signature is as clear as it can be, and although it’s disappointing to be missing functionality that you would like to have, it’s possible (albeit somewhat annoying) to get the functionality you want.