Is ArraySeq fit for purpose?

RichType · October 5, 2019, 10:39am

scala> collection.immutable.ArraySeq(1, 2, 3)
res0: scala.collection.immutable.ArraySeq[Int] = ArraySeq(1, 2, 3)

scala> res0.unsafeArray
res1: Array[_] = Array(1, 2, 3)

scala> res1.isInstanceOf[Array[Int]]
res2: Boolean = true

scala> res0.map(_ * 2)
res3: scala.collection.immutable.ArraySeq[Int] = ArraySeq(2, 4, 6)

scala> res3.unsafeArray
res4: Array[_] = Array(2, 4, 6)

scala> res4.isInstanceOf[Array[Int]]
res5: Boolean = false

chaotic3quilibrium · October 5, 2019, 2:04pm

Can you be more specific about the undesired effect(s)?

SethTisue · October 5, 2019, 2:09pm

That looks like buggy behavior to me. Please open a ticket at https://github.com/scala/bug

It seems buggy in two distinct ways:

res1 has the wrong compile-time type (should be Array[Int])
res4 has the wrong runtime type (it’s an Array[Object], as .getClass reveals, but should be an Array[Int])

but I suspect there’s a single underlying cause, so I’d suggest a single ticket.

RichType · October 5, 2019, 2:21pm

The map method is boxing primitive value types, in this case Int.

I discovered this when I tried to make a Functor Instance for Arrays. Of course I couldn’t because the map signature for Arrays takes an implicit ClassTag parameter. Then I was like, hang-on how is ArraySeq doing a map without a ClassTag, as I’d already created a Functor instance for ArraySeq, without the compiler complaining. So then I went to the REPL to check the boxing.

RichType · October 5, 2019, 2:47pm

#11761

clhodapp · October 5, 2019, 9:16pm

The Scaladoc for s.c.i.ArraySeq.unsafeArray lays out the fact that this is all within the parameters of expected behavior:

The wrapped mutable Array that backs this ArraySeq . Any changes to this array will break the expected immutability. Its element type does not have to be equal to the element type of this ArraySeq. A primitive ArraySeq can be backed by an array of boxed values and a reference ArraySeq can be backed by an array of a supertype or subtype of the element type.

I believe that throwing away the type of the underlying Array is necessary for s.c.i.ArraySeq to compile as covariant when Array isn’t.

It might be nice to find a way to get map to avoid boxing when mapping to a Java primitive type, but that seems like a possible optimization, not a necessity for correctness. I’m not so sure, though. I think you’re going to get boxing when returning a primitive through a method with a generic return type like Function1’s apply anyway, so unboxing might trade space for time, rather than being a pure win.

RichType · October 5, 2019, 9:42pm

Won’t @uncheckVariance suffice?

clhodapp · October 5, 2019, 10:06pm

That’ll actually lead to runtime errors:

@ import annotation.unchecked.uncheckedVariance 
import annotation.unchecked.uncheckedVariance

@ class C1[+A](arrVal: Array[A]) { val arr: Array[A @uncheckedVariance] = arrVal } 
defined class C1

@ new C1(Array(1,2,3)) 
java.lang.ClassCastException: [I cannot be cast to [Ljava.lang.Object;
  ammonite.$sess.cmd1$C1.<init>(cmd1.sc:1)
  ammonite.$sess.cmd2$.<clinit>(cmd2.sc:1)


@ class C2[+A](arrVal: Array[A]) { val arr: Array[_] = arrVal } 
defined class C2

@ new C2(Array(1,2,3)) 
res4: C2[Int] = ammonite.$sess.cmd3$C2@7b3085a3

The reason is that, under the covers, the underlying field has the bytecode class of [Ljava/lang/Object (roughly Array[java.lang.Object]) in the first case but Ljava/lang/Object (roughly java.lang.Object) in the second case.

RichType · October 5, 2019, 10:52pm

package ostrat
import reflect.ClassTag, annotation.unchecked.uncheckedVariance

/** Using Att as temporary name, can be switched to Arr later to replace
 *   type alias for ArraySeq. */
class Att[+A](val array: Array[A] @uncheckedVariance) extends AnyVal
{ @inline def length = array.length
  def apply(index: Int): A = array(index)
  @inline def foreach[U](f: A => U): Unit = array.foreach(f)

  def map[B](f: A => B)(implicit ct: ClassTag[B]): Att[B] =
  { var count = 0
    val newArray: Array[B] = new Array[B](length)
    while (count < length) { newArray(count) = f(array(count)); count += 1}
    new Att(newArray)
  }
}

object Att
{ def apply[A](inp: A*)(implicit ct: ClassTag[A]): Att[A] =
 new Att[A](inp.toArray)
}

And the following tests all pass:

package ostrat
import utest._

object ArrTest extends TestSuite
{
  val tests = Tests
  {
    val at1: Att[Int] = Att(1, 2, 3)
    val at2 = at1.map(_ * 2)
    val at3 = at2.map(_ > 3)
    'test1 -
    {
      assert(at1.length == 3)
      assert(at2(2) == 6)
      assert(at2.array.isInstanceOf[Array[Int]])
      assert(at3(1) == true)
      assert(at3.array.isInstanceOf[Array[Boolean]])
    }
  }
}

clhodapp · October 5, 2019, 11:53pm

It’s still possible to trigger a runtime failure under that definition:

Welcome to the Ammonite Repl 1.7.1
(Scala 2.13.0 Java 1.8.0_121)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi
@ {
  import reflect.ClassTag
  
  /** Using Att as temporary name, can be switched to Arr later to replace type alias for ArraySeq. */
  class Att[+A](val array: Array[A] @scala.annotation.unchecked.uncheckedVariance) extends AnyVal
  { @inline def length = array.length
    def apply(index: Int): A = array(index)
    @inline def foreach[U](f: A => U): Unit = array.foreach(f)
  
    def map[B](f: A => B)(implicit ct: ClassTag[B]): Att[B] =
    { var count = 0
      val newArray: Array[B] = new Array[B](length)
      while (count < length) { newArray(count) = f(array(count)); count += 1}
      new Att(newArray)
    }
  }
  
  object Att
  { def apply[A](inp: A*)(implicit ct: ClassTag[A]): Att[A] = new Att[A](inp.toArray)
  }
  } 
import reflect.ClassTag

/** Using Att as temporary name, can be switched to Arr later to replace type alias for ArraySeq. */

defined class Att
defined object Att

@ def attArr(att: Att[Any]) = att.array 
defined function attArr

@ attArr(Att(1,2,3)) 
java.lang.ClassCastException: [I cannot be cast to [Ljava.lang.Object;
  ammonite.$sess.cmd1$.attArr(cmd1.sc:1)
  ammonite.$sess.cmd2$.<clinit>(cmd2.sc:1)


@

edit: Note that the previous snippet requires scrolling to see the whole thing. At least on my system, it’s really hard to tell that I didn’t just re-post your definition back at you.

RichType · October 6, 2019, 11:07am

The particular examples, of explicit use of Any wouldn’t bother me too much, however when I came to implement the ++ method(s), I ran into problems. So I think its time to abandon a single Array wrapper class with a type parameter extending Any and experiment with replacing ArraySeq with an AnyRef Array wrapper and one wrapper class for each of the atomic value types.

The correct class would be found by an implicit. If it works then I don’t see why the evidence objects couldn’t be use to specialise other types like List and Option. in fact I did start to try and create specialised Option classes some time back, but abandoned it because I so rarely use Option in any performant critical situations.

szeiger · October 7, 2019, 12:43pm

The erased array type is not only needed because of variance, it’s mostly about primitive boxing. You can have an ArraySeq[Int] backed by an Array[Integer] and vice versa.

ArraySeq already has subclasses for all the different primitive and reference array types. You could try to add overloads of map for all specialized Function1 return types and then implement all of them individually in all primitive ArraySeq subclasses.

We already experimented with runtime checks and they did not pay off. Yes, you can take a ClassTag for the return type and create a specialized array but that means you’re doing an extra unboxing step because you’re still calling a non-specialized Function1. If you have to box anyway it’s usually better to keep it boxed. In order to avoid boxing you have to implement a separate code path for every specialized Function1 type. Doing it at runtime is (so you could specialize some important cases without all the duplication) is too expensive. And that’s only for map. It gets even uglier in flatMap (no way to specialize on argument type; return type could be done based on Splitter).

RichType · October 7, 2019, 11:40pm

So, so far, so good as far as I can make out. I’ve only implemented append and map here.

package ostrat
import annotation.unchecked.uncheckedVariance, reflect.ClassTag

trait ArrBuild[+T] extends Function1[Int, ArrN[T]]

/** If successful name will be changed to Arr. */
trait ArrN[+A] extends Any
{
  def array: Array[A] @uncheckedVariance
  @inline def length: Int = array.length
  @inline def apply(index: Int): A = array(index)

  def map[B](f: A => B)(implicit ev: ArrBuild[B]): ArrN[B] =
  {
    val res = ev(length)
    var count = 0
    while (count < length)
    { res.array(count) = f(apply(count))
      count += 1
    }
    res
  }

  protected[this] def internalAppend[AA >: A, B <: AA](opArray: Array[B], newArray: Array[AA]): Unit =
  {
    val opLength = opArray.length
    var count = 0
    while (count < length)
    { newArray(count) = apply(count)
      count += 1
    }
    var count2 = 0
    while (count2 < opLength)
    { newArray(count) = opArray(count2)
      count += 1; count2 += 1
    }
  }
}

And I’ve only implemented final classes for T <: AnyRef, Ints and Doubles:

final class ArrR[+A <: AnyRef](val array: Array[A] @uncheckedVariance) extends AnyVal with ArrN[A]
{
  def ++[AA >: A <: AnyRef](operand: ArrR[AA] @uncheckedVariance)(implicit ct: ClassTag[AA]): ArrR[AA] =
  {
    val newArray: Array[AA] = new Array[AA](length + operand.length)
    internalAppend(operand.array, newArray)
    new ArrR(newArray)
  }
}

object ArrR
{ def apply[A <: AnyRef](inp: A*)(implicit ct: ClassTag[A]): ArrR[A] =
    new ArrR[A](inp.toArray)
}

trait ArrValue[A] extends Any with ArrN[A]
{ def newArr(length: Int): ArrValue[A]

  def ++(operand: ArrValue[A] ): ArrValue[A] =
  {
    val res: ArrValue[A] = newArr(length + operand.length)
    internalAppend(operand.array, res.array)
    res
  }
}

final class ArrI(val array: Array[Int]) extends AnyVal with ArrValue[Int]
{ override def newArr(length: Int): ArrI =
    new ArrI(new Array[Int](length))
}

object ArrI
{ def apply(inp: Int*): ArrI = new ArrI(inp.toArray)
}

final class ArrD(val array: Array[Double]) extends AnyVal with ArrValue[Double]
{ override def newArr(length: Int): ArrD =
    new ArrD(new Array[Double](length))
}

The implicit build instances are in my package object except for the AnyRef instance. The package object inherits from a LowPriority trait that contains the AnyRef implicit:

implicit val arrIBuildImplicit: ArrBuild[Int] =
 len => new ArrI(new Array[Int](len))
implicit val arrDBuildImplicit: ArrBuild[Double] =
 len => new ArrD(new Array[Double](len))
trait LowPriority
{ implicit def arrRBuildImplicit[T <: AnyRef](implicit ct: ClassTag[T]): ArrBuild[T] =
 len => new ArrR[T](new Array[T](len))
}

To implement flatMap I will need ArrayBuffer builder implicit instances, but I think I can implement all methods with those 2 builder functions. All the tests so far pass.

package ostrat
import utest._

object ArrTest extends TestSuite
{
  trait MyT{ def i: Int}
  case class My1(i: Int) extends MyT
  case class My2(i: Int) extends MyT
  val tests = Tests
  {
    val ar1: ArrR[My1] = ArrR(My1(1), My1(2), My1(3))
    val ar2: ArrR[My2] = ArrR(My2(4), My2(5))
    val ar12: ArrR[MyT] = ar1 ++ ar2
    val ai1 = ArrI(1, 2, 3, 4)
    val ai2 = ArrI(5, 6)
    val ai12 = ai1 ++ ai2
    val am1 = ar1.map(_.i * 10)
    val am2 = am1.map(i => My2(i + 5))
    val am3 = am2.map(_.i * 0.5)
    'test1 -
    {
      ar1.length ==> 3
      ar12.length ==> 5
      ar12(4).i ==> 5
      ai12.length ==> 6
      ai12(0) ==> 1
      ai12(5) ==> 6
      am1(1) ==> 20
      assert(am1.array.isInstanceOf[Array[Int]])
      am2(2).i ==> 35
      assert(am3.array.isInstanceOf[Array[Double]])
      am3(0) ==> 7.5
    }
  }
}

Note the backing Array types, no unnecessary boxing.

Jasper-M · October 8, 2019, 9:29am

I didn’t look in detail, but are you sure there’s no boxing and/or unboxing anywhere? Value classes box when calling inherited members. Specialized Function1 classes don’t work optimally in unspecialized generic contexts.

szeiger · October 8, 2019, 10:14am

That won’t work. You’re still using a generic Function1. We already have a ClassTagSeqFactory available, all you really need to do in ArraySeq is add an overload for map that makes use of it. As I mentioned before, the reason why we’re not doing it is performance. With your generic map implementation every element of the source array gets boxed and every computed result gets unboxed. You need a separate map for each combination of specialized source (Int, Long, Float, Double) and target (Int, Long, Float, Double, Boolean, Unit) type. There is no way to separate them (so you could implement it with n+m code paths instead of n*m) due to the way specialization works.

szeiger · October 8, 2019, 10:16am

OTOH, we should consider using primitive types in appendedAll and prependedAll (and by extension concat). If both sides are ArraySeqs with the same primitive type it would be better to keep this type.

lrytz · October 8, 2019, 10:25am

I successfully used https://github.com/google/allocation-instrumenter at some point to check if a piece of code allocates any boxed primitives.

Edit: https://gist.github.com/lrytz/900dcc6aba5479ae615098bb0d74b1ee

RichType · October 8, 2019, 2:18pm

Sorry I didn’t quite follow that. Are you saying I need n * m or n + m? It strikes me that just specialising Double would be a big win. So if I’ve understood you correctly that would require 4 code paths for each method. Int would also be extremely useful for some of my code that uses integer coordinates. But that would put it up to 9 code paths per method? Long and Float would be nice, but I don’t see any need to specialise Boolean, let alone Short or Byte.

And does the same specialisation rules apply for Scala.js?

szeiger · October 8, 2019, 2:36pm

You can’t specialize on the m parameter types and n return types separately. There is no abstraction for “specialized Function1 that takes an Int” or “specialized Function1 that returns a Double”, they only come in pairs.

You need an overload for each specialized return type:

abstract class ArraySeq[A] {
  def map[B](f: A => B) = ...
  def map(f: A => Int): ArraySeq.OfInt = ...
  def map(f: A => Double): ArraySeq.OfInt = ...
  ...
}

You can either leave them abstract or give them base implementations so you don’t have to implement the combinations for which you don’t want specialization. But you can’t delegate to the generic implementation and return a boxed ArraySeq because of erasure. Maybe they should really be abstract because this is exactly the kind of semi-specialized version that we don’t want for performance reasons.

Then you need to implement/override them in the specialized subclasses:

final class OfInt extends ArraySeq[Int] {
  override def map(f: Int => Int): OfInt = ...
}

These implementations will make use of the specialized function types.

With 10 manually specialized ArraySeq subclasses and 6 specialized function return types you’re looking at 60 separate implementations per method. And you will still get a performance degradation from the ones that don’t have a corresponding specialized function type but now have to return a specialized ArraySeq subclass due to erasure (and this is not a corner cases; it affects every transformation from a non-primitive type to a primitive type!)