Why can't we have compound value types on Jvm and Js?

RichType · October 5, 2017, 2:21pm

Perhaps the main attraction of Scala-Native is to have compound value types, or structs. Obviously The Jvm and Js run times don’t support them. Now maybe I’m missing something obvious, but I don’t see why they couldn’t exist as a compile time fictions. When used within a method / function, no heap object would need to be created and the run time could just perform operations on the native platform supported components. The Struct could be passed as multiple parameters, its atomic platform components.

The only place there would have to be object creation is for return types. They would have to be returned as Tuples behind the scenes. However particularly in code with a lot of tail loop recursion, function return is only a small percentage of processor operations so you could get maybe up to 95% of the benefits of compound value types even on platforms where they are not natively supported.

I notice that both the System V and the Microsoft x86-64 calling conventions only use one register for return values, using the stack for 2nd and subsequent values. I’ve often wondered if this was a mistake. But maybe the reason for this is that in typical code for more values are passed as parameters then are returned from functions. Hence supporting the proposition that native compound return types are not vital to fairly efficient code.

There would then be the question of dealing with collections of compound value types, but my guess is that this should be doable using byte arrays. Again maybe there is something I’m missing, so I’m happy to be corrected if my thinking is nieve.

LPTK · October 5, 2017, 4:35pm

We can’t use byte arrays on the JVM unless we’re willing to pay a big cost for reconstructing values with the right type on every access. However, I think it would be easy to use type classes to direct the creation of a struct of arrays instead of an array of structs. Something close to what Array already does (with ClassTag).

class FlatArray[T:StructRep](init: => T) { ... }

This would be backed by a normal Array for T that are not structs, and by several arrays for structs (one per field), which would avoid boxing.

Adapting collections to make use of this would probably be a little harder (it would require using the type class everywhere). Currently ArrayBuffer for example doesn’t even bother with Array's ClassTag and uses an internal Array[AnyRef].

In general, I agree that it would be very nice to have value classes that erase to several values when used in parameters, class fields and array elements.

Probably related: Automating ad hoc data representation transformations.

densh · October 5, 2017, 4:39pm

Exploring better syntax for structs and making them more natural is one of the goals for 0.4 release of Scala Native. We’ve found that tuple-like CStructN[T1,..., TN] to be too cumbersome in practice. Some initial design notes are already available at https://github.com/scala-native/scala-native/issues/897

Implementation-wise, passing structs around is generally a complete non-issue on Scala Native. LLVM provides excellent support for this and we’ll just use built-in struct and array value types for this. Passing structs across C boundary requires some additional care to respect C ABI but it’s feasible to support it.

RichType · October 5, 2017, 5:29pm

@densh My big concern is maintaining a common code base across native, Jvm and Js. Will I be able to write higher level code that uses a common interface, while utilising Scala native structs in my Scala native versions? Currently my source code is 99% platform independent and only 1% Jvm / Scala.js specific. Can Scala native maintain a common syntax with the Jvm /Js platforms while still leveraging native efficiency? If we had compound data types this would not only be a great boon for the JVM / Js platforms but make keeping a common syntax much easier.

@LPTK Do we have to reconstruct values from the wrapped byte Array every time? The map, flatMap and foreach functions on the Compound Value collection class could act on the constituent members without ever reconstructing the Compound value.

I use Vec2(x: Double, y: Double) a lot in my code and at one point I created a specialist collection for Vec2’s using a wrapped Array[Double ], but then i started using other classes such as;

case class Dist2(x: Dist, y: Dist)
class Dist(val metres: Double) extends AnyVal

So I scrapped my specialist collection code and started wondering about a more generic solution. Should I create a general wrapped array class for the product of 2 Doubles, for a struct of any number of Doubles or a more general case for all compound value types?

Anyway I presume this must be a common problem that deserves a standardised language / library solution.

LPTK · October 5, 2017, 7:23pm

I don’t know a way to not do so, unless you use an unsafe “off-heap memory” hack. @densh can probably confirm (or contradict).

RichType · October 6, 2017, 11:41am

@LPTK OK whats wrong with this in terms of efficiency? Its pretty rough and ready but hopefully you get the idea.

trait AtomVal[A]
{
   def numBytes: Int   
   def loadVal(byteBuf: java.nio.ByteBuffer, posn: Int): A 
   def storeVal(byteBuf: java.nio.ByteBuffer, posn: Int, value: A): Unit 
}

class ValSeq2[A, B](length: Int)(implicit val ev1: AtomVal[A], ev2: AtomVal[B])
{   
   @inline def elemLen: Int= ev1.numBytes + ev2.numBytes
   @inline def bytesLen = elemLen * length
   val arr: Array[Byte] = new Array[Byte](elemLen * length)
   val buf = java.nio.ByteBuffer.wrap(arr)   
   def storeAt(posn: Int, inp1: A, inp2: B): Unit =
   {
      val offset: Int = posn * elemLen
      ev1.storeVal(buf, offset, inp1)
      ev2.storeVal(buf, offset + ev1.numBytes, inp2)
   }
   def loadCompound(posn: Int): (A, B) =
   {
      val offset: Int = posn * elemLen
      val r1 = ev1.loadVal(buf, offset)
      val r2 = ev2.loadVal(buf, offset + ev1.numBytes)
      (r1, r2)
   }
   def map[R](f: (A, B) => R): Seq[R] =
   {
      var acc: Seq[R] = Seq()
      var i: Int = 0      
      while(i < bytesLen)
      {
         val el1: A = ev1.loadVal(buf, i)
         i += ev1.numBytes
         val el2: B = ev2.loadVal(buf, i)
         i += ev2.numBytes
         acc :+= f(el1, el2)
      }
      acc
   }
}

object ValSeq2 {
   def apply[A, B](len: Int)(implicit ev1: AtomVal[A], ev2: AtomVal[B]): ValSeq2[A, B] = new ValSeq2[A, B](len)   
}

package object pExp
{  
   implicit object IntImplicitAtomVal extends AtomVal[Int]
   {
      val numBytes = 4
      override  def loadVal(byteBuf: java.nio.ByteBuffer, posn: Int): Int = byteBuf.getInt(posn) 
      override def storeVal(byteBuf: java.nio.ByteBuffer, posn: Int, value: Int): Unit = byteBuf.putInt(posn, value)
   }
   
   implicit object DoubleImplicitAtomVal extends AtomVal[Double]
   {
       val numBytes = 8
       override def loadVal(byteBuf: java.nio.ByteBuffer, posn: Int): Double = byteBuf.getDouble(posn)
       override def storeVal(byteBuf: java.nio.ByteBuffer, posn: Int, value: Double): Unit = byteBuf.putDouble(posn, value)       
   }
   
   implicit object BooleanImplicitAtomVal extends AtomVal[Boolean]
   {
      val numBytes = 1
      override  def loadVal(byteBuf: java.nio.ByteBuffer, posn: Int): Boolean = byteBuf.get(posn) match
      {
         case 1 => true
         case 0 => false
         case _ => throw new Exception("Wrong value in Boolean Byte")
      }
      override def storeVal(byteBuf: java.nio.ByteBuffer, posn: Int, value: Boolean): Unit =
         byteBuf.put(posn, if(value) 1.toByte else 0.toByte)
   }   
}

object ExpApp extends App
{
   val c = ValSeq2[Int, Double](4)
   c.storeAt(0, -457, 2.12)
   c.storeAt(3, -5, 800.8)
   println(c.loadCompound(0))
   println(c.loadCompound(3))
   val c2 = ValSeq2[Int, Boolean](3)
   c2.storeAt(0, 100, true)
   c2.storeAt(1, 789, true)
   c2.storeAt(2, 5, false)   
   println(c2.loadCompound(1))
   println(c2.loadCompound(2))
   println(c2.map((i, b) => if (b) i * 2 else i))
}

tarsa · October 6, 2017, 8:20pm

It deserves a JVM-level solution which is already in the works. Look at project Valhalla: Project Valhalla Project Valhalla is not only about compound value types but also about specializing generics to avoid boxing as much as it is feasible while retaining full functionality (ok, except identity-related features as lack of identity is the foundation of value types).

Using byte arrays on JVM and JS for representing value classes would prohibit them from having reference type members.

Value types as proposed in Project Valhalla can contain reference types, primitve types as well as other (nested) value types - so it seems JVM will be able to fully support value types of any form.

OTOH it doesn’t seem that optimizations for compound value types in JS are feasible. While JS has DataView type that provides various methods for viewing the buffer as double, long, int, etc it doesn’t expose any method to save or load references to/ from byte buffer.

Remember also that C of course allows to store references (or pointers) in structs. Therefore, if you want a way for encoding compound value types that would work (ie would eg allow flat arrays of value types) on all platforms (native, JVM and JS) then it wouldn’t be compatible with C structs already supported by Scala-Native.

LPTK · October 8, 2017, 6:18pm

Your loadCompound method reconstructs a tuple of values on every access. At this point, why not just store the tuple in the first place?

Also, collections will want to abstract over the structs that they store, without resorting to ad-hoc ValSeqN alternatives. As soon as you make collections generic, you can’t even use the appropriate FunctionN function in methods like foreach. You’ll have to use Function1 everywhere, so you’ll need to box here too.

RichType · October 8, 2017, 9:46pm

@tarsa Ok here’s a new platform independent file:

sealed trait AtomVal[A] { def numBytes: Int }

object AtomVal
{   
   implicit case object IntAtom extends AtomVal[Int] { def numBytes: Int = 4 }
   implicit case object DoubleAtom extends AtomVal[Double] { def numBytes: Int = 8 }
}

abstract class Struct2[A, B, R](implicit val ev1: AtomVal[A], val ev2:AtomVal[B])
{   
   def elemLen: Int = ev1.numBytes + ev2.numBytes
   val apply: (A, B) => R
}

class Struct3[A, B, C, R](implicit val ev1: AtomVal[A], val ev2:AtomVal[B], val ev3: AtomVal[C])
{   
   def elemLen: Int = ev1.numBytes + ev2.numBytes + ev3.numBytes
}

Here’s a scala.js implementation, but one that can have exactly the same interface as a Jvm implementation:

class ValSeq2[A, B, R](length: Int)(implicit ev: Struct2[A, B, R])
{   
   @ inline def elemLen = ev.elemLen   
   def atom1Len = ev.ev1.numBytes
   def bytesLen = elemLen * length
   val buf = new typedarray.ArrayBuffer(bytesLen)
   val vw = new typedarray.DataView(buf)
   def storeElem(ind: Int, inp1: A, inp2: B): Unit = { storeAtom1(ind, inp1); storeAtom2(ind, inp2) }
   def getElem(ind: Int): R = ev.apply(getAtom1(ind), getAtom2(ind))
   
   def sMap[C, D, R2](f: (A, B) => (C, D))(implicit evR2: Struct2[C, D, R2]): ValSeq2[C, D, R2] =
   {
      val vs2 = new ValSeq2[C, D, R2](length)
      (0 until length).foreach(i =>
         {
            val a = getAtom1(i)
            val b = getAtom2(i)
            val (c, d) = f(a, b)
            vs2.storeAtom1(i, c)
            vs2.storeAtom2(i, d)
         })
      vs2
   }
   
   def getTuple(ind: Int): (A, B) = (getAtom1(ind), getAtom2(ind))
   
   def storeAtom1(ind: Int, inp: A): Unit =
   {
      val byteInd = ind * elemLen
      ev.ev1.asInstanceOf[AnyRef] match
      {
         case AtomVal.IntAtom => vw.setInt32(byteInd, inp.asInstanceOf[Int])
         case AtomVal.DoubleAtom => vw.setFloat64(byteInd, inp.asInstanceOf[Double])
      }
   }
   def getAtom1(ind: Int): A =
   {
      val byteInd = ind * elemLen
      ev.ev1.asInstanceOf[AnyRef] match
      {
         case AtomVal.IntAtom => vw.getInt32(byteInd).asInstanceOf[A]
         case AtomVal.DoubleAtom => vw.getFloat64(byteInd).asInstanceOf[A]
      }
   }
   def storeAtom2(ind: Int, inp: B): Unit =
   {
      val byteInd = ind * elemLen + atom1Len
      ev.ev2.asInstanceOf[AnyRef] match
      {
         case AtomVal.IntAtom => vw.setInt32(byteInd, inp.asInstanceOf[Int])
         case AtomVal.DoubleAtom =>vw.setFloat64(byteInd, inp.asInstanceOf[Double])
      }
   }
   def getAtom2(ind: Int): B =
   {
      val byteInd = ind * elemLen + atom1Len
      ev.ev2.asInstanceOf[AnyRef] match
      {
         case AtomVal.IntAtom => vw.getInt32(byteInd).asInstanceOf[B]
         case AtomVal.DoubleAtom => vw.getFloat64(byteInd).asInstanceOf[B]
      }
   }
}

I’d like to get rid of the And B type parameters on the ValSeq2. There must be away to derive A and B from the implicit “ev: Struct2[_, _, R]” parameter.

Here’s some example classes:

case class ExID(v1 :Int, v2: Double)
object ExID
{
   implicit object Ints2Implicit extends Struct2[Int, Double, ExID]
   { override val apply = ExID.apply }
}

case class ExDD(v1 :Double, v2: Double)
object ExDD
{
   implicit object Ints2Implicit extends Struct2[Double, Double, ExDD]
   { override val apply = ExDD.apply }
}

And here’s a simple example using the sMap method:

object EJsApp
{
   def main(args: Array[String]): Unit =
   {
      val v = new ValSeq2[Int, Double, ExID](3)
      v.storeElem(0, 27, 56.001)
      v.storeElem(1, -987, -1987.001)
      v.storeElem(2, 2, 2.02)
      (0 until 3).foreach(i => println(v.getElem(i)))
      val v2 = v.sMap[Double, Double, ExDD]((a, b) => (a + 3.0, b -56))
      (0 until 3).foreach(i => println(v2.getElem(i)))
   }
}

@LPTK I’ve just studied your response and I think your suggestion of using multiple arrays is a superior solution. Either way the problem of collections of values structs seems eminently solvable. And I would still assert my original contention that we could have 80-90% of the benefits of structs with the current Jvm and Js platforms.

I do think that having value classes without object references would be very useful. Such data is a lot easier to serialise and persist, although your multiple arrays would allow refs anyway.

tarsa · October 10, 2017, 10:18pm

@RichType
Given that your proposition is incompatible with both Scala’s AnyVal and Scala-Native’s CStruct you basically need a completely separate construct. What’s stopping you from implementing it as eg a macro? Even if it would be in a simplified form you still would have a concrete mechanism to benchmark.

I think that the idea of unrolling a compound parameter to multiple method parameters severely limits the polymorphism opportunities. A method with different number of arguments can’t override method from superclass.

Overall, (I know it may sound rude, but) I think the proposed idea is too cumbersome to use and too limited to become common. Small community and (relatively) high complexity would mean lots of bugs and high maintenance cost for little benefit. OTOH maybe I’m somewhat biased because I’m counting on Project Valhalla which is a complete and powerful solution.

RichType · December 31, 2017, 12:19pm

@tarsa quoting the wiki:

“Minimal Value Types
Early not-stable prototype for Value Types with a subset of functionality for expert experimentation”

strikes me as somewhat different to “a complete and powerful solution.”

We don’t know what Valhalla will deliver, when or if ever. However the JDK has shown a consistent pattern both under Sun and Oracle of delivering disappointing substandard solutions years late.

I also feel that pure compound value types (no refs) are a very valuable tool as they don’t need to be searched by the garbage collector, they are easier to make thread safe and to persist. Collections of pure Compound value types can also be compressed and serialised more safely easily.