Json and a first-class extensible records syntax

drdozer · June 24, 2019, 4:18pm

There are a ton of scala json libraries. But for all of them, there’s a significant amount of “getting in the way”. Arkane imports, different syntaxes, and so on. I’ve been doing some typescript hacking, and not that surprisingly, json just works in that world. The DOTY type system is able to represent just about everything that exists in typescript. So is there a good reason why we can’t have a light-weight, optionally typed extensible record system that “just works” and gives a zero boilerplate way to interface with things like json? I’m not suggesting we repeat the experiment of embedding json into the language like we did xml, but surely there’s some way we can come up with a generic data value expression syntax that can then be ‘read into’ case classes, json, xml, and the rest, by virtue of having the right implicit in scope?

The lightest weight solution I can think of is to have a ‘named tuples’ type that gives something that shadows the tuples-as-hlist structures but with a constant field for the property name. If that was bolted onto some macro/inline magic, then the data expressions could be rendered directly into the appropriate calls, without going via an intermediate representation.

val dataExpr = (
  name = "Matthew",
  age = 43)
// NamedTuple["name".type, String]::NamedTuple["age".type, Int]::NamedTupleNil

case class NameAge(name: String, age: 43)
val exprAsCaseClass: NameAge = (
  name = "Matthew",
  age = 43)
// CstrArg[String], CstrArg[Int] => NamedTuple[....] => NameAge

val exprAsJson: Json = (
  name = "Matthew",
  age = 43)
// JsonProp[String], JsonProp[Int] => NamedTuple[...] => Json

The hope would be that wherever possible, code handling named tuple data would be seeing someCallback(name: String, value: V) or anotherCallback[Name, V](value: V), and that the expressions themselves would only exist as nested function calls, and ideally not even that if the callbacks were macros that generated tightly-coupled code.

For data without a compile-time-known schema, we’d need the property names to be first-class rather than hidden in the type system.

I know various things like this already exist, particularly in shapeless, but they all come with a lot of ceremony and not insignificant run-time cost. Particularly in setting up and tearing down the intermediate representation, but also in the syntax and the game of guess the imports.

I have the use for semi-structured data like this in many places, particularly anything that touches databases or statistics.

lavrov · June 24, 2019, 7:07pm

Records were proposed by @odersky a while ago https://github.com/lampepfl/dotty-feature-requests/issues/8. Is it still considered for Scala 3?

odersky · June 25, 2019, 8:40am

https://dotty.epfl.ch/docs/reference/changed-features/structural-types.html

contains a discussion how records can be implemented using Dotty’s improved structural types. I believe it would be possible to adapt this for JSON. If a standard way of doing so emerges it would make a good candidate for inclusion in the standard library.

AMatveev · June 25, 2019, 1:18pm

If I understand correctly structural-types use access method by key.
The documentation says that it is for database access.

It is very important to note that most jdbc drivers use access method by index.
I have made a test which illustrates that processing large data by key significantly slower:

by key
  used memory:408000000
  total time:2.458646836
by index
  used memory:17600000
  total time:0.031717541
ratio
  memory:23.18181818181818181818181818181818
  time:77.51694357390442090072493324750491

So I am sure good database support should provide

access by index
It is significantly improve scalability
match type transformation
It is absolutely necessary for good library to be able to transform “column named tuple” to “value named tuple” (Access by name is our the most desirable feature for slick)

I think it can be done by Flyweight pattern

compiler should provide values:Product and meta in the case of constant
compileer should provide meta in the case of collections and factories
library should provide creation of runtime row
library should provide mapping between different row types

trait Row extend Product{
   def getMeta():RowMeta
}
object QueryBuilder{
    def executeByQuery[T < QueryColumns, E: QueryToRow[T] ](qc: T)(implicit meta:Meta[E]):List[E]
}
object Main{
  def main():Unit = {
    QueryBuilder.executeByQuery(
       for (c <- coffees) yield (image = c.image)
    ).foreach{r => 
      println(r.image) 
   }
  }
}

Test code

object RowPerformanceTest {
  val arraySize = 100000
  var keyArray: Array[java.util.HashMap[String,BigDecimal]] = _
  var indexArray: Array[Array[BigDecimal]] = _
  var startTime:Long = _
  var startUsedMemory:Long = _
  var endTime: Long = _
  var endUsedMemory: Long = _
  def begin(): Unit ={
    keyArray = Array.ofDim[java.util.HashMap[String,BigDecimal]](arraySize)
    indexArray = Array.ofDim[Array[BigDecimal]](arraySize)
    Runtime.getRuntime.gc()
    startTime = System.nanoTime()
    startUsedMemory = Runtime.getRuntime.totalMemory() - Runtime.getRuntime.freeMemory()
  }
  def end(): Unit = {
    endTime = System.nanoTime()
    Runtime.getRuntime.gc()
    endUsedMemory = Runtime.getRuntime.totalMemory() - Runtime.getRuntime.freeMemory()
  }
  def main(args: Array[String]): Unit = {
    def testByKey(): Unit ={
      var i = 0
      while(i<keyArray.length){
        val map = new util.HashMap[String,BigDecimal]()
        var j = 0
        while(j<40){
          map.put(s"column_$j",BigDecimal(j))
          j=j+1
        }
        keyArray(i)= map
        i=i+1
      }
    }
    def testByIndex(): Unit ={
      var i = 0
      while(i<keyArray.length){
        val array = Array.ofDim[BigDecimal](40)
        var j = 0
        while(j<40){
          array(j)=BigDecimal(j)
          j=j+1
        }
        indexArray(i)= array
        i=i+1
      }
    }
    begin()
    testByKey()
    end()
    begin()
    testByKey()
    end()
    println("by key")
    val byKeyUsedMemory = BigDecimal(endUsedMemory-startUsedMemory)
    val byKeyTotolTime =  BigDecimal(endTime-startTime)/1000000000
    println(s"  used memory:$byKeyUsedMemory")
    println(s"  total time:$byKeyTotolTime")
    begin()
    testByIndex()
    end()
    begin()
    testByIndex()
    end()
    println("by index")
    val byIndexUsedMemory = BigDecimal(endUsedMemory-startUsedMemory)
    val byIndexTotolTime =  BigDecimal(endTime-startTime)/1000000000
    println(s"  used memory:$byIndexUsedMemory")
    println(s"  total time:$byIndexTotolTime")
    println("ratio")
    println(s"  memory:${byKeyUsedMemory/byIndexUsedMemory}")
    println(s"  time:${byKeyTotolTime/byIndexTotolTime}")
  }
}