Json and a first-class extensible records syntax

#1

There are a ton of scala json libraries. But for all of them, there’s a significant amount of “getting in the way”. Arkane imports, different syntaxes, and so on. I’ve been doing some typescript hacking, and not that surprisingly, json just works in that world. The DOTY type system is able to represent just about everything that exists in typescript. So is there a good reason why we can’t have a light-weight, optionally typed extensible record system that “just works” and gives a zero boilerplate way to interface with things like json? I’m not suggesting we repeat the experiment of embedding json into the language like we did xml, but surely there’s some way we can come up with a generic data value expression syntax that can then be ‘read into’ case classes, json, xml, and the rest, by virtue of having the right implicit in scope?

The lightest weight solution I can think of is to have a ‘named tuples’ type that gives something that shadows the tuples-as-hlist structures but with a constant field for the property name. If that was bolted onto some macro/inline magic, then the data expressions could be rendered directly into the appropriate calls, without going via an intermediate representation.

val dataExpr = (
  name = "Matthew",
  age = 43)
// NamedTuple["name".type, String]::NamedTuple["age".type, Int]::NamedTupleNil

case class NameAge(name: String, age: 43)
val exprAsCaseClass: NameAge = (
  name = "Matthew",
  age = 43)
// CstrArg[String], CstrArg[Int] => NamedTuple[....] => NameAge

val exprAsJson: Json = (
  name = "Matthew",
  age = 43)
// JsonProp[String], JsonProp[Int] => NamedTuple[...] => Json

The hope would be that wherever possible, code handling named tuple data would be seeing someCallback(name: String, value: V) or anotherCallback[Name, V](value: V), and that the expressions themselves would only exist as nested function calls, and ideally not even that if the callbacks were macros that generated tightly-coupled code.

For data without a compile-time-known schema, we’d need the property names to be first-class rather than hidden in the type system.

I know various things like this already exist, particularly in shapeless, but they all come with a lot of ceremony and not insignificant run-time cost. Particularly in setting up and tearing down the intermediate representation, but also in the syntax and the game of guess the imports.

I have the use for semi-structured data like this in many places, particularly anything that touches databases or statistics.

5 Likes
#2

Records were proposed by @odersky a while ago https://github.com/lampepfl/dotty-feature-requests/issues/8. Is it still considered for Scala 3?

1 Like
#3

https://dotty.epfl.ch/docs/reference/changed-features/structural-types.html

contains a discussion how records can be implemented using Dotty’s improved structural types. I believe it would be possible to adapt this for JSON. If a standard way of doing so emerges it would make a good candidate for inclusion in the standard library.

1 Like
#4

If I understand correctly structural-types use access method by key.
The documentation says that it is for database access.

It is very important to note that most jdbc drivers use access method by index.
I have made a test which illustrates that processing large data by key significantly slower:

by key
  used memory:408000000
  total time:2.458646836
by index
  used memory:17600000
  total time:0.031717541
ratio
  memory:23.18181818181818181818181818181818
  time:77.51694357390442090072493324750491

So I am sure good database support should provide

  • access by index
    It is significantly improve scalability
  • match type transformation
    It is absolutely necessary for good library to be able to transform “column named tuple” to “value named tuple” (Access by name is our the most desirable feature for slick)

I think it can be done by Flyweight pattern

  • compiler should provide values:Product and meta in the case of constant
  • compileer should provide meta in the case of collections and factories
  • library should provide creation of runtime row
  • library should provide mapping between different row types
trait Row extend Product{
   def getMeta():RowMeta
}
object QueryBuilder{
    def executeByQuery[T < QueryColumns, E: QueryToRow[T] ](qc: T)(implicit meta:Meta[E]):List[E]
}
object Main{
  def main():Unit = {
    QueryBuilder.executeByQuery(
       for (c <- coffees) yield (image = c.image)
    ).foreach{r => 
      println(r.image) 
   }
  }
}

Test code

object RowPerformanceTest {
  val arraySize = 100000
  var keyArray: Array[java.util.HashMap[String,BigDecimal]] = _
  var indexArray: Array[Array[BigDecimal]] = _
  var startTime:Long = _
  var startUsedMemory:Long = _
  var endTime: Long = _
  var endUsedMemory: Long = _
  def begin(): Unit ={
    keyArray = Array.ofDim[java.util.HashMap[String,BigDecimal]](arraySize)
    indexArray = Array.ofDim[Array[BigDecimal]](arraySize)
    Runtime.getRuntime.gc()
    startTime = System.nanoTime()
    startUsedMemory = Runtime.getRuntime.totalMemory() - Runtime.getRuntime.freeMemory()
  }
  def end(): Unit = {
    endTime = System.nanoTime()
    Runtime.getRuntime.gc()
    endUsedMemory = Runtime.getRuntime.totalMemory() - Runtime.getRuntime.freeMemory()
  }
  def main(args: Array[String]): Unit = {
    def testByKey(): Unit ={
      var i = 0
      while(i<keyArray.length){
        val map = new util.HashMap[String,BigDecimal]()
        var j = 0
        while(j<40){
          map.put(s"column_$j",BigDecimal(j))
          j=j+1
        }
        keyArray(i)= map
        i=i+1
      }
    }
    def testByIndex(): Unit ={
      var i = 0
      while(i<keyArray.length){
        val array = Array.ofDim[BigDecimal](40)
        var j = 0
        while(j<40){
          array(j)=BigDecimal(j)
          j=j+1
        }
        indexArray(i)= array
        i=i+1
      }
    }
    begin()
    testByKey()
    end()
    begin()
    testByKey()
    end()
    println("by key")
    val byKeyUsedMemory = BigDecimal(endUsedMemory-startUsedMemory)
    val byKeyTotolTime =  BigDecimal(endTime-startTime)/1000000000
    println(s"  used memory:$byKeyUsedMemory")
    println(s"  total time:$byKeyTotolTime")
    begin()
    testByIndex()
    end()
    begin()
    testByIndex()
    end()
    println("by index")
    val byIndexUsedMemory = BigDecimal(endUsedMemory-startUsedMemory)
    val byIndexTotolTime =  BigDecimal(endTime-startTime)/1000000000
    println(s"  used memory:$byIndexUsedMemory")
    println(s"  total time:$byIndexTotolTime")
    println("ratio")
    println(s"  memory:${byKeyUsedMemory/byIndexUsedMemory}")
    println(s"  time:${byKeyTotolTime/byIndexTotolTime}")
  }
}

see also:

1 Like
#5

Thanks. Although json is an obvious use case, my two concerns are a concise syntax which supports migration between explicitly typed and untyped expressions, and a mechanic to allow zero-overhead compile-time rewriting of a record into underlying API calls. Ideally the structural type should always be only a fiction of the source level representation, and should always actually be compiled down to a concrete representation, be it JSON or XML, or maps of Map[String, Any], or js.object or whatever it is.

Compare the ceremony of:

val person = Record("name" -> "Emma", "age" -> 42).asInstanceOf[Person]

with

val person = (name = "Emma", age = 42)

Or, if you happen to have a Peron type handy,

val person: Person = (name = "Emma", age=42)

Think of a structural type as a named argument list waiting for the function/constructor that will turn it into a first-class value.

1 Like