Discussion about structural types(index-based)

There is said in Programmatic Structural Types:
Some usecases, such as modelling database access, are more awkward in statically typed languages than in dynamically typed languages: With dynamically typed languages, it’s quite natural to model a row as a record or object, and to select entries with simple dot notation.

Unfortunately a dynamic approach has hard disadvantages.

  • It has a lack of performance on big data
  • It is error prone, the compiler does not see misprints

Is it possible to solve that disadvantages at all?

I think it is possible if move dynamics from a row to a whole dataset.
It is possible to improve performance significantly. And it is possible to minimize error prone.

Unfortunately I am not have experience to make good SIP. I can only propose an idea. Please take it into consideration.

At first we need a row abstraction.

  trait Record extend Prodcut{
      def set(i:int, v: Any):Record
      def get(i:int):Any
  }

  object Row extend Record{
      def create():Product = ...
  }

  type Person = Row {
    val id: Long 
    val name: String
    val age: Int
    val checkDate: Date
  }

So we can be able to write

  val p:Person = (name="name1", age=1)

It could be desugaring to

  val p:Person = Record.create().set(1,"name1").set(2,1).asInstanceOf[Person]

In practice any data factory would use dynamics binding to create rows.
The performance improvement is reached because such additional work is done once at all dataset.

object Dao{
  def queryAll[T<: Record]()(implicit meta: RecordMeta[T]): Traversable[T] = {
     new Traversable[T]{
        def foreach[U](body: T=>U):Uni = {
            //open dataset 
            val s = ConnectionContext.connection.executequery(...)
            //crate metamap 
            val mapArray = new Array[Int](meta.size)
            ....            
            var hasRow = s.next 
            while(hasRow){
               var i = 0 
               var r = Record.create().asInstanceOf[T]
               while{i<mapArray.size}{
                  i++
                  // It seams 70 times cheaper than by key access 
                  // we can use zero copy approach after all
                  r.set(i,s.get(mapArray(i)))
               }
               hasRow = s.next 
            }
        }
     }
  
  }
  def updateAll[T<: Record](tableName:String)(data: ArrayBuiler[T]=> Unit)(implicit meta: RecordMeta[T]):Unit = ...
}

So we would be able to write:

  Dao.update[(id:Long,checkDate:Date)]("person"){ b => 
    for(p <- Dao.queryAll[Person]()){
     println(p.age)
     b+= (id = p.id, checkDate = sysdate)
    }
  }

It could be desugaring to

  Dao.update[(id:Long,checkDate:Date)]("person"){ b => 
    for(p <- Dao.queryAll[Person]()){
     println(p.get(2).asInstanceOf[Int])
     b+= Record.create().set(0,p.get(0)).set(1,sysdate)
    }
  }

I think if amount of column is greater than 5 such approach will improve readability significantly and decrease error prone at the same runtime speed.

It would be great if there were ability to transform record type with match type(but is a long future, so here is very short example):

val q:Seq[(name:Column[String], age:Column[Int])] = ...
val q = for (c <- Persons) yield (name=c.name,age=c.age)
for(r<-q.result){
  // r:(name:String,age:Int)
}

See also:

Scala 3’s structural types are pretty much intended for this use case, AFAIK…

I am unsure I understand you correctly.
It seems the reference is broken.
If you say about Proposal for programmatic structural types

I think it does not solve the task because it has lack of performance for big data.

Code is still confusing. You haven’t provided definition of Dao.update. Could you prepare a full working code (in current Scala or Dotty) that anyone can run without figuring out missing pieces? Then add commentary about where extra record syntax would help.

Throwing unfinished code will produce unfinished ideas.

1 Like

Sorry I don’t understand motivation again :frowning:

The question was: Where is the performance gain.

It seems obvious for me.

The syntax is very similar to dotty programmatic Structural Types. So I think it is obvious too.

What question will it be cleared?

Glad to hear it. Can you restate it so it’s clear to the rest of us?

2 Likes

Oderskey has written:

  • I believe structural types are used rarely not primarily because they have bad performance

I disagree with him. There are big difference between row and cell. But I don’t believe that working example can change situation. If we can use “for” to iterate over cells I will be happy. But currently it is not so. So we have to use classes in the situation where they are inconvenient.

Yes it is simple enough. The real problem is to prove.

If you work in another domain area we have a wall between us. It is usual situation so I don’t blame anyone of course.

Access by key is 77 times slower than by index. If you need to iterate over 100 000 * 20 cells it is really significant overhead.
Of course the final difference will be less. But it will be still significant.

In other discussion you showed just 7x performance difference (instead of 77x here) on a very simple benchmark so something’s not adding up. Maybe raw java.util.HashMap is just much faster than the column name to column index mapping that is built into your JDBC driver? Have you tried creating your own Map[String, Int] for column name to column index mapping and then using only column index when invoking JDBC API?

2 Likes

It is test for pure dynamic objects. I also have said in other discussions that I have experienced performance decreasing about 1,5 - 3 times.

We usually work with jdbc by index.

I think what you want are named tuples, for which:

namedTuple {a:Int,b:Int} != namedTuple {b:Int,a:Int}

holds.

Records should be order independent, meaning:

record {a:Int,b:Int} == record {b:Int,a:Int}

Tarsa has noticed that tuples are case classes.
So I have chosen more neutral name of topics. Because I think that I it would be better to have the ability using any data structure. It allows to implement decisions with zero copy on direct buffers for example. But quite frankly named tuples is a decision at least for our current tasks.

There are very intresting feature: Multi-Stage Programming

I have the same question, how I could implement good performance glueing in such paradigm.

I don’t understand why database access is more awkward to model with statically typed languages than with dynamically typed languages.

Personally, I find ORM with annotations to be best option to solve this kind of task.

The same for SOAP (XML) and REST (JSON).

The only case I could imagine where multi staging is applicable is with semi structured databases like NoSQL ((Mongo | Couch) - DB).
For strongly nested structures it may better to generate a more efficient flat record which is accessed more quickly when requesting deep members.
And it makes only sense if this flat structure is accessed very often in order to countervail the additional work needed when transforming a JSON node to a flat struct at runtime.

Nonetheless, hash like performance still applies for accessing members as you don’t know if they exist.
Moreover, indexing doesn’t make sense in this case as you don’t know the order of the members pulled from database.

You are right. It is very useful we also use orm for many tasks.

It is a tricky question.

I think the answer is that popular static languages are object oriented. So they just do not try to provide comfortable access to relational data. There are no such problems in dynamic languages because they have very flexible structure.

I would want from scala best of two worlds It is really difficult task in practice.
I respect scala very much for the ability to solve such tasks.

We currently solve such tasks by making mapping once per dataset instead of doing it per each row.
I have tried to illustrate it in this topic.
It is quite common optimization strategy see JDBC Batch.
Quite frankly if business logic can not be batch processed it is not scalable.

It can be illustrated by example, let’s look up it in some dynamic language:

recs = sql("""
   select g.idGds
            ,g.idGdsType
            ,g.sGdsName
            ,sum(i.Cost) nCost
            ....
            ,avg(i.n20) n20
     from gds g
     left join gds_type t on t.idgs=g.id 
     .....
    group by ....
""")

for g in recs.sortBy(it.idGdsType) do {
    println(g.sGdsName)
    ...
}

So we can see very little boilerplate code.
A static language just cannot provide such succinct .
They can say, you can use pojo classes(case classes).
But there will be unwanted class on each query in practice. We can easy imagine how comfortable to program without anonymous classes or lambda expressions. Quite frankly It is not comfortable at all.
Scala can suggest “for” and “dynamic” abstraction, but they have dynamic nature it do not have static performance and it is error prone either.

They can say it is rare case and thay will be right. It is rare case when you use database only for storage some state. But it is very common case when you write an erp system for example.
IMHO It is not rare case in the all world either.

What language is that? I’m confused, because g seems to be the loop variable, but it is also used outside of the loop.

Sorry, it seems I have misprinted but I can not see where.

It is abstract dynamic language of my dream.
But I can make something similar in groovy or jexl if it is really important.
It is from our test case for jex for example:

    var l= sql("select 1 d, 2 e").asList()
    for(r:l){
      println(r.d);
      println(r.e);
    }

They can say, you can use pojo classes(case classes).
But there will be unwanted class on each query in practice. We can easy imagine how comfortable to program without anonymous classes or lambda expressions.

Then it would be wise to provide some kind of anonymous case class, i.e. an anonymous named tuple.
The main point here is that you need some kind of a type lambda taking the original table type (case class) as arg plus the projected fields (select) and then synthesize a new anonymous case class containing only the provided fields with the corresponding type.

Either some kind of RTTI Information about types would be necessary to do it or one uses a macro for this task.

My dream would be to throw out any kind of SQL, because it is the worst part of relational databases, it is nice when it fits but becomes a mess if your query becomes more complex like with any declarative language.

I think the best approach is to have repositories of types:


private Repository[Person] personRepo;
private Repository[Vehicle] vehicleRepo;
private Repository[House] houseRepo;

Person,Vehicle, House are either ORM annotated or must implement a specific Repository trait where primary and foreign keys are defined.
Then you create a Person and throw it in the personRepo which autmatically adds a row in vehicleRepo and houseRepo for you. For this to work, reference from objects need to be mapped to primary and foreign key either by annotation or trait implementation.
You search a repo simply by filter, update the objects you get from the repo and that’s it, the synchronization is automatically done for you.
Or you filter objects in in some repository and project some fields out of it yielding a new structural types with the projected fields.

Maybe the slick framework already provides this kind of stuff, I don’t know but I think this is the right direction for a maximum of comfort or lazyness.