Discussion about structural types(index-based)

Access by key is 77 times slower than by index. If you need to iterate over 100 000 * 20 cells it is really significant overhead.
Of course the final difference will be less. But it will be still significant.

In other discussion you showed just 7x performance difference (instead of 77x here) on a very simple benchmark so something’s not adding up. Maybe raw java.util.HashMap is just much faster than the column name to column index mapping that is built into your JDBC driver? Have you tried creating your own Map[String, Int] for column name to column index mapping and then using only column index when invoking JDBC API?

2 Likes

It is test for pure dynamic objects. I also have said in other discussions that I have experienced performance decreasing about 1,5 - 3 times.

We usually work with jdbc by index.

I think what you want are named tuples, for which:

namedTuple {a:Int,b:Int} != namedTuple {b:Int,a:Int}

holds.

Records should be order independent, meaning:

record {a:Int,b:Int} == record {b:Int,a:Int}

Tarsa has noticed that tuples are case classes.
So I have chosen more neutral name of topics. Because I think that I it would be better to have the ability using any data structure. It allows to implement decisions with zero copy on direct buffers for example. But quite frankly named tuples is a decision at least for our current tasks.

There are very intresting feature: Multi-Stage Programming

I have the same question, how I could implement good performance glueing in such paradigm.

I don’t understand why database access is more awkward to model with statically typed languages than with dynamically typed languages.

Personally, I find ORM with annotations to be best option to solve this kind of task.

The same for SOAP (XML) and REST (JSON).

The only case I could imagine where multi staging is applicable is with semi structured databases like NoSQL ((Mongo | Couch) - DB).
For strongly nested structures it may better to generate a more efficient flat record which is accessed more quickly when requesting deep members.
And it makes only sense if this flat structure is accessed very often in order to countervail the additional work needed when transforming a JSON node to a flat struct at runtime.

Nonetheless, hash like performance still applies for accessing members as you don’t know if they exist.
Moreover, indexing doesn’t make sense in this case as you don’t know the order of the members pulled from database.

You are right. It is very useful we also use orm for many tasks.

It is a tricky question.

I think the answer is that popular static languages are object oriented. So they just do not try to provide comfortable access to relational data. There are no such problems in dynamic languages because they have very flexible structure.

I would want from scala best of two worlds It is really difficult task in practice.
I respect scala very much for the ability to solve such tasks.

We currently solve such tasks by making mapping once per dataset instead of doing it per each row.
I have tried to illustrate it in this topic.
It is quite common optimization strategy see JDBC Batch.
Quite frankly if business logic can not be batch processed it is not scalable.

It can be illustrated by example, let’s look up it in some dynamic language:

recs = sql("""
   select g.idGds
            ,g.idGdsType
            ,g.sGdsName
            ,sum(i.Cost) nCost
            ....
            ,avg(i.n20) n20
     from gds g
     left join gds_type t on t.idgs=g.id 
     .....
    group by ....
""")

for g in recs.sortBy(it.idGdsType) do {
    println(g.sGdsName)
    ...
}

So we can see very little boilerplate code.
A static language just cannot provide such succinct .
They can say, you can use pojo classes(case classes).
But there will be unwanted class on each query in practice. We can easy imagine how comfortable to program without anonymous classes or lambda expressions. Quite frankly It is not comfortable at all.
Scala can suggest “for” and “dynamic” abstraction, but they have dynamic nature it do not have static performance and it is error prone either.

They can say it is rare case and thay will be right. It is rare case when you use database only for storage some state. But it is very common case when you write an erp system for example.
IMHO It is not rare case in the all world either.

What language is that? I’m confused, because g seems to be the loop variable, but it is also used outside of the loop.

Sorry, it seems I have misprinted but I can not see where.

It is abstract dynamic language of my dream.
But I can make something similar in groovy or jexl if it is really important.
It is from our test case for jex for example:

    var l= sql("select 1 d, 2 e").asList()
    for(r:l){
      println(r.d);
      println(r.e);
    }

They can say, you can use pojo classes(case classes).
But there will be unwanted class on each query in practice. We can easy imagine how comfortable to program without anonymous classes or lambda expressions.

Then it would be wise to provide some kind of anonymous case class, i.e. an anonymous named tuple.
The main point here is that you need some kind of a type lambda taking the original table type (case class) as arg plus the projected fields (select) and then synthesize a new anonymous case class containing only the provided fields with the corresponding type.

Either some kind of RTTI Information about types would be necessary to do it or one uses a macro for this task.

My dream would be to throw out any kind of SQL, because it is the worst part of relational databases, it is nice when it fits but becomes a mess if your query becomes more complex like with any declarative language.

I think the best approach is to have repositories of types:


private Repository[Person] personRepo;
private Repository[Vehicle] vehicleRepo;
private Repository[House] houseRepo;

Person,Vehicle, House are either ORM annotated or must implement a specific Repository trait where primary and foreign keys are defined.
Then you create a Person and throw it in the personRepo which autmatically adds a row in vehicleRepo and houseRepo for you. For this to work, reference from objects need to be mapped to primary and foreign key either by annotation or trait implementation.
You search a repo simply by filter, update the objects you get from the repo and that’s it, the synchronization is automatically done for you.
Or you filter objects in in some repository and project some fields out of it yielding a new structural types with the projected fields.

Maybe the slick framework already provides this kind of stuff, I don’t know but I think this is the right direction for a maximum of comfort or lazyness.

I agree with you, IIUC there are no such abstraction in scala :frowning:

I hope it will never happen in the word of databases. I do not want to make manually such big volume of work which can be done automatically by declarative languages.
I understand desire to convert all tasks to orm. But an orm is not a silver bullet. Oracle database solves such amount of works which is difficult to imagine outside. It is nightmare to try solving such tasks manually.

I think in Anorm, you can write something like:

SQL(“SELECT * FROM MyTable”)().foreach { row =>

println(rowInt)

println(rowInt)

}

It is often repeated that structural types are slow, but do they have to be? I don’t fully grok invokeDynamic on the JVM, but it seems to me that it could potentially be used to give structural types nearly the same performance of regular access.

I have always thought it was simply that no one has had the time and interest to make scalac emit the correct code to do this.

If this is correct, it could be a big change to scala: the ability to use structural types with (nearly) no perf hit would allow us to write libraries that are much less tightly coupled yet still type-safe and fast.

We use anorm, it is very good library.
I do not understand your example.
What type does row have?
If it is “scala.Dynamic” I will not be able to see any difficulties. But It will be just slower than tuples. Tuples will be hard to read and do refactor when there are large amount of columns.

scalac already uses invokedynamic to implement structural calls: scala/src/compiler/scala/tools/nsc/transform/CleanUp.scala at 3bb7dbd8f047267f66e33ae89d18f51667a686e0 · scala/scala · GitHub. Unfortunately invokedynamic is not the silver bullet you would hope it’d be, so structural calls are still significantly slower than regular ones.

The type of row is anorm.Row, I believe.

Ok, Let’s not guess.
There are source code:

  /**
   * Returns parsed column.
   *
   * @param name Column name
   * @param c Column mapping
   *
   * {{{
   * import anorm.Column.columnToString // mapping column to string
   *
   * val res: (String, String) = SQL("SELECT * FROM Test").map(row =>
   *   row("code") -> row("label") // string columns 'code' and 'label'
   * )
   * }}}
   */
  def apply[B](name: String)(implicit c: Column[B]): B =
    unsafeGet(SqlParser.get(name)(c))

Let us navigate further.

  def get[T](name: String)(implicit extractor: Column[T]): RowParser[T] =
    RowParser { row =>
      (for {
        col <- row.get(name)
        res <- extractor.tupled(col)
      } yield res).fold(Error(_), Success(_))
    }

let’s see row.get(name)

private[anorm] def get(a: String): MayErr[SqlRequestError, (Any, MetaDataItem)] = for {
    m <- MayErr(metaData.get(a.toUpperCase).toRight(ColumnNotFound(a, this)))
    data <- {
      def d = if (a.indexOf(".") > 0) {
        // if expected to be a qualified (dotted) name
        columnsDictionary.get(m.column.qualified.toUpperCase).
          orElse(m.column.alias.flatMap(aliasesDictionary.get(_)))

      } else {
        m.column.alias.flatMap(a => aliasesDictionary.get(a.toUpperCase)).
          orElse(columnsDictionary.get(m.column.qualified.toUpperCase))
      }

      MayErr(d.toRight(
        ColumnNotFound(m.column.qualified, metaData.availableColumns)))
    }
  } yield (data, m)

Quite frankly It is not good from performance point of view.
I can make faster on jexl.
So we love anorm, but we cannot use such of its api everywhere.

I’m just going to link to my post with examples of other systems with index-based records, just so that we don’t think that it’s something novel or not implemented before.

If you know all the labels in the record (e.g. row types without subtyping), calculating index is trivial by just hashing the label name at compile-time.