Dotty-Quill issues with Embedded and Decoding

deusaquilus · January 24, 2021, 8:11am

@nafg You’ve made some great points about Embedded so I’d like to reply to them and generally talk about Dotty-Quill here.

For reference:

@deusaquilus what about being able to define an explicit projection? I guess it’s different in Slick where the building blocks are columns so you can map to/from tuples, Quill needs named-case-class-fields… but then shouldn’t it be possible to go lower level than that? Case class field names are a nice convenience but they don’t seem like a good building block to build abstractions on top of
Typeclasses don’t feel like the right tool for the job to me
Let me turn the question around though, why don’t nested case classes work by default? You can probably give a more definitive answer, my guess would be:
(1) Some backends support nested data
(2) Just because a field is a case class doesn’t mean it doesn’t get mapped to a single field
(3) Since field names are usually significant, should the name of the field of the embedded case class just be ignored? That feels inconsistent. (As in, class class Outer(field1: Int, ignoredName: SomeEmbeddedCaseClass))

@nafg You really hit the nail on the head, it’s actually all of these points. Doing Dotty Quill, I’m in the process of considering where Embedded is actually needed. In the current Quill, it is used in 3 places:

Outermost field expansion e.g. query[Person] (given a Person(name, age)) would be select p.* but it expands to select p.name, p.age right before transforms. So there we need to know that for some Person(n:Name); Name(first, last) extends Embedded it would need to be select.p.n.first, p.n.last and the n needs to be hidden.
Parsing of arbitrary fields p.n for example in query[Person].map(p => p.n) we need to know that the n is hidden before any expansions happen.
Quat-Making where we need to know that a case class represents an actual row (or a part of a row) as opposed to a user-defined type.

The need for Embedded[T]

Now In some of these places, it is possible to replace knowing the presence of an Embedded[T] for knowing that an Encoder[T] does not exist. I.e. if you encounter a Product type for which there is no encoder, you can probably assume that it is a sub-row as opposed to a user-defined type. There are two problems with this, however:

Things like Spark make heavy use of nested row types so we cannot assume the presence/absence of a Encoder of something has any meaning. In the Spark case, we can maybe assume that all product types for which there is a spark Product-Encoder are not to be expanded the way a Embedded object is, or possibly we can skip the expansion stage entirely because Quats give us enough information anyway.
In dotty-quill there will be support for Lazy Lifts which is basically lifting something without any existing encoders. This is to allow quotation to be done in a static context i.e. without having to import an actual context variable, which is one of the biggest pain points of Quill contexts today (this is somewhat alleviated by implicit function types but only partially). Right now, Quotation can already be done from a static context but in order to know that lifting something is possible, we need to summon its Encoder, this latter restriction is being relaxed slightly for lazy-lifts where we can keep around some data in a Quotation and summon its encoder before the run function. The consequence of this is that even if you don’t have an encoder around for some piece of data (represented by a case class), it can still be a value as opposed to an embedded entity.
Finally, I think it is very realistic to have a case with a Datastore that needs both Embedded entities and well as row-level nested types. Databases like Postgres and Redshift already allow nested objects to one extent or another. Making the Embedded-or-not decision on the level of the an entire datastore seems a little too fat-fingered.
Anyway, I’ve made some big efforts to cut out Embedded from Dotty-Quill but it is still needed in the parser, at least in the case of Lazy Lifts.

Anyway, I’ve made some big efforts to cut out Embedded from Dotty-Quill but it is still needed in the parser, at least in the case of Lazy Lifts.

Coproducts and By-Field Extraction

Another point that I’ll briefly touch on is the notion of encoding based on field-name vs field position (i.e. index). Doing the latter is quite compelling because of its simplicity, we currently do this during decoding. However, the one serious problem with this is that if you want to use this approach with Static queries, it is completely impossible to do coproducts. Say you have something like this:

sealed trait Customer { def id: Int; def etype: String }
object Customer {
  case class Person(override val id: Int, override val etype: String, firstName: String, lastName: String) extends Customer
  case class Robot(override val id: Int, override val etype: String, serialNumber: String) extends Customer
}

Let’s say you then have some kind of function ResultSet => (T <: Customer) that will let you know what sub-class of Customer a row actually is (let’s say the actual implementation use a column etype that could be either "Person" or "Robot"). Still, when you do this:

query[Customer]

… how do you know what fields to select from the DB since during compile-time you have no idea whether the row should be Person or Robot?

The only decent approach that I’ve come up with is to gather all the fields from both Person and Robot and select all of them for the database and assume it is a Table-Per-Hierarchy mapping (as the Hibernate people would put it). That means I would make the query something like this:

query[Customer]
// select c.id, c.etype, c.firstName, c.lastName, c.serialNumber from Customer c

Now imagine what the ResultSet for this kind of approach looks like.

| Id   | eType    | firstName    | lastName      | serialNumber
-------|-----------------------------------------------------
| 1    | "person" | Joe          | Bloggs        | null
| 2    | "robot"  | null         | null          | 123AC35DG

It’s basically impossible to match the indexes of the result-set row with the actual type being decoded because we’ve effectively spliced together all the fields. The only viable approach is to map by row-name instead of by-index.

A very similar issue occurs with the encoder for case classes i.e. CaseClassLift where we could potentially have a Customer.Person or Customer.Robot passed in into query[Customer].insert(Customer.Person(....)) and we need to know which rows to pull out for insertion

Up to now, Quill has not supported coproducts in any reasonable form and we have mostly gotten away with it (after all, Spark has supported them either). The reason for this is that Scala 2’s support of coproducts via sealed-trait has always been idiosyncratic. This will change with Scala 3 enums however which are essentially 1st-class supported coproducts of the language. That means to be aligned with the Scala 3 feature set, an effort must be made to support coproducts which means that some of Quill’s core-design principles need to be re-visited.

Now if Quill was not using macros and porting Quill to Scala 3 was just a matter of messing around with Types and getting code to compile, I would not be thinking about any of these things. Indeed, this is the case for to some degree for Slick, Doobie (probably), and some other frameworks. However, because Scala 2 macros are being completely thrown out and Quill needs a radical redesign, we might as well get it right from the start.

nafg · January 24, 2021, 4:30pm

What do you think about explicit mapped projections, a la Slick’s <> operator?