GSOC 2017 - Slick

My name is Michael Viveros and I am interested in participating in Scala’s Slick project for GSOC 2017, @cvogt @szeiger.

Below are some questions.

  1. What does the topic:type-providers label in the repo’s Issues refer to?
    I think it means anything related to converting a database type to a Scala type and vice-versa.
    I didn’t see a specific section about it in the docs so I’m thinking it spans lots of topics (schemas, queries, code-gen).

  2. Which parts of Slick would the GSOC project be focussing on the most?
    I looked at the Labels of the Issues list and the most popular topics were “documentation" and “type-provdiers”. I did not see any labels related to priority.

  3. What should I be doing next?
    Things I have thought of so far:
    -Learn the basics of Scala (coursera course)
    -Review the Slick code base and try to fix some issues - I found 10 open, effort: easy issues

Michael

topic:type-providers refers to the code generator. It was initially supposed to work like .NET type providers with a macro-based implementation but never materialized (because we didn’t get the required macro support in Scala), so we settled for code generation (which may actually be better for many users).

We don’t really use priority labels but anything with 1 - Ready is good place to start. These are issues that have been triaged and considered a good idea that could (and should) be implemented.

Some ideas off the top of my head for medium-sized projects:

  • Prototype a Java API (possibly using Jinq for bytecode analysis)

  • Implement a non-blocking backend for Postgres or MySQL

  • Reactive Streams improvements: Process streams of DIOActions with back-pressure handling; read LOBs from result sets through sub-streams

I’ve got some Postgres experience so I’d definitely be interested in this project. I think I found the main issue about supporting an async driver.

Do you have any recommended next steps?

I will start looking more into the Slick code base (specifically the Postgres profile) and the async-postgresql driver referenced in the issue.

The first step would be a simple asynchronous backend that extends RelationalBackend but not JdbcBackend. The DBIO interpreter in BasicBackend should be perfectly usable for both, blocking and non-blocking I/O. It has some special support for SynchronousDatabaseAction that an asynchronous driver (unlike JdbcBackend) simply wouldn’t use. You need to add a new DBIO type for asynchronous database actions and make the interpreter handle it correctly. You also need a way to configure database connections and handle transactions appropriately.

The result of this work should be a backend that allows you to instantiate a configured Database and run DBIO actions on it that consist of an asynchronous equivalent of SimpleJdbcAction and all standard DBIO combinators.

Step 2 is a profile like AsyncPostgresProfile extends SqlProfile that can compile the supported basic operations (defined in RelationalProfile) to asynchronous actions. Since you need to use the SQL code generator which is currently defined in JdbcStatementBuilderComponent with a dependency on JdbcProfile this will require some refactoring to pull the statement builder up to the SqlProfile level.

At this point you should be able to write a TestDB and run Testkit on the new profile. All tests that are based on RelationalProfile should run and pass.

The final phase is making SqlProfile more useful. All our SqlProfile implementations at the moment implement JdbcProfile, so there was no real need to make SqlProfile useful. It should be possible to refactor many operations from JdbcProfile into SqlProfile.

1 Like

well, I think the best way is first to create a “true” async driver for PostgreSQL (actually PostgreSQL is async, so it might be benefitical the most).
Currently there is already one async driver that could probably be integrated into slick: https://github.com/mauricio/postgresql-async/ I don’t think that this driver is suitable.
The problem with this driver is, that there can only be one query per connection, which means that without correct pooling this driver is probably not really useful at all.

A better approach would probably to use either Finagle (I have no experience with that) or Akka-Stream and actually represent a connection as a Stream and Each “transaction” can be a stream as well, that means that you can configure how much queries are inside the queue.
I actually toyed with that kind of idea already: https://github.com/schmitch/akka-pg however besides a demonstration that it might be possible, I didn’t came too far.
Currently I use this driver to have a simple pub/sub over a single PostgreSQL instance (more instances don’t work since notifies won’t get replicated) which works fully async.
Kind of missing is actually a way to reconnect flow, however that can be implemented relativly easily on future akka versions, thanks to jroper: https://github.com/akka/akka/pull/23367

I think the good thing about a akka-stream database driver would not only be the async nature, but also the backpressure of reactive-streams, so that there is no possible way to kill the database server with too many queries.

Edit TL;DR:
What I basically wanted to say is, that I think before Slick get’s integration for a async driver, scala should basically first have a stable and well looking driver, that has a nice internal design.

@MichaelViveros do you have an update? It’s a cool idea.

No, sorry, I got accepted into GSOC for another project so I haven’t looked into anything for Slick since my initial post. I think another student got accepted to GSOC for Slick though so maybe they looked into the async driver, https://summerofcode.withgoogle.com/projects/#4870442316726272

Ah. Tx for the update.