In multi-stage compilation, should we use a standard serialisation method to ship objects through stages?

Wow, didn’t expect this happen :-]

No surprise that Chill is what Java serialization should strive to be. But BooPickle & ProtoBuf (both yield binary data) really stand out here which kind of proved my point that human readability can be sacrificed.

Will either of them achieve to be universally applicable like Chill? Regardless of the outcome, I would propose to add the simplest implementation in dotty core staging, then suggest people to improve on top of it using all the better options.

You would also run into issues if you try to lift a large object. There are hard limits on how big literals in a classfile can get (How to run Scala code at compile time? - #9 by Jasper-M - Question - Scala Users).

1 Like

Thanks a lot @Jasper-M, I’ll modify the code to take multiple UTF-8 encoded string easily.

This is just a demonstration in scala 2, I expect scala 3 to have some mechanism to create headless Expr[_] directly from serialized data

@Jasper-M hope the new version is up to your standard:

Wondering if something similar could be useful to scala3 compiler?

Here is a prototype that implements the AutoLift idea above in Scala 3.

import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream}
import java.util.Base64
import scala.quoted.*

sealed trait SerializableExpr[T]:
  def apply(x: T)(using Quotes): Expr[T]

sealed trait DeSerializableExpr[T]:
  def unapply(x: Expr[T])(using Quotes): Option[T]

object SerializableExpr {

  private inline val MAX_LITERAL_LENGTH = 32768

  lazy val encoder: Base64.Encoder = Base64.getEncoder
  lazy val decoder: Base64.Decoder = Base64.getDecoder

  def apply[T: SerializableExpr](x: T)(using Quotes): Expr[T] =
    summon[SerializableExpr[T]].apply(x)

  def unapply[T: DeSerializableExpr](x: Expr[T])(using Quotes): Option[T] =
    summon[DeSerializableExpr[T]].unapply(x)

  given serializableExpr[T <: Serializable : Type]: SerializableExpr[T] with {
    def apply(x: T)(using Quotes): Expr[T] =
      val stringsExpr = Varargs(serialize(x).map(Expr(_)))
      '{ deserialize[T]($stringsExpr*) }
  }

  given deSerializableExpr[T <: Serializable : Type]: DeSerializableExpr[T] with {
    def unapply(x: Expr[T])(using Quotes): Option[T] =
      x match
        case '{ deserialize[T](${Varargs(stringExprs)}*) } =>
          Exprs.unapply(stringExprs).map(strings => deserialize(strings*))
        case _ => None
  }

  private def serialize(x: Serializable): Seq[String] = {
    val bOStream = new ByteArrayOutputStream()
    val oOStream = new ObjectOutputStream(bOStream)
    oOStream.writeObject(x)
    val serialized = encoder.encodeToString(bOStream.toByteArray)
    serialized.sliding(MAX_LITERAL_LENGTH, MAX_LITERAL_LENGTH).toSeq
  }

  private def deserialize[T <: Serializable](strings: String*) = {
    val bytes = strings.map(decoder.decode).reduce(_ ++ _)
    val bIStream = new ByteArrayInputStream(bytes)
    val oIStream = new ObjectInputStream(bIStream)
    val v = oIStream.readObject()
    v.asInstanceOf[T]
  }
}

object App {
  import SerializableExpr.given

  def example(using Quotes) = {
    val serializedExpr = SerializableExpr("abc")
    serializedExpr match
      case SerializableExpr(value) => println(value)
      case _ =>
  }
}
1 Like

Actually, I did not need the DeSerializableExpr to port AutoLift.

Here is a shorter version that used ToExpr. But, it creates an expression that is harder to extract using FromExpr.

import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream}
import java.util.Base64
import scala.quoted.*

object SerializableExpr {

  private inline val MAX_LITERAL_LENGTH = 32768

  lazy val encoder: Base64.Encoder = Base64.getEncoder
  lazy val decoder: Base64.Decoder = Base64.getDecoder

  given serializableExpr[T <: Serializable : Type]: ToExpr[T] with {
    def apply(x: T)(using Quotes): Expr[T] =
      val stringsExpr = Varargs(serialize(x).map(Expr(_)))
      '{ deserialize[T]($stringsExpr*) }
  }

  private def serialize(x: Serializable): Seq[String] = {
    val bOStream = new ByteArrayOutputStream()
    val oOStream = new ObjectOutputStream(bOStream)
    oOStream.writeObject(x)
    val serialized = encoder.encodeToString(bOStream.toByteArray)
    serialized.sliding(MAX_LITERAL_LENGTH, MAX_LITERAL_LENGTH).toSeq
  }

  private def deserialize[T <: Serializable](strings: String*) = {
    val bytes = strings.map(decoder.decode).reduce(_ ++ _)
    val bIStream = new ByteArrayInputStream(bytes)
    val oIStream = new ObjectInputStream(bIStream)
    val v = oIStream.readObject()
    v.asInstanceOf[T]
  }
}

object App {
  import SerializableExpr.*

  def example(using Quotes) = {
    val serializedExpr: Expr[String] = Expr("abc")
  }
}
1 Like

@nicolasstucki Thanks a lot! Hope I could be helpful later if you plan to add a (extendable) variant of this into dotty metaprogramming core.

According to my recent poll 55.6% of developers claim they don’t need it (https://twitter.com/tribbloid/status/1520501655376175105). Maybe all they need is a little push :-]

Extensidable in what way?

Using Java Serialization is almost certainly a Bad Idea™ for reasons I don’t think I need to repeat here. But the idea in general makes sense, and could be implemented using some other serialization library that’s less problematic.

In fact, I implemented exactly this in my Python implementation of syntactic macros and hygienic quasiquotes Reference — MacroPy3 1.1.0 documentation. I don’t know enough about Scala 3 macros to talk about the details, but at a high level yeah it is handy and seems to work.

4 Likes

@nicolasstucki how about making the following changes?

  • SerializableExpr becomes an extendable trait, with
    • Serializable a dependent type, &
    • def serialize and def deserialize abstract methods which returns/takes BLOBs instead of strings

In this case the threat @lihaoyi mentioned will only have minimal impact, as we can define JavaSerializableExpr easily when it works. And switch to, e.g. KryoSerializableExpr or UPickleSerializableExpr whenever necessary.

BTW, looks like your last version already has the ability to create Expr[_] from BLOBs with 0 runtime overhead (by compiling directly into inlined constant JVM bytecode), so my old concern should be invalid.

1 Like

Someone should create a library defining this abstraction and some basic implementations.

1 Like

@nicolasstucki do you accept PR? I thought you are a maintainer of scala3 reflection library

I did not understand the question. Which PRs? Yes, I am the maintainer.

@nicolasstucki I would hope to add the PR into scala3 compiler, as the quote & splice API is still kind of experimental and implementation may need to adapt in the future.

Others may have different plans.

I see. I imagined it would be better to implement first in a standalone library. This would make it simpler to implement and try out all serialization framework such as Kryo and upickle in one place. We would be able to implement and stabilize it faster. In a second time we could move the interface definition and the basic java serialization implementation to the standard library.

We could try adding it as experimental in the standard library, but then to cross validate with other serialization libraries we would need to wait full release cicles. I feel this would be much slower and require more work that the first approach.

I could start a repo with this code and add the basic functionality, but might need some help.

2 Likes

Thanks a lot, all make sense, will do & publish

1 Like

finished, special thanks to @nicolasstucki and @DmytroMitin

this leads to some surprisingly simple macros, e.g. a TypeTag impl under 100 lines.

There is some serialization error:

AutoLiftSpec.scala:11:36: Exception occurred while executing macro expansion.
java.io.NotSerializableException: scala.quoted.runtime.impl.TypeImpl

but they are easy to fix

3 Likes

I’ve seen that attempts to fix the serialization error has already started in dotty.tools.dotc.quoted.PickledQuotes, good news, I’ll play with it more and see if the lightweight TypeTag can be published

Partially successful! The pickler can move a Type across compilers using the following code:

trait SerialisingLowering[T] extends ToExpr[T] {

  def apply(x: T)(
      using
      Quotes
  ): Expr[T] = {
    val stringsExpr: Expr[Seq[String]] = Varargs(serialize(x).map(Expr(_)))

    deserialize(stringsExpr)
  }

  protected def serialize(x: T): Seq[String]

  protected def deserialize(expr: Expr[Seq[String]])(
      using
      Quotes
  ): Expr[T]
}

object SerialisingLowering {

  object JVMNativeLowering {

    @transient lazy val encoder: Base64.Encoder = Base64.getEncoder
    @transient lazy val decoder: Base64.Decoder = Base64.getDecoder

    private inline val MAX_LITERAL_LENGTH = 32768

    protected def deserializeImpl[T](strings: Seq[String]): T = {
      val bytes = strings.map(decoder.decode).reduce(_ ++ _)
      val bIStream = new ByteArrayInputStream(bytes)
      val oIStream = new ObjectInputStream(bIStream)
      val v = oIStream.readObject()
      v.asInstanceOf[T]
    }

    def deserialize[T: Type](expr: Expr[Seq[String]])(
        using
        Quotes
    ): Expr[T] = {

      '{ deserializeImpl[T]($expr) }
    }

    object Implicits {

      given only[T <: Serializable: Type]: JVMNativeLowering[T] = JVMNativeLowering[T]()
    }
  }

  class JVMNativeLowering[T <: Serializable] extends SerialisingLowering[T] {

    import JVMNativeLowering.*

    override protected def serialize(x: T): Seq[String] = {
      val bOStream = new ByteArrayOutputStream()
      val oOStream = new ObjectOutputStream(bOStream)
      oOStream.writeObject(x)
      val serialized = encoder.encodeToString(bOStream.toByteArray)
      serialized.sliding(MAX_LITERAL_LENGTH, MAX_LITERAL_LENGTH).toSeq
    }

    override protected def deserialize(expr: Expr[Seq[String]])(
        using
        Quotes
    ): Expr[T] = {

      JVMNativeLowering.deserialize(expr)
    }
  }

}

case class TypeTag[T](
    pickle: List[String]
) extends Serializable {

  import TypeTag.*

  def runtimeClass: Class[T] = ???

  def unbox(quotes: Quotes): Type[T] = {

    given Quotes = quotes

    val q = quotes.asInstanceOf[QuoteUnpickler]

    val cucumber: Type[Nothing] = q.unpickleTypeV2(pickle, null)

    cucumber.asInstanceOf[Type[T]]
  }
}

object TypeTag {
  import SerialisingLowering._

  def compile[T]()(
      using
      Type[T],
      Quotes
  ): Expr[TypeTag[T]] = {

    val tt = implicitly[Type[T]]
    val qq = implicitly[Quotes]

    val raw = (tt, qq) match {

      case (t: TypeImpl, q: QuotesImpl) =>
        given ctx: Context = q.ctx

        val tree = t.typeTree
        val pickle = PickledQuotes.pickleQuote(tree)

        val txt = tree.showIndented(2)

        println(txt)
        val raw = TypeTag[T](pickle)

        raw
      case _ =>
        ???
    }

    import Lowering.JVMNativeLowering.Implicits.given
    //      given toExpr: ToExpr[TypeTag[T]] = JVMNativeSerializingToExpr.Implicits.only[TypeTag[T]]

    Expr.apply[TypeTag[T]](raw)
  }

  inline given get[T]: TypeTag[T] = ${ compile[T]() }
}

test code:

object Fixture {

  lazy val testCompiler: staging.Compiler = staging.Compiler.make(getClass.getClassLoader)

}


      val tag1 = TypeTag.get[Seq[T1[Int]]]

      testCompiler.run { q =>

        given Quotes = q
        given Context = q.asInstanceOf[QuotesImpl].ctx

        val t1: Type[Seq[T1[Int]]] = tag1.unbox(q)
        val str = t1.asInstanceOf[TypeImpl].typeTree.show
        
        println(str)

        '{}
      }

this can display the original type: