Compiling Scala to Python as a new platform for data engineering and AI

Looks promising:

3 Likes

I have experience with using and writing numerical libraries code in Python, Fortran and other languages. I think it’s important not to discount the sheer amount of engineering that went into numpy, pandas and scipy. They aren’t just wrappers over C libraries either, most of the code (and a lot of the hard numerical analysis code) of these projects are also in python.

The Julia programming language is a good example here. The project has been going on for years and has a fraction of the adaption that the python data stack has. In part because users of numerical software (especially outside of ML where things are often fuzzier) have a very low tolerance for bugs. When I was writing physics simulations, I was already obsessed with my papers being right, discovering that some result is undermined by a bug in a numerical library is nightmare material.

In my opinion, this a niche that is well served by python and R, and I don’t think expending huge engineering efforts to target it with Scala is worth it.

4 Likes

Thanks for your experience-based input!

The goal is not to compile to Python for its own sake but to find an easy way for Scala programmers to get access to the python-libraries including not only the code from the underlying C-library.

After all insightful comments in this thread I’m inclined to view Scala Native as the most promising starting-point for providing some (easily maintained) way of accessing numpy, pandas, torch etc. Esp. since there seems to be a way to access python memory from a C-program - and Scala Native can just call C-stuff available to the linker. And GPU code is platform specific (until project Babylon is ready).

So I’m thinking that a way forward (hopefully not unrealistic?) is to create some shim around how python memory is accessed from C to allow for memory sharing across languages.

I found this but I’m not sure this old post is up to date with how things are done today, if you want to access PyObject from a C program:

What do you think?

2 Likes

This seems like an amazing idea to explore. Would this be conceptually similar to how its done in blender maybe?

They embed the python interpreter into C/C++ as far as I understand. It worked really well when I tried it although it is probably 10 years ago I believe..

Don’t know, but this could be game changer for scala native. Making python and scala devs work seamlessly together not only within ai.

One could also expose scala apis and libraries for python scripting.

1 Like

I would love this to happen.
I was trying out Scala, F# and some other languages some time ago and eventually went with F# because of their awesome transpiler project Fable (https://fable.io/).

Not only does it transpile to JS, but also TS and Python with type hints.

It has been sooo useful for me in the following aspects:

  • I do not like Python and TS that much but their ecosystems are very hard to beat. It allows me to have the best of both worlds: an expressive and safe lang with huge lib that I can just leverage.
  • I can focus on learning the language (F#, Scala, etc.) instead of learning both the language and their ecosystems. Being able to continue using all the lib I already know in the target ecosystem (e.g. zod in TS for validation, numpy and pandas for data stuff in python) allow me get productive and hand-on during the learning/transition.
  • Similar story on the tooling side. I for example, haven’t got time to learn the FsUnit/whatever the setup for .net, but it is fine, I just can use pytest.

(A bit more background, I am doing/learning DDD style. So I have most of the core logic in F#, which then call some ā€œplatformā€ code written in python.)

5 Likes

Philosophy behind mojo from the creator. I think it was a great insight.

1 Like

Cross-posting this for reference:

2 Likes

Recently I was experimenting with a Python backend for Scala 3. It draws inspirations largely from the implementation of Scala.js. The objective of this project is to support interop with Python and thus open up the entire Python ecosystem (with all the efficient, native bindings for ML, data processing, etc) for us. Here is a small example project: https://github.com/linyxus/cappy-example . It has Scala facades for numpy and rich, and small demonstrations for them. Feel free to play with it!

10 Likes

Cool! there is a starlark , and I was want to implement one in scala

Another one is Dart, if we can compile Scala to Dart , then we can write flutter app with Scala

I am not a Python user, and I am not sure of implications, but I find this amazing, in spite of seeing no particular use for this myself.

I am curious:

Thanks for your interests! To answer your questions:

  • It started (and now mostly remains) purely as a hobby side project just for fun. It took like one month to get to the status quo. It could become more serious in the future: the bet for Python as a backend would be that it opens up Python’s rich & extensive & performant C-FFI-based ecosystem for us. There are various projects on JVM that work on C FFI too, for sure. But Python already has a well-designed model, a huge ecosystem with extensive toolings and bindings. (And we want to let all these things benefit Scala!) I was also wondering whether it is possible to position Scala’s Python backend towards Python just like TypeScript towards Javascript: a gradual, opt-in, flexible yet strong typed interface for an existing language.
  • Claude was used extensively. As I said, it started just for fun, largely vibe coded. I used LLMs extensively for prototyping and developping (and I was actually astonished how efficient can it be with agent teams investigating 10 bugs in parallel). On the other hand, I reviewed & audited the code it writes and all bugfixes it proposed. And the overall architecture, of course, was designed by me (I don’t trust LLMs for that, yet :stuck_out_tongue: but we indeed debated a lot and it would provide quite a lot of useful feedbacks).
  • It’s a standalone backend with nearly no changes to the compiler itself. With some effort it can become a separate compiler plugin, just like what Scala native does. It is currently a fork of the entire compiler simply because it was more convenient for me. So, maintainance effort should be comparable to Scala native & Scala.js.
4 Likes

this is very intereresting - do you have plans for distribution of library code? and would that be just the IR, or standalone python output (i.e. could dynamic linking be possible)?

second: other parts of the program that arent using python facades - they will be using the scala and java stdlib compiled to python which i imagine is probably slower than idiomatic python - what strategies should be employed there?

2 Likes

For distributing library code, I will probably distribute only the IR files for Scala libraries, and both the tasty and the IR files for Scala facades of Python libraries (typed interfaces for Python libs).

And for compiling Scala to be used by Python, great question! Actually Martin had the proposal of integrating CPython with the Scala.native backend: compiling Scala to use the heap models of Python and producing objects that can be directly linked into Python.

3 Likes