Compiling Scala to Python as a new platform for data engineering and AI

Looks promising:

3 Likes

I have experience with using and writing numerical libraries code in Python, Fortran and other languages. I think it’s important not to discount the sheer amount of engineering that went into numpy, pandas and scipy. They aren’t just wrappers over C libraries either, most of the code (and a lot of the hard numerical analysis code) of these projects are also in python.

The Julia programming language is a good example here. The project has been going on for years and has a fraction of the adaption that the python data stack has. In part because users of numerical software (especially outside of ML where things are often fuzzier) have a very low tolerance for bugs. When I was writing physics simulations, I was already obsessed with my papers being right, discovering that some result is undermined by a bug in a numerical library is nightmare material.

In my opinion, this a niche that is well served by python and R, and I don’t think expending huge engineering efforts to target it with Scala is worth it.

4 Likes

Thanks for your experience-based input!

The goal is not to compile to Python for its own sake but to find an easy way for Scala programmers to get access to the python-libraries including not only the code from the underlying C-library.

After all insightful comments in this thread I’m inclined to view Scala Native as the most promising starting-point for providing some (easily maintained) way of accessing numpy, pandas, torch etc. Esp. since there seems to be a way to access python memory from a C-program - and Scala Native can just call C-stuff available to the linker. And GPU code is platform specific (until project Babylon is ready).

So I’m thinking that a way forward (hopefully not unrealistic?) is to create some shim around how python memory is accessed from C to allow for memory sharing across languages.

I found this but I’m not sure this old post is up to date with how things are done today, if you want to access PyObject from a C program:

What do you think?

2 Likes

This seems like an amazing idea to explore. Would this be conceptually similar to how its done in blender maybe?

They embed the python interpreter into C/C++ as far as I understand. It worked really well when I tried it although it is probably 10 years ago I believe..

Don’t know, but this could be game changer for scala native. Making python and scala devs work seamlessly together not only within ai.

One could also expose scala apis and libraries for python scripting.

1 Like

I would love this to happen.
I was trying out Scala, F# and some other languages some time ago and eventually went with F# because of their awesome transpiler project Fable (https://fable.io/).

Not only does it transpile to JS, but also TS and Python with type hints.

It has been sooo useful for me in the following aspects:

  • I do not like Python and TS that much but their ecosystems are very hard to beat. It allows me to have the best of both worlds: an expressive and safe lang with huge lib that I can just leverage.
  • I can focus on learning the language (F#, Scala, etc.) instead of learning both the language and their ecosystems. Being able to continue using all the lib I already know in the target ecosystem (e.g. zod in TS for validation, numpy and pandas for data stuff in python) allow me get productive and hand-on during the learning/transition.
  • Similar story on the tooling side. I for example, haven’t got time to learn the FsUnit/whatever the setup for .net, but it is fine, I just can use pytest.

(A bit more background, I am doing/learning DDD style. So I have most of the core logic in F#, which then call some ā€œplatformā€ code written in python.)

3 Likes

Philosophy behind mojo from the creator. I think it was a great insight.

1 Like