Seems there are many languages
Iâm sympathetic to this line of reasoning â in airy theory, it seems like it could be quite useful. The question is what the level of effort would be, even at a ballpark level.
If this was a commercial project, Iâd task somebody with a time-bounded research spike: spend, say, a month digging into this seriously, kicking the tires and experimenting, with an aim to finish with a report on whatâs easy and whatâs not, where the major problems are, and maybe a bit of simple prototype that cross-compiles a small subset of Scala successfully.
Whether any of the stakeholders have a sufficiently senior engineer to spare for a month is, of course, another question. (If somebody is between gigs and looking for a project to work on in the meantime, this might be a terribly interesting one.)
More various thoughts, after pondering this for a while more (note that my knowledge of Python is extremely shallow, so itâs possible that Iâm off-base in some places):
Some of the above discussion is talking about Python dialects. I suspect that we should be cautious there: the goal (as I understand it) is being able to work in conventional Python environments, with the major Python AI libraries. So a dialect only seems relevant if it can speak to those libraries.
Based on the history of ScalaJS and ScalaNative, I would guess that a significant fraction of the effort here would be spent dealing with:
- JVM types. Thatâs probably mostly not rocket science, but itâs the big chunk of the iceberg below water. We assume a bunch of these types in routine Scala programming, and we would need reasonably API-compatible implementations of them in order to do much. (Possibly the Java project has made progress there; I donât know.)
- Concurrency. Idiomatic Scala code tends to assume that
Future
is a thing, and translating those idioms often involves creating a fair amount of our own runtime in order to support that assumption.
That said, translating all of Scala might be a second-order priority here. The right cognate might not be ScalaJS, but Spark.
The analogy is pretty precise: a major use case where people do their exploratory work in Python, but often write their production code in Scala for reliability and maintainability. We tend to pay relatively little attention to Spark here, but my observations suggest that it accounts for a large fraction of the actual Scala work happening on the enterprise scene.
AFAIK, Spark doesnât support quite all of Scala; it wouldnât astonish me if a subset also sufficed for at least a lot of practical AI use cases. So we might not have to solve the entire problem in order to get something initially useful, as a solid proof of concept.
But really, I suspect we can talk all we like, but weâre probably going to need somebody seizing the bull by the horns and starting to commit code in anger in order to start having fully-informed opinions.
A âhello worldâ would be a start; a backend sufficient to do some basic AI calls would be a genuine proof of concept, and it wouldnât surprise me if you could get there with a fairly modest subset of functionality. And that proof of concept might well suffice to get folks jumping onto the project. If someoneâs feeling ambitious, it seems like a fine focus.
Scala doesnât seem to have much success with backends other than the JVM. I havenât seen any real world use case for Scala Native till now. Iâm very pessimistic about a new backend.
I think a more realistic option is to deepen cooperation with the OpenJDK community. HAT (Heterogeneous Accelerator Toolkit) is a subproject of Project Babylon, which provides a GPU backend for the JVM platform. Here is a good article introducing it: Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM. I thought maybe it would be possible to reuse parts of it to provide the GPU backend for Scala.
I think ScalaJS has been a success as another platform outside JVM, and having been production ready for quite some time. Scala Native is not yet 1.0 and when it will reach production ready and stability it might well take off like Scala JS did. Esp. for use cases where GraalVM is not attractive.
Project Babylon may bring nice GPU access, but there is still the big landscape of Python tools e.g. available at huggingface that might be interesting to tap into.
JVM types. Thatâs probably mostly not rocket science, but itâs the big chunk
Storch has made a whole forest of various types of different precisions etc: storch/core/src/main/scala/torch/DType.scala at main ¡ sbrunk/storch ¡ GitHub
Yes, the fact that Scala Native has not yet been officially released may be the reason why almost no one uses it. But even with Scala.js as a precedent, Scala Native has been in development for ten years without an official version, which is itself worthy of vigilance. This makes me pessimistic about whether a new backend can be put into practical use.
Additionally, although Python has a long history, it is not a legacy platform, and there arenât many people actively trying to abandon it. Borrowing from the Python ecosystem can enhance Scalaâs capabilities, but it also faces competition from Python. I think competing with Python is very challenging, and relying on the Python ecosystem makes it hard for Scala to build its own ecosystem. It can only serve as a glue language, and users can easily revert to Python.
Yeah, but thatâs exactly my point: one place where Scala is quite successful is working hand-in-glove with Python, for Spark.
Python is a fine language for exploration and development, but a somewhat weak one for long-lived products. Weâve had sustained success with large companies using Python to design their Spark environments, and Scala to maintain them.
I suspect that thatâs where Scala fits into the AI equation. We shouldnât try to supplant Python â weâll just lose. But providing a robust, strongly-typed way to build long-lived, highly maintainable AI applications seems like a very plausible niche. Indeed, using Spark very explicitly as the well-established proof of concept for the model seems like a likely way to sell it into the enterprise market.
We shouldnât be thinking in terms of competition, but complementarity. Python and Scala fit into different parts of the lifecycle for these sorts of tools: we should lean into that.
Recently, I was also interested in working with python and I was looking for interop, however I would more agree with @Glavo in regards of approach.
Even recent survey from VirtusLab shows quite low native adoption, even JS still niche in Scala ecosystem. So, in my opinion instead of investing in yet another project it is better to double down on existing stack - improving native integration (including python), improve tooling so integration with native code is trivial (with or without Panama or Babylon). With currently ever reducing Scala community I just donât think itâs feasible to maintain yet another compilation target.
Also, it might worth to shift priorities to WASM which seems also getting traction in AI space.
I agree that Spark is successful and does the right things: It has built an ecosystem with its own core competitiveness and allowed Python to use its ecosystem. But I think the Python backend does the exact opposite. Think about this: if Spark was written in Python and the Scala API was just a binding, could you still get those users to switch from Python to Scala? I donât think so. I think Scala being the native language of Spark played a huge role in this process. After losing this core advantage, Pythonâs advantage in terms of number of users is enough to overwhelm most languages.
In my opinion, the right way to get more Python users to try Scala is to develop more frameworks in Scala and create Python bindings, like Spark, rather than the other way around. Even Scala Native is much more important than Python backend in this regard, because CPython is notoriously slow, and using Python as the backend would greatly undermine Scalaâs advantages, while implementing libraries in Scala Native and creating Python APIs looks more attractive. In addition, I think technologies such as GraalPython are also worth considering.
âItâs better to put resources elsewhere.â
I agree with @bjornregnell that arguments of that form are not helpful. Either someone finds this idea interesting and somehow gets the time to do it, and it will be done, or no one does, and it wonât. Itâs not like we have someone with nothing on their hands that we would choose to assign this project to, as opposed to something else.
âScala.js is nicheâ
Well ⌠itâs just not. Several surveys agree on about 20% of Scala users using Scala.js. Name any other cross-compiling language where there is such a high ratio of people targeting JS. Now youâll say: but 20% of the Scala userbase is nothing compared to, e.g., TypeScript users. Sure, thatâs true; but if thatâs your metric, you can also leave Scala/JVM behind so
"Scala Native is not production ready, so how would Scala/Python make it?
Now itâs time to look at the technical aspects. Scalaâs core expertise lies in how to interoperate with the host languages of its target platforms. Without that, none of its platforms would stand a chance. Scala/JVM works because it can leverage Java and other JVM libraries. Scala.js can leverage JavaScript libraries. Both those platforms have a really good story for interop, and that is why they work.
Writing a compiler backend is easy (the compiler part of Scala.js had a full prototype in 2 months). Designing language features for interoperability is the true challenge of targeting a new platform. It is also the main challenge with Scala.js-on-Wasm-without-JS.
Scala Native follows suit in the problem itâs trying to address. However, the interop problem is much harder on Native than on the JVM or JS. The gap between a GCed object model and a linear memory model is huge. Much bigger than between the statically typed nature of Scala and the dynamically typed nature of JavaScript (for Scala.js, the biggest gap was overloading semantics; not static typing at all). Thatâs why Scala Native has not reached the level of maturity that Scala.js has. Itâs not because itâs lacking resources (on the core, there are more resources on Scala Native than Scala.js).
Python is semantically much closer to JS than to native. Writing a backend for Python can probably be prototyped in a few months, like for JS. Designing interop will take longer, but based on the experience of Scala.js and Scala/JVM before it, we could have something decent to play with in a few more months.
A significant amount of work in writing Scala.js was to write the JDK libraries. This work has already been reused to a large extent by Scala Native. It is even easier now, because weâre making it less dependent on JS in order to target standalone Wasm. Thatâs a whole lot of work that a Python backend will get for free.
So this is very much doable. Yes, itâs going to be slow to execute; but no slower than Python itself (Scala.js is not slower than JS; sometimes itâs faster). Yes, itâs going to be yet another backend. But if there is one person who finds that challenge interesting and has a bit of experience writing a compiler, they could rely on a lot of existing work, knowledge and expertise in the Scala compiler landscape.
I donât see it: in what way is the fact that Spark is written in Scala even relevant? In my experience, most Spark users â Python and Scala users alike â arenât aware of the fact that Spark is written in Scala.
What matters is that Python makes it easy to explore your data, and Scala makes it easy to productionize those explorations. Far as I can tell, thatâs why companies use them. Sparkâs origin doesnât matter; what matters is that Scala results in more maintainable systems than Python. And it looks to me like the same arguments likely apply to the AI space.
That doesnât mean it would succeed, of course. But âyouâre probably already using Scala for something similar, for the following reasonsâ is a good argument to put into a whitepaper.
I debated internally a lot on the answers before I realize itâs because Iâm at odds with the premise itself here of the SIP-meeting discussing this. Where did you garner sentiment that the Scala community cares about this? It has certainly not been my experience in 20 years at 7+ different places. Or maybe itâs not about the what the Scala community wants but about something you think we should care about and that would be beneficial but we donât know it yet (âweâ here meaning the community).
Naturally one is free to work in whatever they want, but it seems here you are being propelled by the interest in addressing a pain point. Iâd like to know where you (the SIP-meeting you, not singular you) got this idea, since from where I stand those good intentions seem unjustified.
I think itâd scratch the âI want to work with those libraries without being forced to use python in my personal projects because no company is going to let me do this anywayâ itch, like when I wrote a scala-lua transpiler to do scripts for some game engines, but I donât believe 1 in a hundred scala developers are in that position.
The other technical questions where masterfully addressed by srjd.
All in all, these kind of questions of âwould working on this be of interest to the communityâ are always terrible: Scala-contributors is the worst kind of representation of the scala-community. Itâs like going into a rich private neighborhood to ask for sentiment regarding political matters. Youâll only get the most slanted and narrow perspectives possible. Frankly Iâd rather you just do it (or not) instead of asking here, and I say that as a member of the community that sometimes happens to visit Scala contributors, because every other Scala programmer I know does not and would not.
By the end of each SIP-meeting, when we have gone through all pending SIP-proposals, we often, if time permits, discuss various things related to the Scala language. This time I proposed to discuss Scala for AI and the relation to existing AI tools in Python - so donât âblameâ the entire set of SIP members for setting some kind of âagendaâ that you may disapprove of - I am the only one to âblameâ for bringing this up.
Or maybe itâs not about the what the Scala community wants but about something you think we should care about
I have some anecdotal evidence (I cannot claim any scientific validity but at least anecdotes from independent sources) that people have considered Scala for AI applications, but chose Python, not because of the language, but because of the AI tools available. Thus, I have a hypothesis (true or false) that it would be useful for some Scala developers if there could be some improved interop. I have no plans of doing this myself, but I think there are opportunities beyond the most shallow interop such as just communicating between different processes which will be even slower than Python as parsing and unparsing will be needed for each data transfer.
always terrible [âŚ] instead of asking here, and I say that as a member of the community
I am a bit perplexed by your negative tone, but I may have misinterpreted the underlying sentiment. My apologies if I upset you by starting this post - that was not my intent. I just wanted to elicit different views on the topic, for those who care to chime in - this is after all a discussion forum. Thanks anyway for taking the time to express your views.
No intention on blaming from me, just understanding how it happened to properly answer your questions.
Regarding the anecdotal evidence, I think that was my main point at the end regarding where one garners sentiment on a topic and the plurality of it. Iâm also sure that every human is likely to inflate the non scientific numbers of their perception based on what they consider better (like when you think, this is better for them even though they donât know it yet!). All done in the best of intentions.
I am a bit perplexed by your negative tone
I tried as hard as I could to keep it neutral, though dissenting. Itâs hard not to take or convey dissent as negativity though. My apologies for my efforts werenât good enough.
Thanks for your reply. No worries.
Compiling to Python may seem like a bad idea to many and a good idea to some, but I am still happy for all the insightful answers in this thread and I am learning a lot. Thanks again.
n terms of artificial intelligence, Scala has fallen far behind many other languages, especially Rust, Python, and Go. We lack some crucial libraries and toolchains, and they are currently very incomplete. There is a shortage of talent in Scala. It has even fallen behind some niche languages. This is a disgrace to us. We must enrich our AI core libraries and toolchains and promote them well. we need scala-numpy scala-pandas scala-plot scala-torch scala-transformers scala-pickle scala-gym scala-vllm scala-triton scala-tensorRT scala-deepspeed scala-sglang
Agreed, but I think a lot of those should be JVM or native bindings to the native code versions, not some sort of weird compile-to-Python backend. There is some reason to use Scala as even-slower-Python-that-keeps-track-of-types-better, but not very much.
If Scala has the basics at Python-speed but by default lets you do all the custom stuff at JVM speed rather than Python speed, that already is an argument to use Scala instead of Python even if what youâre doing is all data-dependent so whether you have types or not is relatively unimportant.
This doesnât mean that the Python backend is necessarily undesirable; you might want that as a way to write in Scala and consume in Python. But if weâre doing it as a way to consume Python libraries, itâs hard for me to see how itâs a big win over simply using Python or using an interface layer like ScalaPy, SoS in Jupyter, Polynote, etc..
I guess that is also true for Java then?
Storch had its last commit in Feb 2024 and it is still on pyTorch 2.1 while pyTorch is moving fast and is now on 2.7.
Devloping and maintaining all those libs you mention from scratch would need many new maintainers and contributors diving in. And it would require GPU-support which is not available yet if I understand the JVM situation correctly.
So compiling to Python could perhaps be a quicker way to develop the interopâŚ
Well, as pointed out, much performance-critical Python actually runs at C-speed. And its also about accessing nvidias latest GPU:s conveniently, possibly by standing on the shoulders of those who already did itâŚ
The traditional problem with JVM â Python interop is that each has a separate heap and data needs to be copied between them. This tends to be too slow for large data, which is typical of AI applications. I donât know whether this is about to change with Panama or other efforts.